Implementing Observability

Instrument the application with Logging, Metrics, and Tracing (OpenTelemetry) to understand system behavior and debug production issues.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

Implementing Observability is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Instrument the application with Logging, Metrics, and Tracing (OpenTelemetry) to understand system behavior and debug production issues.

Teams using Implementing Observability should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/implementing-observability/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/implementing-observability/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/implementing-observability/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Implementing Observability Compares

Feature / Agent	Implementing Observability	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Instrument the application with Logging, Metrics, and Tracing (OpenTelemetry) to understand system behavior and debug production issues.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Implementing Observability

## Goal
Make the system's internal state inferable from its external outputs. Answer "Why is it slow?" and "Why did it fail?" without SSH-ing into a server.

## When to Use
- Before launching to production.
- When debugging a performance bottleneck.
- When integrating a new microservice or external API.

## Instructions

### 1. Structured Logging
Text logs are hard to query. Use JSON.
- **Context**: Every log must have `trace_id`, `request_id`, `user_id`.
- **Levels**: `INFO` for normal ops, `WARN` for handled issues, `ERROR` for unhandled crashes.

```json
{"level": "info", "msg": "User logged in", "user_id": 123, "trace_id": "abc-123"}
```

### 2. Distributed Tracing (OpenTelemetry)
Trace a request across boundaries (Frontend -> API -> DB).
- Instrument HTTP clients and server frameworks.
- Visualize the "waterfall" to find the slow span.

### 3. Golden Signals (Metrics)
Track the four key metrics for every service:
- **Latency**: Time to serve a request.
- **Traffic**: Request rate (RPS).
- **Errors**: Rate of 5xx responses.
- **Saturation**: CPU/Memory/Disk usage.

### 4. Alerting
Alert on symptoms (High Error Rate), not causes (High CPU).
- **Page**: If `Error Rate > 1%` for 5 minutes.
- **Ticket**: If `Disk Usage > 80%`.

## Constraints

### ✅ Do
- **DO**: Use OpenTelemetry standards for portability.
- **DO**: Correlate logs and traces (inject trace ID into logs).
- **DO**: Sample high-volume traces (10%) to save costs, but keep 100% of errors.

### ❌ Don't
- **DON'T**: Log PII (Emails, Passwords, Credit Cards).
- **DON'T**: Create alerts that auto-resolve in seconds (flapping).
- **DON'T**: Rely solely on "system up" checks; check "business logic working".

## Output Format
- `docker-compose.yml` with Prometheus/Grafana/Jaeger (for dev).
- Code instrumentation (e.g., `tracing.py`).

## Dependencies
- `backend/managing-flask-middleware/SKILL.md` (where instrumentation lives)
- `shared/debugging/SKILL.md`

Related Skills

implementing-search-filter

from diegosouzapw/awesome-omni-skill

Implements search and filter interfaces for both frontend (React/TypeScript) and backend (Python) with debouncing, query management, and database integration. Use when adding search functionality, building filter UIs, implementing faceted search, or optimizing search performance.

implementing-error-handling

from diegosouzapw/awesome-omni-skill

Master error handling patterns across languages including exceptions, Result types, error propagation, and graceful degradation to build resilient applications. Use when implementing error handling, designing APIs, or improving application reliability.

observability-review

from diegosouzapw/awesome-omni-skill

AI agent that analyzes operational signals (metrics, logs, traces, alerts, SLO/SLI reports) from observability platforms (Prometheus, Datadog, New Relic, CloudWatch, Grafana, Elastic) and produces practical, risk-aware triage and recommendations. Use when reviewing system health, investigating performance issues, analyzing monitoring data, evaluating service reliability, or providing SRE analysis of operational metrics. Distinguishes between critical issues requiring action, items needing investigation, and informational observations requiring no action.

implementing-android-code

from diegosouzapw/awesome-omni-skill

This skill should be used when implementing Android code in Bitwarden. Covers critical patterns, gotchas, and anti-patterns unique to this codebase. Triggered by "How do I implement a ViewModel?", "Create a new screen", "Add navigation", "Write a repository", "BaseViewModel pattern", "State-Action-Event", "type-safe navigation", "@Serializable route", "SavedStateHandle persistence", "process death recovery", "handleAction", "sendAction", "Hilt module", "Repository pattern", "implementing a screen", "adding a data source", "handling navigation", "encrypted storage", "security patterns", "Clock injection", "DataState", or any questions about implementing features, screens, ViewModels, data sources, or navigation in the Bitwarden Android app.

implementing-rapid7-insightvm-for-scanning

from diegosouzapw/awesome-omni-skill

Deploy and configure Rapid7 InsightVM Security Console and Scan Engines for authenticated and unauthenticated vulnerability scanning across enterprise environments.

implementing-navigation

from diegosouzapw/awesome-omni-skill

Implements navigation patterns and routing for both frontend (React/TS) and backend (Python) including menus, tabs, breadcrumbs, client-side routing, and server-side route configuration. Use when building navigation systems or setting up routing.

implementing-api-patterns

from diegosouzapw/awesome-omni-skill

API design and implementation across REST, GraphQL, gRPC, and tRPC patterns. Use when building backend services, public APIs, or service-to-service communication. Covers REST frameworks (FastAPI, Axum, Gin, Hono), GraphQL libraries (Strawberry, async-graphql, gqlgen, Pothos), gRPC (Tonic, Connect-Go), tRPC for TypeScript, pagination strategies (cursor-based, offset-based), rate limiting, caching, versioning, and OpenAPI documentation generation. Includes frontend integration patterns for forms, tables, dashboards, and ai-chat skills.

api-testing-observability-api-mock

from diegosouzapw/awesome-omni-skill

You are an API mocking expert specializing in realistic mock services for development, testing, and demos. Design mocks that simulate real API behavior and enable parallel development.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

moai-lang-{{LANGUAGE_SLUG}}

from diegosouzapw/awesome-omni-skill

{{LANGUAGE_NAME}} best practices with modern frameworks, {{PRIMARY_DOMAIN}}, and performance optimization for 2025

moai-lang-elixir

from diegosouzapw/awesome-omni-skill

Elixir 1.17+ development specialist covering Phoenix 1.7, LiveView, Ecto, and OTP patterns. Use when developing real-time applications, distributed systems, or Phoenix projects.

moai-lang-csharp

from diegosouzapw/awesome-omni-skill

Enterprise C# 13 development with .NET 9, async/await, LINQ, Entity Framework Core, ASP.NET Core, and Context7 MCP integration for modern backend and enterprise applications.