ClaudeDeveloper Tools

distributed-debugging-debug-trace

You are a debugging expert specializing in setting up comprehensive debugging environments, distributed tracing, and diagnostic tools. Configure debugging workflows, implement tracing solutions, and establish troubleshooting practices for development and production environments.

31,392 stars

bysickn33

Complexity: easy

View on GitHub Installation ↓

About this skill

This skill transforms your AI agent into a seasoned debugging expert, specializing in complex software environments. It enables the agent to provide sophisticated guidance on setting up robust debugging infrastructures, architecting distributed tracing solutions across microservices and cloud-native applications, and establishing effective diagnostic and troubleshooting practices. The AI can configure detailed debugging workflows, advise on implementing various tracing technologies (like OpenTelemetry), recommend diagnostic tools, and define best practices for both development lifecycle and production incident response, enhancing observability and reliability.

Best use case

Troubleshooting complex distributed systems, optimizing development debugging processes, implementing observability best practices, responding to production incidents, designing system diagnostics, and creating documentation or guidelines for debugging and tracing.

Comprehensive plans for debugging environments, detailed distributed tracing strategies and configurations, expert recommendations for diagnostic tools, best practice guidelines for troubleshooting, or step-by-step guidance for resolving complex software issues.

Practical example

Example input

Help me design a robust distributed tracing strategy for our new cloud-native application, which uses Kubernetes, Kafka, and a mix of Python and Node.js microservices. We need to effectively track requests, identify latency, and correlate logs for rapid issue resolution.

Example output

Okay, as your debugging expert, let's outline a robust distributed tracing strategy for your Kubernetes, Kafka, Python, and Node.js cloud-native application:

**1. Adopt OpenTelemetry (OTel) as the Standard:**
* **Why:** OpenTelemetry is a vendor-neutral, open-source standard for instrumenting, generating, and exporting telemetry data (traces, metrics, and logs). It provides consistent APIs and SDKs across languages, ensuring interoperability.
* **Action:** Integrate OTel SDKs into all your Python and Node.js microservices.

**2. Service Instrumentation:**
* **Auto-Instrumentation:** Utilize OTel's auto-instrumentation capabilities for common frameworks (e.g., Express.js, Flask, FastAPI, gRPC, HTTP clients) to automatically capture basic request spans.
* **Manual Instrumentation:** Implement manual instrumentation for critical business logic, database queries, external API calls, and especially for Kafka producer/consumer interactions. Ensure `trace_id` and `span_id` are propagated correctly across service boundaries (e.g., as HTTP headers, Kafka message headers).

**3. Context Propagation via Kafka:**
* **Producers:** Before sending a message to Kafka, extract the current OTel trace context and inject it into the Kafka message headers.
* **Consumers:** Upon receiving a message, extract the trace context from the Kafka message headers and use it to continue the trace. This ensures end-to-end visibility through your asynchronous message queues.

**4. OpenTelemetry Collectors:**
* **Deployment:** Deploy OTel Collectors within your Kubernetes cluster, ideally as a DaemonSet or Sidecar for high-traffic services, or as a Deployment for collecting traces from multiple services.
* **Function:** Collectors can receive traces from your services, batch, process (e.g., attribute modification, sampling), and export them to your chosen backend.

**5. Choose a Tracing Backend:**
* **Managed Services:** Integrate with a cloud-native APM solution (e.g., Datadog, New Relic, Honeycomb, AWS X-Ray, Google Cloud Trace) that natively supports OpenTelemetry. These offer robust UIs, alerting, and analysis.
* **Self-Hosted:** Consider deploying Jaeger or Grafana Tempo within your cluster for trace storage and visualization. These require more operational overhead but offer full control.

**6. Log and Metric Correlation:**
* **Logging:** Ensure your logging system (e.g., ELK stack, Grafana Loki) captures the `trace_id` and `span_id` from the OpenTelemetry context with every log line. This allows you to jump directly from a trace span to relevant logs for deep debugging.
* **Metrics:** Leverage OpenTelemetry to generate service-level metrics (e.g., request count, latency, error rates) alongside traces. Correlate these metrics with traces to identify performance anomalies and drill down into specific problematic requests.

**7. Debugging Workflow & Best Practices:**
* **Development:** Encourage developers to review local or dev environment traces to understand execution flows and identify performance issues early.
* **Production:** Configure dashboards (e.g., Grafana) to visualize service maps, trace dependencies, and critical path latencies. Set up alerts for unexpected trace patterns (e.g., increased error spans, long-running transactions).
* **Sampling:** Implement intelligent sampling strategies (e.g., head-based, tail-based sampling) in your OTel Collectors to manage trace volume while retaining critical traces.

**Key Implementation Steps:**
1. **Dependency Addition:** Add OTel SDKs and instrumentation libraries to all Python and Node.js services.
2. **Configuration:** Configure OTel exporters to send data to your OTel Collectors.
3. **Collector Deployment:** Deploy and configure OTel Collectors in Kubernetes.
4. **Backend Integration:** Set up your chosen tracing backend and ensure it's receiving data.
5. **Validation:** Test end-to-end tracing with representative request flows.

When to use this skill

When you need to design or refine debugging workflows for development teams.
When implementing or improving distributed tracing in microservices, serverless, or cloud-native architectures.
When establishing or enhancing incident response and troubleshooting practices for production systems.
When diagnosing performance bottlenecks, elusive bugs, or intermittent failures in complex applications.

When not to use this skill

For direct, real-time interaction with a live debugger or executing code on a remote system (unless integrated with a specific execution tool skill).
When requiring physical hardware debugging or very low-level machine code analysis without specific tooling integration.
When the primary need is for a specific, pre-defined tool invocation rather than expert advice or configuration generation.
For tasks that require direct human intuition or sensory perception that AI cannot replicate.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/distributed-debugging-debug-trace/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/distributed-debugging-debug-trace/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/distributed-debugging-debug-trace/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How distributed-debugging-debug-trace Compares

Feature / Agent	distributed-debugging-debug-trace	Standard Approach
Platform Support	Claude	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Debug and Trace Configuration

You are a debugging expert specializing in setting up comprehensive debugging environments, distributed tracing, and diagnostic tools. Configure debugging workflows, implement tracing solutions, and establish troubleshooting practices for development and production environments.

## Use this skill when

- Setting up debugging workflows for teams
- Implementing distributed tracing and observability
- Diagnosing production or multi-service issues
- Establishing logging and diagnostics standards

## Do not use this skill when

- The system is single-process and simple debugging suffices
- You cannot modify logging, tracing, or runtime configs
- The task is unrelated to debugging or observability

## Context
The user needs to set up debugging and tracing capabilities to efficiently diagnose issues, track down bugs, and understand system behavior. Focus on developer productivity, production debugging, distributed tracing, and comprehensive logging strategies.

## Requirements
$ARGUMENTS

## Instructions

- Identify services, trace boundaries, and key spans.
- Configure local debugging and production-safe tracing.
- Standardize log/trace fields and correlation IDs.
- Validate end-to-end trace coverage and sampling.
- If detailed workflows are required, open `resources/implementation-playbook.md`.

## Safety

- Avoid enabling verbose tracing in production without safeguards.
- Redact secrets and PII from logs and traces.

## Resources

- `resources/implementation-playbook.md` for detailed tooling and configuration patterns.

Related Skills

environment-setup-guide

31392

from sickn33/antigravity-awesome-skills

Guide developers through setting up development environments with proper tools, dependencies, and configurations

Developer ToolsClaude

ios-debugger-agent

31392

from sickn33/antigravity-awesome-skills

Debug the current iOS project on a booted simulator with XcodeBuildMCP.

Mobile DevelopmentClaude

error-diagnostics-smart-debug

31392

from sickn33/antigravity-awesome-skills

Use when working with error diagnostics smart debug

Debugging & TroubleshootingClaude

error-diagnostics-error-trace

31392

from sickn33/antigravity-awesome-skills

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,

DevOps & InfrastructureClaude

error-debugging-multi-agent-review

31392

from sickn33/antigravity-awesome-skills

Use when working with error debugging multi agent review

Code ReviewClaude

error-debugging-error-trace

31392

from sickn33/antigravity-awesome-skills

DevOps & InfrastructureClaude

error-debugging-error-analysis

31392

from sickn33/antigravity-awesome-skills

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

DevOps & InfrastructureClaude

distributed-tracing

31392

from sickn33/antigravity-awesome-skills

Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

Observability & MonitoringClaude

debugging-toolkit-smart-debug

31392

from sickn33/antigravity-awesome-skills

Use when working with debugging toolkit smart debug

Development ToolsClaude

debugging-strategies

31392

from sickn33/antigravity-awesome-skills

Transform debugging from frustrating guesswork into systematic problem-solving with proven strategies, powerful tools, and methodical approaches.

Developer ToolsClaude

debugger

31392

from sickn33/antigravity-awesome-skills

Debugging specialist for errors, test failures, and unexpected behavior. Use proactively when encountering any issues.

Development ToolsClaude

debug-buttercup

31392

from sickn33/antigravity-awesome-skills

All pods run in namespace crs. Use when pods in the crs namespace are in CrashLoopBackOff, OOMKilled, or restarting, multiple services restart simultaneously (cascade failure), or redis is unresponsive or showing AOF warnings.

DevOps & OperationsClaude