error-diagnostics-error-trace

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,

31,392 stars
Complexity: easy

About this skill

This skill transforms the AI agent into a seasoned error tracking and observability expert. It empowers the agent to offer comprehensive guidance, best practices, and actionable advice for implementing robust error monitoring solutions. The agent can provide insights on setting up error tracking systems, configuring effective alerts, implementing structured logging methodologies, and ensuring development teams can quickly identify and resolve production issues. It serves as a go-to resource for expert knowledge in maintaining system health and reliability.

Best use case

Users need expert advice or guidance on establishing or improving error monitoring systems, configuring alerts, implementing structured logging, or understanding observability best practices to resolve production issues.

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,

Users will receive expert-level advice, clear best practices, actionable checklists, and comprehensive guidance related to error tracking, monitoring, structured logging, and observability, enabling them to make informed decisions and implement effective solutions.

Practical example

Example input

As the error tracking expert, what are the essential steps to set up a comprehensive error monitoring system for a new microservices application using modern best practices?

Example output

To set up a comprehensive error monitoring system for a new microservices application, follow these essential steps:

1.  **Choose a Monitoring Stack:** Select tools like Prometheus/Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Datadog, New Relic, Sentry, or Honeycomb. Consider your budget, existing infrastructure, and team's familiarity.
2.  **Implement Structured Logging:** Ensure all services emit logs in a structured format (e.g., JSON). Include critical metadata like `timestamp`, `service_name`, `environment`, `log_level`, `request_id`, `user_id`, `error_code`, `stack_trace` (if an error), etc. This makes logs searchable and analyzable.
3.  **Centralized Log Aggregation:** Use a log aggregator (e.g., Logstash, Fluentd, Vector) to collect logs from all microservices and forward them to your chosen monitoring stack's central storage (e.g., Elasticsearch, S3, cloud logging services).
4.  **Error Tracking Integration:** Integrate a dedicated error tracking service (e.g., Sentry, Bugsnag, Rollbar) into each microservice. This captures unhandled exceptions, provides stack traces, and aggregates similar errors.
5.  **Metrics Collection:** Collect key performance indicators (KPIs) and operational metrics from each service. This includes request rates, error rates, latency, CPU/memory usage, database query times. Tools like Prometheus or a cloud-native monitoring service are ideal.
6.  **Distributed Tracing:** Implement distributed tracing (e.g., OpenTelemetry, Jaeger, Zipkin) to visualize the flow of requests across multiple microservices. This is crucial for diagnosing latency issues and understanding service dependencies.
7.  **Alerting & Notifications:** Configure intelligent alerts based on critical metrics and error thresholds. Examples: 'high error rate on Service X', 'latency spike on API Y', 'database connection errors'. Route notifications to appropriate teams via Slack, PagerDuty, email, etc.
8.  **Dashboards & Visualization:** Create intuitive dashboards (e.g., Grafana, Kibana, Datadog) that provide a holistic view of your system's health, key metrics, error trends, and service dependencies.
9.  **Runbooks & Playbooks:** Document clear runbooks for common alerts and incidents, outlining steps for diagnosis and resolution.
10. **Regular Review & Iteration:** Continuously review your monitoring setup. Are alerts effective? Are dashboards providing useful insights? Adjust based on incidents and evolving application needs.

By following these steps, you'll establish a robust error monitoring system capable of providing deep insights into your microservices application's health and performance.

When to use this skill

  • When working on error tracking and monitoring tasks or workflows.
  • When needing guidance, best practices, or checklists for error tracking and monitoring.
  • When seeking expert advice on system observability and reliability.
  • When designing or refining error management strategies for software applications.

When not to use this skill

  • This skill is not designed for direct execution of code, interaction with external APIs to set up monitoring tools, or real-time diagnostics on live systems. Its purpose is advisory and informational, not for performing direct system operations.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/error-diagnostics-error-trace/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/error-diagnostics-error-trace/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/error-diagnostics-error-trace/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How error-diagnostics-error-trace Compares

Feature / Agenterror-diagnostics-error-traceStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Error Tracking and Monitoring

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.

## Use this skill when

- Working on error tracking and monitoring tasks or workflows
- Needing guidance, best practices, or checklists for error tracking and monitoring

## Do not use this skill when

- The task is unrelated to error tracking and monitoring
- You need a different domain or tool outside this scope

## Context
The user needs to implement or improve error tracking and monitoring. Focus on real-time error detection, meaningful alerts, error grouping, performance monitoring, and integration with popular error tracking services.

## Requirements
$ARGUMENTS

## Instructions

- Clarify goals, constraints, and required inputs.
- Apply relevant best practices and validate outcomes.
- Provide actionable steps and verification.
- If detailed examples are required, open `resources/implementation-playbook.md`.

## Output Format

1. **Error Tracking Analysis**: Current error handling assessment
2. **Integration Configuration**: Setup for error tracking services
3. **Logging Implementation**: Structured logging setup
4. **Alert Rules**: Intelligent alerting configuration
5. **Error Grouping**: Deduplication and grouping logic
6. **Recovery Strategies**: Automatic error recovery implementation
7. **Dashboard Setup**: Real-time error monitoring dashboard
8. **Documentation**: Implementation and troubleshooting guide

Focus on providing comprehensive error visibility, intelligent alerting, and quick error resolution capabilities.

## Resources

- `resources/implementation-playbook.md` for detailed patterns and examples.

Related Skills

error-debugging-error-trace

31392
from sickn33/antigravity-awesome-skills

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.

DevOps & InfrastructureClaude

error-debugging-error-analysis

31392
from sickn33/antigravity-awesome-skills

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

DevOps & InfrastructureClaude

linux-shell-scripting

31392
from sickn33/antigravity-awesome-skills

Provide production-ready shell script templates for common Linux system administration tasks including backups, monitoring, user management, log analysis, and automation. These scripts serve as building blocks for security operations and penetration testing environments.

DevOps & InfrastructureClaude

iterate-pr

31392
from sickn33/antigravity-awesome-skills

Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.

DevOps & InfrastructureClaude

istio-traffic-management

31392
from sickn33/antigravity-awesome-skills

Comprehensive guide to Istio traffic management for production service mesh deployments.

DevOps & InfrastructureClaude

incident-runbook-templates

31392
from sickn33/antigravity-awesome-skills

Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.

DevOps & InfrastructureClaude

incident-response-smart-fix

31392
from sickn33/antigravity-awesome-skills

[Extended thinking: This workflow implements a sophisticated debugging and resolution pipeline that leverages AI-assisted debugging tools and observability platforms to systematically diagnose and res

DevOps & InfrastructureClaudeGitHub Copilot

incident-responder

31392
from sickn33/antigravity-awesome-skills

Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management.

DevOps & InfrastructureClaude

expo-cicd-workflows

31392
from sickn33/antigravity-awesome-skills

Helps understand and write EAS workflow YAML files for Expo projects. Use this skill when the user asks about CI/CD or workflows in an Expo or EAS context, mentions .eas/workflows/, or wants help with EAS build pipelines or deployment automation.

DevOps & InfrastructureClaude

docker-expert

31392
from sickn33/antigravity-awesome-skills

You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.

DevOps & InfrastructureClaude

devops-troubleshooter

31392
from sickn33/antigravity-awesome-skills

Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability.

DevOps & InfrastructureClaude

devops-deploy

31392
from sickn33/antigravity-awesome-skills

DevOps e deploy de aplicacoes — Docker, CI/CD com GitHub Actions, AWS Lambda, SAM, Terraform, infraestrutura como codigo e monitoramento.

DevOps & InfrastructureClaudeCursorGemini