error-diagnostics-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,
About this skill
This skill transforms the AI agent into a seasoned error tracking and observability expert. It empowers the agent to offer comprehensive guidance, best practices, and actionable advice for implementing robust error monitoring solutions. The agent can provide insights on setting up error tracking systems, configuring effective alerts, implementing structured logging methodologies, and ensuring development teams can quickly identify and resolve production issues. It serves as a go-to resource for expert knowledge in maintaining system health and reliability.
Best use case
Users need expert advice or guidance on establishing or improving error monitoring systems, configuring alerts, implementing structured logging, or understanding observability best practices to resolve production issues.
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,
Users will receive expert-level advice, clear best practices, actionable checklists, and comprehensive guidance related to error tracking, monitoring, structured logging, and observability, enabling them to make informed decisions and implement effective solutions.
Practical example
Example input
As the error tracking expert, what are the essential steps to set up a comprehensive error monitoring system for a new microservices application using modern best practices?
Example output
To set up a comprehensive error monitoring system for a new microservices application, follow these essential steps: 1. **Choose a Monitoring Stack:** Select tools like Prometheus/Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Datadog, New Relic, Sentry, or Honeycomb. Consider your budget, existing infrastructure, and team's familiarity. 2. **Implement Structured Logging:** Ensure all services emit logs in a structured format (e.g., JSON). Include critical metadata like `timestamp`, `service_name`, `environment`, `log_level`, `request_id`, `user_id`, `error_code`, `stack_trace` (if an error), etc. This makes logs searchable and analyzable. 3. **Centralized Log Aggregation:** Use a log aggregator (e.g., Logstash, Fluentd, Vector) to collect logs from all microservices and forward them to your chosen monitoring stack's central storage (e.g., Elasticsearch, S3, cloud logging services). 4. **Error Tracking Integration:** Integrate a dedicated error tracking service (e.g., Sentry, Bugsnag, Rollbar) into each microservice. This captures unhandled exceptions, provides stack traces, and aggregates similar errors. 5. **Metrics Collection:** Collect key performance indicators (KPIs) and operational metrics from each service. This includes request rates, error rates, latency, CPU/memory usage, database query times. Tools like Prometheus or a cloud-native monitoring service are ideal. 6. **Distributed Tracing:** Implement distributed tracing (e.g., OpenTelemetry, Jaeger, Zipkin) to visualize the flow of requests across multiple microservices. This is crucial for diagnosing latency issues and understanding service dependencies. 7. **Alerting & Notifications:** Configure intelligent alerts based on critical metrics and error thresholds. Examples: 'high error rate on Service X', 'latency spike on API Y', 'database connection errors'. Route notifications to appropriate teams via Slack, PagerDuty, email, etc. 8. **Dashboards & Visualization:** Create intuitive dashboards (e.g., Grafana, Kibana, Datadog) that provide a holistic view of your system's health, key metrics, error trends, and service dependencies. 9. **Runbooks & Playbooks:** Document clear runbooks for common alerts and incidents, outlining steps for diagnosis and resolution. 10. **Regular Review & Iteration:** Continuously review your monitoring setup. Are alerts effective? Are dashboards providing useful insights? Adjust based on incidents and evolving application needs. By following these steps, you'll establish a robust error monitoring system capable of providing deep insights into your microservices application's health and performance.
When to use this skill
- When working on error tracking and monitoring tasks or workflows.
- When needing guidance, best practices, or checklists for error tracking and monitoring.
- When seeking expert advice on system observability and reliability.
- When designing or refining error management strategies for software applications.
When not to use this skill
- This skill is not designed for direct execution of code, interaction with external APIs to set up monitoring tools, or real-time diagnostics on live systems. Its purpose is advisory and informational, not for performing direct system operations.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/error-diagnostics-error-trace/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How error-diagnostics-error-trace Compares
| Feature / Agent | error-diagnostics-error-trace | Standard Approach |
|---|---|---|
| Platform Support | Claude | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | easy | N/A |
Frequently Asked Questions
What does this skill do?
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,
Which AI agents support this skill?
This skill is designed for Claude.
How difficult is it to install?
The installation complexity is rated as easy. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
SKILL.md Source
# Error Tracking and Monitoring You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues. ## Use this skill when - Working on error tracking and monitoring tasks or workflows - Needing guidance, best practices, or checklists for error tracking and monitoring ## Do not use this skill when - The task is unrelated to error tracking and monitoring - You need a different domain or tool outside this scope ## Context The user needs to implement or improve error tracking and monitoring. Focus on real-time error detection, meaningful alerts, error grouping, performance monitoring, and integration with popular error tracking services. ## Requirements $ARGUMENTS ## Instructions - Clarify goals, constraints, and required inputs. - Apply relevant best practices and validate outcomes. - Provide actionable steps and verification. - If detailed examples are required, open `resources/implementation-playbook.md`. ## Output Format 1. **Error Tracking Analysis**: Current error handling assessment 2. **Integration Configuration**: Setup for error tracking services 3. **Logging Implementation**: Structured logging setup 4. **Alert Rules**: Intelligent alerting configuration 5. **Error Grouping**: Deduplication and grouping logic 6. **Recovery Strategies**: Automatic error recovery implementation 7. **Dashboard Setup**: Real-time error monitoring dashboard 8. **Documentation**: Implementation and troubleshooting guide Focus on providing comprehensive error visibility, intelligent alerting, and quick error resolution capabilities. ## Resources - `resources/implementation-playbook.md` for detailed patterns and examples.
Related Skills
error-debugging-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.
error-debugging-error-analysis
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
linux-shell-scripting
Provide production-ready shell script templates for common Linux system administration tasks including backups, monitoring, user management, log analysis, and automation. These scripts serve as building blocks for security operations and penetration testing environments.
iterate-pr
Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.
istio-traffic-management
Comprehensive guide to Istio traffic management for production service mesh deployments.
incident-runbook-templates
Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.
incident-response-smart-fix
[Extended thinking: This workflow implements a sophisticated debugging and resolution pipeline that leverages AI-assisted debugging tools and observability platforms to systematically diagnose and res
incident-responder
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management.
expo-cicd-workflows
Helps understand and write EAS workflow YAML files for Expo projects. Use this skill when the user asks about CI/CD or workflows in an Expo or EAS context, mentions .eas/workflows/, or wants help with EAS build pipelines or deployment automation.
docker-expert
You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.
devops-troubleshooter
Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability.
devops-deploy
DevOps e deploy de aplicacoes — Docker, CI/CD com GitHub Actions, AWS Lambda, SAM, Terraform, infraestrutura como codigo e monitoramento.