dd-apm
APM - traces, services, dependencies, performance analysis.
Best use case
dd-apm is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
APM - traces, services, dependencies, performance analysis.
Teams using dd-apm should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/dd-apm/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How dd-apm Compares
| Feature / Agent | dd-apm | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
APM - traces, services, dependencies, performance analysis.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Datadog APM
Distributed tracing, service maps, and performance analysis.
## Requirements
Datadog Labs Pup should be installed via:
```bash
go install github.com/datadog-labs/pup@latest
```
## Quick Start
```bash
pup auth login
pup apm services list
pup apm traces list --service api-gateway --duration 1h
```
## Services
### List Services
```bash
pup apm services list
pup apm services list --env production
```
### Service Details
```bash
pup apm services get api-gateway --json
```
### Service Map
```bash
# View dependencies
pup apm service-map --service api-gateway --json
```
## Traces
### Search Traces
```bash
# By service
pup apm traces list --service api-gateway --duration 1h
# Errors only
pup apm traces list --service api-gateway --status error
# Slow traces (>1s)
pup apm traces list --service api-gateway --min-duration 1000ms
# With specific tag
pup apm traces list --query "@http.url:/api/users"
```
### Get Trace Detail
```bash
pup apm traces get <trace_id> --json
```
## Key Metrics
| Metric | What It Measures |
|--------|------------------|
| `trace.http.request.hits` | Request count |
| `trace.http.request.duration` | Latency |
| `trace.http.request.errors` | Error count |
| `trace.http.request.apdex` | User satisfaction |
## ⚠️ Trace Sampling
**Not all traces are kept.** Understand sampling:
| Mode | What's Kept |
|------|-------------|
| **Head-based** | Random % at start |
| **Error/Slow** | All errors, slow traces |
| **Retention** | What's indexed (billed) |
```bash
# Check retention filters
pup apm retention-filters list
```
### Trace Retention Costs
| Retention | Cost |
|-----------|------|
| Indexed spans | $$$ per million |
| Ingested spans | $ per million |
**Best practice:** Only index what you need for search.
## Service Level Objectives
Link APM to SLOs:
```bash
pup slos create \
--name "API Latency p99 < 200ms" \
--type metric \
--numerator "sum:trace.http.request.hits{service:api,@duration:<200000000}" \
--denominator "sum:trace.http.request.hits{service:api}" \
--target 99.0
```
## Common Queries
| Goal | Query |
|------|-------|
| Slowest endpoints | `avg:trace.http.request.duration{*} by {resource_name}` |
| Error rate | `sum:trace.http.request.errors{*} / sum:trace.http.request.hits{*}` |
| Throughput | `sum:trace.http.request.hits{*}.as_rate()` |
## Troubleshooting
| Problem | Fix |
|---------|-----|
| No traces | Check ddtrace installed, DD_TRACE_ENABLED=true |
| Missing service | Verify DD_SERVICE env var |
| Traces not linked | Check trace headers propagated |
| High cardinality | Don't tag with user_id/request_id |
## References/Docs
- [APM Setup](https://docs.datadoghq.com/tracing/)
- [Trace Search](https://docs.datadoghq.com/tracing/trace_explorer/)
- [Retention Filters](https://docs.datadoghq.com/tracing/trace_pipeline/trace_retention/)Related Skills
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
developing-frontend-apps
Frontend application development best practices. Use when building, modifying, or reviewing frontend applications, React components, UI components, client-side JavaScript/TypeScript, CSS/styling, single-page applications, or web application architecture.
developing-claude-agent-sdk-agents
Build AI agents with the Claude Agent SDK (TypeScript/Python). Covers creating agents, custom tools, hooks, subagents, MCP integration, permissions, sessions, and deployment. Use when building, reviewing, debugging, or deploying SDK-based agents. Invoke PROACTIVELY when user mentions Agent SDK, claude-agent-sdk, ClaudeSDKClient, query(), or building autonomous agents.
developing-backend-services
Backend service development best practices. Use when designing, building, or reviewing backend services, REST APIs, gRPC services, microservices, webhooks, message queues, or server-side applications regardless of language or framework.
dev_standards_skill
Development standards and architecture management skill. Enforces modular design, low coupling, clean code practices, and maintains project architecture graph for quick context understanding. Language-agnostic, works with TypeScript, Python, Go, Rust, Java, and more. Use when starting development tasks, refactoring, or analyzing project structure.
dev.shortcuts
Mandatory shortcut trigger and usage guidance. ALWAYS check if shortcut applies before responding to ANY coding or development request.
dev-workflow-planning
Structured development workflows using /brainstorm, /write-plan, and /execute-plan patterns. Transform ad-hoc conversations into systematic project execution with hypothesis-driven planning, incremental implementation, and progress tracking.
dev-swarm-tech-specs
Define technical specifications including tech stack, security, theme standards (from UX mockup), coding standards, and testing standards. Use when user asks to define tech specs, choose tech stack, or start Stage 7 after architecture.
dev-swarm-stage-architecture
Design the complete system architecture including components, data flow, infrastructure, database schema, and API design. Use when starting stage 07 (architecture) or when user asks about system design, tech stack, or database schema.
dev-specialisms:fly-deploy
Quick MVP deployment to fly.io for JavaScript (Next.js, RedwoodSDK, Express), Rust (Axum, Rocket), Python (FastAPI), and generic Dockerfiles. Use when deploying applications to fly.io, setting up databases (Postgres, volumes, Tigris object storage), managing secrets, configuring custom domains, setting up GitHub Actions workflows, creating review apps for pull requests, or troubleshooting fly.io deployments. Covers complete deployment workflows from initial setup through production.
dev-expert
Development patterns for React, Vue, Laravel, Next.js, React Native - state management, forms, API integration
dev-coding
Implement features as a Principal Engineering Developer