deploy-otel

Deploy the OpenTelemetry observability stack (Prometheus, Grafana, OTEL Collector) to a Kind cluster for testing toolhive telemetry. Use when you need to set up monitoring, metrics collection, or observability infrastructure.

1,689 stars

Best use case

deploy-otel is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Deploy the OpenTelemetry observability stack (Prometheus, Grafana, OTEL Collector) to a Kind cluster for testing toolhive telemetry. Use when you need to set up monitoring, metrics collection, or observability infrastructure.

Teams using deploy-otel should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/deploy-otel/SKILL.md --create-dirs "https://raw.githubusercontent.com/stacklok/toolhive/main/.claude/skills/deploy-otel/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/deploy-otel/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How deploy-otel Compares

Feature / Agentdeploy-otelStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Deploy the OpenTelemetry observability stack (Prometheus, Grafana, OTEL Collector) to a Kind cluster for testing toolhive telemetry. Use when you need to set up monitoring, metrics collection, or observability infrastructure.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Deploy OTEL Observability Stack

Deploy a complete OpenTelemetry observability stack to a Kind cluster for testing ToolHives telemetry capabilities.

## Steps

### 1. Verify Prerequisites

Check that required tools are installed:

```bash
echo "Checking prerequisites..."
command -v kind >/dev/null 2>&1 || { echo "ERROR: kind is not installed"; exit 1; }
command -v helm >/dev/null 2>&1 || { echo "ERROR: helm is not installed"; exit 1; }
command -v kubectl >/dev/null 2>&1 || { echo "ERROR: kubectl is not installed"; exit 1; }
echo "All prerequisites met."
```

### 2. Create Kind Cluster

Create the Kind cluster if it doesn't exist:

```bash
CLUSTER_NAME="toolhive"

if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
  echo "Kind cluster '${CLUSTER_NAME}' already exists"
else
  echo "Creating Kind cluster '${CLUSTER_NAME}'..."
  kind create cluster --name ${CLUSTER_NAME}
fi

# Export kubeconfig
kind get kubeconfig --name ${CLUSTER_NAME} > kconfig.yaml
echo "Kubeconfig written to kconfig.yaml"
```

### 3. Add Helm Repositories

```bash
echo "Adding Helm repositories..."
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
echo "Helm repositories updated."
```

### 4. Install Prometheus/Grafana Stack

```bash
echo "Installing kube-prometheus-stack..."
helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  -f examples/otel/prometheus-stack-values.yaml \
  -n monitoring --create-namespace \
  --kubeconfig kconfig.yaml \
  --wait --timeout 5m

echo "Prometheus/Grafana stack installed."
```

### 5. Install Tempo for Distributed Tracing

```bash
echo "Installing Grafana Tempo..."
helm upgrade -i tempo grafana/tempo \
  -f examples/otel/tempo-values.yaml \
  -n monitoring \
  --kubeconfig kconfig.yaml \
  --wait --timeout 3m

echo "Grafana Tempo installed."
```

### 6. Install OpenTelemetry Collector

```bash
echo "Installing OpenTelemetry Collector..."
helm upgrade -i otel-collector open-telemetry/opentelemetry-collector \
  -f examples/otel/otel-values.yaml \
  -n monitoring \
  --kubeconfig kconfig.yaml \
  --wait --timeout 3m

echo "OpenTelemetry Collector installed."
```

### 7. Verify Deployment

```bash
echo "Verifying deployment..."
kubectl get pods -n monitoring --kubeconfig kconfig.yaml
```

### 8. Display Access Instructions

```bash
cat <<'EOF'

=== OTEL Stack Deployment Complete ===

To access the UIs, run these port-forward commands:

  # Grafana (admin / admin)
  kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:3000 --kubeconfig kconfig.yaml

  # Prometheus
  kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 --kubeconfig kconfig.yaml

EOF
```

## Troubleshooting

If Helm installations fail due to incompatible values, it may be because the Helm charts have been updated and our `values.yaml` files are no longer compatible.

**Chart Documentation:**
- OpenTelemetry Collector: https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector
- Prometheus Stack: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
- Tempo: https://github.com/grafana/helm-charts/tree/main/charts/tempo

**If you encounter issues:**
1. Check the chart's `values.yaml` for schema changes in the versions of the Charts we are using
2. Compare with our values files in `examples/otel/`
3. Create an issue at: https://github.com/stacklok/toolhive/issues describing what the issue is and recommend a fix

## What This Deploys

| Component | Description |
|-----------|-------------|
| Prometheus | Metrics storage, scrapes OTEL collector on port 8889 |
| Grafana | Visualization dashboards (admin/admin) |
| Tempo | Distributed tracing backend, receives traces from OTEL Collector |
| OTEL Collector | Receives OTLP metrics/traces, exports to Prometheus and Tempo |

## Cleanup

To remove everything:

```bash
task kind-destroy
```

Or manually:

```bash
kind delete cluster --name toolhive
rm -f kconfig.yaml
```

Related Skills

deploying-vmcp-locally

1689
from stacklok/toolhive

Deploys a VirtualMCPServer configuration locally for manual testing and verification

split-pr

1689
from stacklok/toolhive

Analyzes current changes and suggests how to split them into smaller, reviewable PRs

Workflow & Productivity

code-review-assist

1689
from stacklok/toolhive

Augments human code review by summarizing changes, surfacing key review questions, assessing test coverage, and identifying low-risk sections. Use when reviewing a diff, PR, or code snippet as a senior review partner.

toolhive-release

1689
from stacklok/toolhive

Creates ToolHive release PRs by analyzing commits since the last release, categorizing changes, recommending semantic version bump type (major/minor/patch), and triggering the release workflow. Use when cutting a release, preparing a new version, checking what changed since last release, or when the user mentions "release", "version bump", or "cut a release".

doc-review

1689
from stacklok/toolhive

Reviews documentation for factual accuracy

add-rule

1689
from stacklok/toolhive

Captures a team convention or best practice and adds it to the appropriate .claude/rules/ or .claude/agents/ file

check-contribution

1689
from stacklok/toolhive

Validates operator chart contribution practices (helm template, ct lint, docs generation, version bump) before committing changes.

toolhive-cli-user

1689
from stacklok/toolhive

Guide for using ToolHive CLI (thv) to run and manage MCP servers and skills. Use when running, listing, stopping, building, or configuring MCP servers locally. Covers server lifecycle, registry browsing, secrets management, client registration, groups, container builds, exports, permissions, network isolation, authentication, and skill management (install, uninstall, list, info, build, push, validate). NOT for Kubernetes operator usage or ToolHive development/contributing.

vmcp-review

1689
from stacklok/toolhive

Reviews vMCP code changes for known anti-patterns that make the codebase harder to understand or more brittle. Use when reviewing PRs, planning features, or refactoring vMCP code.

pr-review

1689
from stacklok/toolhive

Submit inline review comments to GitHub PRs and reply to/resolve review threads using the GitHub CLI and GraphQL API.

deployment-patterns

144923
from affaan-m/everything-claude-code

Deployment workflows, CI/CD pipeline patterns, Docker containerization, health checks, rollback strategies, and production readiness checklists for web applications. Use when setting up deployment infrastructure or planning releases.

DevelopmentClaude

makepad-deployment

31392
from sickn33/antigravity-awesome-skills

CRITICAL: Use for Makepad packaging and deployment. Triggers on: deploy, package, APK, IPA, 打包, 部署, cargo-packager, cargo-makepad, WASM, Android, iOS, distribution, installer, .deb, .dmg, .nsis, GitHub Actions, CI, action, marketplace

Developer ToolsClaude