cohere-observability

Set up comprehensive observability for Cohere API v2 with metrics, traces, and alerts. Use when implementing monitoring for Chat/Embed/Rerank operations, setting up dashboards, or configuring alerts for Cohere integrations. Trigger with phrases like "cohere monitoring", "cohere metrics", "cohere observability", "monitor cohere", "cohere alerts", "cohere tracing".

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

cohere-observability is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using cohere-observability should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cohere-observability/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/cohere-observability/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cohere-observability/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How cohere-observability Compares

Feature / Agent	cohere-observability	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Cohere Observability

## Overview
Set up production observability for Cohere API v2 with Prometheus metrics, OpenTelemetry tracing, and AlertManager rules. Tracks per-endpoint latency, token usage, error rates, and costs.

## Prerequisites
- Prometheus or compatible metrics backend
- OpenTelemetry SDK installed
- `cohere-ai` SDK v7+

## Instructions

### Step 1: Metrics Collection

```typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';

const registry = new Registry();

// Per-endpoint request counter
const requestCounter = new Counter({
  name: 'cohere_requests_total',
  help: 'Total Cohere API requests',
  labelNames: ['endpoint', 'model', 'status'],
  registers: [registry],
});

// Latency histogram
const requestDuration = new Histogram({
  name: 'cohere_request_duration_seconds',
  help: 'Cohere request duration',
  labelNames: ['endpoint', 'model'],
  buckets: [0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30],
  registers: [registry],
});

// Token usage tracking
const tokenCounter = new Counter({
  name: 'cohere_tokens_total',
  help: 'Total tokens consumed',
  labelNames: ['endpoint', 'model', 'direction'], // direction: input|output
  registers: [registry],
});

// Error counter by type
const errorCounter = new Counter({
  name: 'cohere_errors_total',
  help: 'Cohere errors by status code',
  labelNames: ['endpoint', 'status_code'],
  registers: [registry],
});

// Rate limit headroom
const rateLimitGauge = new Gauge({
  name: 'cohere_rate_limit_remaining',
  help: 'Remaining rate limit capacity',
  labelNames: ['endpoint'],
  registers: [registry],
});
```

### Step 2: Instrumented Client Wrapper

```typescript
import { CohereClientV2, CohereError, CohereTimeoutError } from 'cohere-ai';

const cohere = new CohereClientV2();

async function instrumentedCall<T>(
  endpoint: string,
  model: string,
  operation: () => Promise<T>
): Promise<T> {
  const timer = requestDuration.startTimer({ endpoint, model });

  try {
    const result = await operation();
    requestCounter.inc({ endpoint, model, status: 'success' });
    timer();

    // Track tokens from response
    const usage = (result as any)?.usage?.billedUnits;
    if (usage) {
      if (usage.inputTokens) {
        tokenCounter.inc({ endpoint, model, direction: 'input' }, usage.inputTokens);
      }
      if (usage.outputTokens) {
        tokenCounter.inc({ endpoint, model, direction: 'output' }, usage.outputTokens);
      }
    }

    return result;
  } catch (err) {
    requestCounter.inc({ endpoint, model, status: 'error' });
    timer();

    if (err instanceof CohereError) {
      errorCounter.inc({ endpoint, status_code: String(err.statusCode) });
    } else if (err instanceof CohereTimeoutError) {
      errorCounter.inc({ endpoint, status_code: 'timeout' });
    }

    throw err;
  }
}

// Usage
const response = await instrumentedCall('chat', 'command-a-03-2025', () =>
  cohere.chat({
    model: 'command-a-03-2025',
    messages: [{ role: 'user', content: query }],
  })
);
```

### Step 3: OpenTelemetry Tracing

```typescript
import { trace, SpanStatusCode, SpanKind } from '@opentelemetry/api';

const tracer = trace.getTracer('cohere-client', '1.0.0');

async function tracedCohereCall<T>(
  endpoint: string,
  model: string,
  operation: () => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(
    `cohere.${endpoint}`,
    { kind: SpanKind.CLIENT },
    async (span) => {
      span.setAttribute('cohere.model', model);
      span.setAttribute('cohere.endpoint', endpoint);

      try {
        const result = await operation();

        // Add token usage to span
        const usage = (result as any)?.usage?.billedUnits;
        if (usage) {
          span.setAttribute('cohere.tokens.input', usage.inputTokens ?? 0);
          span.setAttribute('cohere.tokens.output', usage.outputTokens ?? 0);
        }

        span.setStatus({ code: SpanStatusCode.OK });
        return result;
      } catch (err: any) {
        span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
        span.recordException(err);

        if (err instanceof CohereError) {
          span.setAttribute('cohere.error.status', err.statusCode ?? 0);
        }
        throw err;
      } finally {
        span.end();
      }
    }
  );
}
```

### Step 4: Structured Logging

```typescript
import pino from 'pino';

const logger = pino({ name: 'cohere', level: process.env.LOG_LEVEL ?? 'info' });

function logCohereCall(
  endpoint: string,
  model: string,
  durationMs: number,
  status: 'success' | 'error',
  meta?: Record<string, unknown>
) {
  logger[status === 'error' ? 'error' : 'info']({
    service: 'cohere',
    endpoint,
    model,
    durationMs,
    status,
    ...meta,
  });
}

// Combined instrumentation
async function observedCall<T>(
  endpoint: string,
  model: string,
  fn: () => Promise<T>
): Promise<T> {
  return tracedCohereCall(endpoint, model, () =>
    instrumentedCall(endpoint, model, async () => {
      const start = Date.now();
      try {
        const result = await fn();
        logCohereCall(endpoint, model, Date.now() - start, 'success', {
          tokens: (result as any)?.usage?.billedUnits,
        });
        return result;
      } catch (err) {
        logCohereCall(endpoint, model, Date.now() - start, 'error', {
          error: err instanceof CohereError ? err.statusCode : 'timeout',
        });
        throw err;
      }
    })
  );
}
```

### Step 5: Alert Rules

```yaml
# prometheus/cohere-alerts.yml
groups:
  - name: cohere
    rules:
      - alert: CohereHighErrorRate
        expr: |
          rate(cohere_errors_total[5m]) /
          rate(cohere_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cohere error rate > 5%"
          description: "{{ $labels.endpoint }} error rate: {{ $value | humanizePercentage }}"

      - alert: CohereRateLimited
        expr: rate(cohere_errors_total{status_code="429"}[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Cohere rate limiting detected"

      - alert: CohereHighLatency
        expr: |
          histogram_quantile(0.95,
            rate(cohere_request_duration_seconds_bucket[5m])
          ) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cohere P95 latency > 10s"

      - alert: CohereAuthFailure
        expr: cohere_errors_total{status_code="401"} > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Cohere authentication failure — check API key"

      - alert: CohereHighTokenBurn
        expr: rate(cohere_tokens_total[1h]) > 100000
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Cohere token burn rate > 100K/hour"
```

### Step 6: Metrics Endpoint

```typescript
// GET /metrics
import express from 'express';

const app = express();

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', registry.contentType);
  res.send(await registry.metrics());
});
```

## Dashboard Panels (Grafana)

| Panel | Query | Type |
|-------|-------|------|
| Request Rate | `rate(cohere_requests_total[5m])` | Time series |
| Error Rate | `rate(cohere_errors_total[5m]) / rate(cohere_requests_total[5m])` | Stat |
| P50/P95 Latency | `histogram_quantile(0.95, rate(cohere_request_duration_seconds_bucket[5m]))` | Time series |
| Token Usage | `rate(cohere_tokens_total[1h])` | Bar chart |
| Errors by Code | `sum by (status_code)(rate(cohere_errors_total[5m]))` | Pie chart |

## Output
- Prometheus metrics for requests, latency, tokens, and errors
- OpenTelemetry traces with Cohere-specific attributes
- Structured JSON logging with pino
- AlertManager rules for error rate, latency, auth, and cost

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Missing token metrics | Usage not in response | Check `response.usage.billedUnits` |
| High cardinality | Too many model labels | Use model family, not exact version |
| Alert storm | Threshold too low | Tune thresholds for your traffic |
| Trace gaps | Missing context propagation | Ensure OTel context flows through async |

## Resources
- [Prometheus Naming Conventions](https://prometheus.io/docs/practices/naming/)
- [OpenTelemetry JS](https://opentelemetry.io/docs/languages/js/)
- [Cohere API Reference](https://docs.cohere.com/reference/about)

## Next Steps
For incident response, see `cohere-incident-runbook`.

Related Skills

exa-observability

from ComeOnOliver/skillshub

Set up monitoring, metrics, and alerting for Exa search integrations. Use when implementing monitoring for Exa operations, building dashboards, or configuring alerting for search quality and latency. Trigger with phrases like "exa monitoring", "exa metrics", "exa observability", "monitor exa", "exa alerts", "exa dashboard".

evernote-observability

from ComeOnOliver/skillshub

Implement observability for Evernote integrations. Use when setting up monitoring, logging, tracing, or alerting for Evernote applications. Trigger with phrases like "evernote monitoring", "evernote logging", "evernote metrics", "evernote observability".

documenso-observability

from ComeOnOliver/skillshub

Implement monitoring, logging, and tracing for Documenso integrations. Use when setting up observability, implementing metrics collection, or debugging production issues. Trigger with phrases like "documenso monitoring", "documenso metrics", "documenso logging", "documenso tracing", "documenso observability".

deepgram-observability

from ComeOnOliver/skillshub

Set up comprehensive observability for Deepgram integrations. Use when implementing monitoring, setting up dashboards, or configuring alerting for Deepgram integration health. Trigger: "deepgram monitoring", "deepgram metrics", "deepgram observability", "monitor deepgram", "deepgram alerts", "deepgram dashboard".

databricks-observability

from ComeOnOliver/skillshub

Set up comprehensive observability for Databricks with metrics, traces, and alerts. Use when implementing monitoring for Databricks jobs, setting up dashboards, or configuring alerting for pipeline health. Trigger with phrases like "databricks monitoring", "databricks metrics", "databricks observability", "monitor databricks", "databricks alerts", "databricks logging".

customerio-observability

from ComeOnOliver/skillshub

Set up Customer.io monitoring and observability. Use when implementing metrics, structured logging, alerting, or Grafana dashboards for Customer.io integrations. Trigger: "customer.io monitoring", "customer.io metrics", "customer.io dashboard", "customer.io alerts", "customer.io observability".

coreweave-observability

from ComeOnOliver/skillshub

Set up GPU monitoring and observability for CoreWeave workloads. Use when implementing GPU metrics dashboards, configuring alerts, or tracking inference latency and throughput. Trigger with phrases like "coreweave monitoring", "coreweave observability", "coreweave gpu metrics", "coreweave grafana".

cohere-webhooks-events

from ComeOnOliver/skillshub

Implement Cohere streaming event handling, SSE patterns, and connector webhooks. Use when building streaming UIs, handling chat/tool events, or registering Cohere connectors for RAG. Trigger with phrases like "cohere streaming", "cohere events", "cohere SSE", "cohere connectors", "cohere webhook".

cohere-upgrade-migration

from ComeOnOliver/skillshub

Migrate from Cohere API v1 to v2 and upgrade SDK versions. Use when upgrading cohere-ai SDK, migrating from CohereClient to CohereClientV2, or handling breaking changes between API versions. Trigger with phrases like "upgrade cohere", "cohere migration", "cohere v1 to v2", "update cohere SDK", "cohere breaking changes".

cohere-security-basics

from ComeOnOliver/skillshub

Apply Cohere security best practices for API key management and access control. Use when securing API keys, implementing key rotation, or auditing Cohere security configuration. Trigger with phrases like "cohere security", "cohere secrets", "secure cohere", "cohere API key security", "cohere key rotation".

cohere-sdk-patterns

from ComeOnOliver/skillshub

Apply production-ready Cohere SDK patterns for TypeScript and Python. Use when implementing Cohere integrations, refactoring SDK usage, or establishing team coding standards for Cohere API v2. Trigger with phrases like "cohere SDK patterns", "cohere best practices", "cohere code patterns", "idiomatic cohere", "cohere wrapper".

cohere-reference-architecture

from ComeOnOliver/skillshub

Implement Cohere reference architecture with layered project layout for RAG and agents. Use when designing new Cohere integrations, reviewing project structure, or establishing architecture standards for Cohere API v2 applications. Trigger with phrases like "cohere architecture", "cohere project structure", "cohere layout", "organize cohere app", "cohere design pattern".