aggregating-gauge-metrics

Aggregate pre-computed metrics (gauge, counter, delta types) using OPAL. Use when analyzing request counts, error rates, resource utilization, or any numeric metrics over time. Covers align + m() + aggregate pattern, summary vs time-series output, and common aggregation functions. For percentile metrics (tdigest), see analyzing-tdigest-metrics skill.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

aggregating-gauge-metrics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using aggregating-gauge-metrics should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/aggregating-gauge-metrics/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/aggregating-gauge-metrics/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/aggregating-gauge-metrics/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How aggregating-gauge-metrics Compares

Feature / Agent	aggregating-gauge-metrics	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Aggregating Gauge Metrics

Pre-computed metrics in Observe store aggregated measurements at regular intervals (typically every 5 minutes). This skill teaches how to query gauge, counter, and delta metric types using OPAL.

## When to Use This Skill

- Analyzing request counts, error rates, or throughput metrics
- Tracking resource utilization (CPU, memory, network)
- Computing totals, averages, or rates across time periods
- Creating dashboards with time-series charts
- Working with any gauge, counter, or delta metric type
- When you need summary statistics or trends over time

## Prerequisites

- Access to Observe tenant via MCP
- Understanding that metrics are pre-aggregated (not raw events)
- Metric dataset with type: gauge, counter, or delta
- Use `discover_context()` to find and inspect metrics

## Key Concepts

### What Are Gauge Metrics?

**Gauge metrics** are pre-aggregated numeric measurements collected at regular intervals:

**Pre-aggregated**: Already summarized at collection time (typically 5-minute intervals)
- More efficient than querying raw data
- Faster query performance
- Lower storage costs

**Common Metric Types**:
- **Gauge**: Point-in-time value (CPU utilization, memory usage, queue depth)
- **Counter**: Monotonically increasing value (total requests, bytes sent)
- **Delta**: Change between intervals (requests per interval, errors per interval)

**Examples**:
- `span_call_count_5m` - Number of requests per 5-minute interval
- `span_error_count_5m` - Number of errors per 5-minute interval
- `system_cpu_utilization_ratio` - CPU utilization percentage
- `k8s_pod_memory_available_bytes` - Available memory in bytes

### CRITICAL: The align Verb is REQUIRED

Unlike datasets (Events/Intervals), metrics **MUST** use the `align` verb:

```opal
# WRONG - Will not work ❌
m("span_call_count_5m")
| statsby total:sum(metric)

# CORRECT - Must use align ✅
align options(bins: 1), rate:sum(m("span_call_count_5m"))
aggregate total_requests:sum(rate)
```

**Why align is required**: Metrics are stored as time-series data that must be aligned to a time grid before aggregation.

### Summary vs Time-Series Output

OPAL metrics queries can produce two different output types:

| Output Type | Pattern | Result | Use Case |
|-------------|---------|--------|----------|
| **Summary** | `options(bins: 1)` | One row per group | Totals, overall statistics |
| **Time-Series** | `5m`, `1h`, or default | Many rows per group | Trending, dashboards, charts |

**Summary pattern** - Single statistics across entire time range:
```opal
align options(bins: 1), rate:sum(m("metric"))
aggregate total:sum(rate), group_by(service_name)
```
Output: One row per service

**Time-series pattern** - Values over time buckets:
```opal
align 5m, rate:sum(m("metric"))
| aggregate total:sum(rate), group_by(service_name)
```
Output: Multiple rows per service (one per 5-minute bucket)

**CRITICAL Syntax Difference**:
- Summary (`bins: 1`): NO pipe `|` between align and aggregate
- Time-series (`5m`): YES pipe `|` between align and aggregate

## Discovery Workflow

**Step 1: Search for metrics**
```
discover_context("request count", result_type="metric")
discover_context("error", result_type="metric")
discover_context("cpu memory", result_type="metric")
```

**Step 2: Get detailed metric schema**
```
discover_context(metric_name="span_call_count_5m")
```

**Step 3: Verify metric type**
Look for: `Type: gauge` (or `counter`, `delta`)

**Step 4: Note available dimensions**
These are used for `group_by()`:
- `service_name`, `service_namespace`
- `environment`, `span_name`
- `k8s_namespace_name`, `k8s_pod_name`
- etc. (shown in discovery output)

**Step 5: Write query**
Use `align` + `m()` + `aggregate` pattern with correct dimensions

## Basic Patterns

### Pattern 1: Total Count Across Time Range

Get overall totals (summary output):

```opal
align options(bins: 1), rate:sum(m("span_call_count_5m"))
aggregate total_requests:sum(rate)
```

**Output**: Single row with total count across entire time range.

**No group_by**: Aggregates everything together.

### Pattern 2: Totals Per Group

Get totals broken down by dimension:

```opal
align options(bins: 1), rate:sum(m("span_call_count_5m"))
aggregate total_requests:sum(rate), group_by(service_name)
```

**Output**: One row per service with total requests.

**group_by**: Use any dimension from metric schema.

### Pattern 3: Average Values Per Group

Calculate averages across time range:

```opal
align options(bins: 1), cpu:avg(m("system_cpu_utilization_ratio"))
aggregate avg_cpu:avg(cpu), group_by(service_name)
```

**Output**: Average CPU utilization per service.

**avg() function**: Used twice - once in align, once in aggregate.

### Pattern 4: Multiple Aggregations

Compute several statistics together:

```opal
align options(bins: 1), rate:sum(m("span_call_count_5m"))
aggregate total:sum(rate),
          average:avg(rate),
          maximum:max(rate),
          group_by(service_name)
```

**Output**: Multiple columns per service (total, average, maximum).

### Pattern 5: Time-Series for Trending

Track values over time buckets:

```opal
align 5m, rate:sum(m("span_call_count_5m"))
| aggregate requests_per_5min:sum(rate), group_by(service_name)
```

**Output**: Multiple rows per service (one per 5-minute interval).

**Note**: Pipe `|` required after align for time-series pattern.

**Output columns**:
- `_c_bucket` - Time bucket identifier
- `valid_from`, `valid_to` - Bucket boundaries
- Metric values

## Common Use Cases

### Counting Total Requests by Service

```opal
align options(bins: 1), rate:sum(m("span_call_count_5m"))
aggregate total_requests:sum(rate), group_by(service_name)
| sort desc(total_requests)
| limit 10
```

**Use case**: Identify top services by request volume.

### Counting Errors with Fill for Zero Values

```opal
align options(bins: 1), errors:sum(m("span_error_count_5m"))
aggregate total_errors:sum(errors), group_by(service_name)
fill total_errors:0
```

**Use case**: Show all services, even those with zero errors.

**fill verb**: Replaces null values with 0.

### Tracking Request Rate Over Time

```opal
align 1h, rate:sum(m("span_call_count_5m"))
| aggregate requests_per_hour:sum(rate), group_by(service_name)
```

**Use case**: Hourly request trends for dashboards.

**Output**: Time-series data for charting.

### Multiple Metrics in One Query

```opal
align options(bins: 1),
      requests:sum(m("span_call_count_5m")),
      errors:sum(m("span_error_count_5m"))
aggregate total_requests:sum(requests),
          total_errors:sum(errors),
          group_by(service_name)
| make_col error_rate:float64(total_errors) / float64(total_requests)
```

**Use case**: Calculate error rate from two metrics.

**make_col**: Add derived column after aggregation.

### Resource Utilization Averages

```opal
align options(bins: 1), cpu:avg(m("system_cpu_utilization_ratio"))
aggregate avg_cpu:avg(cpu),
          max_cpu:max(cpu),
          group_by(k8s_pod_name)
| sort desc(avg_cpu)
| limit 20
```

**Use case**: Find pods with highest CPU usage.

## Complete Example

**Scenario**: You want to analyze request and error rates for your microservices over the last 24 hours.

**Step 1: Discover available metrics**
```
discover_context("request error", result_type="metric")
```

Found metrics:
- `span_call_count_5m` (type: gauge)
- `span_error_count_5m` (type: gauge)

**Step 2: Get metric details**
```
discover_context(metric_name="span_call_count_5m")
```

Available dimensions: `service_name`, `service_namespace`, `environment`, `span_name`

**Step 3: Query for summary statistics**
```opal
align options(bins: 1),
      requests:sum(m("span_call_count_5m")),
      errors:sum(m("span_error_count_5m"))
aggregate total_requests:sum(requests),
          total_errors:sum(errors),
          group_by(service_name)
fill total_errors:0
| make_col error_rate:float64(total_errors) / float64(total_requests) * 100.0
| sort desc(total_requests)
```

**Step 4: Interpret results**

| service_name | total_requests | total_errors | error_rate |
|--------------|----------------|--------------|------------|
| frontend-proxy | 15660 | 0 | 0.0 |
| frontend | 15263 | 35 | 0.23 |
| featureflagservice | 11693 | 0 | 0.0 |
| productcatalogservice | 8813 | 0 | 0.0 |

**Insight**: Frontend has a 0.23% error rate - investigate errors.

**Step 5: Get hourly trends**
```opal
align 1h,
      requests:sum(m("span_call_count_5m")),
      errors:sum(m("span_error_count_5m"))
| aggregate requests_per_hour:sum(requests),
            errors_per_hour:sum(errors),
            group_by(service_name)
| filter service_name = "frontend"
```

**Output**: Time-series showing frontend requests and errors per hour.

## Common Pitfalls

### Pitfall 1: Forgetting align Verb

❌ **Wrong**:
```opal
m("span_call_count_5m")
| statsby total:sum(metric)
```

✅ **Correct**:
```opal
align options(bins: 1), rate:sum(m("span_call_count_5m"))
aggregate total:sum(rate)
```

**Why**: Metrics MUST use `align` verb - it's required, not optional.

### Pitfall 2: Wrong Pipe Usage

❌ **Wrong** (pipe with bins:1):
```opal
align options(bins: 1), rate:sum(m("metric"))
| aggregate total:sum(rate)
```

❌ **Wrong** (no pipe with time duration):
```opal
align 5m, rate:sum(m("metric"))
aggregate total:sum(rate)
```

✅ **Correct**:
```opal
# Summary - NO pipe
align options(bins: 1), rate:sum(m("metric"))
aggregate total:sum(rate)

# Time-series - YES pipe
align 5m, rate:sum(m("metric"))
| aggregate total:sum(rate)
```

**Why**: Syntax differs between summary and time-series patterns.

### Pitfall 3: Grouping by Non-Existent Dimension

❌ **Wrong**:
```opal
align options(bins: 1), rate:sum(m("metric"))
aggregate total:sum(rate), group_by(service_name)
```
Error: "field 'service_name' does not exist"

✅ **Correct**:
```opal
# First: discover_context(metric_name="metric") to see available dimensions
# Then: use only dimensions that exist
align options(bins: 1), rate:sum(m("metric"))
aggregate total:sum(rate), group_by(correct_dimension_name)
```

**Why**: Not all metrics have the same dimensions - always check first.

### Pitfall 4: Using statsby Instead of aggregate

❌ **Wrong**:
```opal
align options(bins: 1), rate:sum(m("metric"))
statsby total:sum(rate)
```

✅ **Correct**:
```opal
align options(bins: 1), rate:sum(m("metric"))
aggregate total:sum(rate)
```

**Why**: After `align`, use `aggregate` (not `statsby` which is for datasets).

## Aggregation Functions Reference

Common functions used with gauge metrics:

```opal
# Summing values
align options(bins: 1), metric:sum(m("metric_name"))
aggregate total:sum(metric)

# Averaging values
align options(bins: 1), metric:avg(m("metric_name"))
aggregate average:avg(metric)

# Maximum value
align options(bins: 1), metric:max(m("metric_name"))
aggregate maximum:max(metric)

# Minimum value
align options(bins: 1), metric:min(m("metric_name"))
aggregate minimum:min(metric)

# Count of samples
align options(bins: 1), metric:count(m("metric_name"))
aggregate sample_count:count(metric)
```

**Pattern**: Function used in both `align` and `aggregate`.

## Time Bucket Options

Common time durations for time-series queries:

```opal
align 1m, ...    # 1-minute buckets
align 5m, ...    # 5-minute buckets (common)
align 15m, ...   # 15-minute buckets
align 1h, ...    # 1-hour buckets
align 1d, ...    # 1-day buckets
```

**Default**: `align` without duration uses automatic binning (300 bins).

## Best Practices

1. **Always use discover_context() first** to find metrics and check dimensions
2. **Verify metric type** - this skill is for gauge/counter/delta (NOT tdigest)
3. **Use summary pattern** (`bins: 1`) for single statistics, reports, totals
4. **Use time-series pattern** (`5m`, `1h`) for dashboards, trending, charts
5. **Remember pipe rule**: bins:1 = no pipe, time duration = yes pipe
6. **Use fill** to replace nulls with zeros for complete results
7. **Add sort + limit** for top-N queries to avoid overwhelming output
8. **Check available dimensions** before using group_by

## Related Skills

- **analyzing-tdigest-metrics** - For percentile metrics (latency, duration p95/p99)
- **time-series-analysis** - For event/interval trending with timechart (different from metrics)
- **aggregating-event-datasets** - For aggregating raw events with statsby (different from metrics)
- **working-with-intervals** - For calculating durations from raw interval data

## Summary

Gauge metrics are pre-aggregated measurements that **require** the `align` verb:

- **Core pattern**: `align` + `m()` + `aggregate`
- **Metric types**: gauge, counter, delta (NOT tdigest)
- **Two output modes**:
  - Summary: `options(bins: 1)` → one row per group, NO pipe
  - Time-series: `5m`, `1h` → many rows per group, YES pipe
- **Common functions**: sum, avg, max, min, count
- **Discovery**: Use `discover_context()` to find metrics and dimensions

**Key distinction**: Metrics are pre-aggregated (use `align`), while Events/Intervals are raw data (use `statsby`/`timechart`).

---
**Last Updated**: November 14, 2025
**Version**: 1.0
**Tested With**: Observe OPAL (ServiceExplorer/Service Metrics)

Related Skills

application-metrics

from diegosouzapw/awesome-omni-skill

Guide for instrumenting applications with metrics. Use when adding observability, monitoring, metrics, counters, gauges, or instrumentation to code. Covers API endpoints, databases, queues, caching, and locks.

agile-metrics

from diegosouzapw/awesome-omni-skill

Master agile metrics with velocity, burn-down charts, cycle time, and team health indicators for data-driven improvement.

sentry-setup-metrics

from diegosouzapw/awesome-omni-skill

Setup Sentry Metrics in any project. Use this when asked to add Sentry metrics, track custom metrics, setup counters/gauges/distributions, or instrument application performance metrics. Supports JavaScript, TypeScript, Python, React, Next.js, and Node.js.

x-metrics-tracker

from diegosouzapw/awesome-omni-skill

Track X account metrics (followers, posts, following) with daily snapshots via browser automation. Use when you need to monitor X profile growth, capture daily analytics, or view historical follower/post trends.

analytics-metrics

from diegosouzapw/awesome-omni-skill

Build data visualization and analytics dashboards. Use when creating charts, KPI displays, metrics dashboards, or data visualization components. Triggers on analytics, dashboard, charts, metrics, KPI, data visualization, Recharts.

aggregating-event-datasets

from diegosouzapw/awesome-omni-skill

Aggregate and summarize event datasets (logs) using OPAL statsby. Use when you need to count, sum, or calculate statistics across log events. Covers make_col for derived columns, statsby for aggregation, group_by for grouping, aggregation functions (count, sum, avg, percentile), and topk for top N results. Returns single summary row per group across entire time range. For time-series trends, see time-series-analysis skill.

analyze-move-risk-gauges-leadlag

from diegosouzapw/awesome-omni-skill

用公開市場數據檢查「利率波動率（MOVE）是否對利率事件（如 JGB 殖利率變動）不恐慌，並且是否領先帶動 VIX / 信用利差走低」。

startup-metrics-framework

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks about \\\"key startup metrics", "SaaS metrics", "CAC and LTV", "unit economics", "burn multiple", "rule of 40", "marketplace metrics", or requests...

solo-metrics-track

from diegosouzapw/awesome-omni-skill

Set up PostHog metrics plan with event funnel, KPI benchmarks, and kill/iterate/scale decision thresholds. Use when user says "set up metrics", "track KPIs", "PostHog events", "funnel analysis", "when to kill or scale", or "success metrics". Do NOT use for SEO metrics (use /seo-audit).

analyzing-tdigest-metrics

from diegosouzapw/awesome-omni-skill

Analyze percentile metrics (tdigest type) using OPAL for latency analysis and SLO tracking. Use when calculating p50, p95, p99 from pre-aggregated duration or latency metrics. Covers the critical double-combine pattern with align + m_tdigest() + tdigest_combine + aggregate. For simple metrics (counts, averages), see aggregating-gauge-metrics skill.

aggregating-performance-metrics

from diegosouzapw/awesome-omni-skill

Aggregate and centralize performance metrics from applications, systems, databases, caches, and services. Use when consolidating monitoring data from multiple sources. Trigger with phrases like "aggregate metrics", "centralize monitoring", or "collect performance data".

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development