notion-observability
Set up observability for Notion integrations with metrics, traces, and alerts. Use when implementing monitoring for Notion API calls, setting up dashboards, or configuring alerting for Notion integration health. Trigger with phrases like "notion monitoring", "notion metrics", "notion observability", "monitor notion", "notion alerts", "notion tracing".
Best use case
notion-observability is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Set up observability for Notion integrations with metrics, traces, and alerts. Use when implementing monitoring for Notion API calls, setting up dashboards, or configuring alerting for Notion integration health. Trigger with phrases like "notion monitoring", "notion metrics", "notion observability", "monitor notion", "notion alerts", "notion tracing".
Teams using notion-observability should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/notion-observability/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How notion-observability Compares
| Feature / Agent | notion-observability | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Set up observability for Notion integrations with metrics, traces, and alerts. Use when implementing monitoring for Notion API calls, setting up dashboards, or configuring alerting for Notion integration health. Trigger with phrases like "notion monitoring", "notion metrics", "notion observability", "monitor notion", "notion alerts", "notion tracing".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# Notion Observability
## Overview
Instrument Notion API calls with metrics, structured logging, and alerting. Track request rates, latencies, error rates, and rate limit headroom. This skill covers a full observability stack: an instrumented client wrapper, Prometheus metrics with histogram buckets tuned for Notion's typical 200-800ms latency, structured logging via pino, health check endpoints, and Prometheus alerting rules for error rate spikes, rate limit exhaustion, high latency, and service outages.
## Prerequisites
- `@notionhq/client` v2+ installed (`npm install @notionhq/client`)
- Python alternative: `notion-client` (`pip install notion-client`)
- Prometheus-compatible metrics backend (optional: Grafana, Datadog, or CloudWatch)
- Structured logging library: `pino` (Node.js) or `structlog` (Python)
## Instructions
### Step 1: Instrumented Notion Client Wrapper
Wrap every Notion API call with timing, error classification, and structured logging:
```typescript
import { Client, isNotionClientError, APIErrorCode } from '@notionhq/client';
interface NotionMetrics {
requestCount: number;
errorCount: number;
rateLimitCount: number;
totalLatencyMs: number;
latencyBuckets: Map<string, number[]>;
lastError: { code: string; message: string; timestamp: string } | null;
}
class InstrumentedNotionClient {
private client: Client;
private metrics: NotionMetrics = {
requestCount: 0,
errorCount: 0,
rateLimitCount: 0,
totalLatencyMs: 0,
latencyBuckets: new Map(),
lastError: null,
};
constructor(auth: string, timeoutMs = 30_000) {
this.client = new Client({ auth, timeoutMs });
}
async call<T>(operation: string, fn: (client: Client) => Promise<T>): Promise<T> {
const start = performance.now();
this.metrics.requestCount++;
try {
const result = await fn(this.client);
const durationMs = Math.round(performance.now() - start);
this.metrics.totalLatencyMs += durationMs;
this.recordLatency(operation, durationMs);
console.log(JSON.stringify({
level: 'info',
service: 'notion',
operation,
durationMs,
status: 'ok',
timestamp: new Date().toISOString(),
}));
return result;
} catch (error) {
const durationMs = Math.round(performance.now() - start);
this.metrics.totalLatencyMs += durationMs;
this.metrics.errorCount++;
this.recordLatency(operation, durationMs);
let errorInfo: { code: string; message: string; status: number };
if (isNotionClientError(error)) {
errorInfo = { code: error.code, message: error.message, status: error.status };
if (error.code === APIErrorCode.RateLimited) {
this.metrics.rateLimitCount++;
}
} else {
errorInfo = { code: 'unknown', message: String(error), status: 0 };
}
this.metrics.lastError = {
code: errorInfo.code,
message: errorInfo.message,
timestamp: new Date().toISOString(),
};
console.log(JSON.stringify({
level: 'error',
service: 'notion',
operation,
durationMs,
status: 'error',
errorCode: errorInfo.code,
httpStatus: errorInfo.status,
message: errorInfo.message,
timestamp: new Date().toISOString(),
}));
throw error;
}
}
private recordLatency(operation: string, durationMs: number) {
const existing = this.metrics.latencyBuckets.get(operation) || [];
existing.push(durationMs);
this.metrics.latencyBuckets.set(operation, existing);
}
getMetrics(): NotionMetrics & { avgLatencyMs: number; p95LatencyMs: number } {
const allLatencies = Array.from(this.metrics.latencyBuckets.values()).flat().sort((a, b) => a - b);
const p95Index = Math.floor(allLatencies.length * 0.95);
return {
...this.metrics,
avgLatencyMs: this.metrics.requestCount > 0
? Math.round(this.metrics.totalLatencyMs / this.metrics.requestCount)
: 0,
p95LatencyMs: allLatencies[p95Index] ?? 0,
};
}
}
// Usage
const notion = new InstrumentedNotionClient(process.env.NOTION_TOKEN!);
const pages = await notion.call('databases.query', (client) =>
client.databases.query({ database_id: dbId, page_size: 50 })
);
const user = await notion.call('users.me', (client) =>
client.users.me({})
);
```
**Python — instrumented wrapper:**
```python
import time
import json
import logging
from notion_client import Client, APIResponseError
logger = logging.getLogger("notion")
class InstrumentedNotion:
def __init__(self, token: str):
self.client = Client(auth=token, timeout_ms=30_000)
self.request_count = 0
self.error_count = 0
self.rate_limit_count = 0
self.total_latency_ms = 0.0
def call(self, operation: str, fn):
start = time.monotonic()
self.request_count += 1
try:
result = fn(self.client)
duration_ms = round((time.monotonic() - start) * 1000)
self.total_latency_ms += duration_ms
logger.info(json.dumps({
"service": "notion", "operation": operation,
"duration_ms": duration_ms, "status": "ok",
}))
return result
except APIResponseError as e:
duration_ms = round((time.monotonic() - start) * 1000)
self.total_latency_ms += duration_ms
self.error_count += 1
if e.status == 429:
self.rate_limit_count += 1
logger.error(json.dumps({
"service": "notion", "operation": operation,
"duration_ms": duration_ms, "status": "error",
"error_code": e.code, "http_status": e.status,
}))
raise
# Usage
notion = InstrumentedNotion(os.environ["NOTION_TOKEN"])
pages = notion.call("databases.query",
lambda c: c.databases.query(database_id=db_id, page_size=50))
```
### Step 2: Prometheus Metrics Export
```typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
const registry = new Registry();
const notionRequests = new Counter({
name: 'notion_requests_total',
help: 'Total Notion API requests',
labelNames: ['operation', 'status'],
registers: [registry],
});
const notionDuration = new Histogram({
name: 'notion_request_duration_seconds',
help: 'Notion API request latency in seconds',
labelNames: ['operation'],
// Buckets tuned for Notion's typical 200-800ms response times
buckets: [0.1, 0.25, 0.5, 0.8, 1, 2, 5, 10],
registers: [registry],
});
const notionErrors = new Counter({
name: 'notion_errors_total',
help: 'Notion API errors by error code',
labelNames: ['code'],
registers: [registry],
});
const notionRateLimitRemaining = new Gauge({
name: 'notion_rate_limit_remaining',
help: 'Estimated remaining rate limit headroom',
registers: [registry],
});
// Wrap every Notion call with Prometheus instrumentation
async function instrumentedCall<T>(
operation: string,
fn: () => Promise<T>
): Promise<T> {
const timer = notionDuration.startTimer({ operation });
try {
const result = await fn();
notionRequests.inc({ operation, status: 'success' });
return result;
} catch (error) {
notionRequests.inc({ operation, status: 'error' });
if (isNotionClientError(error)) {
notionErrors.inc({ code: error.code });
}
throw error;
} finally {
timer();
}
}
// Expose /metrics endpoint for Prometheus scraping
app.get('/metrics', async (_req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});
```
### Step 3: Health Check, Structured Logging, and Alerting
**Health check endpoint:**
```typescript
app.get('/health/notion', async (_req, res) => {
const checks: Record<string, any> = {};
// Test Notion API connectivity
const start = Date.now();
try {
const me = await notion.call('health.users.me', (c) => c.users.me({}));
checks.notion = {
status: 'connected',
latencyMs: Date.now() - start,
botName: me.name,
};
} catch (error) {
checks.notion = {
status: 'disconnected',
latencyMs: Date.now() - start,
error: isNotionClientError(error) ? error.code : 'unknown',
};
}
const healthy = checks.notion.status === 'connected';
res.status(healthy ? 200 : 503).json({
status: healthy ? 'healthy' : 'degraded',
checks,
metrics: notion.getMetrics(),
timestamp: new Date().toISOString(),
});
});
```
**Structured logging with pino:**
```typescript
import pino from 'pino';
const logger = pino({
name: 'notion-integration',
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
});
function logNotionCall(
operation: string,
durationMs: number,
result: 'ok' | 'error',
details?: Record<string, unknown>
) {
const entry = {
service: 'notion',
operation,
durationMs,
result,
...details,
};
if (result === 'error') {
logger.error(entry, `notion.${operation} failed (${durationMs}ms)`);
} else if (durationMs > 2000) {
logger.warn(entry, `notion.${operation} slow (${durationMs}ms)`);
} else {
logger.info(entry, `notion.${operation} ok (${durationMs}ms)`);
}
}
function logRateLimit(operation: string, retryAfterMs: number) {
logger.warn({
service: 'notion',
event: 'rate_limited',
operation,
retryAfterMs,
}, `Rate limited on ${operation}. Retry in ${retryAfterMs}ms`);
}
```
**Prometheus alerting rules:**
```yaml
groups:
- name: notion_alerts
rules:
- alert: NotionHighErrorRate
expr: >
rate(notion_errors_total[5m]) /
rate(notion_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "Notion API error rate exceeds 5%"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: NotionRateLimited
expr: increase(notion_errors_total{code="rate_limited"}[5m]) > 10
for: 1m
labels:
severity: warning
annotations:
summary: "Notion rate limit hits increasing"
- alert: NotionHighLatency
expr: >
histogram_quantile(0.95,
rate(notion_request_duration_seconds_bucket[5m])) > 3
for: 5m
labels:
severity: warning
annotations:
summary: "Notion P95 latency exceeds 3 seconds"
- alert: NotionDown
expr: increase(notion_errors_total{code="service_unavailable"}[5m]) > 5
for: 2m
labels:
severity: critical
annotations:
summary: "Notion API appears down (repeated 503 errors)"
```
## Output
- Instrumented Notion client tracking all API calls with per-operation latency buckets
- Prometheus metrics for request rate, latency histograms, and error counters
- Structured JSON logging via pino with slow-query warnings (>2s)
- Health check endpoint with Notion connectivity status and aggregate metrics
- Alerting rules for error rate spikes, rate limiting, high latency, and outages
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| High cardinality metrics | Too many unique label values | Use fixed operation names (`databases.query`, `pages.create`) |
| Alert storms on Notion outage | All alerts fire simultaneously | Add `group_wait: 30s` in alertmanager config |
| Missing metrics for some calls | Not all API calls use wrapper | Enforce wrapper at architecture level |
| Log volume too high in prod | DEBUG level enabled | Set `LOG_LEVEL=info` or `warn` in production |
| P95 latency unreliable | Too few samples | Ensure minimum 100 requests in window |
| Rate limit counter never fires | Wrong error code check | Use `APIErrorCode.RateLimited` constant |
## Examples
### Quick Metrics Dashboard Query (PromQL)
```promql
# Request rate by operation
rate(notion_requests_total[5m])
# Error percentage
100 * rate(notion_errors_total[5m]) / rate(notion_requests_total[5m])
# P95 latency per operation
histogram_quantile(0.95, rate(notion_request_duration_seconds_bucket[5m]))
# Rate limit events in last hour
increase(notion_errors_total{code="rate_limited"}[1h])
```
### Inline Metrics Check (No Prometheus)
```typescript
// Quick console-based metrics for debugging
setInterval(() => {
const m = notion.getMetrics();
console.log(
`[Notion] requests=${m.requestCount} errors=${m.errorCount} ` +
`rate_limits=${m.rateLimitCount} avg_latency=${m.avgLatencyMs}ms ` +
`p95_latency=${m.p95LatencyMs}ms`
);
}, 60_000); // Log every minute
```
## Resources
- [Notion Request Limits](https://developers.notion.com/reference/request-limits) — 3 requests/second average
- [Notion Error Codes](https://developers.notion.com/reference/errors) — full error code reference
- [Prometheus Naming Best Practices](https://prometheus.io/docs/practices/naming/)
- [pino Logger](https://getpino.io/) — fast structured logging for Node.js
- [Grafana Dashboard Templates](https://grafana.com/grafana/dashboards/) — pre-built API monitoring dashboards
## Next Steps
For incident response procedures when monitoring detects failures, see `notion-incident-runbook`.Related Skills
windsurf-observability
Monitor Windsurf AI adoption, feature usage, and team productivity metrics. Use when tracking AI feature usage, measuring ROI, setting up dashboards, or analyzing Cascade effectiveness across your team. Trigger with phrases like "windsurf monitoring", "windsurf metrics", "windsurf analytics", "windsurf usage", "windsurf adoption".
webflow-observability
Set up observability for Webflow integrations — Prometheus metrics for API calls, OpenTelemetry tracing, structured logging with pino, Grafana dashboards, and alerting for rate limits, errors, and latency. Trigger with phrases like "webflow monitoring", "webflow metrics", "webflow observability", "monitor webflow", "webflow alerts", "webflow tracing".
vercel-observability
Set up Vercel observability with runtime logs, analytics, log drains, and OpenTelemetry tracing. Use when implementing monitoring for Vercel deployments, setting up log drains, or configuring alerting for function errors and performance. Trigger with phrases like "vercel monitoring", "vercel metrics", "vercel observability", "vercel logs", "vercel alerts", "vercel tracing".
veeva-observability
Veeva Vault observability for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva observability".
vastai-observability
Monitor Vast.ai GPU instance health, utilization, and costs. Use when setting up monitoring dashboards, configuring alerts, or tracking GPU utilization and spending. Trigger with phrases like "vastai monitoring", "vastai metrics", "vastai observability", "monitor vastai", "vastai alerts".
twinmind-observability
Monitor TwinMind transcription quality, meeting coverage, action item extraction rates, and memory vault health. Use when implementing observability, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind observability", "twinmind observability".
speak-observability
Monitor Speak API health, assessment latency, session metrics, and pronunciation score distributions. Use when implementing observability, or managing Speak language learning platform operations. Trigger with phrases like "speak observability", "speak observability".
snowflake-observability
Set up Snowflake observability using ACCOUNT_USAGE views, alerts, and external monitoring. Use when implementing Snowflake monitoring dashboards, setting up query performance tracking, or configuring alerting for warehouse and pipeline health. Trigger with phrases like "snowflake monitoring", "snowflake metrics", "snowflake observability", "snowflake dashboard", "snowflake alerts".
shopify-observability
Set up observability for Shopify app integrations with query cost tracking, rate limit monitoring, webhook delivery metrics, and structured logging. Trigger with phrases like "shopify monitoring", "shopify metrics", "shopify observability", "monitor shopify API", "shopify alerts", "shopify dashboard".
salesforce-observability
Set up observability for Salesforce integrations with API limit monitoring, error tracking, and alerting. Use when implementing monitoring for Salesforce operations, tracking API consumption, or configuring alerting for Salesforce integration health. Trigger with phrases like "salesforce monitoring", "salesforce metrics", "salesforce observability", "monitor salesforce", "salesforce alerts", "salesforce API usage dashboard".
retellai-observability
Retell AI observability — AI voice agent and phone call automation. Use when working with Retell AI for voice agents, phone calls, or telephony. Trigger with phrases like "retell observability", "retellai-observability", "voice agent".
replit-observability
Monitor Replit deployments with health checks, uptime tracking, resource usage, and alerting. Use when setting up monitoring for Replit apps, building health dashboards, or configuring alerting for deployment health and performance. Trigger with phrases like "replit monitoring", "replit metrics", "replit observability", "monitor replit", "replit alerts", "replit uptime".