snowflake-observability

Set up Snowflake observability using ACCOUNT_USAGE views, alerts, and external monitoring. Use when implementing Snowflake monitoring dashboards, setting up query performance tracking, or configuring alerting for warehouse and pipeline health. Trigger with phrases like "snowflake monitoring", "snowflake metrics", "snowflake observability", "snowflake dashboard", "snowflake alerts".

1,868 stars

Best use case

snowflake-observability is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Set up Snowflake observability using ACCOUNT_USAGE views, alerts, and external monitoring. Use when implementing Snowflake monitoring dashboards, setting up query performance tracking, or configuring alerting for warehouse and pipeline health. Trigger with phrases like "snowflake monitoring", "snowflake metrics", "snowflake observability", "snowflake dashboard", "snowflake alerts".

Teams using snowflake-observability should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/snowflake-observability/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/snowflake-pack/skills/snowflake-observability/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/snowflake-observability/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How snowflake-observability Compares

Feature / Agentsnowflake-observabilityStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Set up Snowflake observability using ACCOUNT_USAGE views, alerts, and external monitoring. Use when implementing Snowflake monitoring dashboards, setting up query performance tracking, or configuring alerting for warehouse and pipeline health. Trigger with phrases like "snowflake monitoring", "snowflake metrics", "snowflake observability", "snowflake dashboard", "snowflake alerts".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Snowflake Observability

## Overview

Set up comprehensive observability for Snowflake using built-in ACCOUNT_USAGE views, Snowflake Alerts, and integration with external monitoring systems.

## Prerequisites

- Role with access to `SNOWFLAKE.ACCOUNT_USAGE` (ACCOUNTADMIN or granted)
- Notification integration configured for alerts
- Optional: Prometheus/Grafana or Datadog for external dashboards

## Instructions

### Step 1: Key Monitoring Queries

```sql
-- === QUERY PERFORMANCE ===
-- Average query time by warehouse (last 7 days)
SELECT warehouse_name,
       COUNT(*) AS query_count,
       ROUND(AVG(total_elapsed_time) / 1000, 1) AS avg_seconds,
       ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_elapsed_time) / 1000, 1) AS p95_seconds,
       ROUND(PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY total_elapsed_time) / 1000, 1) AS p99_seconds
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(days, -7, CURRENT_TIMESTAMP())
  AND execution_status = 'SUCCESS'
  AND query_type = 'SELECT'
GROUP BY warehouse_name
ORDER BY avg_seconds DESC;

-- === ERROR RATE ===
-- Error rate by hour
SELECT DATE_TRUNC('hour', start_time) AS hour,
       COUNT_IF(execution_status = 'SUCCESS') AS success,
       COUNT_IF(execution_status = 'FAIL') AS failures,
       ROUND(COUNT_IF(execution_status = 'FAIL') * 100.0 /
             NULLIF(COUNT(*), 0), 2) AS error_rate_pct
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(hours, -24, CURRENT_TIMESTAMP())
GROUP BY hour
ORDER BY hour;

-- === CREDIT CONSUMPTION ===
-- Hourly credit usage
SELECT DATE_TRUNC('hour', start_time) AS hour,
       warehouse_name,
       SUM(credits_used) AS credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(hours, -24, CURRENT_TIMESTAMP())
GROUP BY hour, warehouse_name
ORDER BY hour DESC, credits DESC;

-- === STORAGE GROWTH ===
-- Daily storage trend
SELECT usage_date,
       ROUND(storage_bytes / 1e12, 3) AS storage_tb,
       ROUND(stage_bytes / 1e12, 3) AS stage_tb,
       ROUND(failsafe_bytes / 1e12, 3) AS failsafe_tb
FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE
WHERE usage_date >= DATEADD(days, -30, CURRENT_DATE())
ORDER BY usage_date;
```

### Step 2: Built-in Snowflake Alerts

```sql
-- Alert: High error rate
CREATE OR REPLACE ALERT high_error_rate_alert
  WAREHOUSE = ANALYTICS_WH
  SCHEDULE = '15 MINUTE'
  IF (EXISTS (
    SELECT 1
    FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
    WHERE start_time >= DATEADD(minutes, -15, CURRENT_TIMESTAMP())
    GROUP BY ALL
    HAVING COUNT_IF(execution_status = 'FAIL') * 100.0 / COUNT(*) > 5
  ))
  THEN
    CALL SYSTEM$SEND_EMAIL(
      'ops_notifications',
      'oncall@company.com',
      'Snowflake: Error rate > 5%',
      'Query error rate exceeded 5% in the last 15 minutes.'
    );

-- Alert: Warehouse stuck running (no auto-suspend)
CREATE OR REPLACE ALERT warehouse_running_alert
  WAREHOUSE = ANALYTICS_WH
  SCHEDULE = '60 MINUTE'
  IF (EXISTS (
    SELECT 1 FROM INFORMATION_SCHEMA.WAREHOUSES
    WHERE state = 'STARTED'
      AND DATEDIFF('hour', COALESCE(resumed_on, created_on), CURRENT_TIMESTAMP()) > 4
  ))
  THEN
    CALL SYSTEM$SEND_EMAIL(
      'ops_notifications',
      'ops@company.com',
      'Snowflake: Warehouse running > 4 hours',
      'A warehouse has been running for over 4 hours. Check auto-suspend settings.'
    );

-- Alert: Task failures
CREATE OR REPLACE ALERT task_failure_alert
  WAREHOUSE = ANALYTICS_WH
  SCHEDULE = '10 MINUTE'
  IF (EXISTS (
    SELECT 1
    FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
      SCHEDULED_TIME_RANGE_START => DATEADD(minutes, -10, CURRENT_TIMESTAMP())
    ))
    WHERE state = 'FAILED'
  ))
  THEN
    CALL SYSTEM$SEND_EMAIL(
      'ops_notifications',
      'oncall@company.com',
      'Snowflake: Task Failure',
      'One or more Snowflake tasks failed. Check TASK_HISTORY for details.'
    );

-- Resume all alerts
ALTER ALERT high_error_rate_alert RESUME;
ALTER ALERT warehouse_running_alert RESUME;
ALTER ALERT task_failure_alert RESUME;
```

### Step 3: Export Metrics to External Systems

```typescript
// src/snowflake/metrics-exporter.ts
// Export Snowflake metrics to Prometheus/Datadog

interface SnowflakeMetrics {
  queryCount: number;
  errorRate: number;
  avgLatencyMs: number;
  p95LatencyMs: number;
  creditsUsed: number;
  activeWarehouses: number;
}

async function collectSnowflakeMetrics(
  conn: snowflake.Connection
): Promise<SnowflakeMetrics> {
  const [queryStats] = await query(conn, `
    SELECT
      COUNT(*) AS query_count,
      COUNT_IF(execution_status = 'FAIL') * 100.0 / NULLIF(COUNT(*), 0) AS error_rate,
      AVG(total_elapsed_time) AS avg_latency,
      PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_elapsed_time) AS p95_latency
    FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
    WHERE start_time >= DATEADD(minutes, -5, CURRENT_TIMESTAMP())
  `).then(r => r.rows);

  const [creditStats] = await query(conn, `
    SELECT COALESCE(SUM(credits_used), 0) AS credits
    FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
    WHERE start_time >= CURRENT_DATE()
  `).then(r => r.rows);

  const [whStats] = await query(conn, `
    SELECT COUNT_IF(state = 'STARTED') AS active
    FROM INFORMATION_SCHEMA.WAREHOUSES
  `).then(r => r.rows);

  return {
    queryCount: queryStats.QUERY_COUNT,
    errorRate: queryStats.ERROR_RATE,
    avgLatencyMs: queryStats.AVG_LATENCY,
    p95LatencyMs: queryStats.P95_LATENCY,
    creditsUsed: creditStats.CREDITS,
    activeWarehouses: whStats.ACTIVE,
  };
}

// Prometheus exposition format
function formatPrometheus(metrics: SnowflakeMetrics): string {
  return [
    `snowflake_queries_total ${metrics.queryCount}`,
    `snowflake_error_rate_percent ${metrics.errorRate}`,
    `snowflake_avg_latency_ms ${metrics.avgLatencyMs}`,
    `snowflake_p95_latency_ms ${metrics.p95LatencyMs}`,
    `snowflake_credits_used_today ${metrics.creditsUsed}`,
    `snowflake_active_warehouses ${metrics.activeWarehouses}`,
  ].join('\n');
}
```

### Step 4: Operational Dashboard Queries

```sql
-- Pipeline health dashboard
SELECT
  'Tasks' AS component,
  COUNT_IF(state = 'started') AS running,
  COUNT_IF(state = 'suspended') AS suspended,
  (SELECT COUNT_IF(state = 'FAILED')
   FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
     SCHEDULED_TIME_RANGE_START => DATEADD(hours, -24, CURRENT_TIMESTAMP())
   ))) AS failures_24h
FROM INFORMATION_SCHEMA.TASKS

UNION ALL

SELECT 'Pipes',
  COUNT_IF(is_autoingest_enabled = 'true'), 0,
  0  -- Check PIPE_USAGE_HISTORY for errors
FROM INFORMATION_SCHEMA.PIPES

UNION ALL

SELECT 'Streams',
  COUNT_IF(stale = FALSE),
  COUNT_IF(stale = TRUE), 0
FROM INFORMATION_SCHEMA.STREAMS;
```

## Error Handling

| Issue | Cause | Solution |
|-------|-------|----------|
| ACCOUNT_USAGE latency | Views have up to 45min lag | Use INFORMATION_SCHEMA for real-time data |
| Alert not firing | Alert suspended | `ALTER ALERT x RESUME` |
| Metrics gaps | Warehouse suspended | Only active warehouses report metrics |
| Email not delivered | Notification integration misconfigured | Check `ALLOWED_RECIPIENTS` |

## Resources

- [Account Usage Views](https://docs.snowflake.com/en/sql-reference/account-usage)
- [QUERY_HISTORY](https://docs.snowflake.com/en/sql-reference/account-usage/query_history)
- [WAREHOUSE_METERING_HISTORY](https://docs.snowflake.com/en/sql-reference/account-usage/warehouse_metering_history)
- [Snowflake Alerts](https://docs.snowflake.com/en/user-guide/alerts)

## Next Steps

For incident response, see `snowflake-incident-runbook`.

Related Skills

windsurf-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor Windsurf AI adoption, feature usage, and team productivity metrics. Use when tracking AI feature usage, measuring ROI, setting up dashboards, or analyzing Cascade effectiveness across your team. Trigger with phrases like "windsurf monitoring", "windsurf metrics", "windsurf analytics", "windsurf usage", "windsurf adoption".

webflow-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up observability for Webflow integrations — Prometheus metrics for API calls, OpenTelemetry tracing, structured logging with pino, Grafana dashboards, and alerting for rate limits, errors, and latency. Trigger with phrases like "webflow monitoring", "webflow metrics", "webflow observability", "monitor webflow", "webflow alerts", "webflow tracing".

vercel-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up Vercel observability with runtime logs, analytics, log drains, and OpenTelemetry tracing. Use when implementing monitoring for Vercel deployments, setting up log drains, or configuring alerting for function errors and performance. Trigger with phrases like "vercel monitoring", "vercel metrics", "vercel observability", "vercel logs", "vercel alerts", "vercel tracing".

veeva-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault observability for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva observability".

vastai-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor Vast.ai GPU instance health, utilization, and costs. Use when setting up monitoring dashboards, configuring alerts, or tracking GPU utilization and spending. Trigger with phrases like "vastai monitoring", "vastai metrics", "vastai observability", "monitor vastai", "vastai alerts".

twinmind-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor TwinMind transcription quality, meeting coverage, action item extraction rates, and memory vault health. Use when implementing observability, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind observability", "twinmind observability".

speak-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor Speak API health, assessment latency, session metrics, and pronunciation score distributions. Use when implementing observability, or managing Speak language learning platform operations. Trigger with phrases like "speak observability", "speak observability".

snowflake-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Upgrade Snowflake drivers, handle breaking changes, and migrate between editions. Use when upgrading snowflake-sdk or snowflake-connector-python versions, migrating between Snowflake editions, or handling deprecations. Trigger with phrases like "upgrade snowflake", "snowflake migration", "snowflake breaking changes", "update snowflake driver", "snowflake version".

snowflake-security-basics

1868
from jeremylongshore/claude-code-plugins-plus-skills

Apply Snowflake security best practices: network policies, key rotation, MFA, encryption, and least-privilege access. Use when securing Snowflake access, implementing network policies, or auditing security configuration. Trigger with phrases like "snowflake security", "snowflake network policy", "secure snowflake", "snowflake MFA", "snowflake encryption".

snowflake-sdk-patterns

1868
from jeremylongshore/claude-code-plugins-plus-skills

Apply production-ready Snowflake SDK patterns for snowflake-sdk and snowflake-connector-python. Use when implementing connection pooling, async execute wrappers, streaming results, or establishing team coding standards for Snowflake. Trigger with phrases like "snowflake SDK patterns", "snowflake best practices", "snowflake code patterns", "idiomatic snowflake", "snowflake connection pool".

snowflake-reliability-patterns

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement Snowflake reliability patterns: replication, failover, Time Travel recovery, and application-level resilience for Snowflake integrations. Use when building fault-tolerant pipelines, configuring disaster recovery, or adding resilience to production Snowflake services. Trigger with phrases like "snowflake reliability", "snowflake failover", "snowflake replication", "snowflake disaster recovery", "snowflake Time Travel".

snowflake-reference-architecture

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement Snowflake reference architecture with medallion pattern and Snowflake-native design. Use when designing a new Snowflake data platform, setting up bronze/silver/gold layers, or establishing architecture standards for a Snowflake deployment. Trigger with phrases like "snowflake architecture", "snowflake medallion", "snowflake best practices layout", "snowflake data platform design".