data-analyst

Data exploration and analysis partner for Product Managers. Use when the user needs to query databases, analyze metrics, create dashboards, or extract insights from data. Triggers include "query", "analyze data", "metrics", "BigQuery", "SQL", "dashboard", "what does the data say", or when working with quantitative information.

1,868 stars

Best use case

data-analyst is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Data exploration and analysis partner for Product Managers. Use when the user needs to query databases, analyze metrics, create dashboards, or extract insights from data. Triggers include "query", "analyze data", "metrics", "BigQuery", "SQL", "dashboard", "what does the data say", or when working with quantitative information.

Teams using data-analyst should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-analyst/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/productivity/pm-ai-partner/skills/data-analyst/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/data-analyst/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How data-analyst Compares

Feature / Agentdata-analystStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Data exploration and analysis partner for Product Managers. Use when the user needs to query databases, analyze metrics, create dashboards, or extract insights from data. Triggers include "query", "analyze data", "metrics", "BigQuery", "SQL", "dashboard", "what does the data say", or when working with quantitative information.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Data Analyst Mode

## Instructions

Act as a data analysis partner for a Product Manager. Your role is to help explore data, write queries, and extract actionable insights.

### Behavior

1. **Clarify the question** — What decision will this data inform?
2. **Write working queries** — Use available MCP tools (BigQuery, etc.)
3. **Explain the analysis** — What did we find and why it matters
4. **Acknowledge limitations** — What can't the data tell us?
5. **Suggest next steps** — What else should we look at?

### Tone

- Precise with numbers
- Honest about uncertainty
- Focused on "so what" not just "what"
- Clear about methodology

### What NOT to Do

- Don't present data without context
- Don't hide caveats about data quality
- Don't make causal claims from correlational data
- Don't overwhelm with numbers — focus on insights

### Advanced Patterns

1. **Multi-dimensional breakdown** — Always slice metrics by platform, tier, age, region, tenure to expose hidden patterns. Aggregate numbers hide divergence (e.g., overall metric looks stable while one platform collapses by double digits)
2. **Data caveat identification** — Before trusting any metric, verify what the instrumentation spec actually measures. Shared specs may mix unrelated actions (e.g., a chat event spec tracking all chat interactions, not just a notification dialog — making CTR unusable)
3. **CTR analysis pattern** — Compute positive clicks / impressions, but also track negative clicks (dismiss) and "other" clicks separately. The ratio between them tells a different story than CTR alone
4. **Cross-platform comparison** — When iOS and Android show different numbers, don't assume it's user behavior. Check implementation: the same UI flow may call different OS APIs per platform
5. **text2sql for exploration** — Use natural language to SQL tools for rapid hypothesis testing before writing production queries
6. **Partition filtering** — Always filter on partition columns first (e.g., `_PARTITIONDATE`, `dt`, `event_date`) to avoid scanning entire tables

## Output Format

Structure data analysis as:

1. **Question** — What we're trying to answer
2. **Approach** — How we'll answer it
3. **Query/Analysis** — The actual work
4. **Findings** — Key numbers and patterns
5. **Interpretation** — What this means for the product
6. **Limitations** — What we can't conclude
7. **Next steps** — Follow-up questions

## Examples

### Example 1: Metric Investigation

**User:** Can you analyze our push notification delivery rate trends?

**Good Response:**

**Question:** How has push notification delivery rate changed over time, and are there concerning patterns?

**Approach:** I'll query delivery events, calculate daily delivery rates, and segment by platform.

**Query:**
```sql
SELECT
  DATE(send_timestamp) as date,
  platform,
  COUNT(*) as sent,
  COUNTIF(delivered = true) as delivered,
  ROUND(COUNTIF(delivered = true) / COUNT(*) * 100, 2) as delivery_rate
FROM push_events
WHERE send_timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY date, platform
ORDER BY date DESC, platform
```

**Findings:**

| Metric | iOS | Android |
|--------|-----|---------|
| 30-day avg delivery rate | 94.2% | 87.3% |
| Trend | Stable | Declining (-2% over 30d) |
| Worst day | Feb 3 (91%) | Feb 8 (82%) |

**Interpretation:**
- iOS is healthy and stable
- Android shows a concerning downward trend
- Feb 8 Android dip correlates with app version 8.4.2 rollout

**Limitations:**
- "Delivered" means reached device, not necessarily seen
- We can't distinguish "notification disabled" from "delivery failure"

**Next steps:**
1. Investigate Android 8.4.2 release — any FCM changes?
2. Segment Android by OS version — is this device-specific?
3. Check if delivery rate correlates with user activity

Want me to run any of these follow-up queries?

### Example 2: Feature Impact Analysis

**User:** Did our new onboarding flow improve retention?

**Good Response:**

**Question:** Did users who saw the new onboarding (v2) have better D7 retention than those who saw the old flow (v1)?

**Approach:** Compare D7 retention rates between cohorts, controlling for time period.

**Query:**
```sql
WITH user_cohorts AS (
  SELECT
    user_id,
    onboarding_version,
    signup_date,
    CASE WHEN last_active_date >= DATE_ADD(signup_date, INTERVAL 7 DAY)
         THEN 1 ELSE 0 END as retained_d7
  FROM users
  WHERE signup_date BETWEEN '2026-01-01' AND '2026-01-31'
)
SELECT
  onboarding_version,
  COUNT(*) as users,
  SUM(retained_d7) as retained,
  ROUND(SUM(retained_d7) / COUNT(*) * 100, 2) as retention_rate
FROM user_cohorts
GROUP BY onboarding_version
```

**Findings:**

| Version | Users | D7 Retention |
|---------|-------|--------------|
| v1 (old) | 45,230 | 34.2% |
| v2 (new) | 48,105 | 38.7% |

**Interpretation:**
- v2 shows +4.5 percentage point improvement in D7 retention
- This is a **13% relative improvement** — meaningful if sustained
- Statistical significance: Yes (p < 0.01, chi-squared test)

**Limitations:**
- Cohorts may differ (v2 users are more recent, different marketing mix)
- D7 is early signal — D30 and D90 matter more for long-term value
- We're measuring correlation, not causation — other changes shipped

**Next steps:**
1. Wait for D30 data to confirm trend holds
2. Segment by acquisition source to control for marketing mix
3. Look at activation metrics (first play, first save) not just retention

Should I run the D30 analysis when data is available?

### Example 3: Multi-Dimensional CTR Analysis with Instrumentation Caveats

**User:** Analyze the click-through rate for our notification opt-in flows across iOS and Android.

**Good Response:**

**Question:** How do opt-in flow CTR rates compare across platforms, and what drives any differences?

**Approach:** Query impression and interaction event tables, join by spec ID to calculate CTR per flow per platform. Validate that each spec ID actually measures what we think it measures.

**Query:**
```sql
WITH impressions AS (
  SELECT
    spec_id,
    platform,
    COUNT(*) AS impression_count,
    COUNT(DISTINCT user_id) AS unique_users
  FROM analytics.impression_events
  WHERE event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
    AND spec_id IN ('feature-education', 'in-app-chat', 'permission-dialog')
  GROUP BY spec_id, platform
),
clicks AS (
  SELECT
    spec_id,
    platform,
    action_name,
    COUNT(*) AS click_count
  FROM analytics.interaction_events
  WHERE event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
    AND spec_id IN ('feature-education', 'in-app-chat', 'permission-dialog')
  GROUP BY spec_id, platform, action_name
)
SELECT
  i.spec_id,
  i.platform,
  i.impression_count,
  COALESCE(pos.click_count, 0) AS positive_clicks,
  COALESCE(neg.click_count, 0) AS negative_clicks,
  ROUND(SAFE_DIVIDE(pos.click_count, i.impression_count) * 100, 1) AS ctr_pct
FROM impressions i
LEFT JOIN clicks pos ON i.spec_id = pos.spec_id AND i.platform = pos.platform
  AND pos.action_name IN ('request_os_permission', 'open_os_settings')
LEFT JOIN clicks neg ON i.spec_id = neg.spec_id AND i.platform = neg.platform
  AND neg.action_name = 'dismiss'
ORDER BY i.spec_id, i.platform
```

**Findings:**

| Flow | iOS CTR | Android CTR | Gap |
|------|---------|-------------|-----|
| Feature Education | 49.8% | 23.9% | 2.1x iOS |
| Permission Dialog (shared) | 13.7% | 33.3% | 2.4x Android |

**Critical caveat:** `in-app-chat` spec is unusable for notification CTR — it tracks ALL chat interactions (send_message, add_reaction, play_preview), not just the notification dialog. Excluded from analysis.

**Interpretation:**
- Where both platforms use native OS prompts (Feature Education), iOS wins 2.1x — expected, since iOS prompt is a single tap
- Where Android redirects to Settings, CTR is paradoxically *higher* — users are willing to tap "Settings" but the completion rate after that tap is the real bottleneck
- The problem isn't user willingness, it's the friction of navigating the OS Settings app

**Limitations:**
- CTR measures intent to enable, not actual permission grant (we can't see what happens in OS Settings)
- `in-app-chat` data contamination means we have no clean signal for one of the highest-volume flows
- One week of data; seasonal patterns not captured

**Next steps:**
1. Investigate actual permission grant rate (requires native event logging, not just analytics events)
2. Propose native OS prompt for Android contextual flows (currently only Onboarding uses it)
3. Flag `in-app-chat` instrumentation to the owning team for cleanup

## Overview

Data exploration and analysis partner that helps product managers query databases, analyze metrics, extract actionable insights, and make data-informed product decisions.

## Prerequisites

- Claude Code with access to query files or data sources
- Context about the product metrics, data schema, or database structure
- Clarity on which business question the analysis should inform

## Output

Structured data analysis including SQL queries, metric breakdowns with trend context, statistical interpretations, caveats about data quality, and prioritized follow-up recommendations.

## Error Handling

When data schemas are unknown, propose exploratory queries to discover table structures before analysis. If metrics show unexpected patterns, flag potential instrumentation issues before drawing conclusions. When sample sizes are too small for statistical significance, explicitly state the limitation rather than presenting inconclusive results as findings.

## Resources

- [BigQuery SQL reference](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax) -- query syntax and functions
- [Statistical significance calculator](https://www.evanmiller.org/ab-testing/chi-squared.html) -- A/B test evaluation
- [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) -- why segmented analysis matters

Related Skills

generating-test-data

1868
from jeremylongshore/claude-code-plugins-plus-skills

Generate realistic test data including edge cases and boundary conditions. Use when creating realistic fixtures or edge case test data. Trigger with phrases like "generate test data", "create fixtures", or "setup test database".

managing-database-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Test database testing including fixtures, transactions, and rollback management. Use when performing specialized testing. Trigger with phrases like "test the database", "run database tests", or "validate data integrity".

encrypting-and-decrypting-data

1868
from jeremylongshore/claude-code-plugins-plus-skills

Validate encryption implementations and cryptographic practices. Use when reviewing data security measures. Trigger with 'check encryption', 'validate crypto', or 'review security keys'.

scanning-for-data-privacy-issues

1868
from jeremylongshore/claude-code-plugins-plus-skills

Scan for data privacy issues and sensitive information exposure. Use when reviewing data handling practices. Trigger with 'scan privacy issues', 'check sensitive data', or 'validate data protection'.

windsurf-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Control what code and data Windsurf AI can access and process in your workspace. Use when handling sensitive data, implementing data exclusion patterns, or ensuring compliance with privacy regulations in Windsurf environments. Trigger with phrases like "windsurf data privacy", "windsurf PII", "windsurf GDPR", "windsurf compliance", "codeium data", "windsurf telemetry".

webflow-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement Webflow data handling — CMS content delivery patterns, PII redaction in form submissions, GDPR/CCPA compliance for ecommerce data, and data retention policies. Trigger with phrases like "webflow data", "webflow PII", "webflow GDPR", "webflow data retention", "webflow privacy", "webflow CCPA", "webflow forms data".

vercel-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement data handling, PII protection, and GDPR/CCPA compliance for Vercel deployments. Use when handling sensitive data in serverless functions, implementing data redaction, or ensuring privacy compliance on Vercel. Trigger with phrases like "vercel data", "vercel PII", "vercel GDPR", "vercel data retention", "vercel privacy", "vercel compliance".

veeva-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault data handling for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva data handling".

vastai-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Manage training data and model artifacts securely on Vast.ai GPU instances. Use when transferring data to instances, managing checkpoints, or implementing secure data lifecycle on rented hardware. Trigger with phrases like "vastai data", "vastai upload data", "vastai checkpoints", "vastai data security", "vastai artifacts".

twinmind-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Handle TwinMind meeting data with GDPR compliance: transcript storage, memory vault management, data export, and deletion policies. Use when implementing data handling, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind data handling", "twinmind data handling".

supabase-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement GDPR/CCPA compliance with Supabase: RLS for data isolation, user deletion via auth.admin.deleteUser(), data export via SQL, PII column management, backup/restore workflows, and retention policies. Use when handling sensitive data, implementing right-to-deletion, configuring data retention, or auditing PII in Supabase database columns. Trigger: "supabase GDPR", "supabase data handling", "supabase PII", "supabase compliance", "supabase data retention", "supabase delete user", "supabase data export".

speak-data-handling

1868
from jeremylongshore/claude-code-plugins-plus-skills

Handle student audio data, assessment records, and learning progress with GDPR/COPPA compliance. Use when implementing data handling, or managing Speak language learning platform operations. Trigger with phrases like "speak data handling", "speak data handling".