AI Agent Skill HUB

ClaudeData & Research

Skill: Explore Data

## Purpose

154 stars

byai-analyst-lab

Complexity: medium

View on GitHub Installation ↓

About this skill

The 'Explore Data' skill provides AI agents with a robust capability for interactive data exploration without requiring a full analytical pipeline. It allows an agent to rapidly understand the structure, content, and quality of a connected dataset, making it an invaluable tool for initial data reconnaissance. Upon invocation, the skill can operate in three modes: a dataset overview listing tables, row counts, and suggesting initial questions; a table-specific exploration showing column types, null rates, sample rows, and key statistics; or a deep-dive into a single column, presenting distributions, null analysis, and outlier detection. It's designed to help agents quickly grasp data characteristics and identify potential issues or patterns. Users would leverage this skill after connecting a new dataset, when they need to understand its shape without a specific analytical question, or to form initial hypotheses before committing to a formal analysis. It streamlines the initial data understanding phase, making the agent more efficient and insightful when interacting with new data sources.

Best use case

The primary use case is initial data reconnaissance and understanding newly connected or unfamiliar datasets. Data analysts, scientists, and any user interacting with an AI agent to extract insights from data will benefit by quickly grasping the dataset's structure, content, and potential issues before starting formal analysis.

## Purpose

Users should expect a clear, interactive summary of their dataset, a specific table, or a detailed breakdown of a column, including key statistics, samples, and potential data quality flags.

Practical example

Example input

Explore the `customer_transactions` table, focusing on the `transaction_amount` column.

Example output

**Column Deep-Dive: `customer_transactions.transaction_amount`**
- **Type:** DECIMAL
- **Nulls:** 0.5% (123 nulls). Pattern appears random.
- **Distribution:** (Histogram ASCII art or link to plot)
  - Min: $0.50, Max: $15,000.00
  - Mean: $78.25, Median: $45.00
  - Standard Deviation: $120.10
- **Outliers:** 2.1% values above $500 (IQR method flags >$420 as potential outlier).
- **Suggestions:** Consider analyzing transactions in relation to `customer_segment` for category-specific spending habits.

When to use this skill

When the user says `/explore` or similar phrases like 'let me explore the data' or 'what's in this dataset?'
After connecting a new dataset, before any formal analysis begins.
When the user wants to understand data shape without a specific analytical question.
To quickly identify data quality issues or form initial hypotheses about the data.

When not to use this skill

When performing complex statistical modeling or advanced machine learning tasks.
When executing a predefined, specific analytical query that doesn't require exploration.
When writing production-ready data transformation or ETL pipelines.
If the user already has a clear, complex question and knows the relevant tables/columns.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/explore/SKILL.md --create-dirs "https://raw.githubusercontent.com/ai-analyst-lab/ai-analyst/main/.claude/skills/explore/skill.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/explore/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Skill: Explore Data Compares

Feature / Agent	Skill: Explore Data	Standard Approach
Platform Support	Claude	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	medium	N/A

Frequently Asked Questions

What does this skill do?

## Purpose

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

SKILL.md Source

# Skill: Explore Data

## Purpose
Quick, interactive data exploration without the full pipeline. Lets users
poke around the active dataset — preview tables, check distributions, spot
patterns, and form hypotheses before committing to a formal analysis.

## When to Use
- User says `/explore` or "let me explore the data" or "what's in this dataset?"
- After connecting a new dataset, before any formal analysis
- When the user wants to understand data shape without a specific question

## Invocation
`/explore` — explore the active dataset
`/explore {table}` — focus on a specific table
`/explore {table} {column}` — deep-dive into a specific column

## Instructions

### Step 1: Load Context
Read `.knowledge/active.yaml` to identify the active dataset.
Read `.knowledge/datasets/{active}/schema.md` for table/column reference.
Read `.knowledge/datasets/{active}/quirks.md` for known gotchas.

If no active dataset, prompt: "No dataset connected. Use `/connect-data` to add one."

### Step 2: Choose Exploration Mode

**Mode A: Dataset overview** (no table specified)
- List all tables with row counts and date ranges
- Highlight the 3-5 most analytically useful tables (most rows, most joins)
- Show key entities and how they connect
- Suggest 3 starting questions based on available data

**Mode B: Table exploration** (table specified)
- Show column list with types and null rates
- Sample 5 random rows
- For numeric columns: min, max, mean, median
- For categorical columns: top 5 values with counts
- For date columns: range and coverage
- Flag any quality issues (>5% nulls, low cardinality, suspicious values)

**Mode C: Column deep-dive** (table + column specified)
- Full distribution: histogram for numeric, bar chart for categorical
- Null analysis: count, pattern (random vs systematic)
- Outlier detection: IQR method, flag extremes
- If date column: coverage heatmap by week
- Suggest related columns for cross-analysis

### Step 3: Interactive Follow-Up
After presenting results, offer 2-3 contextual next actions:
- "Want to see how {column} varies by {dimension}?"
- "This looks like a good candidate for funnel analysis. Want to try `/run-pipeline`?"
- "There are quality issues in {column}. Want to run `/data-profiling`?"

### Step 4: Save Exploration Notes
Write a brief exploration summary to `working/explore_notes_{DATE}.md`:
- Tables examined
- Key observations
- Quality flags
- Suggested next steps

This file is available for subsequent agents (e.g., Question Framing can reference
exploration notes to inform hypothesis generation).

## Rules
1. Keep it fast — no more than 3-4 queries per exploration step
2. Always apply `swd_style()` if generating any chart
3. Never modify data during exploration
4. Always cite table and column names in output
5. If data source is CSV fallback, mention this to the user

## Edge Cases
- **Empty table:** Report row count = 0, suggest checking data load
- **Table not found:** Fuzzy-match against schema, suggest closest match
- **Column has all nulls:** Flag as BLOCKER, suggest checking data pipeline
- **Very wide table (>50 columns):** Group columns by category, show summary not full list

Related Skills

Skill: History

from ai-analyst-lab/ai-analyst

## Purpose

Data & ResearchClaude

dcf

from daloopa/investing

Discounted cash flow valuation with sensitivity analysis

Data & ResearchClaude

notebooklm-research

from claude-world/notebooklm-skill

Full-autopilot AI research agent powered by Google NotebookLM (notebooklm-py v0.3.4). Ingests sources (URL, text, PDF, DOCX, YouTube, Google Drive), runs deep web research, asks cited questions, and generates 10 native artifact types (audio podcast, video, cinematic video, slide deck, report, quiz, flashcards, mind map, infographic, data table, study guide). Produces original content drafts via Claude, with optional publishing to social platforms via threads-viral-agent integration. Use this skill when the user mentions: NotebookLM, research with sources, create notebook, generate podcast from articles, turn research into content, trending topic research, research pipeline, source-based analysis, cited research answers, generate slides, generate quiz, make flashcards, deep web research, create infographic, compare sources, research report, study guide, source analysis, or knowledge synthesis.

Data & ResearchClaude

RLM (Recursive Language Model) Skill

from richardwhiteii/rlm

The RLM (Recursive Language Model) Skill enables AI agents to process extremely large contexts (10M+ tokens) by recursively chunking, processing, and aggregating results, effectively overcoming context window limitations.

Data & ResearchClaude

q

from DavidROliverBA/ArchitectKB

Fast SQLite-based vault search using FTS5 full-text search index

Data & ResearchClaude

nblm

from magicseek/nblm

This skill allows AI agents, particularly Claude Code, to directly query and manage your Google NotebookLM notebooks, providing source-grounded and citation-backed answers from Gemini.

Data & ResearchClaude

lastXdays

from levineam/lastXdays-skill

Researches any given topic across Reddit, X (Twitter), and the broader web within a custom, configurable time window, synthesizing findings and generating expert-level prompts.

Data & ResearchClaude

Data Analyst — AfrexAI ⚡📊

from openclaw/skills

**Transform raw data into decisions. Not just charts — answers.**

Data & Research

data-analysis-partner

from openclaw/skills

智能数据分析 Skill，输入 CSV/Excel 文件和分析需求，输出带交互式 ECharts 图表的 HTML 自包含分析报告

Data & Research

japan-gyousei-data

from naoterumaker/japan-gyousei-data

Access Japanese administrative open data, including real estate transaction prices, government procurement information, and e-Stat government statistics, via a real-time MCP server.

Data & Research

tavily-search

from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research

baidu-search

from openclaw/skills

Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

Data & Research