eda
Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.
Best use case
eda is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.
Teams using eda should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/eda/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How eda Compares
| Feature / Agent | eda | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Exploratory Data Analysis (EDA)
Analyze tabular datasets to understand distributions, data quality, and patterns.
## When to Use
- Understanding a new dataset before modeling
- Checking data quality (missing values, outliers, duplicates)
- Analyzing target variable distribution
- Identifying class imbalance
- Generating summary statistics
## Analysis Process
1. **Connect to data** - Verify access and inspect schema
2. **Analyze target variable first** - Understand class balance
3. **Check each column** - Distribution, missing data, cardinality
4. **Document findings** - Save reports for reproducibility
## Available Analyses
| Analysis | Description |
|----------|-------------|
| Column Distribution | Value counts, percentages, cardinality assessment |
| Missing Data | Null counts, patterns (MCAR/MAR/MNAR) |
| Class Balance | Imbalance detection for classification targets |
| Summary Stats | Count, unique, nulls per column |
## Column Distribution Analysis
For detailed analysis methodology and output format:
- See [references/eda-analysis.md](references/eda-analysis.md)
### Quick Reference
**Cardinality Levels:**
| Level | Criteria | Action |
|-------|----------|--------|
| Low | ≤10 unique | Good for categorical encoding |
| Medium | 11-100 or <1% of rows | May need encoding strategy |
| High | >100 and <50% of rows | Consider grouping/binning |
| Very High | >50% of rows | Likely identifier, exclude |
**Missing Data Thresholds:**
| Percentage | Assessment |
|------------|------------|
| 0% | No missing data |
| <1% | Minimal - safe to drop or impute |
| 1-5% | Some - consider imputation strategy |
| >5% | Significant - investigate pattern |
**Class Imbalance:**
- >80% in top class: Imbalance detected
- >95% in top class: Extreme imbalance
## Output Format
```markdown
# Column Distribution: {column_name}
- **source**: path/to/data
- **column**: column_name
## Summary
- Total rows: N
- Null/missing: N (X%)
- Unique values: N
- Cardinality: Low|Medium|High|Very High
## Distribution
| Value | Count | Percentage | Cumulative |
|-------|-------|------------|------------|
## Observations
- Auto-generated insights
```
## Best Practices
1. Start with schema inspection before deep analysis
2. Check target variable first for classification tasks
3. Missing data may not be random - investigate patterns
4. Save reports for reproducibilityRelated Skills
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
geo-fundamentals
Generative Engine Optimization for AI search engines (ChatGPT, Claude, Perplexity).
geo-audit
Audit and optimize website for AI search engines like ChatGPT, Perplexity, Google AI Overviews, and Claude. Use when discussing GEO (Generative Engine Optimization), SEO for AI, llms.txt, AI crawlers, structured data for LLMs, or visibility in AI search results.
generator
Générateur de Skill - Crée de nouveaux fichiers SKILL.md depuis les définitions YAML d'agents
generative-optimization
Expert guidance for solving optimization problems using generative models (GMM and Flow Matching). Use when users need to solve optimization, inverse problems, or find feasible solutions under constraints using probabilistic sampling approaches.
generational-agent-succession
Parallel agent swarms with generational succession. Combines agent-architect's multi-agent parallelism with automatic succession when agents degrade. Each parallel agent gets fresh context through controlled handoffs while maintaining accumulated wisdom.
generate-llms
Generate llms.txt and llms-full.txt files for AI agent consumption following the llmstxt.org standard. Use when updating site content that should be reflected in the llms files, or when building/deploying the site.
gdpr-data-handling
Implement GDPR-compliant data handling with consent management, data subject rights, and privacy by design. Use when building systems that process EU personal data, implementing privacy controls, o...
gboy-character-selector
Select characters from the G*BOY universe for your OpenCLAW agent personality.
garak
Security testing and red-teaming for LLMs using NVIDIA's garak vulnerability scanner. Use when probing AI models for jailbreaks, prompt injections, data leakage, toxic content generation, or other failure modes. Triggers on "test LLM security", "red team model", "run garak", "LLM vulnerability scan", "jailbreak testing", or "prompt injection test".
gan-ai-automation
Automate Gan AI tasks via Rube MCP (Composio). Always search tools first for current schemas.
gait-capture-runpack
Capture and verify deterministic Gait runpacks from normalized run input. Use when asked to record a run, produce run_id or runpack artifacts, generate ticket-ready proof, or validate artifact integrity before handoff.