eda

Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

eda is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using eda should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/eda/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/eda/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/eda/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How eda Compares

Feature / Agent	eda	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Exploratory Data Analysis (EDA)

Analyze tabular datasets to understand distributions, data quality, and patterns.

## When to Use

- Understanding a new dataset before modeling
- Checking data quality (missing values, outliers, duplicates)
- Analyzing target variable distribution
- Identifying class imbalance
- Generating summary statistics

## Analysis Process

1. **Connect to data** - Verify access and inspect schema
2. **Analyze target variable first** - Understand class balance
3. **Check each column** - Distribution, missing data, cardinality
4. **Document findings** - Save reports for reproducibility

## Available Analyses

| Analysis | Description |
|----------|-------------|
| Column Distribution | Value counts, percentages, cardinality assessment |
| Missing Data | Null counts, patterns (MCAR/MAR/MNAR) |
| Class Balance | Imbalance detection for classification targets |
| Summary Stats | Count, unique, nulls per column |

## Column Distribution Analysis

For detailed analysis methodology and output format:
- See [references/eda-analysis.md](references/eda-analysis.md)

### Quick Reference

**Cardinality Levels:**
| Level | Criteria | Action |
|-------|----------|--------|
| Low | ≤10 unique | Good for categorical encoding |
| Medium | 11-100 or <1% of rows | May need encoding strategy |
| High | >100 and <50% of rows | Consider grouping/binning |
| Very High | >50% of rows | Likely identifier, exclude |

**Missing Data Thresholds:**
| Percentage | Assessment |
|------------|------------|
| 0% | No missing data |
| <1% | Minimal - safe to drop or impute |
| 1-5% | Some - consider imputation strategy |
| >5% | Significant - investigate pattern |

**Class Imbalance:**
- >80% in top class: Imbalance detected
- >95% in top class: Extreme imbalance

## Output Format

```markdown
# Column Distribution: {column_name}

- **source**: path/to/data
- **column**: column_name

## Summary
- Total rows: N
- Null/missing: N (X%)
- Unique values: N
- Cardinality: Low|Medium|High|Very High

## Distribution
| Value | Count | Percentage | Cumulative |
|-------|-------|------------|------------|

## Observations
- Auto-generated insights
```

## Best Practices

1. Start with schema inspection before deep analysis
2. Check target variable first for classification tasks
3. Missing data may not be random - investigate patterns
4. Save reports for reproducibility

Related Skills

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

geo-fundamentals

from diegosouzapw/awesome-omni-skill

Generative Engine Optimization for AI search engines (ChatGPT, Claude, Perplexity).

geo-audit

from diegosouzapw/awesome-omni-skill

Audit and optimize website for AI search engines like ChatGPT, Perplexity, Google AI Overviews, and Claude. Use when discussing GEO (Generative Engine Optimization), SEO for AI, llms.txt, AI crawlers, structured data for LLMs, or visibility in AI search results.

generator

from diegosouzapw/awesome-omni-skill

Générateur de Skill - Crée de nouveaux fichiers SKILL.md depuis les définitions YAML d'agents

generative-optimization

from diegosouzapw/awesome-omni-skill

Expert guidance for solving optimization problems using generative models (GMM and Flow Matching). Use when users need to solve optimization, inverse problems, or find feasible solutions under constraints using probabilistic sampling approaches.

generational-agent-succession

from diegosouzapw/awesome-omni-skill

Parallel agent swarms with generational succession. Combines agent-architect's multi-agent parallelism with automatic succession when agents degrade. Each parallel agent gets fresh context through controlled handoffs while maintaining accumulated wisdom.

generate-llms

from diegosouzapw/awesome-omni-skill

Generate llms.txt and llms-full.txt files for AI agent consumption following the llmstxt.org standard. Use when updating site content that should be reflected in the llms files, or when building/deploying the site.

gdpr-data-handling

from diegosouzapw/awesome-omni-skill

Implement GDPR-compliant data handling with consent management, data subject rights, and privacy by design. Use when building systems that process EU personal data, implementing privacy controls, o...

gboy-character-selector

from diegosouzapw/awesome-omni-skill

Select characters from the G*BOY universe for your OpenCLAW agent personality.

garak

from diegosouzapw/awesome-omni-skill

Security testing and red-teaming for LLMs using NVIDIA's garak vulnerability scanner. Use when probing AI models for jailbreaks, prompt injections, data leakage, toxic content generation, or other failure modes. Triggers on "test LLM security", "red team model", "run garak", "LLM vulnerability scan", "jailbreak testing", or "prompt injection test".

gan-ai-automation

from diegosouzapw/awesome-omni-skill

Automate Gan AI tasks via Rube MCP (Composio). Always search tools first for current schemas.

gait-capture-runpack

from diegosouzapw/awesome-omni-skill

Capture and verify deterministic Gait runpacks from normalized run input. Use when asked to record a run, produce run_id or runpack artifacts, generate ticket-ready proof, or validate artifact integrity before handoff.