eda

Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.

16 stars

Best use case

eda is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.

Teams using eda should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/eda/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/eda/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/eda/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How eda Compares

Feature / AgentedaStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Exploratory Data Analysis (EDA)

Analyze tabular datasets to understand distributions, data quality, and patterns.

## When to Use

- Understanding a new dataset before modeling
- Checking data quality (missing values, outliers, duplicates)
- Analyzing target variable distribution
- Identifying class imbalance
- Generating summary statistics

## Analysis Process

1. **Connect to data** - Verify access and inspect schema
2. **Analyze target variable first** - Understand class balance
3. **Check each column** - Distribution, missing data, cardinality
4. **Document findings** - Save reports for reproducibility

## Available Analyses

| Analysis | Description |
|----------|-------------|
| Column Distribution | Value counts, percentages, cardinality assessment |
| Missing Data | Null counts, patterns (MCAR/MAR/MNAR) |
| Class Balance | Imbalance detection for classification targets |
| Summary Stats | Count, unique, nulls per column |

## Column Distribution Analysis

For detailed analysis methodology and output format:
- See [references/eda-analysis.md](references/eda-analysis.md)

### Quick Reference

**Cardinality Levels:**
| Level | Criteria | Action |
|-------|----------|--------|
| Low | ≤10 unique | Good for categorical encoding |
| Medium | 11-100 or <1% of rows | May need encoding strategy |
| High | >100 and <50% of rows | Consider grouping/binning |
| Very High | >50% of rows | Likely identifier, exclude |

**Missing Data Thresholds:**
| Percentage | Assessment |
|------------|------------|
| 0% | No missing data |
| <1% | Minimal - safe to drop or impute |
| 1-5% | Some - consider imputation strategy |
| >5% | Significant - investigate pattern |

**Class Imbalance:**
- >80% in top class: Imbalance detected
- >95% in top class: Extreme imbalance

## Output Format

```markdown
# Column Distribution: {column_name}

- **source**: path/to/data
- **column**: column_name

## Summary
- Total rows: N
- Null/missing: N (X%)
- Unique values: N
- Cardinality: Low|Medium|High|Very High

## Distribution
| Value | Count | Percentage | Cumulative |
|-------|-------|------------|------------|

## Observations
- Auto-generated insights
```

## Best Practices

1. Start with schema inspection before deep analysis
2. Check target variable first for classification tasks
3. Missing data may not be random - investigate patterns
4. Save reports for reproducibility

Related Skills

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

geo-fundamentals

16
from diegosouzapw/awesome-omni-skill

Generative Engine Optimization for AI search engines (ChatGPT, Claude, Perplexity).

geo-audit

16
from diegosouzapw/awesome-omni-skill

Audit and optimize website for AI search engines like ChatGPT, Perplexity, Google AI Overviews, and Claude. Use when discussing GEO (Generative Engine Optimization), SEO for AI, llms.txt, AI crawlers, structured data for LLMs, or visibility in AI search results.

generator

16
from diegosouzapw/awesome-omni-skill

Générateur de Skill - Crée de nouveaux fichiers SKILL.md depuis les définitions YAML d'agents

generative-optimization

16
from diegosouzapw/awesome-omni-skill

Expert guidance for solving optimization problems using generative models (GMM and Flow Matching). Use when users need to solve optimization, inverse problems, or find feasible solutions under constraints using probabilistic sampling approaches.

generational-agent-succession

16
from diegosouzapw/awesome-omni-skill

Parallel agent swarms with generational succession. Combines agent-architect's multi-agent parallelism with automatic succession when agents degrade. Each parallel agent gets fresh context through controlled handoffs while maintaining accumulated wisdom.

generate-llms

16
from diegosouzapw/awesome-omni-skill

Generate llms.txt and llms-full.txt files for AI agent consumption following the llmstxt.org standard. Use when updating site content that should be reflected in the llms files, or when building/deploying the site.

gdpr-data-handling

16
from diegosouzapw/awesome-omni-skill

Implement GDPR-compliant data handling with consent management, data subject rights, and privacy by design. Use when building systems that process EU personal data, implementing privacy controls, o...

gboy-character-selector

16
from diegosouzapw/awesome-omni-skill

Select characters from the G*BOY universe for your OpenCLAW agent personality.

garak

16
from diegosouzapw/awesome-omni-skill

Security testing and red-teaming for LLMs using NVIDIA's garak vulnerability scanner. Use when probing AI models for jailbreaks, prompt injections, data leakage, toxic content generation, or other failure modes. Triggers on "test LLM security", "red team model", "run garak", "LLM vulnerability scan", "jailbreak testing", or "prompt injection test".

gan-ai-automation

16
from diegosouzapw/awesome-omni-skill

Automate Gan AI tasks via Rube MCP (Composio). Always search tools first for current schemas.

gait-capture-runpack

16
from diegosouzapw/awesome-omni-skill

Capture and verify deterministic Gait runpacks from normalized run input. Use when asked to record a run, produce run_id or runpack artifacts, generate ticket-ready proof, or validate artifact integrity before handoff.