pandas-dataframe-analyzer

Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations.

509 stars

bya5c-ai

View on GitHub Installation ↓

Best use case

pandas-dataframe-analyzer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations.

Teams using pandas-dataframe-analyzer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pandas-dataframe-analyzer/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/data-science-ml/skills/pandas-dataframe-analyzer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/pandas-dataframe-analyzer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How pandas-dataframe-analyzer Compares

Feature / Agent	pandas-dataframe-analyzer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# pandas-dataframe-analyzer

## Overview

Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations using pandas and profiling libraries.

## Capabilities

- Statistical profiling of DataFrames
- Missing value pattern detection
- Data type optimization suggestions
- Memory footprint analysis
- Duplicate detection and handling
- Distribution analysis and visualization
- Correlation matrix computation
- Cardinality analysis for categorical features

## Target Processes

- Exploratory Data Analysis (EDA) Pipeline
- Data Collection and Validation Pipeline
- Feature Engineering Design and Implementation

## Tools and Libraries

- pandas
- pandas-profiling / ydata-profiling
- numpy
- scipy (for statistical tests)

## Input Schema

```json
{
  "type": "object",
  "required": ["dataPath"],
  "properties": {
    "dataPath": {
      "type": "string",
      "description": "Path to the data file (CSV, Parquet, JSON)"
    },
    "sampleSize": {
      "type": "integer",
      "description": "Number of rows to sample for analysis",
      "default": 10000
    },
    "profileType": {
      "type": "string",
      "enum": ["minimal", "standard", "full"],
      "default": "standard"
    },
    "outputFormat": {
      "type": "string",
      "enum": ["json", "html", "markdown"],
      "default": "json"
    }
  }
}
```

## Output Schema

```json
{
  "type": "object",
  "required": ["summary", "columns", "recommendations"],
  "properties": {
    "summary": {
      "type": "object",
      "properties": {
        "rowCount": { "type": "integer" },
        "columnCount": { "type": "integer" },
        "memoryUsageMB": { "type": "number" },
        "duplicateRows": { "type": "integer" },
        "missingCells": { "type": "integer" },
        "missingCellsPercent": { "type": "number" }
      }
    },
    "columns": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "dtype": { "type": "string" },
          "nullCount": { "type": "integer" },
          "uniqueCount": { "type": "integer" },
          "stats": { "type": "object" }
        }
      }
    },
    "recommendations": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type": { "type": "string" },
          "column": { "type": "string" },
          "suggestion": { "type": "string" },
          "impact": { "type": "string" }
        }
      }
    }
  }
}
```

## Usage Example

```javascript
{
  kind: 'skill',
  title: 'Analyze training dataset',
  skill: {
    name: 'pandas-dataframe-analyzer',
    context: {
      dataPath: 'data/train.csv',
      profileType: 'full',
      outputFormat: 'json'
    }
  }
}
```

Related Skills

terraform-analyzer

509

from a5c-ai/babysitter

Specialized skill for analyzing Terraform configurations. Supports parsing, security scanning (tfsec, checkov), cost estimation (infracost), drift detection, and plan visualization across AWS, Azure, and GCP.

db-query-analyzer

509

from a5c-ai/babysitter

Analyze database query performance with execution plans and index recommendations

code-complexity-analyzer

509

from a5c-ai/babysitter

Analyze code complexity metrics including cyclomatic complexity, code smells, and technical debt

cloudformation-analyzer

509

from a5c-ai/babysitter

Validate and analyze AWS CloudFormation templates for security and best practices

semantic-code-analyzer

509

from a5c-ai/babysitter

LLM-powered semantic analysis of code diffs to detect business-logic trojans

sast-analyzer

509

from a5c-ai/babysitter

Static Application Security Testing orchestration and analysis. Execute Semgrep, Bandit, ESLint security plugins, CodeQL, and other SAST tools. Parse, prioritize, and deduplicate findings across multiple tools with remediation guidance.

crypto-analyzer

509

from a5c-ai/babysitter

Cryptographic implementation analysis and validation for encryption algorithms, key sizes, and certificate management

semver-analyzer

509

from a5c-ai/babysitter

Analyze code changes and determine semantic version bumps. Detect breaking changes automatically, suggest version bump (major/minor/patch), generate changelog entries, and validate version consistency.

api-diff-analyzer

509

from a5c-ai/babysitter

Compare API specifications to detect breaking changes. Compare OpenAPI spec versions, categorize changes by severity, generate migration guides, and block breaking changes in CI.

process-analyzer

509

from a5c-ai/babysitter

Analyze processes, identify workflows, define boundaries and scope, and map process requirements for specialization creation.

scope-logic-analyzer

509

from a5c-ai/babysitter

Test equipment integration for signal analysis (oscilloscope and logic analyzer)

protocol-analyzer

509

from a5c-ai/babysitter

Serial protocol analysis and debugging for common embedded interfaces (I2C, SPI, UART)