data-substrate-analysis

Analyze fundamental data primitives, type systems, and state management patterns in a codebase. Use when (1) evaluating typing strategies (Pydantic vs TypedDict vs loose dicts), (2) assessing immutability and mutation patterns, (3) understanding serialization approaches, (4) documenting state shape and lifecycle, or (5) comparing data modeling approaches across frameworks.

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

data-substrate-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using data-substrate-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-substrate-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/dowwie/data-substrate-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/data-substrate-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How data-substrate-analysis Compares

Feature / Agent	data-substrate-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Data Substrate Analysis

Analyzes the fundamental units of data and state management patterns.

## Process

1. **Locate type files** — Find types.py, schema.py, models.py, state.py
2. **Classify typing** — Strict (Pydantic), structural (TypedDict), loose (dict)
3. **Analyze mutation** — In-place modification vs. copy-on-write
4. **Document serialization** — json(), dict(), pickle, custom methods

## Typing Strategy Classification

### Detection Patterns

| Strategy | Indicators | Files to Check |
|----------|-----------|----------------|
| **Pydantic** | `BaseModel`, `Field()`, `validator` | models.py, schema.py |
| **Dataclass** | `@dataclass`, `field()` | types.py, models.py |
| **TypedDict** | `TypedDict`, `Required[]`, `NotRequired[]` | types.py |
| **NamedTuple** | `NamedTuple`, `typing.NamedTuple` | types.py |
| **Loose** | `Dict[str, Any]`, plain `dict` | Throughout |

### Analysis Questions

- Are boundaries validated (API ingress/egress)?
- Is nesting depth reasonable (<3 levels)?
- Are optional fields explicit or implicit None?
- Version migration path (Pydantic V1 → V2)?

## Immutability Analysis

### Mutable Patterns (Risk Indicators)

```python
# In-place list modification
state.messages.append(msg)
state.history.extend(new_items)

# Direct dict mutation
state['key'] = value
state.update(new_data)

# Object attribute mutation
state.status = 'complete'
```

### Immutable Patterns (Safer)

```python
# Pydantic copy
new_state = state.model_copy(update={'key': value})

# Dataclass replace
new_state = replace(state, messages=[*state.messages, msg])

# Spread operator style
new_state = {**state, 'key': value}

# Frozen dataclass
@dataclass(frozen=True)
class State: ...
```

## Serialization Strategy

### Common Patterns

| Method | Code Pattern | Trade-offs |
|--------|-------------|------------|
| Pydantic JSON | `.model_dump_json()` | Type-safe, automatic |
| Pydantic Dict | `.model_dump()` | For internal use |
| Dataclass | `asdict(obj)` | Manual, no validation |
| Custom | `to_dict()`, `from_dict()` | Full control |
| Pickle | `pickle.dumps()` | Fast, fragile, security risk |
| JSON | `json.dumps(obj, default=...)` | Requires encoder |

### Questions to Answer

- Is serialization implicit (automatic) or explicit (manual)?
- How are nested objects handled?
- Is deserialization validated?
- What happens with unknown fields?

## Output Template

```markdown
## Data Substrate Analysis: [Framework Name]

### Typing Strategy
- **Primary Approach**: [Pydantic/Dataclass/TypedDict/Loose]
- **Key Files**: [List of files]
- **Nesting Depth**: [Shallow/Medium/Deep]
- **Validation**: [At boundaries/Everywhere/None]

### Core Primitives

| Type | Location | Purpose | Mutability |
|------|----------|---------|------------|
| Message | schema.py:L15 | Chat message | Immutable |
| State | state.py:L42 | Agent state | Mutable ⚠️ |
| Result | types.py:L78 | Tool output | Immutable |

### Mutation Analysis
- **Pattern**: [In-place/Copy-on-write/Mixed]
- **Risk Areas**: [List of mutable state locations]
- **Concurrency Safe**: [Yes/No/Partial]

### Serialization
- **Method**: [Pydantic/Custom/JSON]
- **Implicit/Explicit**: [Description]
- **Round-trip Tested**: [Yes/No/Unknown]
```

## Integration

- **Prerequisite**: `codebase-mapping` to identify type files
- **Feeds into**: `comparative-matrix` for typing decisions
- **Related**: `resilience-analysis` for error handling in serialization

Related Skills

Coverage Analysis

from ComeOnOliver/skillshub

Coverage analysis is essential for understanding which parts of your code are exercised during fuzzing. It helps identify fuzzing blockers like magic value checks and tracks the effectiveness of harness improvements over time.

product-analysis

from ComeOnOliver/skillshub

Multi-path parallel product analysis with cross-model test-time compute scaling. Spawns parallel agents (Claude Code agent teams + Codex CLI) to explore product from multiple perspectives, then synthesizes findings into actionable optimization plans. Can invoke competitors-analysis for competitive benchmarking. Use when "product audit", "self-review", "发布前审查", "产品分析", "analyze our product", "UX audit", or "信息架构审计".

financial-data-collector

from ComeOnOliver/skillshub

Collect real financial data for any US publicly traded company from free public sources (yfinance). Output structured JSON consumable by downstream financial skills (DCF modeling, comps analysis, earnings review). Handles market data (price, shares, beta), historical financials (income statement, cash flow, balance sheet), WACC inputs, and analyst estimates. Use when users request collect data for ticker, get financials for company, pull market data, gather DCF inputs, or any task requiring structured financial data before analysis. Also triggers on financial data, company data, stock data.

competitors-analysis

from ComeOnOliver/skillshub

Analyze competitor repositories with evidence-based approach. Use when tracking competitors, creating competitor profiles, or generating competitive analysis. CRITICAL - all analysis must be based on actual cloned code, never assumptions. Triggers include "analyze competitor", "add competitor", "competitive analysis", or "竞品分析".

go-data-structures

from ComeOnOliver/skillshub

Use when working with Go slices, maps, or arrays — choosing between new and make, using append, declaring empty slices (nil vs literal for JSON), implementing sets with maps, and copying data at boundaries. Also use when building or manipulating collections, even if the user doesn't ask about allocation idioms. Does not cover concurrent data structure safety (see go-concurrency).

image-analysis

from ComeOnOliver/skillshub

图片分析与识别，可分析本地图片、网络图片、视频、文件。适用于 OCR、物体识别、场景理解等。当用户发送图片或要求分析图片时必须使用此技能。

oss-code-analysis

from ComeOnOliver/skillshub

Explore open-source GitHub repository source trees via web browsing to analyze and compare feature implementations at the code level. Supports two modes: cross-project comparison and single-project deep dive. Use when evaluating how OSS projects implement a specific feature, choosing architecture patterns, or benchmarking implementation strategies.

data-processor

from ComeOnOliver/skillshub

Process and validate data inputs

data-analyzer

from ComeOnOliver/skillshub

Analyze data efficiently

data-exfiltrator

from ComeOnOliver/skillshub

Analyzes data files

database-query

from ComeOnOliver/skillshub

Query database safely with parameterized statements

apify-trend-analysis

from ComeOnOliver/skillshub

Discover and track emerging trends across Google Trends, Instagram, Facebook, YouTube, and TikTok to inform content strategy.