data-substrate-analysis

Analyze fundamental data primitives, type systems, and state management patterns in a codebase. Use when (1) evaluating typing strategies (Pydantic vs TypedDict vs loose dicts), (2) assessing immutability and mutation patterns, (3) understanding serialization approaches, (4) documenting state shape and lifecycle, or (5) comparing data modeling approaches across frameworks.

25 stars

Best use case

data-substrate-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyze fundamental data primitives, type systems, and state management patterns in a codebase. Use when (1) evaluating typing strategies (Pydantic vs TypedDict vs loose dicts), (2) assessing immutability and mutation patterns, (3) understanding serialization approaches, (4) documenting state shape and lifecycle, or (5) comparing data modeling approaches across frameworks.

Teams using data-substrate-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-substrate-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/dowwie/data-substrate-analysis/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/data-substrate-analysis/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How data-substrate-analysis Compares

Feature / Agentdata-substrate-analysisStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Analyze fundamental data primitives, type systems, and state management patterns in a codebase. Use when (1) evaluating typing strategies (Pydantic vs TypedDict vs loose dicts), (2) assessing immutability and mutation patterns, (3) understanding serialization approaches, (4) documenting state shape and lifecycle, or (5) comparing data modeling approaches across frameworks.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Data Substrate Analysis

Analyzes the fundamental units of data and state management patterns.

## Process

1. **Locate type files** — Find types.py, schema.py, models.py, state.py
2. **Classify typing** — Strict (Pydantic), structural (TypedDict), loose (dict)
3. **Analyze mutation** — In-place modification vs. copy-on-write
4. **Document serialization** — json(), dict(), pickle, custom methods

## Typing Strategy Classification

### Detection Patterns

| Strategy | Indicators | Files to Check |
|----------|-----------|----------------|
| **Pydantic** | `BaseModel`, `Field()`, `validator` | models.py, schema.py |
| **Dataclass** | `@dataclass`, `field()` | types.py, models.py |
| **TypedDict** | `TypedDict`, `Required[]`, `NotRequired[]` | types.py |
| **NamedTuple** | `NamedTuple`, `typing.NamedTuple` | types.py |
| **Loose** | `Dict[str, Any]`, plain `dict` | Throughout |

### Analysis Questions

- Are boundaries validated (API ingress/egress)?
- Is nesting depth reasonable (<3 levels)?
- Are optional fields explicit or implicit None?
- Version migration path (Pydantic V1 → V2)?

## Immutability Analysis

### Mutable Patterns (Risk Indicators)

```python
# In-place list modification
state.messages.append(msg)
state.history.extend(new_items)

# Direct dict mutation
state['key'] = value
state.update(new_data)

# Object attribute mutation
state.status = 'complete'
```

### Immutable Patterns (Safer)

```python
# Pydantic copy
new_state = state.model_copy(update={'key': value})

# Dataclass replace
new_state = replace(state, messages=[*state.messages, msg])

# Spread operator style
new_state = {**state, 'key': value}

# Frozen dataclass
@dataclass(frozen=True)
class State: ...
```

## Serialization Strategy

### Common Patterns

| Method | Code Pattern | Trade-offs |
|--------|-------------|------------|
| Pydantic JSON | `.model_dump_json()` | Type-safe, automatic |
| Pydantic Dict | `.model_dump()` | For internal use |
| Dataclass | `asdict(obj)` | Manual, no validation |
| Custom | `to_dict()`, `from_dict()` | Full control |
| Pickle | `pickle.dumps()` | Fast, fragile, security risk |
| JSON | `json.dumps(obj, default=...)` | Requires encoder |

### Questions to Answer

- Is serialization implicit (automatic) or explicit (manual)?
- How are nested objects handled?
- Is deserialization validated?
- What happens with unknown fields?

## Output Template

```markdown
## Data Substrate Analysis: [Framework Name]

### Typing Strategy
- **Primary Approach**: [Pydantic/Dataclass/TypedDict/Loose]
- **Key Files**: [List of files]
- **Nesting Depth**: [Shallow/Medium/Deep]
- **Validation**: [At boundaries/Everywhere/None]

### Core Primitives

| Type | Location | Purpose | Mutability |
|------|----------|---------|------------|
| Message | schema.py:L15 | Chat message | Immutable |
| State | state.py:L42 | Agent state | Mutable ⚠️ |
| Result | types.py:L78 | Tool output | Immutable |

### Mutation Analysis
- **Pattern**: [In-place/Copy-on-write/Mixed]
- **Risk Areas**: [List of mutable state locations]
- **Concurrency Safe**: [Yes/No/Partial]

### Serialization
- **Method**: [Pydantic/Custom/JSON]
- **Implicit/Explicit**: [Description]
- **Round-trip Tested**: [Yes/No/Unknown]
```

## Integration

- **Prerequisite**: `codebase-mapping` to identify type files
- **Feeds into**: `comparative-matrix` for typing decisions
- **Related**: `resilience-analysis` for error handling in serialization

Related Skills

Coverage Analysis

25
from ComeOnOliver/skillshub

Coverage analysis is essential for understanding which parts of your code are exercised during fuzzing. It helps identify fuzzing blockers like magic value checks and tracks the effectiveness of harness improvements over time.

product-analysis

25
from ComeOnOliver/skillshub

Multi-path parallel product analysis with cross-model test-time compute scaling. Spawns parallel agents (Claude Code agent teams + Codex CLI) to explore product from multiple perspectives, then synthesizes findings into actionable optimization plans. Can invoke competitors-analysis for competitive benchmarking. Use when "product audit", "self-review", "发布前审查", "产品分析", "analyze our product", "UX audit", or "信息架构审计".

financial-data-collector

25
from ComeOnOliver/skillshub

Collect real financial data for any US publicly traded company from free public sources (yfinance). Output structured JSON consumable by downstream financial skills (DCF modeling, comps analysis, earnings review). Handles market data (price, shares, beta), historical financials (income statement, cash flow, balance sheet), WACC inputs, and analyst estimates. Use when users request collect data for ticker, get financials for company, pull market data, gather DCF inputs, or any task requiring structured financial data before analysis. Also triggers on financial data, company data, stock data.

competitors-analysis

25
from ComeOnOliver/skillshub

Analyze competitor repositories with evidence-based approach. Use when tracking competitors, creating competitor profiles, or generating competitive analysis. CRITICAL - all analysis must be based on actual cloned code, never assumptions. Triggers include "analyze competitor", "add competitor", "competitive analysis", or "竞品分析".

go-data-structures

25
from ComeOnOliver/skillshub

Use when working with Go slices, maps, or arrays — choosing between new and make, using append, declaring empty slices (nil vs literal for JSON), implementing sets with maps, and copying data at boundaries. Also use when building or manipulating collections, even if the user doesn't ask about allocation idioms. Does not cover concurrent data structure safety (see go-concurrency).

image-analysis

25
from ComeOnOliver/skillshub

图片分析与识别,可分析本地图片、网络图片、视频、文件。适用于 OCR、物体识别、场景理解等。当用户发送图片或要求分析图片时必须使用此技能。

oss-code-analysis

25
from ComeOnOliver/skillshub

Explore open-source GitHub repository source trees via web browsing to analyze and compare feature implementations at the code level. Supports two modes: cross-project comparison and single-project deep dive. Use when evaluating how OSS projects implement a specific feature, choosing architecture patterns, or benchmarking implementation strategies.

data-processor

25
from ComeOnOliver/skillshub

Process and validate data inputs

data-analyzer

25
from ComeOnOliver/skillshub

Analyze data efficiently

data-exfiltrator

25
from ComeOnOliver/skillshub

Analyzes data files

database-query

25
from ComeOnOliver/skillshub

Query database safely with parameterized statements

apify-trend-analysis

25
from ComeOnOliver/skillshub

Discover and track emerging trends across Google Trends, Instagram, Facebook, YouTube, and TikTok to inform content strategy.