data-structure-protocol

Give agents persistent structural memory of a codebase — navigate dependencies, track public APIs, and understand why connections exist without re-reading the whole repo.

31,392 stars
Complexity: easy

About this skill

The Data Structure Protocol (DSP) skill provides AI coding agents with a crucial capability: persistent structural memory of a codebase. This addresses a common limitation where LLM agents lose context between tasks, leading to significant token expenditure on 'orientation'—figuring out file locations, dependencies, and interaction patterns. DSP solves this by externalizing a project's architectural map into a queryable graph, stored locally within a `.dsp/` directory alongside the code. Unlike human documentation or raw Abstract Syntax Tree (AST) dumps, DSP intelligently captures and represents three core aspects: **meaning** (which needs further elaboration in the full DSP spec, but implies semantic relationships), dependencies, and public APIs. By furnishing agents with this readily accessible 'memory' of the codebase's architecture and rationale, DSP drastically reduces the need for constant re-reads. This allows agents to navigate complex projects, understand 'why' connections exist, track changes, and propose solutions with significantly higher efficiency and accuracy, freeing up computational resources (tokens) for more complex problem-solving and code generation.

Best use case

Performing large-scale code refactoring or architectural overhauls, allowing agents to understand the ripple effect of changes across the entire project. Automated bug fixing where agents need to quickly identify relevant code sections and dependencies related to a specific issue. Developing new features in complex, established codebases, ensuring proposed code integrates seamlessly with existing architecture and adheres to design patterns. Onboarding new AI agents or LLMs to a project by providing them with an immediate, pre-built understanding of the project's structure and relationships. Generating detailed code explanations, documentation, or answering complex questions about codebase structure and rationale for developers. Enhancing code review processes by allowing agents to verify adherence to architectural principles and identify potential dependency conflicts.

Give agents persistent structural memory of a codebase — navigate dependencies, track public APIs, and understand why connections exist without re-reading the whole repo.

Significantly reduced token usage and faster execution times for AI agents engaged in code navigation and comprehension tasks. Improved accuracy and consistency in agent-generated code, ensuring it respects existing architectural patterns and dependencies. Agents can perform more complex and long-running coding tasks without losing context or requiring constant re-orientation. Enhanced ability for agents to answer sophisticated questions about codebase structure, rationale, and interconnections. More efficient and effective code development, refactoring, and maintenance cycles powered by AI agents.

Practical example

Example input

Agent Task: Refactor the `PaymentGateway` module to integrate a new `Stripe` API client. Ensure all existing `PayPal` integrations remain functional during the transition. The agent would internally initiate a query to the DSP graph to understand `PaymentGateway`'s current architecture, dependencies (e.g., `OrderProcessingService`, `TransactionLogger`), public APIs, and interaction points with existing payment services.

Example output

Agent Action: Identified `PaymentGateway.processTransaction` as the primary entry point. DSP graph shows `CurrencyConverter` and `FraudDetectionService` as direct dependencies. Proposed changes to `payment_gateway.py` and `api_clients/paypal.py` to add `StripeClient` integration. Created a transition plan ensuring backward compatibility by introducing a feature flag for `Stripe` integration. Generated new test cases for `Stripe` client and updated existing `PayPal` tests. Plan affects 7 files and introduces 4 new helper functions. DSP allowed the agent to quickly map out the necessary changes and their impact without extensive exploration.

When to use this skill

  • When working with large, intricate codebases where the AI agent's context window limitations frequently hinder performance.
  • For any coding task requiring deep understanding of project architecture, inter-module dependencies, and API contracts.
  • To maximize token efficiency and reduce operational costs for AI agent interactions with code.
  • When an agent needs to maintain a consistent understanding of a project's structure across multiple, sequential tasks.

When not to use this skill

  • For small, single-file scripts or extremely simple projects where the overhead of generating and maintaining the `.dsp/` graph might outweigh the benefits.
  • In highly experimental or rapidly changing projects where the codebase structure is in constant flux, requiring frequent regeneration of the DSP graph.
  • When the primary AI agent task involves creative writing, general knowledge, or tasks entirely unrelated to code comprehension or generation.
  • If the target codebase is not accessible for local file system interaction (e.g., purely cloud-based, read-only external API).

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-structure-protocol/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/data-structure-protocol/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/data-structure-protocol/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How data-structure-protocol Compares

Feature / Agentdata-structure-protocolStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Give agents persistent structural memory of a codebase — navigate dependencies, track public APIs, and understand why connections exist without re-reading the whole repo.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Data Structure Protocol (DSP)

LLM coding agents lose context between tasks. On large codebases they spend most of their tokens on "orientation" — figuring out where things live, what depends on what, and what is safe to change. DSP solves this by externalizing the project's structural map into a persistent, queryable graph stored in a `.dsp/` directory next to the code.

DSP is NOT documentation for humans and NOT an AST dump. It captures three things: **meaning** (why an entity exists), **boundaries** (what it imports and exposes), and **reasons** (why each connection exists). This is enough for an agent to navigate, refactor, and generate code without loading the entire source tree into the context window.

## When to Use
Use this skill when:
- The project has a `.dsp/` directory (DSP is already set up)
- The user asks to set up DSP, bootstrap, or map a project's structure
- Creating, modifying, or deleting code files in a DSP-tracked project (to keep the graph updated)
- Navigating project structure, understanding dependencies, or finding specific modules
- The user mentions DSP, dsp-cli, `.dsp`, or structure mapping
- Performing impact analysis before a refactor or dependency replacement

## Core Concepts

### Code = graph

DSP models the codebase as a directed graph. Nodes are **entities**, edges are **imports** and **shared/exports**.

Two entity kinds exist:
- **Object**: any "thing" that isn't a function (module/file/class/config/resource/external dependency)
- **Function**: an exported function/method/handler/pipeline

### Identity by UID, not by file path

Every entity gets a stable UID: `obj-<8hex>` for objects, `func-<8hex>` for functions. File paths are attributes that can change; UIDs survive renames, moves, and reformatting.

For entities inside a file, the UID is anchored with a comment marker in source code:

```js
// @dsp func-7f3a9c12
export function calculateTotal(items) { ... }
```

```python
# @dsp obj-e5f6g7h8
class UserService:
```

### Every connection has a "why"

When an import is recorded, DSP stores a short reason explaining *why* that dependency exists. This lives in the `exports/` reverse index of the imported entity. A dependency graph without reasons tells you *what imports what*; reasons tell you **what is safe to change and who will break**.

### Storage format

Each entity gets a small directory under `.dsp/`:

```
.dsp/
├── TOC                        # ordered list of all entity UIDs from root
├── obj-a1b2c3d4/
│   ├── description            # source path, kind, purpose (1-3 sentences)
│   ├── imports                # UIDs this entity depends on (one per line)
│   ├── shared                 # UIDs of public API / exported entities
│   └── exports/               # reverse index: who imports this and why
│       ├── <importer_uid>     # file content = "why" text
│       └── <shared_uid>/
│           ├── description    # what is exported
│           └── <importer_uid> # why this specific export is imported
└── func-7f3a9c12/
    ├── description
    ├── imports
    └── exports/
```

Everything is plain text. Diffable. Reviewable. No database needed.

### Full import coverage

Every file or artifact that is imported anywhere must be represented in `.dsp` as an Object — code, images, styles, configs, JSON, wasm, everything. External dependencies (npm packages, stdlib, etc.) are recorded as `kind: external` but their internals are never analyzed.

## How It Works

### Initial Setup

The skill relies on a standalone Python CLI script `dsp-cli.py`. If it is missing from the project, download it:

```bash
curl -O https://raw.githubusercontent.com/k-kolomeitsev/data-structure-protocol/main/skills/data-structure-protocol/scripts/dsp-cli.py
```

Requires **Python 3.10+**. All commands use `python dsp-cli.py --root <project-root> <command>`.

### Bootstrap (initial mapping)

If `.dsp/` is empty, traverse the project from root entrypoint(s) via DFS on imports:

1. Identify root entrypoints (`package.json` main, framework entry, `main.py`, etc.)
2. Document the root file: `create-object`, `create-function` for each export, `create-shared`, `add-import` for all dependencies
3. Take the first non-external import, document it fully, descend into its imports
4. Backtrack when no unvisited local imports remain; continue until all reachable files are documented
5. External dependencies: `create-object --kind external`, add to TOC, but never descend into `node_modules`/`site-packages`/etc.

### Workflow Rules

- **Before changing code**: Find affected entities via `search`, `find-by-source`, or `read-toc`. Read their `description` and `imports` to understand context.
- **When creating a file/module**: Call `create-object`. For each exported function — `create-function` (with `--owner`). Register exports via `create-shared`.
- **When adding an import**: Call `add-import` with a brief `why`. For external deps — first `create-object --kind external` if the entity doesn't exist.
- **When removing import/export/file**: Call `remove-import`, `remove-shared`, `remove-entity`. Cascade cleanup is automatic.
- **When renaming/moving a file**: Call `move-entity`. UID does not change.
- **Don't touch DSP** if only internal implementation changed without affecting purpose or dependencies.

### Key Commands

| Category | Commands |
|----------|----------|
| **Create** | `init`, `create-object`, `create-function`, `create-shared`, `add-import` |
| **Update** | `update-description`, `update-import-why`, `move-entity` |
| **Delete** | `remove-import`, `remove-shared`, `remove-entity` |
| **Navigate** | `get-entity`, `get-children --depth N`, `get-parents --depth N`, `get-path`, `get-recipients`, `read-toc` |
| **Search** | `search <query>`, `find-by-source <path>` |
| **Diagnostics** | `detect-cycles`, `get-orphans`, `get-stats` |

### When to Update DSP

| Code Change | DSP Action |
|---|---|
| New file/module | `create-object` + `create-function` + `create-shared` + `add-import` |
| New import added | `add-import` (+ `create-object --kind external` if new dep) |
| Import removed | `remove-import` |
| Export added | `create-shared` (+ `create-function` if new) |
| Export removed | `remove-shared` |
| File renamed/moved | `move-entity` |
| File deleted | `remove-entity` |
| Purpose changed | `update-description` |
| Internal-only change | **No DSP update needed** |

## Examples

### Example 1: Setting up DSP and documenting a module

```bash
python dsp-cli.py --root . init

python dsp-cli.py --root . create-object "src/app.ts" "Main application entrypoint"
# Output: obj-a1b2c3d4

python dsp-cli.py --root . create-function "src/app.ts#start" "Starts the HTTP server" --owner obj-a1b2c3d4
# Output: func-7f3a9c12

python dsp-cli.py --root . create-shared obj-a1b2c3d4 func-7f3a9c12

python dsp-cli.py --root . add-import obj-a1b2c3d4 obj-deadbeef "HTTP routing"
```

### Example 2: Navigating the graph before making changes

```bash
python dsp-cli.py --root . search "authentication"
python dsp-cli.py --root . get-entity obj-a1b2c3d4
python dsp-cli.py --root . get-children obj-a1b2c3d4 --depth 2
python dsp-cli.py --root . get-recipients obj-a1b2c3d4
python dsp-cli.py --root . get-path obj-a1b2c3d4 func-7f3a9c12
```

### Example 3: Impact analysis before replacing a library

```bash
python dsp-cli.py --root . find-by-source "lodash"
# Output: obj-11223344

python dsp-cli.py --root . get-recipients obj-11223344
# Shows every module that imports lodash and WHY — lets you systematically replace it
```

## Best Practices

- ✅ **Do:** Update DSP immediately when creating new files, adding imports, or changing public APIs
- ✅ **Do:** Always add a meaningful `why` reason when recording an import — this is where most of DSP's value lives
- ✅ **Do:** Use `kind: external` for third-party libraries without analyzing their internals
- ✅ **Do:** Keep descriptions minimal (1-3 sentences about purpose, not implementation)
- ✅ **Do:** Treat `.dsp/` diffs like code diffs — review them, keep them accurate
- ❌ **Don't:** Touch `.dsp/` for internal-only changes that don't affect purpose or dependencies
- ❌ **Don't:** Change an entity's UID on rename/move (use `move-entity` instead)
- ❌ **Don't:** Create UIDs for every local variable or helper — only file-level Objects and public/shared entities

## Integration

This skill connects naturally to:
- **context-compression** — DSP reduces the need for compression by providing targeted retrieval instead of loading everything
- **context-optimization** — DSP is a structural optimization: agents pull minimal "context bundles" instead of raw source
- **architecture** — DSP captures architectural boundaries (imports/exports) that feed system design decisions

## References

- **Full architecture specification**: [ARCHITECTURE.md](https://github.com/k-kolomeitsev/data-structure-protocol/blob/main/ARCHITECTURE.md)
- **CLI source + reference docs**: [skills/data-structure-protocol](https://github.com/k-kolomeitsev/data-structure-protocol/tree/main/skills/data-structure-protocol)
- **Introduction article**: [article.md](https://github.com/k-kolomeitsev/data-structure-protocol/blob/main/article.md)

Related Skills

codex-review

31392
from sickn33/antigravity-awesome-skills

Professional code review with auto CHANGELOG generation, integrated with Codex AI. Use when you want professional code review before commits, you need automatic CHANGELOG generation, or reviewing large-scale refactoring.

Code AnalysisClaudeCodex

code-review-checklist

31392
from sickn33/antigravity-awesome-skills

Comprehensive checklist for conducting thorough code reviews covering functionality, security, performance, and maintainability

Code AnalysisClaude

code-refactoring-context-restore

31392
from sickn33/antigravity-awesome-skills

Use when working with code refactoring context restore

Code AnalysisClaude

code-documentation-code-explain

31392
from sickn33/antigravity-awesome-skills

You are a code education expert specializing in explaining complex code through clear narratives, visual diagrams, and step-by-step breakdowns. Transform difficult concepts into understandable explanations for developers at all levels.

Code AnalysisClaude

c4-architecture-c4-architecture

31392
from sickn33/antigravity-awesome-skills

Generate comprehensive C4 architecture documentation for an existing repository/codebase using a bottom-up analysis approach.

Code AnalysisClaude

native-data-fetching

31392
from sickn33/antigravity-awesome-skills

Use when implementing or debugging ANY network request, API call, or data fetching. Covers fetch API, React Query, SWR, error handling, caching, offline support, and Expo Router data loaders (useLoaderData).

API IntegrationClaude

llm-structured-output

31392
from sickn33/antigravity-awesome-skills

Get reliable JSON, enums, and typed objects from LLMs using response_format, tool_use, and schema-constrained decoding across OpenAI, Anthropic, and Google APIs.

LLM UtilitiesClaudeChatGPTGemini

hugging-face-datasets

31392
from sickn33/antigravity-awesome-skills

Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.

Data ManagementClaude

hugging-face-dataset-viewer

31392
from sickn33/antigravity-awesome-skills

Query Hugging Face datasets through the Dataset Viewer API for splits, rows, search, filters, and parquet links.

Data Access & ExplorationClaude

gdpr-data-handling

31392
from sickn33/antigravity-awesome-skills

Practical implementation guide for GDPR-compliant data processing, consent management, and privacy controls.

Legal & ComplianceClaude

fp-data-transforms

31392
from sickn33/antigravity-awesome-skills

Everyday data transformations using functional patterns - arrays, objects, grouping, aggregation, and null-safe access

Data TransformationClaude

food-database-query

31392
from sickn33/antigravity-awesome-skills

Food Database Query

NutritionClaude