planning-with-files

Implements file-based planning for complex multi-step tasks. Creates task_plan.md, findings.md, and progress.md as persistent working memory. Use when starting tasks requiring >5 tool calls, multi-phase projects, research, or any work where losing track of goals and progress would be costly.

320 stars

Best use case

planning-with-files is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Implements file-based planning for complex multi-step tasks. Creates task_plan.md, findings.md, and progress.md as persistent working memory. Use when starting tasks requiring >5 tool calls, multi-phase projects, research, or any work where losing track of goals and progress would be costly.

Teams using planning-with-files should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/planning-with-files/SKILL.md --create-dirs "https://raw.githubusercontent.com/trailofbits/skills-curated/main/plugins/planning-with-files/skills/planning-with-files/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/planning-with-files/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How planning-with-files Compares

Feature / Agentplanning-with-filesStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Implements file-based planning for complex multi-step tasks. Creates task_plan.md, findings.md, and progress.md as persistent working memory. Use when starting tasks requiring >5 tool calls, multi-phase projects, research, or any work where losing track of goals and progress would be costly.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Planning with Files

Use persistent markdown files as working memory on disk.

## Quick Start

1. **Create planning files** -- use `/plan` or create manually from
   [templates](references/templates.md)
2. **Create `task_plan.md`** with goal, phases, and key questions
3. **Create `findings.md`** for research and decisions
4. **Create `progress.md`** for session logging
5. **Re-read the plan before decisions** -- refreshes goals in attention
6. **Update after each phase** -- mark status, log errors

## When to Use

- Multi-step tasks (3+ phases)
- Research projects requiring many searches
- Building or creating projects with multiple files
- Tasks spanning many tool calls (>10)
- Any work where losing track of goals would be costly
- Tasks that may span multiple sessions

## When NOT to Use

- Simple questions or quick lookups
- Single-file edits with obvious scope
- Tasks completable in under 5 tool calls
- Conversational exchanges without implementation

## Core Pattern

```
Context Window = RAM (volatile, limited)
Filesystem     = Disk (persistent, unlimited)

Anything important gets written to disk.
```

After many tool calls, the original goal drifts out of the attention
window. Reading `task_plan.md` brings it back. This is the single
most important pattern in file-based planning.

## File Purposes

| File | Purpose | When to Update |
|------|---------|----------------|
| `task_plan.md` | Phases, progress, decisions | After each phase completes |
| `findings.md` | Research, discoveries, decisions | After ANY discovery |
| `progress.md` | Session log, test results | Throughout the session |

All three files go in the **project root**, not the plugin directory.

## Critical Rules

### 1. Create Plan First

Never start a complex task without `task_plan.md`. This is
non-negotiable. The plan is your persistent memory.

### 2. The 2-Action Rule

After every 2 search, browse, or read operations, immediately save
key findings to `findings.md`. Multimodal content (images, browser
results, PDF contents) does not persist in context -- capture it as
text before it is lost.

### 3. Read Before Decide

Before any major decision, re-read `task_plan.md`. This pushes
goals and context back into the recent attention window, counteracting
the "lost in the middle" effect that occurs after ~50 tool calls.

```
[Original goal -- far away in context, forgotten]
...many tool calls...
[Recently read task_plan.md -- gets ATTENTION]
→ Now make the decision with goals fresh in context
```

### 4. Update After Act

After completing any phase:

- Mark phase status: `in_progress` -> `complete`
- Log any errors encountered in the Errors table
- Note files created or modified in `progress.md`

### 5. Log ALL Errors

Every error goes in `task_plan.md`. Include the attempt number and
resolution. This builds knowledge and prevents repeating failures.

```markdown
## Errors Encountered
| Error | Attempt | Resolution |
|-------|---------|------------|
| FileNotFoundError | 1 | Created default config |
| API timeout | 2 | Added retry logic |
```

### 6. Never Repeat Failures

If an action failed, the next action must be different. Track what
you tried and mutate the approach.

```
if action_failed:
    next_action != same_action
```

## 3-Strike Error Protocol

```
ATTEMPT 1: Diagnose & Fix
  -> Read error carefully
  -> Identify root cause
  -> Apply targeted fix

ATTEMPT 2: Alternative Approach
  -> Same error? Try a different method
  -> Different tool? Different library?
  -> NEVER repeat the exact same failing action

ATTEMPT 3: Broader Rethink
  -> Question assumptions
  -> Search for solutions
  -> Consider updating the plan

AFTER 3 FAILURES: Escalate to User
  -> Explain what you tried (with attempt log)
  -> Share the specific error
  -> Ask for guidance
```

## Read vs Write Decision Matrix

| Situation | Action | Reason |
|-----------|--------|--------|
| Just wrote a file | Don't read it | Content still in context |
| Viewed image/PDF | Write findings NOW | Multimodal content doesn't persist |
| Browser returned data | Write to file | Screenshots don't persist |
| Starting new phase | Read plan/findings | Re-orient if context is stale |
| Error occurred | Read relevant file | Need current state to fix |
| Resuming after gap | Read all planning files | Recover full state |

## 5-Question Reboot Test

If you can answer these from your planning files, context is solid:

| Question | Answer Source |
|----------|--------------|
| Where am I? | Current phase in `task_plan.md` |
| Where am I going? | Remaining phases |
| What's the goal? | Goal statement in plan |
| What have I learned? | `findings.md` |
| What have I done? | `progress.md` |

## Anti-Patterns

| Don't | Do Instead |
|-------|------------|
| State goals once and forget | Re-read plan before decisions |
| Hide errors and retry silently | Log every error to plan file |
| Stuff everything in context | Store large content in files |
| Start executing immediately | Create plan file FIRST |
| Repeat failed actions | Track attempts, mutate approach |
| Create files in plugin directory | Create files in project root |

## References

- [Templates](references/templates.md) -- starter templates for all
  three planning files
- [Principles](references/principles.md) -- context engineering
  principles behind this approach
- [Examples](references/examples.md) -- concrete examples and error
  recovery patterns

Related Skills

x-research

320
from trailofbits/skills-curated

Searches X/Twitter for real-time perspectives, dev discussions, product feedback, breaking news, and expert opinions using the X API v2. Provides search with engagement sorting, user profiles, thread fetching, watchlists, and result caching. Use when: (1) user says "x research", "search x for", "search twitter for", "what are people saying about", "what's twitter saying", "check x for", "x search", (2) user needs recent X discourse on a topic (library releases, API changes, product launches, industry events), (3) user wants to find what devs/experts/community thinks about a topic. NOT for: posting tweets or account management.

wooyun-legacy

320
from trailofbits/skills-curated

Provides web vulnerability testing methodology distilled from 88,636 real-world cases from the WooYun vulnerability database (2010-2016). Use when performing penetration testing, security audits, code reviews for security flaws, or vulnerability research. Covers SQL injection, XSS, command execution, file upload, path traversal, unauthorized access, information disclosure, and business logic flaws.

skill-extractor

320
from trailofbits/skills-curated

Extracts reusable skills from work sessions. Use when: (1) a non-obvious problem was solved worth preserving, (2) a pattern was discovered that would help future sessions, (3) a workaround or debugging technique needs documentation. Manual invocation only via /skill-extractor command - no automatic triggers or hooks.

security-awareness

320
from trailofbits/skills-curated

Teaches agents to recognize and avoid security threats during normal activity. Covers phishing detection, credential protection, domain verification, and social engineering defense. Use when building or operating agents that access email, credential vaults, web browsers, or sensitive data.

scv-scan

320
from trailofbits/skills-curated

Audits Solidity codebases for smart contract vulnerabilities using a four-phase workflow (cheatsheet loading, codebase sweep, deep validation, reporting) covering 36 vulnerability classes. Use when auditing Solidity contracts for security issues, performing smart contract vulnerability scans, or reviewing Solidity code for common exploit patterns.

react-pdf

320
from trailofbits/skills-curated

Generates PDF documents using the React-PDF library (@react-pdf/renderer) with TypeScript and JSX. Use when creating PDFs, generating reports, invoices, forms, resumes, or any document that needs flexbox layout, SVG graphics, custom fonts, or professional typesetting. Prefer over Python PDF libraries (ReportLab, fpdf2) when layout complexity matters.

openai-yeet

320
from trailofbits/skills-curated

Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`). Originally from OpenAI's curated skills catalog.

openai-spreadsheet

320
from trailofbits/skills-curated

Use when tasks involve creating, editing, analyzing, or formatting spreadsheets (`.xlsx`, `.csv`, `.tsv`) using Python (`openpyxl`, `pandas`), especially when formulas, references, and formatting need to be preserved and verified. Originally from OpenAI's curated skills catalog.

openai-sentry

320
from trailofbits/skills-curated

Use when the user asks to inspect Sentry issues or events, summarize recent production errors, or pull basic Sentry health data via the Sentry API; perform read-only queries with the bundled script and require `SENTRY_AUTH_TOKEN`. Originally from OpenAI's curated skills catalog.

openai-security-threat-model

320
from trailofbits/skills-curated

Repository-grounded threat modeling that enumerates trust boundaries, assets, attacker capabilities, abuse paths, and mitigations, and writes a concise Markdown threat model. Trigger only when the user explicitly asks to threat model a codebase or path, enumerate threats/abuse paths, or perform AppSec threat modeling. Do not trigger for general architecture summaries, code review, or non-security design work. Originally from OpenAI's curated skills catalog.

openai-security-ownership-map

320
from trailofbits/skills-curated

Analyze git repositories to build a security ownership topology (people-to-file), compute bus factor and sensitive-code ownership, and export CSV/JSON for graph databases and visualization. Trigger only when the user explicitly wants a security-oriented ownership or bus-factor analysis grounded in git history (for example: orphaned sensitive code, security maintainers, CODEOWNERS reality checks for risk, sensitive hotspots, or ownership clusters). Do not trigger for general maintainer lists or non-security ownership questions. Originally from OpenAI's curated skills catalog.

openai-security-best-practices

320
from trailofbits/skills-curated

Perform language and framework specific security best-practice reviews and suggest improvements. Trigger only when the user explicitly requests security best practices guidance, a security review/report, or secure-by-default coding help. Trigger only for supported languages (python, javascript/typescript, go). Do not trigger for general code review, debugging, or non-security tasks. Originally from OpenAI's curated skills catalog.