very-long-text-summarization

Summarizes very long texts (books, handbooks, biographies, codebases) using hierarchical multi-pass extraction with cheap model armies. Produces structured knowledge maps, not just summaries. Use when processing 50+ page documents, professional handbooks, career biographies, or any text too large for a single context window. Activate on "summarize book", "summarize handbook", "long document", "extract knowledge", "distill text", "professional biography". NOT for short text summarization (<10 pages), real-time chat summarization, or code documentation (use technical-writer).

85 stars

bycuriositech

View on GitHub Installation ↓

Best use case

very-long-text-summarization is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using very-long-text-summarization should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/very-long-text-summarization/SKILL.md --create-dirs "https://raw.githubusercontent.com/curiositech/some_claude_skills/main/.claude/skills/very-long-text-summarization/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/very-long-text-summarization/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How very-long-text-summarization Compares

Feature / Agent	very-long-text-summarization	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Very Long Text Summarization

Processes texts too large for a single context window using hierarchical multi-pass extraction with armies of cheap models. Produces structured knowledge maps, indexed summaries, and skill drafts — not just prose compression.

---

## When to Use

✅ **Use for**:
- Professional handbooks and textbooks (100-1000+ pages)
- Career biographies and memoirs (extracting expertise patterns)
- Large codebases (architecture-level understanding)
- Research paper collections (synthesizing findings across papers)
- Any text exceeding a single context window (~100K tokens)

❌ **NOT for**:
- Short documents (&lt;10 pages) — just read them directly
- Real-time conversation summarization (use auto-compact patterns)
- Code documentation generation (use `technical-writer`)
- Simple TL;DR requests (not worth the multi-pass overhead)

---

## Architecture: Three-Pass Hierarchical Extraction

```mermaid
flowchart TD
  D[Document] --> C[Chunk into segments]
  C --> P1["Pass 1: Haiku army\n(parallel extraction)"]
  P1 --> I[Intermediate summaries]
  I --> P2["Pass 2: Sonnet synthesis\n(merge + structure)"]
  P2 --> S[Structured knowledge map]
  S --> P3["Pass 3: Opus refinement\n(optional, for skill drafts)"]
  P3 --> O[Final output]
```

### Pass 1: Chunked Extraction (Haiku Army)

Split the document into overlapping chunks (~4K tokens each, 500 token overlap). Deploy one Haiku call per chunk in parallel. Each extracts:

```yaml
extraction_template:
  summary: "2-3 sentence summary of this section"
  key_claims: ["list of factual claims or assertions"]
  processes: ["any step-by-step procedures described"]
  decisions: ["any decision points or heuristics mentioned"]
  failures: ["any failures, mistakes, or anti-patterns described"]
  aha_moments: ["any insights, realizations, or conceptual breakthroughs"]
  metaphors: ["any metaphors or mental models used"]
  temporal: ["any 'things changed when...' or 'before X, after Y' patterns"]
  quotes: ["notable direct quotes worth preserving"]
  references: ["any citations, links, or cross-references"]
```

**Cost**: ~$0.001 per chunk. A 300-page book (~150K tokens) = ~38 chunks = ~$0.04 total for Pass 1.

**Parallelism**: All chunks run simultaneously. A 300-page book completes Pass 1 in ~3 seconds (wall clock), not 3 minutes.

### Pass 2: Synthesis (Sonnet)

Feed all Pass 1 extractions into one or more Sonnet calls. Sonnet merges, deduplicates, and structures the knowledge.

```yaml
synthesis_template:
  document_summary: "1-2 paragraph executive summary"
  
  knowledge_map:
    core_concepts:
      - concept: "name"
        definition: "what it means in this domain"
        relationships: ["connects to concept X because..."]
    
    processes:
      - name: "process name"
        steps: ["ordered steps"]
        decision_points: ["where choices are made"]
        common_mistakes: ["what goes wrong"]
    
    expertise_patterns:
      - pattern: "what experts do differently"
        novice_mistake: "what novices do instead"
        aha_moment: "the insight that bridges the gap"
    
    temporal_evolution:
      - period: "date range"
        paradigm: "what was believed/practiced"
        change_trigger: "what caused the shift"
    
    key_metaphors:
      - metaphor: "how practitioners think about X"
        maps_to: "the underlying structure it represents"
  
  index:
    - topic: "topic name"
      chunk_ids: [3, 7, 12]  # Which original chunks cover this
      summary: "1 sentence"
```

**Cost**: ~$0.02-0.05 depending on extraction volume. The index preserves traceability back to specific book sections.

### Pass 3: Refinement (Opus, Optional)

For skill-draft output mode: Opus takes the knowledge map and produces a SKILL.md following the skill-architect template. This is the "crystallize skill from handbook" pipeline.

**Cost**: ~$0.10. Only run when the output is a skill draft.

---

## Chunking Strategy

### Semantic Chunking (Preferred)

Split on document structure — chapter boundaries, section headings, paragraph breaks. Preserves semantic coherence within each chunk.

```python
def semantic_chunk(text: str, max_tokens: int = 4000, overlap: int = 500) -> list[str]:
    """Split text on structural boundaries with overlap."""
    # Split on headings, then merge short sections
    sections = split_on_headings(text)  # ##, ###, etc.
    
    chunks = []
    current = ""
    
    for section in sections:
        if count_tokens(current + section) > max_tokens:
            chunks.append(current)
            # Overlap: keep the last ~500 tokens
            current = get_last_n_tokens(current, overlap) + section
        else:
            current += section
    
    if current:
        chunks.append(current)
    
    return chunks
```

### Fixed-Size Chunking (Fallback)

For unstructured text without headings. Split on paragraph boundaries, targeting ~4K tokens with 500-token overlap.

### Why Overlap?

Concepts that span chunk boundaries need to appear in both chunks to be extracted. Without overlap, you lose cross-boundary knowledge.

---

## Output Modes

### Mode 1: Summary

Produces a structured summary with executive overview, key concepts, and index.

**Use for**: Quick understanding of a long document. Reading a handbook before a meeting.

### Mode 2: Knowledge Map

Produces the full knowledge map: concepts, processes, expertise patterns, temporal evolution, metaphors. Machine-readable (YAML/JSON) for downstream processing.

**Use for**: Feeding into skill creation, domain meta-skill development, or cross-document analysis.

### Mode 3: Skill Draft

Produces a SKILL.md following the skill-architect template, with the handbook's expertise encoded as decision trees, anti-patterns, and shibboleths.

**Use for**: Converting professional handbooks into Claude skills. The KE pipeline.

---

## Cost Model

| Document Size | Pages | Chunks | Pass 1 (Haiku) | Pass 2 (Sonnet) | Pass 3 (Opus) | Total |
|--------------|-------|--------|----------------|-----------------|---------------|-------|
| Article | 10 | 4 | $0.004 | $0.01 | — | $0.014 |
| Chapter | 30 | 10 | $0.01 | $0.02 | — | $0.03 |
| Handbook | 300 | 38 | $0.04 | $0.05 | $0.10 | $0.19 |
| Textbook | 800 | 100 | $0.10 | $0.10 | $0.10 | $0.30 |
| Encyclopedia | 2000+ | 250+ | $0.25 | $0.20 | $0.10 | $0.55 |

Processing time is dominated by the longest single Haiku call (~2-3s). With full parallelism, even a 2000-page text completes Pass 1 in under 5 seconds.

---

## Anti-Patterns

### Single-Pass Summarization
**Wrong**: Feed the entire document into one Opus call.
**Why**: Exceeds context window, or attention dilution produces weak extraction on such long input.
**Right**: Hierarchical multi-pass. Cheap parallel extraction → expensive synthesis.

### Summarization Without Structure
**Wrong**: Produce a 2-paragraph prose summary of a 300-page handbook.
**Why**: The structure IS the knowledge. A flat summary loses the decision trees, failure patterns, and temporal evolution that make skills valuable.
**Right**: Structured knowledge map with indexed access back to source sections.

### Skipping Overlap
**Wrong**: Chunk on hard boundaries with no overlap.
**Why**: Cross-boundary concepts get split and lost.
**Right**: 500-token overlap between chunks. Each chunk includes the tail of the previous chunk.

### Ignoring Source Traceability
**Wrong**: Produce extractions without tracking which chunk they came from.
**Why**: When a claim seems wrong, you need to verify it against the source. Without traceability, you can't.
**Right**: Every extraction carries a `chunk_id` linking back to the original text segment.

Related Skills

recovery-social-features

from curiositech/some_claude_skills

Privacy-first social features for recovery apps - sponsors, groups, messaging, friend connections. Use for sponsor/sponsee systems, meeting-based groups, peer support, safe messaging. Activate on "sponsor", "sponsee", "recovery group", "accountability partner", "sober network", "meeting group", "peer support". NOT for general social media patterns (use standard social), dating features, or public profiles.

recovery-education-writer

from curiositech/some_claude_skills

Write neuroscientific, peer-oriented drug education content that roots experiences in body/brain mechanisms. Use when creating educational articles, explaining neurological phenomena, demystifying recovery challenges, or answering "why does this happen?" questions. Activates for harm reduction content, psychoeducation, recovery science writing, and content that reduces shame through understanding.

recovery-community-moderator

from curiositech/some_claude_skills

Trauma-informed AI moderator for addiction recovery communities. Applies harm reduction principles, honors 12-step traditions, distinguishes healthy conflict from abuse, detects crisis posts. Activate on 'community moderation', 'moderate forum', 'review post', 'check content', 'crisis detection'. NOT for legal documents (use recovery-app-legal-terms), app development (use domain skills), or therapy (use jungian-psychologist).

recovery-coach-patterns

from curiositech/some_claude_skills

Follow Recovery Coach codebase patterns and conventions. Use when writing new code, components, API routes, or database queries. Activates for general development, code organization, styling, and architectural decisions in this project.

recovery-app-onboarding

from curiositech/some_claude_skills

Expert guidance for designing and implementing onboarding flows in recovery, wellness, and mental health applications. This skill should be used when building onboarding experiences, first-time user flows, feature discovery, or tutorial systems for apps serving vulnerable populations (addiction recovery, mental health, wellness). Activate on "onboarding", "first-time user", "tutorial", "feature tour", "welcome flow", "new user experience", "app introduction", "recovery app UX". NOT for general mobile UX (use mobile-ux-optimizer), marketing landing pages (use web-design-expert), or native app development (use iOS/Android skills).

recovery-app-legal-terms

from curiositech/some_claude_skills

Generate legally-sound terms of service, privacy policies, and medical disclaimers for recovery and wellness applications. Expert in HIPAA, GDPR, CCPA compliance. Activate on 'terms of service', 'privacy policy', 'legal terms', 'medical disclaimer', 'HIPAA', 'user agreement'. NOT for contract negotiation (use attorney), app development (use domain skills), or moderation (use recovery-community-moderator).

partner-text-coach

from curiositech/some_claude_skills

Real-time communication coach for navigating partner/relationship texts. Analyzes incoming messages for emotional subtext, suggests thoughtful responses, helps de-escalate conflict, and provides follow-up conversation strategies. Expert in attachment theory, nonviolent communication (NVC), Gottman research, and healthy relationship dynamics. Activate on "what should I say", "how to respond", "partner text", "relationship message", "what does this mean", "text my partner", "conversation with partner". NOT for manipulation tactics, revenge/ghosting advice, replacing couples therapy, or abusive relationships (seek professional help).

skill-coach

from curiositech/some_claude_skills

Guides creation of high-quality Agent Skills with domain expertise, anti-pattern detection, and progressive disclosure best practices. Use when creating skills, reviewing existing skills, or when users mention improving skill quality, encoding expertise, or avoiding common AI tooling mistakes. Activate on keywords: create skill, review skill, skill quality, skill best practices, skill anti-patterns. NOT for general coding advice or non-skill Claude Code features.

3d-cv-labeling-2026

from curiositech/some_claude_skills

Expert in 3D computer vision labeling tools, workflows, and AI-assisted annotation for LiDAR, point clouds, and sensor fusion. Covers SAM4D/Point-SAM, human-in-the-loop architectures, and vertical-specific training strategies. Activate on '3D labeling', 'point cloud annotation', 'LiDAR labeling', 'SAM 3D', 'SAM4D', 'sensor fusion annotation', '3D bounding box', 'semantic segmentation point cloud'. NOT for 2D image labeling (use clip-aware-embeddings), general ML training (use ml-engineer), video annotation without 3D (use computer-vision-pipeline), or VLM prompt engineering (use prompt-engineer).

wisdom-accountability-coach

from curiositech/some_claude_skills

Longitudinal memory tracking, philosophy teaching, and personal accountability with compassion. Expert in pattern recognition, Stoicism/Buddhism, and growth guidance. Activate on 'accountability', 'philosophy', 'Stoicism', 'Buddhism', 'personal growth', 'commitment tracking', 'wisdom teaching'. NOT for therapy or mental health treatment (refer to professionals), crisis intervention, or replacing professional coaching credentials.

windows-95-web-designer

from curiositech/some_claude_skills

Modern web applications with authentic Windows 95 aesthetic. Gradient title bars, Start menu paradigm, taskbar patterns, 3D beveled chrome. Extrapolates Win95 to AI chatbots, mobile UIs, responsive layouts. Activate on 'windows 95', 'win95', 'start menu', 'taskbar', 'retro desktop', '95 aesthetic', 'clippy'. NOT for Windows 3.1 (use windows-3-1-web-designer), vaporwave/synthwave, macOS, flat design.

windows-3-1-web-designer

from curiositech/some_claude_skills

Modern web applications with authentic Windows 3.1 aesthetic. Solid navy title bars, Program Manager navigation, beveled borders, single window controls. Extrapolates Win31 to AI chatbots (Cue Card paradigm), mobile UIs (pocket computing). Activate on 'windows 3.1', 'win31', 'program manager', 'retro desktop', '90s aesthetic', 'beveled'. NOT for Windows 95 (use windows-95-web-designer - has gradients, Start menu), vaporwave/synthwave, macOS, flat design.