RLM (Recursive Language Model) Skill
The RLM (Recursive Language Model) Skill enables AI agents to process extremely large contexts (10M+ tokens) by recursively chunking, processing, and aggregating results, effectively overcoming context window limitations.
About this skill
The RLM Skill empowers AI agents, particularly those with inherent context window limitations, to tackle massive datasets that would otherwise be impossible to process. It achieves this through a structured "Load → Inspect → Chunk → Sub-Query → Aggregate" pattern. This means an agent can first load an entire large file or multiple files into RLM's memory, then inspect its structure without consuming precious context tokens. Next, it intelligently breaks the content into manageable chunks using various strategies (lines, characters, paragraphs), processes each chunk with sub-queries, and finally aggregates the individual results into a comprehensive output. This skill is crucial for tasks requiring deep analysis across extensive codebases, summarizing vast log files, or extracting insights from large document sets. Instead of failing due to context overflow, RLM provides a robust mechanism to systematically analyze, understand, and synthesize information from virtually any size of input. Its utility extends beyond simple summarization, allowing for complex pattern detection, data aggregation, and structured query execution over very large contexts.
Best use case
The primary use case is enabling AI agents to analyze, understand, and generate insights from extremely large files, multi-file projects, or datasets that far exceed typical context window limits. Developers benefit by using it for deep codebase analysis, security researchers for reviewing large log files, and analysts for processing extensive reports or research papers. It ensures no data is left unexamined due to size constraints, allowing for comprehensive understanding and aggregation.
The RLM (Recursive Language Model) Skill enables AI agents to process extremely large contexts (10M+ tokens) by recursively chunking, processing, and aggregating results, effectively overcoming context window limitations.
Users should expect comprehensive analyses, summaries, or aggregated insights derived from very large contexts that would normally cause an AI agent to fail due to context window limitations.
Practical example
Example input
Analyze the architectural patterns across all `.py` files in this large repository and summarize their dependencies.
Example output
After processing the entire codebase, RLM identifies a prevalent MVC architectural pattern with clear separation of concerns. Key dependencies show module A relies heavily on B for data persistence and C for external API integrations, reducing direct coupling. The main points of interaction are identified in `controllers/api_v1.py`.
When to use this skill
- Analyzing files larger than 100KB or 2000 lines.
- Processing multiple files simultaneously for combined analysis.
- Performing deep codebase architecture analysis.
- Aggregating patterns or summaries from large datasets like logs or documents.
When not to use this skill
- For small files that easily fit within the agent's context window.
- Simple, single-pass tasks that don't require chunking.
- Interactive editing tasks where direct text manipulation is needed.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/rlm/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How RLM (Recursive Language Model) Skill Compares
| Feature / Agent | RLM (Recursive Language Model) Skill | Standard Approach |
|---|---|---|
| Platform Support | Claude | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | easy | N/A |
Frequently Asked Questions
What does this skill do?
The RLM (Recursive Language Model) Skill enables AI agents to process extremely large contexts (10M+ tokens) by recursively chunking, processing, and aggregating results, effectively overcoming context window limitations.
Which AI agents support this skill?
This skill is designed for Claude.
How difficult is it to install?
The installation complexity is rated as easy. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
SKILL.md Source
# RLM (Recursive Language Model) Skill
## Overview
The RLM pattern enables processing massive contexts (10M+ tokens) that exceed Claude's context window by recursively chunking, processing, and aggregating results. Instead of failing on large files, use RLM to break them into manageable pieces.
## When to Use RLM
Use RLM when you encounter:
- **Large files**: Any file >100KB or >2000 lines
- **Multi-file analysis**: Processing multiple files together (combined size matters)
- **Context exceeded**: User asks to analyze content that won't fit in context window
- **Aggregation tasks**: Summarizing logs, finding patterns across large datasets, counting/filtering operations
- **Deep codebase analysis**: Understanding architecture across many files
- **Document processing**: Analyzing reports, research papers, documentation sets
**Don't use RLM for:**
- Small files (<100KB)
- Single-pass tasks that fit in context
- Interactive editing (use standard tools)
## The RLM Pattern
The core workflow is: **Load → Inspect → Chunk → Sub-Query → Aggregate**
### Step 1: Load Context
```bash
# Load large content into RLM memory
rlm_load_context(
name="codebase",
content=file_contents # Full file content
)
```
Returns: `{name, size_bytes, size_chars, line_count, loaded: true}`
### Step 2: Inspect Context
```bash
# Understand structure without loading into prompt
rlm_inspect_context(
name="codebase",
preview_chars=500 # Optional preview
)
```
Returns: Metadata + preview (first N chars)
### Step 3: Chunk Context
```bash
# Break into manageable pieces
rlm_chunk_context(
name="codebase",
strategy="lines", # or "chars" or "paragraphs"
size=100 # Lines per chunk (or chars if strategy=chars)
)
```
**Chunking Strategies:**
- `lines` (default): Split by line count - best for code, logs, structured data
- `chars`: Split by character count - best for prose, unstructured text
- `paragraphs`: Split by blank lines - best for documents, markdown
Returns: `{name, chunk_count, strategy, size_per_chunk}`
### Step 4: Sub-Query (Process Chunks)
**Single chunk:**
```bash
rlm_sub_query(
context_name="codebase",
chunk_index=0,
query="Extract all function names",
provider="claude-sdk" # or "ollama"
)
```
**Batch processing (parallel):**
```bash
rlm_sub_query_batch(
context_name="codebase",
chunk_indices=[0, 1, 2, 3],
query="Extract all function names",
provider="claude-sdk",
concurrency=4 # Max parallel requests (max: 8)
)
```
Returns: Array of results, one per chunk
### Step 5: Store Results (Optional)
```bash
# Store intermediate results for later aggregation
rlm_store_result(
name="function_names",
result=sub_query_response,
metadata={"chunk": 0} # Optional
)
```
### Step 6: Aggregate
```bash
# Retrieve all stored results
rlm_get_results(name="function_names")
```
Then synthesize final answer from all chunk results.
## Provider Options
### claude-sdk (Default - Recommended)
- Model: **Haiku 4.5** (fast, accurate, cost-effective)
- Cost: ~$0.25 per 1M input tokens
- Best for: Most tasks requiring accuracy
- Usage: `provider="claude-sdk"`
### ollama (Local, Free)
- Model: User's local Ollama instance
- Cost: Free (runs on your hardware)
- Best for: Experimentation, privacy-sensitive data, budget constraints
- Usage: `provider="ollama"`
**Choosing a Provider:**
- Default to `claude-sdk` for production tasks
- Use `ollama` when cost/privacy is primary concern
- Haiku 4.5 is fast enough for batch processing
## Example Workflows
### Workflow 1: Analyze Large Codebase
**Task:** Find all TODO comments across 50 Python files
```python
# 1. Read all files and combine
all_code = ""
for file in python_files:
all_code += read_file(file)
# 2. Load into RLM
rlm_load_context(name="codebase", content=all_code)
# 3. Inspect to understand size
metadata = rlm_inspect_context(name="codebase")
# Shows: 15,000 lines, 500KB
# 4. Chunk by lines (code is line-oriented)
rlm_chunk_context(
name="codebase",
strategy="lines",
size=200 # 200 lines per chunk
)
# Result: 75 chunks
# 5. Process in batches
for batch_start in range(0, 75, 4):
batch_indices = list(range(batch_start, min(batch_start+4, 75)))
results = rlm_sub_query_batch(
context_name="codebase",
chunk_indices=batch_indices,
query="Extract all TODO comments with line context",
concurrency=4
)
# 6. Store results
for i, result in enumerate(results):
rlm_store_result(
name="todos",
result=result,
metadata={"chunk": batch_start + i}
)
# 7. Aggregate
all_results = rlm_get_results(name="todos")
# Synthesize final TODO list
```
### Workflow 2: Process Large Log File
**Task:** Summarize errors from 10MB log file
```python
# 1. Load logs
logs = read_file("/var/log/app.log")
rlm_load_context(name="logs", content=logs)
# 2. Chunk by lines (logs are line-oriented)
rlm_chunk_context(name="logs", strategy="lines", size=500)
# 3. Filter to error lines only
rlm_filter_context(
name="logs",
output_name="errors",
pattern=r"ERROR|CRITICAL",
mode="keep"
)
# 4. Chunk filtered results
rlm_chunk_context(name="errors", strategy="lines", size=100)
# 5. Batch process errors
chunk_metadata = rlm_inspect_context(name="errors")
num_chunks = chunk_metadata["chunk_count"]
all_indices = list(range(num_chunks))
results = rlm_sub_query_batch(
context_name="errors",
chunk_indices=all_indices,
query="Group errors by type and count occurrences",
concurrency=8
)
# 6. Aggregate error summary
# Synthesize from results array
```
### Workflow 3: Multi-Document Q&A
**Task:** Answer questions from 20 research papers
```python
# 1. Load all papers
combined_docs = "\n\n=== DOCUMENT BREAK ===\n\n".join(papers)
rlm_load_context(name="research", content=combined_docs)
# 2. Chunk by paragraphs (prose is paragraph-oriented)
rlm_chunk_context(name="research", strategy="paragraphs", size=50)
# 3. Ask question across all chunks
results = rlm_sub_query_batch(
context_name="research",
chunk_indices=list(range(chunk_count)),
query="Does this section mention climate change impacts on agriculture? If yes, summarize key points.",
concurrency=8
)
# 4. Filter relevant results
relevant = [r for r in results if "yes" in r.lower()]
# 5. Final synthesis
# Combine relevant excerpts into answer
```
## Tool Reference
### rlm_load_context
Load large content into RLM memory without consuming context window.
- `name`: Identifier for this context
- `content`: Full text content to load
### rlm_inspect_context
Get metadata and preview without loading full content.
- `name`: Context identifier
- `preview_chars`: Number of characters to preview (default: 500)
### rlm_chunk_context
Split context into manageable chunks.
- `name`: Context identifier
- `strategy`: `"lines"`, `"chars"`, or `"paragraphs"`
- `size`: Chunk size (meaning depends on strategy)
### rlm_get_chunk
Retrieve specific chunk by index.
- `name`: Context identifier
- `chunk_index`: Zero-based chunk index
### rlm_filter_context
Filter context using regex, creates new filtered context.
- `name`: Source context identifier
- `output_name`: Name for filtered context
- `pattern`: Regex pattern to match
- `mode`: `"keep"` (keep matches) or `"remove"` (remove matches)
### rlm_sub_query
Process single chunk with sub-LLM call.
- `context_name`: Context identifier
- `query`: Question/instruction for sub-call
- `chunk_index`: Optional specific chunk (otherwise uses whole context)
- `provider`: `"claude-sdk"` or `"ollama"`
- `model`: Optional model override
### rlm_sub_query_batch
Process multiple chunks in parallel (recommended).
- `context_name`: Context identifier
- `query`: Question/instruction for each chunk
- `chunk_indices`: Array of chunk indices to process
- `provider`: `"claude-sdk"` or `"ollama"`
- `concurrency`: Max parallel requests (default: 4, max: 8)
### rlm_store_result
Store sub-call result for later aggregation.
- `name`: Result set identifier
- `result`: Result content to store
- `metadata`: Optional metadata about result
### rlm_get_results
Retrieve all stored results for aggregation.
- `name`: Result set identifier
### rlm_list_contexts
List all loaded contexts and their metadata.
## Best Practices
### Chunking Strategy Selection
- **Code/logs/CSV**: Use `lines` (structured, line-oriented)
- **Prose/articles**: Use `paragraphs` (semantic boundaries)
- **Unstructured text**: Use `chars` (uniform distribution)
### Chunk Size Guidelines
- **Lines**: 100-500 (balance between context and granularity)
- **Chars**: 2000-10000 (roughly 500-2500 tokens)
- **Paragraphs**: 20-100 (depends on paragraph length)
### Efficient Processing
1. **Use batch processing**: `rlm_sub_query_batch` is much faster than sequential calls
2. **Set appropriate concurrency**: 4-8 parallel requests balances speed and resource usage
3. **Filter before chunking**: Use `rlm_filter_context` to reduce data volume
4. **Inspect first**: Always check context size before chunking
### Cost Optimization
- Use `claude-sdk` (Haiku 4.5) for most tasks - fast and cheap
- Use `ollama` for experimentation or when processing very large volumes
- Filter contexts before processing to reduce token usage
- Chunk at appropriate granularity (bigger chunks = fewer calls)
## Common Patterns
### Map-Reduce Pattern
```python
# Map: Process each chunk
results = rlm_sub_query_batch(
context_name="data",
chunk_indices=all_indices,
query="Extract key information",
concurrency=8
)
# Reduce: Aggregate results
final = synthesize(results)
```
### Filter-Process Pattern
```python
# Filter to relevant content
rlm_filter_context(
name="all_logs",
output_name="errors",
pattern="ERROR",
mode="keep"
)
# Process filtered content
results = rlm_sub_query_batch(
context_name="errors",
chunk_indices=all_indices,
query="Categorize error type"
)
```
### Hierarchical Processing Pattern
```python
# First pass: Summarize each chunk
summaries = rlm_sub_query_batch(
context_name="docs",
chunk_indices=all_indices,
query="Summarize key points"
)
# Second pass: Aggregate summaries
rlm_load_context(name="summaries", content="\n".join(summaries))
final = rlm_sub_query(
context_name="summaries",
query="Create overall summary from chunk summaries"
)
```
## Troubleshooting
### "Context too large" errors
- You're trying to process chunks that are still too big
- Solution: Reduce chunk size or filter content first
### Slow processing
- Sequential sub-queries instead of batch
- Solution: Use `rlm_sub_query_batch` with appropriate concurrency
### Poor aggregation quality
- Chunks too small (losing context)
- Solution: Increase chunk size to maintain semantic coherence
### High costs
- Using wrong provider or inefficient chunking
- Solution: Use Haiku 4.5 (`claude-sdk`) and filter before processing
## Summary
RLM unlocks massive context processing for Claude Code:
- ✅ Handle files >100KB easily
- ✅ Process multiple files together
- ✅ Parallelize for speed (batch processing)
- ✅ Cost-effective with Haiku 4.5
- ✅ Flexible chunking strategies
- ✅ Map-reduce pattern for aggregation
**Default workflow:** Load → Inspect → Chunk (lines, 200) → Batch Sub-Query (claude-sdk, concurrency=4) → AggregateRelated Skills
dcf
Discounted cash flow valuation with sensitivity analysis
notebooklm-research
Full-autopilot AI research agent powered by Google NotebookLM (notebooklm-py v0.3.4). Ingests sources (URL, text, PDF, DOCX, YouTube, Google Drive), runs deep web research, asks cited questions, and generates 10 native artifact types (audio podcast, video, cinematic video, slide deck, report, quiz, flashcards, mind map, infographic, data table, study guide). Produces original content drafts via Claude, with optional publishing to social platforms via threads-viral-agent integration. Use this skill when the user mentions: NotebookLM, research with sources, create notebook, generate podcast from articles, turn research into content, trending topic research, research pipeline, source-based analysis, cited research answers, generate slides, generate quiz, make flashcards, deep web research, create infographic, compare sources, research report, study guide, source analysis, or knowledge synthesis.
Skill: History
## Purpose
Skill: Explore Data
## Purpose
q
Fast SQLite-based vault search using FTS5 full-text search index
nblm
This skill allows AI agents, particularly Claude Code, to directly query and manage your Google NotebookLM notebooks, providing source-grounded and citation-backed answers from Gemini.
lastXdays
Researches any given topic across Reddit, X (Twitter), and the broader web within a custom, configurable time window, synthesizing findings and generating expert-level prompts.
tavily-search
Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.
baidu-search
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
notebooklm
Google NotebookLM 非官方 Python API 的 OpenClaw Skill。支持内容生成(播客、视频、幻灯片、测验、思维导图等)、文档管理和研究自动化。当用户需要使用 NotebookLM 生成音频概述、视频、学习材料或管理知识库时触发。
openclaw-search
Intelligent search for agents. Multi-source retrieval with confidence scoring - web, academic, and Tavily in one unified API.
aisa-tavily
AI-optimized web search via AIsa's Tavily API proxy. Returns concise, relevant results for AI agents through AIsa's unified API gateway.