skill-scrape
PDF annotation extraction routing to scrape-agent
Best use case
skill-scrape is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
PDF annotation extraction routing to scrape-agent
Teams using skill-scrape should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/skill-scrape/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How skill-scrape Compares
| Feature / Agent | skill-scrape | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
PDF annotation extraction routing to scrape-agent
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Scrape Skill
Thin wrapper that routes PDF annotation extraction to the `scrape-agent`, which extracts highlights, comments, notes, and other annotations from PDF files.
## Context Pointers
Reference (do not load eagerly):
- Path: `.claude/context/formats/subagent-return.md`
- Purpose: Return validation
- Load at: Subagent execution only
Note: This skill is a thin wrapper. Context is loaded by the delegated agent, not this skill.
## Trigger Conditions
This skill activates when:
### Direct Invocation
- User explicitly runs `/scrape` command
- User requests annotation extraction in conversation
### Implicit Invocation (during task implementation)
When an implementing agent encounters any of these patterns:
**Plan step language patterns**:
- "Extract annotations from [file].pdf"
- "Extract highlights from [file].pdf"
- "Extract comments from [file]"
- "Scrape notes from [file].pdf"
- "Collect annotations in [file]"
- "Export PDF comments to markdown"
- "Gather highlights and notes from [file]"
**File extension detection**:
- Source file has extension: `.pdf`
- Target mentions: "annotations", "highlights", "comments", "notes"
**Task description keywords**:
- "annotation extraction"
- "PDF scraping"
- "extract highlights"
- "collect comments"
- "gather notes from PDF"
### When NOT to Trigger
Do not invoke for:
- Non-PDF files (.docx, .html, .txt, etc.)
- General document conversion (use skill-filetypes)
- Reading PDF text content without annotation context (use skill-filetypes)
- Operations on spreadsheets or presentations (use skill-spreadsheet, skill-presentation)
---
## Execution
### 1. Input Validation
Validate required inputs:
- `pdf_path` - Must be provided, must exist, must be a .pdf file
- `output_path` - Optional, defaults to `{basename}_annotations.md` in same directory
- `annotation_types` - Optional array, defaults to all types
- `output_format` - Optional, defaults to "markdown"
```bash
# Validate source exists
if [ ! -f "$pdf_path" ]; then
return error "PDF file not found: $pdf_path"
fi
# Validate source is a PDF
if [[ "${pdf_path##*.}" != "pdf" ]]; then
return error "Source must be a .pdf file: $pdf_path"
fi
# Determine output path if not provided
if [ -z "$output_path" ]; then
source_dir=$(dirname "$pdf_path")
source_base=$(basename "$pdf_path" .pdf)
output_path="${source_dir}/${source_base}_annotations.md"
fi
```
### 2. Context Preparation
Prepare delegation context:
```json
{
"pdf_path": "/absolute/path/to/document.pdf",
"output_path": "/absolute/path/to/document_annotations.md",
"annotation_types": ["highlights", "comments", "notes", "bookmarks"],
"output_format": "markdown",
"metadata": {
"session_id": "sess_{timestamp}_{random}",
"delegation_depth": 1,
"delegation_path": ["orchestrator", "scrape", "skill-scrape"]
}
}
```
### 3. Invoke Agent
**CRITICAL**: You MUST use the **Task** tool to spawn the scrape agent.
**Required Tool Invocation**:
```
Tool: Task (NOT Skill)
Parameters:
- subagent_type: "scrape-agent"
- prompt: [Include pdf_path, output_path, annotation_types, output_format, metadata]
- description: "Extract annotations from {pdf_path} to {output_path}"
```
**DO NOT** use `Skill(scrape-agent)` - this will FAIL.
Agents live in `.claude/agents/` or extension agent directories, not `.claude/skills/`.
The Skill tool can only invoke skills from `.claude/skills/`.
The agent will:
- Open the PDF and enumerate all annotation objects
- Filter by requested annotation types
- Format annotations according to output_format
- Write structured output to output_path
- Return standardized JSON result
### 4. Return Validation
Validate return matches `subagent-return.md` schema:
- Status is one of: scraped, partial, failed
- Summary is non-empty and <100 tokens
- Artifacts array present with output file path
- Metadata contains session_id, agent_type, delegation info
### 5. Return Propagation
Return validated result to caller without modification.
---
## Return Format
See `.claude/context/formats/subagent-return.md` for full specification.
Expected successful return:
```json
{
"status": "scraped",
"summary": "Extracted 42 annotations from document.pdf: 28 highlights, 10 comments, 4 notes",
"artifacts": [
{
"type": "implementation",
"path": "/absolute/path/to/document_annotations.md",
"summary": "Extracted annotations in markdown format"
}
],
"metadata": {
"session_id": "sess_...",
"agent_type": "scrape-agent",
"delegation_depth": 2,
"delegation_path": ["orchestrator", "scrape", "skill-scrape", "scrape-agent"],
"annotation_count": 42,
"annotation_breakdown": {
"highlights": 28,
"comments": 10,
"notes": 4
}
},
"next_steps": "Review extracted annotations"
}
```
---
## Error Handling
### Input Validation Errors
Return immediately with failed status if PDF not found or not a .pdf file.
### Unsupported Format
Return failed status with clear message about supported annotation types.
### Agent Errors
Pass through the agent's error return verbatim.
### Tool Not Available
Return failed status with installation instructions for PDF annotation tools (e.g., pdfannots, PyMuPDF).
---
## MUST NOT
- Run the postflight step (git commit, status update) — that is the command's responsibility
- Modify the return from scrape-agent before propagating
- Load context files eagerly — only reference them when needed
- Use the Skill tool to invoke scrape-agent (use Task tool instead)Related Skills
skill-learn
Scan codebase for FIX:/NOTE:/TODO:/QUESTION: tags and create structured tasks with interactive selection. Invoke for /learn command.
skill-deck
Generate YC-style investor pitch decks in Typst
skill-todo
Archive completed and abandoned tasks with CHANGE_LOG.md updates and memory harvest suggestions
skill-team-research
Orchestrate multi-agent research with wave-based parallel execution. Spawns 2-4 teammates for diverse investigation angles and synthesizes findings.
skill-team-plan
Orchestrate multi-agent planning with parallel plan generation. Spawns 2-3 teammates for diverse planning approaches and synthesizes into final plan with trade-off analysis.
skill-team-implement
Orchestrate multi-agent implementation with parallel phase execution. Spawns teammates for independent phases and coordinates dependent phases. Includes debugger teammate for error recovery.
skill-status-sync
Atomically update task status across TODO.md and state.json. For standalone use only.
skill-spawn
Research blockers and spawn new tasks to overcome them, updating parent task dependencies
skill-researcher
Conduct general research using web search, documentation, and codebase exploration. Invoke for general research tasks.
skill-refresh
Manage Claude Code resources - terminate orphaned processes and clean up ~/.claude/ directory
skill-planner
Create phased implementation plans from research findings. Invoke when a task needs an implementation plan.
skill-orchestrator
Route commands to appropriate workflows based on task language and status. Invoke when executing /task, /research, /plan, /implement commands.