read large webpage or knowledge

This skill is used for segmented reading and organization when facing large-scale knowledge bases or web pages. It captures original content segment by segment, summarizes key points in real-time, and continuously deposits them into the knowledge base, ensuring orderly information ingestion, clear structure, and traceability.

1,172 stars

byinclusionAI

View on GitHub Installation ↓

Best use case

read large webpage or knowledge is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using read large webpage or knowledge should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/read_large_webpage/SKILL.md --create-dirs "https://raw.githubusercontent.com/inclusionAI/AWorld/main/examples/aworld_quick_start/cli/skills/read_large_webpage/skill.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/read_large_webpage/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How read large webpage or knowledge Compares

Feature / Agent	read large webpage or knowledge	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

### 🧠 Knowledge Base
- **Target Scenarios**: Reading long technical documents, research reports, policy documents, web encyclopedias, etc.
- **Core Capabilities**: Segment-based retrieval of original text, real-time summarization, and knowledge network construction.
- **Supporting Tools**: `get_knowledge_by_lines` (segment-by-segment reading), `add_knowledge` (incremental summary writing).

### 📥 Input Specification
Before starting to read, the following should be clarified:
1. The identifier of the knowledge resource to be read (e.g., URL, document ID, file path).
2. The number of lines or paragraph size to pull each time.
3. The current question or topic of focus, to maintain focus during summarization.
4. Output format requirements (paragraph summaries, bullet points, continuous records, etc.).

### 🛠️ Processing Pipeline
1. **Locate Range**: Determine the starting line number and reading length based on user input, and record offsets when necessary for continuation.
2. **Segment-by-Segment Reading**: Call `get_knowledge_by_lines` to pull the original content of the specified range. If the content is too long, it can be scheduled in multiple batches, and record the remaining unread ranges.
3. **Real-Time Analysis**: Extract key points from the pulled segments, annotate keywords, key information, potential issues, or data.
4. **Knowledge Deposition**: Write the refined key points into the knowledge base through `add_knowledge`, along with source line numbers, timestamps, or context descriptions, maintaining structure.
5. **Iterative Progress**: Repeat steps 2-4 until the entire text is read or the user-defined target depth is reached, while maintaining progress indices for recovery.
6. **Global Review**: At periodic nodes, merge stored summaries, generate overall context maps or summaries, and identify missing information.

### 🔁 Iterative Tips
- If cross-segment comparison is needed, it is recommended to preserve original fragment IDs for traceability.
- For key concepts, additional reasoning skills can be called for verification or expansion.
- It is recommended to record unanswered questions in summaries, which should be prioritized when continuing to consult later.

### 📤 Output Template
```
📍 Reading Progress
- Source: ...
- Range: Line ... - ...
- Remaining: ...

📝 Summary Points
- Point 1: ...
- Point 2: ...
- Point 3: ...

🧾 Stored Knowledge
- Knowledge ID: ...
- Summary: ...
- Reference: ...

⚠️ Pending Issues
- ...
```

### ✅ Output Checklist
- Is the reading range and remaining progress accurately annotated?
- Does the summary cover key information and context?
- Have key points been promptly written to the knowledge base and linked to sources?
- Have unresolved issues or parts requiring in-depth exploration been recorded?

Related Skills

xhs-scraper

1172

from inclusionAI/AWorld

小红书搜索抓取 skill - 通过 agent-browser (CDP) 抓取小红书搜索结果，支持列表+详情、多格式输出。使用场景：按关键词抓取笔记列表与正文、生成 RSS/JSON/Markdown。

xhs-publisher

1172

from inclusionAI/AWorld

小红书发布 skill - 通过 agent-browser (CDP) 自动发布小红书图文笔记，支持多图上传、标题正文填写、一键发布。使用场景：自动化发布图文笔记到小红书创作中心。

text2agent

1172

from inclusionAI/AWorld

Creates new agents from user requirements by generating Python implementation and mcp_config.

optimizer

1172

from inclusionAI/AWorld

Analyzes and automatically optimizes existing agents by improving system prompts and tool configuration.

media_comprehension

1172

from inclusionAI/AWorld

An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.".

app_evaluator

1172

from inclusionAI/AWorld

A professional skill for App Evaluation (evaluating app's performance with score) and App Improvement (giving professional suggestions for improving the app's performance).

agent-browser

1172

from inclusionAI/AWorld

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

OpenClaw

1172

from inclusionAI/AWorld

Complete guide for OpenClaw installation, Discord configuration, and sending messages, including common issues and solutions

x-scraper

1172

from inclusionAI/AWorld

X (Twitter) 抓取 skill - 通过 agent-browser (CDP) 抓取指定用户推文或首页推荐流，支持关键词过滤、Tab 切换、多格式输出。使用场景：按用户/关键词抓取时间线、查看首页推荐流、生成 RSS/JSON/Markdown。

html-to-image

1172

from inclusionAI/AWorld

HTML 转图片 skill - 将 HTML 文件或内容通过 agent-browser 渲染并截图为图片。适用于生成信息图、社交媒体配图、数据可视化截图等场景。

Knowledge Management System

3891

from openclaw/skills

> Turn tribal knowledge into searchable, maintained organizational intelligence. Stop losing expertise when people leave.

Compliance & Audit Readiness Engine

3891

from openclaw/skills

Your AI compliance officer. Guides startups and scale-ups through SOC 2, ISO 27001, GDPR, HIPAA, and PCI DSS — from zero to audit-ready. No consultants needed.

Security