hn-extract
Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.
Best use case
hn-extract is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.
Teams using hn-extract should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/hn-extract/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How hn-extract Compares
| Feature / Agent | hn-extract | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# HackerNews Extract
Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.
see [Examples](https://github.com/guoqiao/skills/blob/main/hn-extract/examples)
## What it does
- Accepts an HackerNews id, url, or a saved Algolia JSON file.
- Scrapes the linked article content with `trafilatura`, cleans HTML, and formats it.
- Fetches the story metadata and comment tree from `https://hn.algolia.com/api/v1/items/<id>`.
- Outputs a readable combined markdown file with original article, threaded comments, and key metadata.
## Requirements
- `uv` installed and in PATH.
## Install
No install beyond having `uv`.
Dependencies will be installed automatically by `uv` into to a dedicated venv when run this script.
## Usage Workflow (Mandatory for Agents)
When an agent is asked to extract a HackerNews post:
1. **Run the script** with an output path: `uv run --script ${baseDir}/hn-extract.py <input> -o /tmp/hn-<id>.md`.
2. **Send ONE combined message:** Upload the file and ask the question in the *same* tool call. Use the `message` tool (`action=send`, `filePath="/tmp/hn-<id>.md"`, `message="Extraction complete. Do you want me to summarize it?"`).
3. **Do not** output the full text or a summary directly in the chat unless specifically requested.
## Usage
```bash
# run as uv script
uv run --script ${baseDir}/hn-extract.py <hn-id|hn-url|path/to/item.json> [-o path/to/output.md]
# Examples
uv run --script ${baseDir}/hn-extract.py 46861313 -o /tmp/output.md
uv run --script ${baseDir}/hn-extract.py "https://news.ycombinator.com/item?id=46861313"
uv run --script ${baseDir}/hn-extract.py data/item.json
```
- Omit `-o` to print to stdout.
- Directories for `-o` are created automatically.
## Notes
- Retries are enabled for HTTP fetches.
- Comments are indented by thread depth.
- Article fetch uses `trafilatura.fetch_url` with liberal SSL handling to make it more usable.
- Sites requires authentication or blocks scraping may still fail.Related Skills
xiaohongshu-extract
Extract metadata from Xiaohongshu (XHS) share or discovery URLs by parsing window.__INITIAL_STATE__ and returning.
gh-extract
Extract content from a GitHub url.
social-media-extractor
This skill enables Claude to extract public data from **Instagram**, **TikTok**, and **Reddit**.
google-maps-b2b-extractor
EXTRACT UNLIMITED LEADS (Emails, Phones, Websites) from Google Maps.
wechat-article-extractor-skill
Extract metadata and content from WeChat Official Account articles.
solo-you2idea-extract
Extract startup ideas from YouTube videos via solograph MCP — index, search, and analyze video transcripts.
x-extract
Extract tweet content from x.com URLs without credentials using browser automation.
brw-voice-extractor
Extract and document someone's authentic writing voice from samples.
brw-brand-voice-extractor
Extract or build a distinct brand voice profile that AI agents can use to produce on-brand content every time.
extract
Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.
expanso-keyword-extract
Extract keywords and key phrases from text for SEO, tagging, and indexing".
cut-your-tokens-97percent-savings-on-session-transcripts-via-observation-extraction
Claw Compactor v6.0 — 50%+ savings through rule-based compression, dictionary encoding, session observation.