hn-extract

Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

hn-extract is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.

Teams using hn-extract should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/hn-extract/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/guoqiao/hn-extract/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/hn-extract/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How hn-extract Compares

Feature / Agent	hn-extract	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# HackerNews Extract

Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.

see [Examples](https://github.com/guoqiao/skills/blob/main/hn-extract/examples)

## What it does
- Accepts an HackerNews id, url, or a saved Algolia JSON file.
- Scrapes the linked article content with `trafilatura`, cleans HTML, and formats it.
- Fetches the story metadata and comment tree from `https://hn.algolia.com/api/v1/items/<id>`.
- Outputs a readable combined markdown file with original article, threaded comments, and key metadata.

## Requirements

- `uv` installed and in PATH.

## Install

No install beyond having `uv`.
Dependencies will be installed automatically by `uv` into to a dedicated venv when run this script.

## Usage Workflow (Mandatory for Agents)

When an agent is asked to extract a HackerNews post:
1.  **Run the script** with an output path: `uv run --script ${baseDir}/hn-extract.py <input> -o /tmp/hn-<id>.md`.
2.  **Send ONE combined message:** Upload the file and ask the question in the *same* tool call. Use the `message` tool (`action=send`, `filePath="/tmp/hn-<id>.md"`, `message="Extraction complete. Do you want me to summarize it?"`).
3.  **Do not** output the full text or a summary directly in the chat unless specifically requested.

## Usage

```bash
# run as uv script
uv run --script ${baseDir}/hn-extract.py <hn-id|hn-url|path/to/item.json> [-o path/to/output.md]

# Examples
uv run --script ${baseDir}/hn-extract.py 46861313 -o /tmp/output.md
uv run --script ${baseDir}/hn-extract.py "https://news.ycombinator.com/item?id=46861313"
uv run --script ${baseDir}/hn-extract.py data/item.json
```

- Omit `-o` to print to stdout.
- Directories for `-o` are created automatically.

## Notes
- Retries are enabled for HTTP fetches.
- Comments are indented by thread depth.
- Article fetch uses `trafilatura.fetch_url` with liberal SSL handling to make it more usable.
- Sites requires authentication or blocks scraping may still fail.

Related Skills

xiaohongshu-extract

from Demerzels-lab/elsamultiskillagent

Extract metadata from Xiaohongshu (XHS) share or discovery URLs by parsing window.__INITIAL_STATE__ and returning.

gh-extract

from Demerzels-lab/elsamultiskillagent

Extract content from a GitHub url.

social-media-extractor

from Demerzels-lab/elsamultiskillagent

This skill enables Claude to extract public data from **Instagram**, **TikTok**, and **Reddit**.

google-maps-b2b-extractor

from Demerzels-lab/elsamultiskillagent

EXTRACT UNLIMITED LEADS (Emails, Phones, Websites) from Google Maps.

wechat-article-extractor-skill

from Demerzels-lab/elsamultiskillagent

Extract metadata and content from WeChat Official Account articles.

solo-you2idea-extract

from Demerzels-lab/elsamultiskillagent

Extract startup ideas from YouTube videos via solograph MCP — index, search, and analyze video transcripts.

x-extract

from Demerzels-lab/elsamultiskillagent

Extract tweet content from x.com URLs without credentials using browser automation.

brw-voice-extractor

from Demerzels-lab/elsamultiskillagent

Extract and document someone's authentic writing voice from samples.

brw-brand-voice-extractor

from Demerzels-lab/elsamultiskillagent

Extract or build a distinct brand voice profile that AI agents can use to produce on-brand content every time.

extract

from Demerzels-lab/elsamultiskillagent

Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.

expanso-keyword-extract

from Demerzels-lab/elsamultiskillagent

Extract keywords and key phrases from text for SEO, tagging, and indexing".

cut-your-tokens-97percent-savings-on-session-transcripts-via-observation-extraction

from Demerzels-lab/elsamultiskillagent

Claw Compactor v6.0 — 50%+ savings through rule-based compression, dictionary encoding, session observation.