Skill: rss-fetcher
Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.
Best use case
Skill: rss-fetcher is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.
Teams using Skill: rss-fetcher should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/rss-fetcher/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How Skill: rss-fetcher Compares
| Feature / Agent | Skill: rss-fetcher | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Skill: rss-fetcher
Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.
---
## Supported Feed Formats
| Format | Detection | Notes |
|--------|-----------|-------|
| RSS 2.0 | `<rss version="2.0">` | Most common format |
| Atom 1.0 | `<feed xmlns="http://www.w3.org/2005/Atom">` | Used by GitHub, Blogger |
| JSON Feed | `Content-Type: application/feed+json` | Newer format |
The fetcher auto-detects format from content type and root element. Parse failures are logged and increment `error_count`.
---
## Fetch Lifecycle
```
1. Scheduler triggers fetch based on feed.fetch_interval
2. HTTP GET with 10-second timeout
3. Parse XML/JSON into normalized item list
4. For each item: compute deduplication key from guid/link
5. INSERT items where guid not already in database
6. Strip HTML from summary fields
7. Update feed.last_fetched, reset error_count on success
8. On error: increment error_count, log error
9. Disable feed if error_count >= 5
```
---
## Item Fields Extracted
| Field | RSS 2.0 source | Atom source |
|-------|----------------|-------------|
| guid | `<guid>` or `<link>` | `<id>` |
| title | `<title>` | `<title>` |
| url | `<link>` | `<link href>` |
| author | `<author>` or `<dc:creator>` | `<author><name>` |
| published_at | `<pubDate>` | `<published>` or `<updated>` |
| summary | `<description>` (first 500 chars, HTML stripped) | `<summary>` or `<content>` |
---
## HTML Stripping
Feed summaries often contain HTML markup. The fetcher strips all tags using `striptags` before storing the summary. This prevents XSS when rendering items and keeps summaries clean for LLM classification.
Input: `<p>Hello <strong>world</strong></p><script>alert(1)</script>`
Output: `Hello world`
The stored `summary` is plain text, max 500 characters.
---
## Deduplication
Items are deduplicated by `(feed_id, guid)`. If an item with the same guid already exists in the database, it is skipped. This means:
- Re-fetching the same feed does not create duplicate items.
- If a feed item's GUID changes (some feeds regenerate GUIDs), the item will be inserted again.
- Items are never deleted when removed from the feed - only new items are added.
---
## Fetch Scheduling
Each feed has a `fetch_interval` in minutes (minimum 5, maximum 1440). The scheduler runs a cron check every minute and triggers fetches for feeds whose `last_fetched` is older than `fetch_interval` minutes.
```
Example: feed with fetch_interval=60
- Last fetched: 2024-01-16 09:00:00
- Next fetch: 2024-01-16 10:00:00
```
On startup, all enabled feeds with `last_fetched` older than `fetch_interval` are fetched immediately.
---
## Error Handling
| Error | Behavior |
|-------|----------|
| HTTP 4xx | Log error, increment error_count |
| HTTP 5xx | Log error, increment error_count |
| Connection timeout (>10s) | Log timeout, increment error_count |
| Parse error (invalid XML) | Log error, increment error_count |
| error_count >= 5 | Set enabled=0, send dashboard warning |
When a fetch succeeds after previous errors, `error_count` is reset to 0.
---
## Common Feed URLs
| Source | URL |
|--------|-----|
| Hacker News | `https://news.ycombinator.com/rss` |
| GitHub Trending | `https://github.com/trending.atom` |
| Dev.to tag | `https://dev.to/feed/tag/{tag}` |
| Reddit r/programming | `https://www.reddit.com/r/programming.rss` |
| TechCrunch | `https://techcrunch.com/feed/` |
| The Verge | `https://www.theverge.com/rss/index.xml` |
| ArXiv CS | `https://arxiv.org/rss/cs` |
| npm blog | `https://github.blog/feed/` |
---
## Validate a Feed URL
Before adding a feed, validate it manually:
```bash
# Test fetch
curl -L -s "https://news.ycombinator.com/rss" | head -20
# Should see RSS or Atom root element
# <rss version="2.0"> or <feed xmlns="...">
```
Or use the API validation endpoint:
```bash
curl -X POST http://localhost:4400/api/feeds/validate \
-H "Content-Type: application/json" \
-d '{"url": "https://news.ycombinator.com/rss"}'
```
Response:
```json
{
"valid": true,
"format": "RSS 2.0",
"title": "Hacker News",
"item_count": 30,
"site_url": "https://news.ycombinator.com"
}
```
---
## Behind a Firewall
If your feeds are on an internal network, run email-digest-builder on a host that has access to those URLs. Feed fetching uses Node.js's built-in `fetch` with a 10-second timeout.
For feeds requiring authentication (HTTP Basic Auth):
```bash
curl -X POST http://localhost:4400/api/feeds \
-H "Content-Type: application/json" \
-d '{
"url": "https://internal.corp/feed.rss",
"title": "Internal Blog",
"auth_user": "user",
"auth_password": "pass"
}'
```
The auth password is stored encrypted alongside the feed.
---
## Feed Item Retention
Items are stored indefinitely by default. To prune old items, use the settings API:
```bash
curl -X PATCH http://localhost:4400/api/settings \
-H "Content-Type: application/json" \
-d '{"item_retention_days": 90}'
```
Items older than `item_retention_days` that are not bookmarked are deleted daily at midnight.Related Skills
Skill: Uptime Monitoring
## Overview
Skill: Status Page
## Overview
Skill: unit-conversion
## Overview
Skill: recipe-scaler
## Overview
reading-list
Operate the reading-list API to save, manage, tag, search, and export articles.
email-digest
Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.
websocket-realtime
Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".
poll-builder
Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.
Skill: personal-finance
## Overview
Skill: csv-import
## Overview
Skill: Syntax Highlighting
## Purpose
Skill: Pastebin Core
## Purpose