Skill: rss-fetcher

Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.

7 stars

Best use case

Skill: rss-fetcher is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.

Teams using Skill: rss-fetcher should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/rss-fetcher/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/automation-productivity/email-digest-builder/skills/rss-fetcher/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/rss-fetcher/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Skill: rss-fetcher Compares

Feature / AgentSkill: rss-fetcherStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Skill: rss-fetcher

Understand how email-digest-builder fetches, parses, and deduplicates RSS/Atom feeds.

---

## Supported Feed Formats

| Format | Detection | Notes |
|--------|-----------|-------|
| RSS 2.0 | `<rss version="2.0">` | Most common format |
| Atom 1.0 | `<feed xmlns="http://www.w3.org/2005/Atom">` | Used by GitHub, Blogger |
| JSON Feed | `Content-Type: application/feed+json` | Newer format |

The fetcher auto-detects format from content type and root element. Parse failures are logged and increment `error_count`.

---

## Fetch Lifecycle

```
1. Scheduler triggers fetch based on feed.fetch_interval
2. HTTP GET with 10-second timeout
3. Parse XML/JSON into normalized item list
4. For each item: compute deduplication key from guid/link
5. INSERT items where guid not already in database
6. Strip HTML from summary fields
7. Update feed.last_fetched, reset error_count on success
8. On error: increment error_count, log error
9. Disable feed if error_count >= 5
```

---

## Item Fields Extracted

| Field | RSS 2.0 source | Atom source |
|-------|----------------|-------------|
| guid | `<guid>` or `<link>` | `<id>` |
| title | `<title>` | `<title>` |
| url | `<link>` | `<link href>` |
| author | `<author>` or `<dc:creator>` | `<author><name>` |
| published_at | `<pubDate>` | `<published>` or `<updated>` |
| summary | `<description>` (first 500 chars, HTML stripped) | `<summary>` or `<content>` |

---

## HTML Stripping

Feed summaries often contain HTML markup. The fetcher strips all tags using `striptags` before storing the summary. This prevents XSS when rendering items and keeps summaries clean for LLM classification.

Input: `<p>Hello <strong>world</strong></p><script>alert(1)</script>`
Output: `Hello world`

The stored `summary` is plain text, max 500 characters.

---

## Deduplication

Items are deduplicated by `(feed_id, guid)`. If an item with the same guid already exists in the database, it is skipped. This means:
- Re-fetching the same feed does not create duplicate items.
- If a feed item's GUID changes (some feeds regenerate GUIDs), the item will be inserted again.
- Items are never deleted when removed from the feed - only new items are added.

---

## Fetch Scheduling

Each feed has a `fetch_interval` in minutes (minimum 5, maximum 1440). The scheduler runs a cron check every minute and triggers fetches for feeds whose `last_fetched` is older than `fetch_interval` minutes.

```
Example: feed with fetch_interval=60
- Last fetched: 2024-01-16 09:00:00
- Next fetch:   2024-01-16 10:00:00
```

On startup, all enabled feeds with `last_fetched` older than `fetch_interval` are fetched immediately.

---

## Error Handling

| Error | Behavior |
|-------|----------|
| HTTP 4xx | Log error, increment error_count |
| HTTP 5xx | Log error, increment error_count |
| Connection timeout (>10s) | Log timeout, increment error_count |
| Parse error (invalid XML) | Log error, increment error_count |
| error_count >= 5 | Set enabled=0, send dashboard warning |

When a fetch succeeds after previous errors, `error_count` is reset to 0.

---

## Common Feed URLs

| Source | URL |
|--------|-----|
| Hacker News | `https://news.ycombinator.com/rss` |
| GitHub Trending | `https://github.com/trending.atom` |
| Dev.to tag | `https://dev.to/feed/tag/{tag}` |
| Reddit r/programming | `https://www.reddit.com/r/programming.rss` |
| TechCrunch | `https://techcrunch.com/feed/` |
| The Verge | `https://www.theverge.com/rss/index.xml` |
| ArXiv CS | `https://arxiv.org/rss/cs` |
| npm blog | `https://github.blog/feed/` |

---

## Validate a Feed URL

Before adding a feed, validate it manually:

```bash
# Test fetch
curl -L -s "https://news.ycombinator.com/rss" | head -20

# Should see RSS or Atom root element
# <rss version="2.0"> or <feed xmlns="...">
```

Or use the API validation endpoint:

```bash
curl -X POST http://localhost:4400/api/feeds/validate \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com/rss"}'
```

Response:
```json
{
  "valid": true,
  "format": "RSS 2.0",
  "title": "Hacker News",
  "item_count": 30,
  "site_url": "https://news.ycombinator.com"
}
```

---

## Behind a Firewall

If your feeds are on an internal network, run email-digest-builder on a host that has access to those URLs. Feed fetching uses Node.js's built-in `fetch` with a 10-second timeout.

For feeds requiring authentication (HTTP Basic Auth):

```bash
curl -X POST http://localhost:4400/api/feeds \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://internal.corp/feed.rss",
    "title": "Internal Blog",
    "auth_user": "user",
    "auth_password": "pass"
  }'
```

The auth password is stored encrypted alongside the feed.

---

## Feed Item Retention

Items are stored indefinitely by default. To prune old items, use the settings API:

```bash
curl -X PATCH http://localhost:4400/api/settings \
  -H "Content-Type: application/json" \
  -d '{"item_retention_days": 90}'
```

Items older than `item_retention_days` that are not bookmarked are deleted daily at midnight.

Related Skills

Skill: Uptime Monitoring

7
from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

7
from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

7
from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

7
from heldernoid/agentic-build-templates

## Overview

reading-list

7
from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

7
from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

7
from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

7
from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

7
from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

7
from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

7
from heldernoid/agentic-build-templates

## Purpose

Skill: Pastebin Core

7
from heldernoid/agentic-build-templates

## Purpose