daily-news-report

Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

daily-news-report is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.

Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "daily-news-report" skill to help with this workflow task. Context: Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/daily-news-report/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/sickn33/daily-news-report/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/daily-news-report/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How daily-news-report Compares

Feature / Agent	daily-news-report	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Daily News Report v3.0

> **Architecture Upgrade**: Main Agent Orchestration + SubAgent Execution + Browser Scraping + Smart Caching

## Core Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                        Main Agent (Orchestrator)                    │
│  Role: Scheduling, Monitoring, Evaluation, Decision, Aggregation    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│   │ 1. Init     │ → │ 2. Dispatch │ → │ 3. Monitor  │ → │ 4. Evaluate │     │
│   │ Read Config │    │ Assign Tasks│    │ Collect Res │    │ Filter/Sort │     │
│   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘     │
│         │                  │                  │                  │           │
│         ▼                  ▼                  ▼                  ▼           │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│   │ 5. Decision │ ← │ Enough 20?  │    │ 6. Generate │ → │ 7. Update   │     │
│   │ Cont/Stop   │    │ Y/N         │    │ Report File │    │ Cache Stats │     │
│   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘     │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘
         ↓ Dispatch                          ↑ Return Results
┌─────────────────────────────────────────────────────────────────────┐
│                        SubAgent Execution Layer                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐              │
│   │ Worker A    │   │ Worker B    │   │ Browser     │              │
│   │ (WebFetch)  │   │ (WebFetch)  │   │ (Headless)  │              │
│   │ Tier1 Batch │   │ Tier2 Batch │   │ JS Render   │              │
│   └─────────────┘   └─────────────┘   └─────────────┘              │
│         ↓                 ↓                 ↓                        │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                    Structured Result Return                 │   │
│   │  { status, data: [...], errors: [...], metadata: {...} }    │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

## Configuration Files

This skill uses the following configuration files:

| File | Purpose |
|------|---------|
| `sources.json` | Source configuration, priorities, scrape methods |
| `cache.json` | Cached data, historical stats, deduplication fingerprints |

## Execution Process Details

### Phase 1: Initialization

```yaml
Steps:
  1. Determine date (user argument or current date)
  2. Read sources.json for source configurations
  3. Read cache.json for historical data
  4. Create output directory NewsReport/
  5. Check if a partial report exists for today (append mode)
```

### Phase 2: Dispatch SubAgents

**Strategy**: Parallel dispatch, batch execution, early stopping mechanism

```yaml
Wave 1 (Parallel):
  - Worker A: Tier1 Batch A (HN, HuggingFace Papers)
  - Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)

Wait for results → Evaluate count

If < 15 high-quality items:
  Wave 2 (Parallel):
    - Worker C: Tier2 Batch A (James Clear, FS Blog)
    - Worker D: Tier2 Batch B (HackerNoon, Scott Young)

If still < 20 items:
  Wave 3 (Browser):
    - Browser Worker: ProductHunt, Latent Space (Require JS rendering)
```

### Phase 3: SubAgent Task Format

Task format received by each SubAgent:

```yaml
task: fetch_and_extract
sources:
  - id: hn
    url: https://news.ycombinator.com
    extract: top_10
  - id: hf_papers
    url: https://huggingface.co/papers
    extract: top_voted

output_schema:
  items:
    - source_id: string      # Source Identifier
      title: string          # Title
      summary: string        # 2-4 sentence summary
      key_points: string[]   # Max 3 key points
      url: string            # Original URL
      keywords: string[]     # Keywords
      quality_score: 1-5     # Quality Score

constraints:
  filter: "Cutting-edge Tech/Deep Tech/Productivity/Practical Info"
  exclude: "General Science/Marketing Puff/Overly Academic/Job Posts"
  max_items_per_source: 10
  skip_on_error: true

return_format: JSON
```

### Phase 4: Main Agent Monitoring & Feedback

Main Agent Responsibilities:

```yaml
Monitoring:
  - Check SubAgent return status (success/partial/failed)
  - Count collected items
  - Record success rate per source

Feedback Loop:
  - If a SubAgent fails, decide whether to retry or skip
  - If a source fails persistently, mark as disabled
  - Dynamically adjust source selection for subsequent batches

Decision:
  - Items >= 25 AND HighQuality >= 20 → Stop scraping
  - Items < 15 → Continue to next batch
  - All batches done but < 20 → Generate with available content (Quality over Quantity)
```

### Phase 5: Evaluation & Filtering

```yaml
Deduplication:
  - Exact URL match
  - Title similarity (>80% considered duplicate)
  - Check cache.json to avoid history duplicates

Score Calibration:
  - Unify scoring standards across SubAgents
  - Adjust weights based on source credibility
  - Bonus points for manually curated high-quality sources

Sorting:
  - Descending order by quality_score
  - Sort by source priority if scores are equal
  - Take Top 20
```

### Phase 6: Browser Scraping (MCP Chrome DevTools)

For pages requiring JS rendering, use a headless browser:

```yaml
Process:
  1. Call mcp__chrome-devtools__new_page to open page
  2. Call mcp__chrome-devtools__wait_for to wait for content load
  3. Call mcp__chrome-devtools__take_snapshot to get page structure
  4. Parse snapshot to extract required content
  5. Call mcp__chrome-devtools__close_page to close page

Applicable Scenarios:
  - ProductHunt (403 on WebFetch)
  - Latent Space (Substack JS rendering)
  - Other SPA applications
```

### Phase 7: Generate Report

```yaml
Output:
  - Directory: NewsReport/
  - Filename: YYYY-MM-DD-news-report.md
  - Format: Standard Markdown

Content Structure:
  - Title + Date
  - Statistical Summary (Source count, items collected)
  - 20 High-Quality Items (Template based)
  - Generation Info (Version, Timestamps)
```

### Phase 8: Update Cache

```yaml
Update cache.json:
  - last_run: Record this run info
  - source_stats: Update stats per source
  - url_cache: Add processed URLs
  - content_hashes: Add content fingerprints
  - article_history: Record included articles
```

## SubAgent Call Examples

### Using general-purpose Agent

Since custom agents require session restart to be discovered, use general-purpose and inject worker prompts:

```
Task Call:
  subagent_type: general-purpose
  model: haiku
  prompt: |
    You are a stateless execution unit. Only do the assigned task and return structured JSON.

    Task: Scrape the following URLs and extract content

    URLs:
    - https://news.ycombinator.com (Extract Top 10)
    - https://huggingface.co/papers (Extract top voted papers)

    Output Format:
    {
      "status": "success" | "partial" | "failed",
      "data": [
        {
          "source_id": "hn",
          "title": "...",
          "summary": "...",
          "key_points": ["...", "...", "..."],
          "url": "...",
          "keywords": ["...", "..."],
          "quality_score": 4
        }
      ],
      "errors": [],
      "metadata": { "processed": 2, "failed": 0 }
    }

    Filter Criteria:
    - Keep: Cutting-edge Tech/Deep Tech/Productivity/Practical Info
    - Exclude: General Science/Marketing Puff/Overly Academic/Job Posts

    Return JSON directly, no explanation.
```

### Using worker Agent (Requires session restart)

```
Task Call:
  subagent_type: worker
  prompt: |
    task: fetch_and_extract
    input:
      urls:
        - https://news.ycombinator.com
        - https://huggingface.co/papers
    output_schema:
      - source_id: string
      - title: string
      - summary: string
      - key_points: string[]
      - url: string
      - keywords: string[]
      - quality_score: 1-5
    constraints:
      filter: Cutting-edge Tech/Deep Tech/Productivity/Practical Info
      exclude: General Science/Marketing Puff/Overly Academic
```

## Output Template

```markdown
# Daily News Report (YYYY-MM-DD)

> Curated from N sources today, containing 20 high-quality items
> Generation Time: X min | Version: v3.0
>
> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.

---

## 1. Title

- **Summary**: 2-4 lines overview
- **Key Points**:
  1. Point one
  2. Point two
  3. Point three
- **Source**: [Link](URL)
- **Keywords**: `keyword1` `keyword2` `keyword3`
- **Score**: ⭐⭐⭐⭐⭐ (5/5)

---

## 2. Title
...

---

*Generated by Daily News Report v3.0*
*Sources: HN, HuggingFace, OneUsefulThing, ...*
```

## Constraints & Principles

1.  **Quality over Quantity**: Low-quality content does not enter the report.
2.  **Early Stop**: Stop scraping once 20 high-quality items are reached.
3.  **Parallel First**: SubAgents in the same batch execute in parallel.
4.  **Fault Tolerance**: Failure of a single source does not affect the whole process.
5.  **Cache Reuse**: Avoid re-scraping the same content.
6.  **Main Agent Control**: All decisions are made by the Main Agent.
7.  **Fallback Awareness**: Detect sub-agent availability, gracefully degrade if unavailable.

## Expected Performance

| Scenario | Expected Time | Note |
|---|---|---|
| Optimal | ~2 mins | Tier1 sufficient, no browser needed |
| Normal | ~3-4 mins | Requires Tier2 supplement |
| Browser Needed | ~5-6 mins | Includes JS rendered pages |

## Error Handling

| Error Type | Handling |
|---|---|
| SubAgent Timeout | Log error, continue to next |
| Source 403/404 | Mark disabled, update sources.json |
| Extraction Failed | Return raw content, Main Agent decides |
| Browser Crash | Skip source, log entry |

## Compatibility & Fallback

To ensure usability across different Agent environments, the following checks must be performed:

1.  **Environment Check**:
    -   In Phase 1 initialization, attempt to detect if `worker` sub-agent exists.
    -   If not exists (or plugin not installed), automatically switch to **Serial Execution Mode**.

2.  **Serial Execution Mode**:
    -   Do not use parallel block.
    -   Main Agent executes scraping tasks for each source sequentially.
    -   Slower, but guarantees basic functionality.

3.  **User Alert**:
    -   MUST include a clear warning in the generated report header indicating the current degraded mode.

Related Skills

zaker-news-search

242

from aiskillstore/marketplace

基于ZAKER权威资讯库进行关键词新闻检索，支持指定时间范围（30天内）。Use when the user asks about 搜索新闻, 某事件新闻, 某人物新闻, 某关键词相关新闻, 查新闻, 新闻检索, 相关新闻, 某时间段新闻.

zaker-hot-news

242

from aiskillstore/marketplace

获取ZAKER聚合权威媒体的最新头条新闻与热点资讯。Use when the user asks about 新闻, 头条, 最新新闻, 今日新闻, 热点新闻, 突发新闻, 国内外大事, 最近发生了什么, 有什么新鲜事, trending news, latest news, headlines, breaking news, what’s happening.

zaker-category-news

242

from aiskillstore/marketplace

获取ZAKER按行业分类的热门新闻（娱乐、科技、财经等）。Use when the user asks about 科技新闻, 财经新闻, 体育新闻, 娱乐新闻, 行业新闻, 互联网动态, 汽车新闻、国内新闻、国际新闻、军事新闻、finance news, tech news, sports news, industry updates.

daily-ai-news

242

from aiskillstore/marketplace

Aggregates and summarizes the latest AI news from multiple sources including AI news websites and web search. Provides concise news briefs with direct links to original articles. Activates when user asks for 'today's AI news', 'AI updates', 'latest AI developments', or mentions wanting a 'daily AI briefing'.

weekly-report

242

from aiskillstore/marketplace

帮助用户梳理周报，按照完整逻辑展示工作价值和边界。当用户说"写周报"、"周报"、"梳理周报"、"整理工作"时触发。

daily-meeting-update

242

from aiskillstore/marketplace

Interactive daily standup/meeting update generator. Use when user says 'daily', 'standup', 'scrum update', 'status update', 'what did I do yesterday', 'prepare for meeting', 'morning update', or 'team sync'. Pulls activity from GitHub, Jira, and Claude Code session history. Conducts 4-question interview (yesterday, today, blockers, discussion topics) and generates formatted Markdown update.

lark-workflow-standup-report

242

from aiskillstore/marketplace

日程待办摘要：编排 calendar +agenda 和 task +get-my-tasks，生成指定日期的日程与未完成任务摘要。适用于了解今天/明天/本周的安排。

newsletter-curation

242

from aiskillstore/marketplace

Newsletter curation with content sourcing, editorial structure, and subscriber growth strategies. Covers issue formatting, link roundups, commentary style, and sending cadence. Use for: email newsletters, link roundups, weekly digests, curated content, creator newsletters. Triggers: newsletter, email newsletter, newsletter curation, weekly digest, link roundup, curated newsletter, newsletter writing, newsletter format, subscriber growth, newsletter strategy, content curation, newsletter template

reporting-sprints

242

from aiskillstore/marketplace

Use this skill when you need to report on a sprint

reporting-issues

242

from aiskillstore/marketplace

Use this skill when you need to report on a troubleshooting session

ghe-report

242

from aiskillstore/marketplace

Generate detailed workflow reports with metrics, health assessments, and epic-specific analysis for GitHub Elements. Covers throughput, cycle times, compliance status, and thread history.

report-writing

242

from aiskillstore/marketplace

작업 완료 후 상세 리포트 문서를 작성. 변경 이력, 영향도 분석, 검증 결과를 문서화할 때 사용. 파일명 규칙 YYYY-MM-DD-<제목>-report.md