reddit-scraper

Scrape and search Reddit posts using Apify. Use when you need to find Reddit discussions, track competitor mentions, monitor product feedback, discover pain points, or analyze subreddit content. Supports keyword filtering, time-based searches, and subreddit-specific queries.

381 stars

bygooseworks-ai

View on GitHub Installation ↓

Best use case

reddit-scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using reddit-scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/reddit-scraper/SKILL.md --create-dirs "https://raw.githubusercontent.com/gooseworks-ai/goose-skills/main/skills/capabilities/reddit-scraper/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/reddit-scraper/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How reddit-scraper Compares

Feature / Agent	reddit-scraper	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Reddit Scraper

Scrape Reddit posts and comments using the Apify `parseforge/reddit-posts-scraper` actor.

## Quick Start

Requires `APIFY_API_TOKEN` env var (or `--token` flag). Install dependency: `pip install requests`.

```bash
# Top posts from r/growthhacking in last week
python3 skills/reddit-scraper/scripts/search_reddit.py \
  --subreddit growthhacking --days 7 --sort top --time week

# Hot posts from multiple subreddits
python3 skills/reddit-scraper/scripts/search_reddit.py \
  --subreddit "growthhacking,gtmengineering" --days 7 --sort hot

# Keyword-filtered competitor tracking
python3 skills/reddit-scraper/scripts/search_reddit.py \
  --subreddit LLMDevs \
  --keywords "Langfuse,Arize,Langsmith" \
  --days 30

# Human-readable summary table
python3 skills/reddit-scraper/scripts/search_reddit.py \
  --subreddit growthhacking --days 7 --output summary
```

## How the Script Works

1. Builds full Reddit URLs for each subreddit (e.g. `https://www.reddit.com/r/growthhacking/top/?t=week`)
2. Calls the Apify `parseforge/reddit-posts-scraper` actor via REST API
3. Polls until the run completes, then fetches the dataset
4. Applies client-side keyword and date filtering
5. Sorts by score (descending) and outputs JSON or summary

## CLI Reference

| Flag | Default | Description |
|------|---------|-------------|
| `--subreddit` | *required* | Subreddit name(s), comma-separated |
| `--keywords` | none | Keywords to filter (comma-separated, OR logic) |
| `--days` | 30 | Only include posts from the last N days |
| `--max-posts` | 50 | Max posts to scrape per subreddit |
| `--sort` | top | Sort: `hot`, `top`, `new`, `rising` |
| `--time` | week | Time window for `top` sort: `hour`, `day`, `week`, `month`, `year`, `all` |
| `--output` | json | Output format: `json` or `summary` |
| `--token` | env var | Apify token (prefer `APIFY_API_TOKEN` env var) |
| `--timeout` | 300 | Max seconds to wait for the Apify run |

## Tips for Small Subreddits

Small or low-traffic subreddits (e.g. `r/gtmengineering`) may return zero posts with `--sort hot` because the hot feed is nearly empty. Use `--sort top --time week` (or `month`) instead — this scrapes the top-ranked posts over the time window and reliably returns results.

## Direct API Usage

If calling the Apify API directly (e.g. via curl), note these **required fields**:

```json
{
  "startUrls": [{"url": "https://www.reddit.com/r/growthhacking/top/?t=week"}],
  "maxPostCount": 50,
  "scrollTimeout": 40,
  "searchType": "posts",
  "proxyConfiguration": {"useApifyProxy": true}
}
```

Key differences from other Apify actors:
- Uses `startUrls` with **full Reddit URLs** (not a `searches` array)
- `proxyConfiguration` is **required** — omitting it causes an error
- Sort/time are controlled via the **URL path** (e.g. `/top/?t=week`), not separate input fields

## Common Workflows

### 1. Competitor Tracking

```bash
python3 skills/reddit-scraper/scripts/search_reddit.py \
  --subreddit "LLMDevs,MachineLearning,LocalLLaMA" \
  --keywords "Langfuse,Arize,Weights & Biases,Langsmith,Braintrust" \
  --days 30 --sort top --time month
```

### 2. Pain Point Discovery

```bash
python3 skills/reddit-scraper/scripts/search_reddit.py \
  --subreddit LLMDevs \
  --keywords "frustrating,difficult,hard to,wish there was,better way" \
  --days 30
```

### 3. Brand Monitoring

```bash
python3 skills/reddit-scraper/scripts/search_reddit.py \
  --subreddit "LLMDevs,MachineLearning" \
  --keywords "YourProductName" \
  --days 7 --sort new
```

## Important: Always Include Post URLs

When presenting Reddit results to the user, **always include the original post URL** for every post. This is critical for allowing users to read the full discussion, comments, and context. Never return a summary table without links.

## Output Format

Posts are returned as JSON array sorted by score. Each post has:

```json
{
  "id": "abc123",
  "title": "Post title",
  "author": "username",
  "subreddit": "growthhacking",
  "score": 42,
  "numComments": 15,
  "createdAt": "2026-02-18T12:00:00.000Z",
  "selfText": "Post body...",
  "url": "https://reddit.com/r/..."
}
```

## Configuration

See `references/apify-config.md` for detailed API configuration, token setup, and rate limits.

Related Skills

web-archive-scraper

381

from gooseworks-ai/goose-skills

Search the Wayback Machine for archived versions of websites. Extract cached pages, customer lists, testimonials, and partner directories from sites that have changed or gone offline. Uses the free CDX API — no API key needed.

twitter-scraper

381

from gooseworks-ai/goose-skills

Search and scrape Twitter/X posts using Apify. Use when you need to find tweets, track brand mentions, monitor competitors on Twitter, or analyze Twitter discussions. Uses Twitter native search syntax (since:/until:) for reliable date filtering.

review-scraper

381

from gooseworks-ai/goose-skills

Scrape product reviews from G2, Capterra, and Trustpilot using Apify. Single script with platform dispatch. Use when you need to monitor competitor reviews, track product sentiment, or gather customer feedback from review sites.

product-hunt-scraper

381

from gooseworks-ai/goose-skills

Scrape Product Hunt trending products using Apify. Use when you need to discover new product launches, track competitors on Product Hunt, or monitor the startup ecosystem for relevant launches.

meta-ad-scraper

381

from gooseworks-ai/goose-skills

Scrape competitor ads from Meta's Ad Library (Facebook, Instagram, Messenger, Threads, WhatsApp). Search by company name, Facebook Page URL, or keyword. Returns ad creatives, spend estimates, reach, impressions, and campaign details. Use for competitive ad research, messaging analysis, and creative inspiration.

linkedin-profile-post-scraper

381

from gooseworks-ai/goose-skills

Scrape recent posts from LinkedIn profiles using Apify. Use when you need to monitor what specific people are posting on LinkedIn, track founder/exec activity, or gather LinkedIn content for competitive intelligence.

linkedin-job-scraper

381

from gooseworks-ai/goose-skills

Scrapes LinkedIn job postings using the JobSpy library (python-jobspy). Use this skill whenever the user wants to find jobs on LinkedIn, search for open roles, pull job listings, build a job pipeline, source job targets for GTM research, or monitor hiring signals. Even if the user just says "find me some jobs" or "what roles is [company] hiring for", use this skill. It runs a local Python script that outputs a CSV of job postings with title, company, location, salary, job type, description, and direct URLs.

hacker-news-scraper

381

from gooseworks-ai/goose-skills

Search Hacker News stories and comments using the free Algolia API. No Apify token needed. Use when you need to find HN discussions, track mentions, discover Show HN launches, or monitor tech community sentiment.

google-ad-scraper

381

from gooseworks-ai/goose-skills

Scrape competitor ads from Google's Ads Transparency Center (Search, YouTube, Display, Gmail). Search by company name, domain, or advertiser ID. Returns ad creatives, formats, targeting regions, and campaign details. Use for competitive ad research and messaging analysis.

conference-speaker-scraper

381

from gooseworks-ai/goose-skills

Extract speaker names, titles, companies, and bios from conference websites. Supports direct HTML scraping and Apify web scraper fallback for JS-heavy sites. Use for pre-event research and outreach targeting.

blog-scraper

381

from gooseworks-ai/goose-skills

Scrape blog posts via RSS feeds (free, no API key) with Apify fallback for JS-heavy sites. Use when you need to monitor competitor blogs, track industry content, or aggregate blog posts by keyword.

signal-detection-pipeline

381

from gooseworks-ai/goose-skills

Detect buying signals from multiple sources, qualify leads, and generate outreach context