skroller

Automated social media content collection and analysis across platforms (Twitter/X, Instagram, TikTok, Reddit, LinkedIn, YouTube, Product Hunt, Medium, GitHub, Pinterest). Use when you need to: (1) scrape public posts programmatically, (2) analyze content by keywords or filters, (3) monitor brand mentions or trends for research, (4) curate content for personal analysis, (5) archive publicly available information, or (6) generate digests from scraped feeds. Always comply with platform ToS and applicable privacy laws.

3,891 stars
Complexity: medium

About this skill

Skroller is an AI agent skill designed for comprehensive social media content collection and analysis. It allows users to programmatically gather publicly available posts from a wide array of platforms, including Twitter/X, Instagram, TikTok, Reddit, LinkedIn, YouTube, Product Hunt, Medium, GitHub, and Pinterest. This skill streamlines the process of acquiring data that would otherwise require manual effort or custom scripting. The skill provides robust capabilities for data extraction, pulling essential information such as post text, timestamps, engagement metrics, and author details. It features intelligent filtering based on keywords, hashtags, specific date ranges, and engagement thresholds, alongside deduplication to ensure unique content collection across sessions. Users can export the analyzed results in various formats like JSON, CSV, or Markdown, or directly integrate with note-taking applications. Skroller is ideal for market research, trend monitoring, brand mention tracking, and content curation. It also integrates critical compliance safeguards, guiding users to respect platform Terms of Service, `robots.txt` rules, rate limits, and privacy regulations like GDPR and CCPA. This makes it a powerful yet responsible tool for managing social intelligence and ensuring data collection practices are ethical and legal.

Best use case

The primary use case for Skroller is to automate the collection and analysis of public social media content for research, trend monitoring, and brand tracking. It benefits market researchers, social media strategists, brand managers, and content curators who need to efficiently gather and analyze large volumes of public data to inform their strategies, monitor online sentiment, or discover emerging topics without manual scraping.

Automated social media content collection and analysis across platforms (Twitter/X, Instagram, TikTok, Reddit, LinkedIn, YouTube, Product Hunt, Medium, GitHub, Pinterest). Use when you need to: (1) scrape public posts programmatically, (2) analyze content by keywords or filters, (3) monitor brand mentions or trends for research, (4) curate content for personal analysis, (5) archive publicly available information, or (6) generate digests from scraped feeds. Always comply with platform ToS and applicable privacy laws.

A structured collection of public social media posts, filtered and analyzed according to specified criteria, ready for further insights or reporting in chosen export formats.

Practical example

Example input

Scrape the latest 100 public posts mentioning 'AI ethics' on Twitter and Reddit, filter for posts with more than 10 likes, and export them as a CSV.

Example output

platform,author,timestamp,text,engagement,keywords
Twitter,user_a,2023-10-26 10:05:32,"AI ethics is paramount for future tech.",15,AI ethics
Reddit,user_b,2023-10-26 09:45:11,"Discussing the latest in responsible AI development.",22,AI ethics
Twitter,user_c,2023-10-26 11:12:01,"New paper on bias in algorithms and AI ethics.",12,AI ethics

When to use this skill

  • When you need to programmatically scrape public social media posts from multiple platforms.
  • To analyze social media content using keywords, filters, or date ranges for specific insights.
  • For monitoring brand mentions, industry trends, or competitive intelligence across social media.
  • To curate and archive publicly available information for personal or research analysis.

When not to use this skill

  • To scrape private or authenticated content that is not publicly available.
  • For purposes of spamming, harassment, data resale, or platform manipulation.
  • When you cannot comply with platform Terms of Service, `robots.txt` directives, or privacy laws.
  • To bypass authentication mechanisms or exceed platform-specific rate limits.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/skroller/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/10oss/skroller/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/skroller/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How skroller Compares

Feature / AgentskrollerStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Automated social media content collection and analysis across platforms (Twitter/X, Instagram, TikTok, Reddit, LinkedIn, YouTube, Product Hunt, Medium, GitHub, Pinterest). Use when you need to: (1) scrape public posts programmatically, (2) analyze content by keywords or filters, (3) monitor brand mentions or trends for research, (4) curate content for personal analysis, (5) archive publicly available information, or (6) generate digests from scraped feeds. Always comply with platform ToS and applicable privacy laws.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Skroller - Social Media Content collection

Automate the collection and analysis of publicly available social media content. This skill handles content collection, filtering, and export with compliance safeguards.

## ⚖️ Legal & Compliance Requirements

**Before using this skill:**
1. **Review Platform ToS** - Each platform has different rules about automated access
2. **Check robots.txt** - Respect disallowed paths
3. **Rate Limiting** - Stay within platform rate limits to avoid service disruption
4. **Privacy Laws** - GDPR, CCPA, and other regulations apply to personal data
5. **Permitted Use** - Research, personal analysis, competitive intelligence (where allowed)
6. **Prohibited Use** - Spam, harassment, data resale, manipulation, bypassing auth

**Data Protection:**
- Anonymize personal data when storing
- Honor deletion requests (GDPR Art. 17)
- Limit retention to necessary periods
- Document lawful basis for processing
- Do not scrape sensitive personal data

## Core Capabilities

- **Content collection** - Gather publicly available posts from platforms
- **Data extraction** - Pull text, timestamps, engagement metrics, author info
- **Smart filtering** - Filter by keywords, hashtags, date ranges, engagement thresholds
- **Deduplication** - Track seen posts to avoid duplicates across sessions
- **Export formats** - JSON, CSV, Markdown, or direct to note apps (Bear/Obsidian)
- **Rate limiting** - Respect platform constraints and avoid service disruption

## Platform Support

| Platform | Approach | Notes |
|----------|----------|-------|
| Twitter/X | Browser automation | Use Playwright; handle login if needed |
| Instagram | Browser automation | Rate-limited; use sparingly |
| TikTok | Browser automation | Heavy JS; may need longer waits |
| Reddit | API + browser | Prefer API where possible |
| LinkedIn | Browser automation | Login required for most content |
| YouTube | API + browser | Comments via browser, videos via API |
| Product Hunt | Browser automation | Product discovery, launches |
| Medium | Browser automation | Articles, blog posts |
| GitHub | Browser automation | Issues, discussions, repos |
| Pinterest | Browser automation | Visual content, pins |

## Quick Start

### Basic Scroll and Extract

```bash
# Scroll Twitter feed, extract 50 posts about "AI"
node scripts/skroller.js --platform twitter --query "AI" --limit 50 --output posts.json
```

### Monitor Brand Mentions

```bash
# Monitor Reddit for brand mentions, export to Markdown
node scripts/skroller.js --platform reddit --query "mybrand" --format markdown --output mentions.md
```

### Competitive Research

```bash
# Scroll competitor Instagram, capture top posts by engagement
node scripts/skroller.js --platform instagram --profile @competitor --min-likes 1000 --output competitor-posts.json
```

## Scripts

### `scripts/skroller.js` - Main scrolling engine

JavaScript script using Playwright for browser automation.

```bash
# Using npm scripts (recommended)
npm run scroll -- --platform twitter --query "AI" --limit 50 --output posts.json

# Direct execution
node scripts/skroller.js --platform twitter --query "AI" --limit 50 --output posts.json
```

**Options:**
- `--platform` - Target platform (required): twitter, reddit, instagram, tiktok, linkedin, youtube, producthunt, medium, github, pinterest
- `--query` - Search keyword/hashtag
- `--profile` - Specific profile to scroll
- `--limit` - Max posts to scrape (default: 50)
- `--min-likes` - Filter by minimum engagement
- `--format` - Output format: json, csv, markdown (default: json)
- `--output` - Output file path
- `--screenshot` - Capture screenshots for debugging
- `--dedupe` - Skip previously seen posts

### `scripts/feed-digest.js` - Generate digests

Creates summary digests from exported post data.

```bash
npm run digest -- --input posts.json --output digest.md
# or: node scripts/feed-digest.js --input posts.json --output digest.md
```

### `scripts/export-to-notes.js` - Unified note app exporter

Exports scraped posts to multiple note applications with a single command.

**Supported apps:** Bear, Obsidian, Notion, Apple Notes, Evernote, OneNote, Google Keep, Roam Research, Logseq, Anytype

```bash
# Using npm scripts (recommended)
npm run export -- --input posts.json --app obsidian --vault ~/Documents/Obsidian

# Direct execution
node scripts/export-to-notes.js --input posts.json --app bear --tags "ai,research"
node scripts/export-to-notes.js --input posts.json --app notion --api-key $NOTION_TOKEN
node scripts/export-to-notes.js --input posts.json --app apple-notes
node scripts/export-to-notes.js --input posts.json --app evernote --output export.enex
node scripts/export-to-notes.js --input posts.json --app one-note --access-token $MS_TOKEN
node scripts/export-to-notes.js --input posts.json --app keep --output keep.html
node scripts/export-to-notes.js --input posts.json --app roam --output roam.md
node scripts/export-to-notes.js --input posts.json --app logseq --vault ~/Documents/Logseq
node scripts/export-to-notes.js --input posts.json --app anytype --output anytype.md
node scripts/export-to-notes.js --input posts.json --app obsidian --dry-run
```

**Configuration:** Set defaults in `.skroller-config.json`:
```json
{
  "export": {
    "defaultApp": "obsidian",
    "vault": "~/Documents/Obsidian",
    "folder": "Skroller",
    "notionDatabaseId": "<db-id>"
  }
}
```

**Requirements by app:**
- **Bear:** grizzly CLI (`go install github.com/tylerwince/grizzly/cmd/grizzly@latest`)
- **Obsidian:** Vault path
- **Notion:** API key (create at notion.so)
- **Apple Notes:** macOS with Notes app
- **Evernote:** Manual ENEX import
- **OneNote:** Microsoft Graph access token
- **Google Keep:** Manual HTML import
- **Roam Research:** Markdown import (drag MDL file into Roam)
- **Logseq:** Vault path (writes to pages/ folder)
- **Anytype:** Markdown import (use Anytype Import feature)

## Configuration

### `.skroller-config.json`

Store default settings:

```json
{
  "defaultLimit": 50,
  "scrollDelayMs": 1500,
  "userAgent": "Mozilla/5.0 ...",
  "platforms": {
    "twitter": {
      "loginRequired": false,
      "rateLimit": "100 requests/hour"
    },
    "instagram": {
      "loginRequired": true,
      "rateLimit": "50 requests/hour"
    }
  },
  "export": {
    "defaultFormat": "json",
    "includeImages": true,
    "includeMetrics": true
  }
}
```

### Authentication

Some platforms require login. Store credentials securely:

```bash
# For platforms requiring auth, set environment variables
export SKROLLR_TWITTER_COOKIE="<auth cookie>"
export SKROLLR_INSTAGRAM_USER="<username>"
export SKROLLR_INSTAGRAM_PASS="<password>"
```

**Security note:** Never commit auth files. Use `.env` with `.gitignore`.

## Output Structure

### JSON Output (default)

```json
{
  "platform": "twitter",
  "query": "AI",
  "scrapedAt": "2026-03-14T20:30:00Z",
  "posts": [
    {
      "id": "1234567890",
      "author": "@username",
      "text": "Post content here...",
      "timestamp": "2026-03-14T18:00:00Z",
      "likes": 150,
      "retweets": 42,
      "replies": 12,
      "url": "https://twitter.com/...",
      "media": ["image1.jpg"],
      "hashtags": ["#AI", "#ML"]
    }
  ]
}
```

### Markdown Output

```markdown
## Twitter Posts: "AI" (2026-03-14)

### @username - 150 likes
Post content here...

[View post](https://twitter.com/...)

---
```

## Filtering Strategies

### Keyword Matching
- Exact match: `"exact phrase"`
- Boolean: `AI AND (startup OR venture)`
- Exclude: `AI -crypto`

### Engagement Thresholds
- Filter low-quality: `--min-likes 100`
- Viral content: `--min-shares 500`

### Time Windows
- Recent only: `--date-from 2026-03-14`
- Historical: `--date-from 2026-01-01 --date-to 2026-01-31`

## Best Practices

### Rate Limiting
- Add delays between scrolls: `--delay 2000`
- Respect platform limits (see config)
- Use proxies for high-volume scraping

### Content Quality
- Filter by engagement to find signal
- Dedupe across sessions
- Export with timestamps for freshness tracking

### Ethics & Compliance
- Check platform ToS before scraping
- Don't scrape personal data at scale
- Use for research/curation, not spam

## Troubleshooting

### Scroll stops early
- Increase `--limit` or check for login requirements
- Some platforms detect automation; add random delays

### Missing content
- Some platforms lazy-load; increase scroll delay
- Try `--screenshot` to debug what's visible

### Rate limited
- Reduce frequency; use `--delay`
- Check platform-specific rate limits in config

## Integration with Other Skills

- **bear-notes**: Export via `export-to-notes.js --app bear`
- **obsidian**: Export via `export-to-notes.js --app obsidian`
- **notion**: Export via `export-to-notes.js --app notion`
- **github**: Create issues from scraped feedback/mentions

```bash
# Scroll and export to Obsidian in one command
node scripts/skroller.js --platform twitter --query "AI" --limit 20 --output ai.json && \
  node scripts/export-to-notes.js --input ai.json --app obsidian --vault ~/Documents/Obsidian

# Scroll and export to Notion
node scripts/skroller.js --platform reddit --query "startups" --limit 30 --output startups.json && \
  node scripts/export-to-notes.js --input startups.json --app notion --api-key $NOTION_TOKEN

# Use npm scripts for cleaner commands
npm run scroll -- --platform twitter --query "tech" --limit 25 --output tech.json
npm run export -- --input tech.json --app bear --tags "tech,research"
```

## See Also

- `references/platform-details.md` - Platform-specific selectors and quirks
- `references/rate-limits.md` - Rate limit guidelines per platform
- `assets/selector-cheatsheet.md` - CSS selectors for each platform

Related Skills

tavily-search

3891
from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research

baidu-search

3891
from openclaw/skills

Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

Data & Research

notebooklm

3891
from openclaw/skills

Google NotebookLM 非官方 Python API 的 OpenClaw Skill。支持内容生成(播客、视频、幻灯片、测验、思维导图等)、文档管理和研究自动化。当用户需要使用 NotebookLM 生成音频概述、视频、学习材料或管理知识库时触发。

Data & Research

openclaw-search

3891
from openclaw/skills

Intelligent search for agents. Multi-source retrieval with confidence scoring - web, academic, and Tavily in one unified API.

Data & Research

aisa-tavily

3891
from openclaw/skills

AI-optimized web search via AIsa's Tavily API proxy. Returns concise, relevant results for AI agents through AIsa's unified API gateway.

Data & Research

Market Sizing — TAM/SAM/SOM Calculator

3891
from openclaw/skills

Build defensible market sizing for any product, pitch deck, or business case. Top-down and bottom-up methodologies combined.

Data & Research

Data Analyst — AfrexAI ⚡📊

3891
from openclaw/skills

**Transform raw data into decisions. Not just charts — answers.**

Data & Research

Competitor Monitor

3891
from openclaw/skills

Tracks and analyzes competitor moves — pricing changes, feature launches, hiring, and positioning shifts

Data & Research

afrexai-competitive-intel

3891
from openclaw/skills

Complete competitive intelligence system — market mapping, product teardowns, pricing intel, win/loss analysis, battlecards, and strategic monitoring. Goes far beyond SEO to cover the full business landscape.

Data & Research

trending-news-aggregator

3891
from openclaw/skills

智能热点新闻聚合器 - 自动抓取多平台热点新闻, AI分析趋势,支持定时推送和热度评分。 核心功能: - 每天自动聚合多平台热点(微博、知乎、百度等) - 智能分类(科技、财经、社会、国际等) - 热度评分算法 - 增量检测(标记新增热点) - AI趋势分析

Data & Research

search-cluster

3891
from openclaw/skills

Aggregated search aggregator using Google CSE, GNews RSS, Wikipedia, Reddit, and Scrapling.

Data & Research

data-analysis-partner

3891
from openclaw/skills

智能数据分析 Skill,输入 CSV/Excel 文件和分析需求,输出带交互式 ECharts 图表的 HTML 自包含分析报告

Data & Research