Twitter/X Profile Scraper

A browser-based Twitter/X profile discovery and scraping tool.

3,891 stars

Best use case

Twitter/X Profile Scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

A browser-based Twitter/X profile discovery and scraping tool.

Teams using Twitter/X Profile Scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/x-twitter-scraper/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/arulmozhiv/x-twitter-scraper/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/x-twitter-scraper/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Twitter/X Profile Scraper Compares

Feature / Agent	Twitter/X Profile Scraper	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

A browser-based Twitter/X profile discovery and scraping tool.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

# Twitter/X Profile Scraper


A browser-based Twitter/X profile discovery and scraping tool.

> Part of **[ScrapeClaw](https://www.scrapeclaw.cc/)** — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.

```yaml
---
name: twitter-scraper
description: Discover and scrape Twitter/X public profiles from your browser.
emoji: 🐦
version: 1.0.2
author: influenza
tags:
  - twitter
  - x
  - scraping
  - social-media
  - profile-discovery
  - influencer-discovery
metadata:
  clawdbot:
    requires:
      bins:
        - python3
        - chromium

    config:
      stateDirs:
        - data/output
        - data/queue
        - thumbnails
      outputFormats:
        - json
        - csv
---
```

## Overview

This skill provides a two-phase Twitter/X scraping system:

1. **Profile Discovery** — Find Twitter accounts via Google Custom Search API or DuckDuckGo
2. **Browser Scraping** — Scrape public profiles using Playwright with anti-detection (no login required)

## Features

- 🔍  - Discover Twitter/X profiles by location and category
- 🌐  - Full browser simulation for accurate scraping
- 🛡️  - Browser fingerprinting, human behavior simulation, and stealth scripts
- 📊  - Profile info, followers, tweets, engagement data, and media
- 💾  - JSON/CSV export with downloaded thumbnails
- 🔄  - Resume interrupted scraping sessions
- ⚡  - Auto-skip private accounts, low-follower profiles, suspended users
- 🌍  - Built-in residential proxy support with 4 providers

#### Getting Google API Credentials (Optional)

1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select existing
3. Enable "Custom Search API"
4. Create API credentials → API Key
5. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/)
6. Create a search engine with `x.com` and `twitter.com` as the sites to search
7. Copy the Search Engine ID

If not configured, discovery falls back to DuckDuckGo (no API key needed).

## Usage

### Agent Tool Interface

For OpenClaw agent integration, the skill provides JSON output:

```bash
# Discover Twitter profiles (returns JSON)
discover --location "Miami" --category "tech" --output json

# Discover profiles in a specific category (returns JSON)
discover --location "New York" --category "crypto" --output json

# Scrape single profile (returns JSON)
scrape --username elonmusk --output json

# Scrape from a queue file
scrape data/queue/Miami_tech_20260220_120000.json
```

## Output Data

### Profile Data Structure

```json
{
  "username": "elonmusk",
  "display_name": "Elon Musk",
  "bio": "...",
  "followers": 180000000,
  "following": 800,
  "tweets_count": 45000,
  "is_verified": true,
  "profile_pic_url": "https://...",
  "profile_pic_local": "thumbnails/elonmusk/profile_abc123.jpg",
  "user_location": "Mars & Earth",
  "join_date": "June 2009",
  "website": "https://x.ai",
  "influencer_tier": "mega",
  "category": "tech",
  "scrape_location": "New York",
  "scraped_at": "2026-02-17T12:00:00",
  "recent_tweets": [
    {
      "id": "1234567890",
      "text": "Tweet content...",
      "timestamp": "2026-02-17T10:30:00.000Z",
      "likes": 50000,
      "retweets": 12000,
      "replies": 3000,
      "views": "5.2M",
      "media_urls": ["https://..."],
      "media_local": ["thumbnails/elonmusk/tweet_media_0_def456.jpg"],
      "is_retweet": false,
      "is_reply": false,
      "url": "https://x.com/elonmusk/status/1234567890"
    }
  ]
}
```

### Queue File Structure

```json
{
  "location": "New York",
  "category": "tech",
  "total": 15,
  "usernames": ["user1", "user2", "..."],
  "completed": ["user1"],
  "failed": {"user3": "not_found"},
  "current_index": 2,
  "created_at": "2026-02-17T12:00:00",
  "source": "google_api"
}
```

### Influencer Tiers

| Tier  | Followers Range     |
|-------|---------------------|
| nano  | < 1,000             |
| micro | 1,000 - 10,000      |
| mid   | 10,000 - 100,000    |
| macro | 100,000 - 1M        |
| mega  | > 1,000,000         |

### File Outputs

- **Queue files**: `data/queue/{location}_{category}_{timestamp}.json`
- **Scraped data**: `data/output/{username}.json`
- **Thumbnails**: `thumbnails/{username}/profile_*.jpg`, `thumbnails/{username}/tweet_media_*.jpg`
- **Export files**: `data/export_{timestamp}.json`, `data/export_{timestamp}.csv`

## Configuration

Edit `config/scraper_config.json`:

```json
{
  "proxy": {
    "enabled": false,
    "provider": "brightdata",
    "country": "",
    "sticky": true,
    "sticky_ttl_minutes": 10
  },
  "google_search": {
    "enabled": true,
    "api_key": "",
    "search_engine_id": "",
    "queries_per_location": 3
  },
  "scraper": {
    "headless": false,
    "min_followers": 500,
    "max_tweets": 20,
    "download_thumbnails": true,
    "max_thumbnails": 6,
    "delay_between_profiles": [4, 8],
    "timeout": 60000
  },
  "cities": ["New York", "Los Angeles", "Miami", "Chicago"],
  "categories": ["tech", "politics", "sports", "entertainment", "news", "crypto"]
}
```

## Filters Applied

The scraper automatically filters out:

- ❌ Suspended or deactivated accounts
- ❌ Protected (private) accounts
- ❌ Profiles with < 500 followers (configurable)
- ❌ Non-existent usernames
- ❌ Already scraped entries (deduplication)

## Anti-Detection

The scraper uses multiple anti-detection techniques:

- **Browser fingerprinting** — 4 rotating fingerprint profiles (viewport, user agent, timezone, WebGL, etc.)
- **Stealth JavaScript** — Hides `navigator.webdriver`, spoofs plugins/languages/hardware, canvas noise, fake `chrome` object
- **Human behavior simulation** — Random delays, mouse movements, scrolling patterns
- **Network randomization** — Variable timing between requests
- **Login wall handling** — Automatically dismisses Twitter's login prompts and overlays

## Troubleshooting

### No Profiles Discovered

- Check Google API key and quota
- Verify Search Engine ID is configured for x.com and twitter.com
- Try different location/category combinations
- If Google fails, DuckDuckGo fallback is used automatically

### Rate Limiting

- Reduce scraping speed (increase delays in config)
- Run during off-peak hours
- **Use a residential proxy** (see below)

### Login Wall Issues

- The scraper automatically dismisses login prompts
- If content is blocked, try running with `--headless` disabled to debug visually

---

## 🌐 Residential Proxy Support

### Why Use a Residential Proxy?

Running a scraper at scale **without** a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:

| Advantage | Description |
|-----------|-------------|
| **Avoid IP Bans** | Residential IPs look like real household users, not data-center bots. Twitter/X is far less likely to flag them. |
| **Automatic IP Rotation** | Each request (or session) gets a fresh IP, so rate-limits never stack up on one address. |
| **Geo-Targeting** | Route traffic through a specific country/city so scraped content matches the target audience's locale. |
| **Sticky Sessions** | Keep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session. |
| **Higher Success Rate** | Rotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on Twitter/X. |
| **Long-Running Scrapes** | Scrape thousands of profiles over hours or days without interruption. |
| **Concurrent Scraping** | Run multiple browser instances across different IPs simultaneously. |

### Recommended Proxy Providers

We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:

| Provider | Best For | Sign Up |
|----------|----------|---------|
| **Bright Data** | World's largest network, 72M+ IPs, enterprise-grade | 👉 [**Get Bright Data**](https://get.brightdata.com/o1kpd2da8iv4) |
| **IProyal** | Pay-as-you-go, 195+ countries, no traffic expiry | 👉 [**Get IProyal**](https://iproyal.com/?r=ScrapeClaw) |
| **Storm Proxies** | Fast & reliable, developer-friendly API, competitive pricing | 👉 [**Get Storm Proxies**](https://stormproxies.com/clients/aff/go/scrapeclaw) |
| **NetNut** | ISP-grade network, 52M+ IPs, direct connectivity | 👉 [**Get NetNut**](https://netnut.io?ref=mwrlzwv) |



### Setup Steps

#### 1. Get Your Proxy Credentials

Sign up with any provider above, then grab:
- **Username** (from your provider dashboard)
- **Password** (from your provider dashboard)
- **Host** and **Port** are pre-configured per provider (or use custom)

#### 2. Configure via Environment Variables

```bash
export PROXY_ENABLED=true
export PROXY_PROVIDER=brightdata    # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_user
export PROXY_PASSWORD=your_pass
export PROXY_COUNTRY=us             # optional: two-letter country code
export PROXY_STICKY=true            # optional: keep same IP per session
```

#### 3. Provider-Specific Host/Port Defaults

These are auto-configured when you set the `provider` name:

| Provider | Host | Port |
|----------|------|------|
| Bright Data | `brd.superproxy.io` | `22225` |
| IProyal | `proxy.iproyal.com` | `12321` |
| Storm Proxies | `rotating.stormproxies.com` | `9999` |
| NetNut | `gw-resi.netnut.io` | `5959` |

Override with `PROXY_HOST` / `PROXY_PORT` env vars if your plan uses a different gateway.

#### 4. Custom Proxy Provider

For any other proxy service, set provider to `custom` and supply host/port manually:

```json
{
  "proxy": {
    "enabled": true,
    "provider": "custom",
    "host": "your.proxy.host",
    "port": 8080,
    "username": "user",
    "password": "pass"
  }
}
```

### Running the Scraper with Proxy

Once configured, the scraper picks up the proxy automatically — no extra flags needed:

```bash
# Discover and scrape as usual — proxy is applied automatically
python main.py discover --location "Miami" --category "tech"
python main.py scrape --username elonmusk

# The log will confirm proxy is active:
# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>
# INFO - Browser using proxy: brightdata → brd.superproxy.io:22225
```

### Using the Proxy Manager Programmatically

```python
from proxy_manager import ProxyManager

# From config (auto-reads config/scraper_config.json)
pm = ProxyManager.from_config()

# From environment variables
pm = ProxyManager.from_env()

# Manual construction
pm = ProxyManager(
    provider="brightdata",
    username="your_user",
    password="your_pass",
    country="us",
    sticky=True
)

# For Playwright browser context
proxy = pm.get_playwright_proxy()
# → {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"}

# For requests / aiohttp
proxies = pm.get_requests_proxy()
# → {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}

# Force new IP (rotates session ID)
pm.rotate_session()

# Debug info
print(pm.info())
```

### Best Practices for Long-Running Scrapes

1. **Use sticky sessions** — Twitter requires consistent IPs during a browsing session. Set `"sticky": true`.
2. **Target the right country** — Set `"country": "us"` (or your target region) so Twitter serves content in the expected locale.
3. **Combine with existing anti-detection** — This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer.
4. **Rotate sessions between batches** — Call `pm.rotate_session()` between large batches of profiles to get a fresh IP.
5. **Use delays** — Even with proxies, respect `delay_between_profiles` in config (default 4-8s) to avoid aggressive patterns.
6. **Monitor your proxy dashboard** — All providers have dashboards showing bandwidth usage and success rates.

## Notes

- **No login required** — Only scrapes publicly visible content
- **Checkpoint/resume** — Queue files track progress; interrupted scrapes can be resumed with `--resume`
- **Rate limiting** — Waits 60s on rate limit, stops on daily limit detection
- **Twitter selectors** — Uses `data-testid` attributes (stable across UI changes) with fallbacks to `aria-label` and structural selectors

Related Skills

Twitter Command Center (Search + Post)

3891

from openclaw/skills

Searches and reads X (Twitter): profiles, timelines, mentions, followers, tweet search, trends, lists, communities, and Spaces. Publishes posts after the user completes OAuth in the browser. Use when the user asks about Twitter/X data, social listening, or posting without sharing account passwords.

Social Media

news-hot-scraper

3891

from openclaw/skills

This skill should be used when users need to scrape hot news topics from Chinese platforms (微博、知乎、B站、抖音、今日头条、腾讯新闻、澎湃新闻), generate summaries, and cite sources. It supports both API-based and direct scraping methods, and offers both extractive and abstractive summarization techniques.

Data & Research

x-twitter-poster

3891

from openclaw/skills

X (Twitter) 发推 Skill。使用 Playwright 连接用户已登录的 Chrome 浏览器，自动填写并发送推文。适用场景： - 用户要求"发推"、"发一条推"、"发推文"、"发推特" - 用户要求"帮我发一条关于 XX 的推" - 用户要求"发一条推文，内容是..." 核心能力： - 连接用户 Chrome 浏览器（CDP 模式） - 继承用户的登录状态 - 跨平台支持：Mac (Meta+Enter) / Windows/Linux (Control+Enter) ⚠️ 安全要求： - 仅在信任代码时启用 CDP 端口 - 建议使用单独 Chrome 账号测试 - 可配置 X_USERNAME 环境变量

Content & Documentation

X/Twitter Research Skill

3891

from openclaw/skills

Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.

Data & Research

twittershots

3891

from openclaw/skills

Generate high-quality screenshots of Twitter/X posts using the TwitterShots API. Use when the user wants to: capture a tweet as an image, screenshot a tweet, generate tweet image, convert tweet to PNG/SVG/HTML, create tweet screenshot for social media (Instagram, TikTok), or mentions "TwitterShots", "tweet screenshot", "capture tweet", "tweet image". Triggers on tweet URLs (twitter.com/*/status/* or x.com/*/status/*) or tweet IDs. Default to format=png and theme=light without asking follow-up questions; if the user explicitly provides format and/or theme, use the user-provided values.

social-media-content-scraper-pro

3891

from openclaw/skills

Social Media Content Bulk Scraper, extract articles/posts from WeChat, Instagram, TikTok, YouTube, export to Markdown/HTML with full metadata. $0.005 USDT per use.

hinge-profile-optimizer

3891

from openclaw/skills

Comprehensive, research-backed Hinge dating profile optimization. Use when someone wants to improve their Hinge profile, audit an existing profile, write better prompts/captions, select and order photos strategically, or understand why they're not getting quality matches. This is the thorough process (~45 mins) - discovery interview, honest market math, photo strategy, copy creation, settings cleanup, and implementation support. Grounded in peer-reviewed behavioral research, platform data, and signaling theory.

twitter-api-alternative

3891

from openclaw/skills

Twitter API Alternative — Search 1B+ tweets with natural language queries, boolean filters, and one-click CSV exports (up to 64K rows). Look up profiles, find users by topic, and track conversations. No developer account needed, no complex OAuth setup — 2-minute setup via Xpoz MCP.

YouTube Channel Scraper

3891

from openclaw/skills

A browser-based YouTube channel discovery and scraping tool.

TikTok Profile Scraper

3891

from openclaw/skills

A browser-based TikTok profile discovery and scraping tool.

Instagram Profile Scraper

3891

from openclaw/skills

A browser-based Instagram profile discovery and scraping tool.

Facebook Page & Group Scraper

3891

from openclaw/skills

> Part of **[ScrapeClaw](https://www.scrapeclaw.cc/)** — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.