TikTok Profile Scraper

A browser-based TikTok profile discovery and scraping tool.

3,891 stars

Best use case

TikTok Profile Scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

A browser-based TikTok profile discovery and scraping tool.

Teams using TikTok Profile Scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tiktok-scraper-2/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/arulmozhiv/tiktok-scraper-2/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/tiktok-scraper-2/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How TikTok Profile Scraper Compares

Feature / AgentTikTok Profile ScraperStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

A browser-based TikTok profile discovery and scraping tool.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# TikTok Profile Scraper

A browser-based TikTok profile discovery and scraping tool.

> Part of **[ScrapeClaw](https://www.scrapeclaw.cc/)** — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook built with Python & Playwright, no API keys required.

```yaml
---
name: tiktok-scraper
description: Discover and scrape TikTok profiles from your browser.
emoji: 🎵
version: 1.0.0
author: influenza
tags:
  - tiktok
  - scraping
  - social-media
  - influencer-discovery
metadata:
  clawdbot:
    requires:
      bins:
        - python3
        - chromium

    config:
      stateDirs:
        - data/output
        - data/queue
        - thumbnails
      outputFormats:
        - json
        - csv
---
```

## Overview

This skill provides a two-phase TikTok scraping system:

1. **Profile Discovery**  
2. **Browser Scraping** 

## Features

- 🔍  - Discover TikTok profiles by location and category
- 🌐  - Full browser simulation for accurate scraping
- 🛡️  - Browser fingerprinting, human behavior simulation, and stealth scripts
- 📊  - Profile info, stats, video thumbnails, and engagement data
- 💾  - JSON/CSV export with downloaded thumbnails
- 🔄  - Resume interrupted scraping sessions
- ⚡  - Auto-skip private accounts, low followers, empty profiles
- 🌍  - Built-in residential proxy support with 4 providers



#### Getting Google API Credentials (Optional)

1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select existing
3. Enable "Custom Search API"
4. Create API credentials → API Key
5. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/)
6. Create a search engine with `tiktok.com` as the site to search
7. Copy the Search Engine ID

## Usage

### Agent Tool Interface

For OpenClaw agent integration, the skill provides JSON output:

```bash
# Discover profiles (returns JSON)
discover --location "Miami" --category "dance" --output json

# Scrape single profile (returns JSON)
scrape --username charlidamelio --output json
```

## Output Data

### Profile Data Structure

```json
{
  "username": "example_creator",
  "full_name": "Example Creator",
  "nickname": "Example",
  "bio": "Dance creator | NYC 💃",
  "bio_link": "https://example.com",
  "followers": 250000,
  "following": 800,
  "likes": 5000000,
  "videos_count": 120,
  "is_verified": false,
  "is_private": false,
  "influencer_tier": "macro",
  "category": "dance",
  "location": "New York",
  "profile_url": "https://www.tiktok.com/@example_creator",
  "profile_pic_local": "thumbnails/example_creator/profile_abc123.jpg",
  "content_thumbnails": [
    "thumbnails/example_creator/content_1_def456.jpg",
    "thumbnails/example_creator/content_2_ghi789.jpg"
  ],
  "video_views": [
    {"display": "1.2M", "count": 1200000},
    {"display": "500K", "count": 500000}
  ],
  "scrape_timestamp": "2026-03-02T14:30:00"
}
```

### Influencer Tiers

| Tier  | Follower Range    |
|-------|-------------------|
| nano  | < 1,000           |
| micro | 1,000 - 10,000    |
| mid   | 10,000 - 100,000  |
| macro | 100,000 - 1M      |
| mega  | > 1,000,000       |

### File Outputs

- **Queue files**: `data/queue/{location}_{category}_{timestamp}.json`
- **Scraped data**: `data/output/{username}.json`
- **Thumbnails**: `thumbnails/{username}/profile_*.jpg`, `thumbnails/{username}/content_*.jpg`
- **Export files**: `data/export_{timestamp}.json`, `data/export_{timestamp}.csv`

## Configuration

Edit `config/scraper_config.json`:

```json
{
  "proxy": {
    "enabled": false,
    "provider": "brightdata",
    "country": "",
    "sticky": true,
    "sticky_ttl_minutes": 10
  },
  "google_search": {
    "enabled": true,
    "api_key": "",
    "search_engine_id": "",
    "queries_per_location": 3
  },
  "scraper": {
    "headless": false,
    "min_followers": 1000,
    "download_thumbnails": true,
    "max_thumbnails": 6
  },
  "cities": ["New York", "Los Angeles", "Miami", "Chicago"],
  "categories": ["fashion", "beauty", "fitness", "food", "travel", "tech", "comedy", "dance", "music", "gaming"]
}
```



## Filters Applied

The scraper automatically filters out:

- ❌ Private accounts
- ❌ Accounts with < 1,000 followers (configurable)
- ❌ Accounts with no videos
- ❌ Non-existent/removed accounts
- ❌ Already scraped accounts (deduplication)

## Troubleshooting

### No Profiles Discovered

- Check Google API key and quota
- Verify Search Engine ID is configured for tiktok.com
- Try different location/category combinations

### Rate Limiting

- Reduce scraping speed (increase delays in config)
- Run during off-peak hours
- **Use a residential proxy** (see below)

### CAPTCHA / Bot Detection

- TikTok has aggressive bot detection — residential proxies are strongly recommended
- The built-in anti-detection handles fingerprinting and stealth automatically
- If you see CAPTCHAs, try running in non-headless mode and solve them manually

---

## 🌐 Residential Proxy Support

### Why Use a Residential Proxy?

Running a scraper at scale **without** a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:

| Advantage | Description |
|-----------|-------------|
| **Avoid IP Bans** | Residential IPs look like real household users, not data-center bots. TikTok is far less likely to flag them. |
| **Automatic IP Rotation** | Each request (or session) gets a fresh IP, so rate-limits never stack up on one address. |
| **Geo-Targeting** | Route traffic through a specific country/city so scraped content matches the target audience's locale. |
| **Sticky Sessions** | Keep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session. |
| **Higher Success Rate** | Rotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on TikTok. |
| **Long-Running Scrapes** | Scrape thousands of profiles over hours or days without interruption. |
| **Concurrent Scraping** | Run multiple browser instances across different IPs simultaneously. |

### Recommended Proxy Providers

We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:

| Provider | Best For | Sign Up |
|----------|----------|---------|
| **Bright Data** | World's largest network, 72M+ IPs, enterprise-grade | 👉 [**Get Bright Data**](https://get.brightdata.com/o1kpd2da8iv4) |
| **IProyal** | Pay-as-you-go, 195+ countries, no traffic expiry | 👉 [**Get IProyal**](https://iproyal.com/?r=ScrapeClaw) |
| **Storm Proxies** | Fast & reliable, developer-friendly API, competitive pricing | 👉 [**Get Storm Proxies**](https://stormproxies.com/clients/aff/go/scrapeclaw) |
| **NetNut** | ISP-grade network, 52M+ IPs, direct connectivity | 👉 [**Get NetNut**](https://netnut.io?ref=mwrlzwv) |



### Setup Steps

#### 1. Get Your Proxy Credentials

Sign up with any provider above, then grab:
- **Username** (from your provider dashboard)
- **Password** (from your provider dashboard)
- **Host** and **Port** are pre-configured per provider (or use custom)

#### 2. Configure via Environment Variables

```bash
export PROXY_ENABLED=true
export PROXY_PROVIDER=brightdata    # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_user
export PROXY_PASSWORD=your_pass
export PROXY_COUNTRY=us             # optional: two-letter country code
export PROXY_STICKY=true            # optional: keep same IP per session
```

#### 3. Provider-Specific Host/Port Defaults

These are auto-configured when you set the `provider` name:

| Provider | Host | Port |
|----------|------|------|
| Bright Data | `brd.superproxy.io` | `22225` |
| IProyal | `proxy.iproyal.com` | `12321` |
| Storm Proxies | `rotating.stormproxies.com` | `9999` |
| NetNut | `gw-resi.netnut.io` | `5959` |

Override with `PROXY_HOST` / `PROXY_PORT` env vars if your plan uses a different gateway.

#### 4. Custom Proxy Provider

For any other proxy service, set provider to `custom` and supply host/port manually:

```json
{
  "proxy": {
    "enabled": true,
    "provider": "custom",
    "host": "your.proxy.host",
    "port": 8080,
    "username": "user",
    "password": "pass"
  }
}
```

### Running the Scraper with Proxy

Once configured, the scraper picks up the proxy automatically — no extra flags needed:

```bash
# Discover and scrape as usual — proxy is applied automatically
python main.py discover --location "Miami" --category "dance"
python main.py scrape --username charlidamelio

# The log will confirm proxy is active:
# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>
```

### Using the Proxy Manager Programmatically

```python
from proxy_manager import ProxyManager

# From config (auto-reads config/scraper_config.json)
pm = ProxyManager.from_config()

# From environment variables
pm = ProxyManager.from_env()

# Manual construction
pm = ProxyManager(
    provider="brightdata",
    username="your_user",
    password="your_pass",
    country="us",
    sticky=True
)

# For Playwright browser context
proxy = pm.get_playwright_proxy()
# → {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"}

# For requests / aiohttp
proxies = pm.get_requests_proxy()
# → {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}

# Force new IP (rotates session ID)
pm.rotate_session()

# Debug info
print(pm.info())
```

### Best Practices for Long-Running Scrapes

1. **Use sticky sessions** — TikTok requires consistent IPs during a browsing session. Set `"sticky": true`.
2. **Target the right country** — Set `"country": "us"` (or your target region) so TikTok serves content in the expected locale.
3. **Combine with existing anti-detection** — This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer.
4. **Rotate sessions between batches** — Call `pm.rotate_session()` between large batches of profiles to get a fresh IP.
5. **Use delays** — Even with proxies, respect `delay_between_profiles` in config to avoid aggressive patterns.
6. **Monitor your proxy dashboard** — All providers have dashboards showing bandwidth usage and success rates.

Related Skills

TikTok B2B 引流台词生成器

3891
from openclaw/skills

## 技能描述

Content & Documentation

news-hot-scraper

3891
from openclaw/skills

This skill should be used when users need to scrape hot news topics from Chinese platforms (微博、知乎、B站、抖音、今日头条、腾讯新闻、澎湃新闻), generate summaries, and cite sources. It supports both API-based and direct scraping methods, and offers both extractive and abstractive summarization techniques.

Data & Research

tiktok-app-marketing

3891
from openclaw/skills

Automate TikTok slideshow marketing for any app or product. Researches competitors, generates AI images, adds text overlays, posts via Postiz, tracks analytics, and iterates on what works. Use when setting up TikTok marketing automation, creating slideshow posts, analyzing post performance, optimizing app marketing funnels, or when a user mentions TikTok growth, slideshow ads, or social media marketing for their app. Covers competitor research (browser-based), image generation, text overlays, TikTok posting (Postiz API), cross-posting to Instagram/YouTube/Threads, analytics tracking, hook testing, CTA optimization, conversion tracking with RevenueCat, and a full feedback loop that adjusts hooks and CTAs based on views vs conversions.

social-media-content-scraper-pro

3891
from openclaw/skills

Social Media Content Bulk Scraper, extract articles/posts from WeChat, Instagram, TikTok, YouTube, export to Markdown/HTML with full metadata. $0.005 USDT per use.

hinge-profile-optimizer

3891
from openclaw/skills

Comprehensive, research-backed Hinge dating profile optimization. Use when someone wants to improve their Hinge profile, audit an existing profile, write better prompts/captions, select and order photos strategically, or understand why they're not getting quality matches. This is the thorough process (~45 mins) - discovery interview, honest market math, photo strategy, copy creation, settings cleanup, and implementation support. Grounded in peer-reviewed behavioral research, platform data, and signaling theory.

tiktok-slideshow

3891
from openclaw/skills

Creates TikTok image carousels (slideshows with text overlays on photos) via the ViralBaby API. Use when the user wants to: create TikTok slideshows or carousels, find/search for background images for social media content, post or upload slideshow content to TikTok, edit slide text, or manage image collections for content creation. Do NOT use for: general TikTok account management, TikTok analytics or metrics, video editing or video creation (this is for photo slideshows only), non-TikTok social media platforms, or any task unrelated to creating visual slideshow content for TikTok.

YouTube Channel Scraper

3891
from openclaw/skills

A browser-based YouTube channel discovery and scraping tool.

Twitter/X Profile Scraper

3891
from openclaw/skills

A browser-based Twitter/X profile discovery and scraping tool.

Instagram Profile Scraper

3891
from openclaw/skills

A browser-based Instagram profile discovery and scraping tool.

Facebook Page & Group Scraper

3891
from openclaw/skills

> Part of **[ScrapeClaw](https://www.scrapeclaw.cc/)** — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.

grok-scraper

3891
from openclaw/skills

Execute queries to Grok AI via Playwright browser automation without requiring an X API KEY. Use when the user wants to "ask Grok", search X for real-time info, or specifically requests to use Grok for free without API billing.

tiktok-trend-slayer

3891
from openclaw/skills

TikTok 选品猎手 - 自动监控 TikTok 商品榜与达人榜,利用 AI 挖掘高增长爆款,生成选品及达人撮合策略。当用户需要 TikTok 选品分析、爆款挖掘、达人匹配、趋势监控时使用此技能。