Apify

Social media scraping, business data, e-commerce via Apify actors — with auto-update workflow for actor catalog. USE WHEN Twitter, Instagram, LinkedIn, TikTok, YouTube, Facebook, Google Maps, Amazon scraping, Apify, update Apify actors, social media scraping, lead generation, web scraper.

11,146 stars

Best use case

Apify is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Social media scraping, business data, e-commerce via Apify actors — with auto-update workflow for actor catalog. USE WHEN Twitter, Instagram, LinkedIn, TikTok, YouTube, Facebook, Google Maps, Amazon scraping, Apify, update Apify actors, social media scraping, lead generation, web scraper.

Teams using Apify should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/Apify/SKILL.md --create-dirs "https://raw.githubusercontent.com/danielmiessler/Personal_AI_Infrastructure/main/Packs/Scraping/src/Apify/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/Apify/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Apify Compares

Feature / AgentApifyStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Social media scraping, business data, e-commerce via Apify actors — with auto-update workflow for actor catalog. USE WHEN Twitter, Instagram, LinkedIn, TikTok, YouTube, Facebook, Google Maps, Amazon scraping, Apify, update Apify actors, social media scraping, lead generation, web scraper.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Customization

**Before executing, check for user customizations at:**
`~/.claude/PAI/USER/SKILLCUSTOMIZATIONS/Apify/`

If this directory exists, load and apply any PREFERENCES.md, configurations, or resources found there. These override default behavior. If the directory does not exist, proceed with skill defaults.


## 🚨 MANDATORY: Voice Notification (REQUIRED BEFORE ANY ACTION)

**You MUST send this notification BEFORE doing anything else when this skill is invoked.**

1. **Send voice notification**:
   ```bash
   curl -s -X POST http://localhost:8888/notify \
     -H "Content-Type: application/json" \
     -d '{"message": "Running the WORKFLOWNAME workflow in the Apify skill to ACTION"}' \
     > /dev/null 2>&1 &
   ```

2. **Output text notification**:
   ```
   Running the **WorkflowName** workflow in the **Apify** skill to ACTION...
   ```

**This is not optional. Execute this curl command immediately upon skill invocation.**

# Apify - Social Media & Web Scraping

Direct TypeScript access to 9 popular Apify actors with 99% token savings.

## 🔌 File-Based MCP

This skill is a **file-based MCP** - a code-first API wrapper that replaces token-heavy MCP protocol calls.

**Why file-based?** Filter data in code BEFORE returning to model context = 97.5% token savings.

**Architecture:** See `~/.claude/PAI/DOCUMENTATION/FileBasedMCPs.md`

## 🎯 Overview

Direct TypeScript access to the 9 most popular Apify actors without MCP overhead. Filter and transform data in code BEFORE it reaches the model context.

## 📊 Available Actors

### Social Media (5 platforms)
- **Instagram** (145k users, 4.60★) - Profiles, posts, hashtags, comments
- **LinkedIn** (26k users, 4.10★) - Profiles, jobs, posts
- **TikTok** (90k users, 4.61★) - Profiles, videos, hashtags, comments
- **YouTube** (40k users, 4.40★) - Channels, videos, comments, search
- **Facebook** (35k users, 4.56★) - Posts, groups, comments

### Business & Lead Generation
- **Google Maps** (198k users, 4.76★) - **HIGHEST VALUE!**
  - Search businesses, extract contacts, reviews, images
  - Perfect for lead generation

### E-commerce
- **Amazon** (8k users, 4.97★) - Products, reviews, pricing

### Web Scraping
- **Web Scraper** (94k users, 4.39★) - General-purpose, works with ANY website

## 🚀 Quick Start

### Basic Usage Pattern

```typescript
import { scrapeInstagramProfile, searchGoogleMaps } from 'actors'

// 1. Call the actor wrapper
const profile = await scrapeInstagramProfile({
  username: 'target_username',
  maxPosts: 50
})

// 2. Filter in code - BEFORE data reaches model!
const viral = profile.latestPosts?.filter(p => p.likesCount > 10000)

// 3. Only filtered results reach model context
console.log(viral) // ~10 posts instead of 50
```

## 📚 Examples by Use Case

### Social Media Monitoring

**Instagram - Track engagement:**
```typescript
import { scrapeInstagramProfile, scrapeInstagramPosts } from 'actors'

// Get profile with recent posts
const profile = await scrapeInstagramProfile({
  username: 'competitor',
  maxPosts: 100
})

// Filter in code - only high-performing posts from last 30 days
const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000)
const topRecent = profile.latestPosts
  ?.filter(p =>
    new Date(p.timestamp).getTime() > thirtyDaysAgo &&
    p.likesCount > 5000
  )
  .sort((a, b) => b.likesCount - a.likesCount)
  .slice(0, 10)

// Only 10 posts reach model instead of 100!
```

**LinkedIn - Job search:**
```typescript
import { searchLinkedInJobs } from 'actors'

const jobs = await searchLinkedInJobs({
  keywords: 'AI engineer',
  location: 'San Francisco',
  remote: true,
  maxResults: 200
})

// Filter in code - only senior roles at well-funded startups
const topJobs = jobs.filter(j =>
  j.seniority?.includes('Senior') &&
  parseInt(j.applicants || '0') > 50
)
```

**TikTok - Trend analysis:**
```typescript
import { scrapeTikTokHashtag } from 'actors'

const videos = await scrapeTikTokHashtag({
  hashtag: 'ai',
  maxResults: 500
})

// Filter in code - only viral content
const viral = videos
  .filter(v => v.playCount > 1000000)
  .sort((a, b) => b.playCount - a.playCount)
  .slice(0, 20)
```

### Lead Generation (Business Intelligence)

**Google Maps - Local business leads:**
```typescript
import { searchGoogleMaps } from 'actors'

// Search with contact info extraction
const places = await searchGoogleMaps({
  query: 'restaurants in Austin',
  maxResults: 500,
  includeReviews: true,
  maxReviewsPerPlace: 20,
  scrapeContactInfo: true // Extracts emails from websites!
})

// Filter in code - only highly-rated with email/phone
const qualifiedLeads = places
  .filter(p =>
    p.rating >= 4.5 &&
    p.reviewsCount >= 100 &&
    (p.email || p.phone)
  )
  .map(p => ({
    name: p.name,
    rating: p.rating,
    reviews: p.reviewsCount,
    email: p.email,
    phone: p.phone,
    website: p.website,
    address: p.address
  }))

// Export leads - only qualified results!
console.log(`Found ${qualifiedLeads.length} qualified leads`)
```

**Google Maps - Review sentiment analysis:**
```typescript
import { scrapeGoogleMapsReviews } from 'actors'

const reviews = await scrapeGoogleMapsReviews({
  placeUrl: 'https://maps.google.com/maps?cid=12345',
  maxResults: 1000
})

// Filter in code - analyze sentiment by rating
const recentNegative = reviews
  .filter(r => {
    const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000)
    return (
      r.rating <= 2 &&
      new Date(r.publishedAtDate).getTime() > thirtyDaysAgo &&
      r.text.length > 50
    )
  })

// Identify common complaints
const complaints = recentNegative.map(r => r.text)
```

### E-commerce & Competitive Intelligence

**Amazon - Price monitoring:**
```typescript
import { scrapeAmazonProduct } from 'actors'

const product = await scrapeAmazonProduct({
  productUrl: 'https://www.amazon.com/dp/B08L5VT894',
  includeReviews: true,
  maxReviews: 200
})

// Filter in code - only recent negative reviews
const recentNegative = product.reviews
  ?.filter(r => {
    const weekAgo = Date.now() - (7 * 24 * 60 * 60 * 1000)
    return (
      r.rating <= 2 &&
      new Date(r.date).getTime() > weekAgo
    )
  })

console.log(`Price: $${product.price}`)
console.log(`Rating: ${product.rating}/5`)
console.log(`Recent issues: ${recentNegative?.length} complaints`)
```

### Custom Web Scraping

**Any Website - Custom extraction:**
```typescript
import { scrapeWebsite } from 'actors'

const products = await scrapeWebsite({
  startUrls: ['https://example.com/products'],
  linkSelector: 'a.product-link',
  maxPagesPerCrawl: 100,
  pageFunction: `
    async function pageFunction(context) {
      const { request, $, log } = context

      return {
        url: request.url,
        title: $('h1.product-title').text(),
        price: $('span.price').text(),
        inStock: $('.in-stock').length > 0,
        description: $('.description').text()
      }
    }
  `
})

// Filter in code - only available products under $100
const affordable = products.filter(p =>
  p.inStock &&
  parseFloat(p.price.replace('$', '')) < 100
)
```

## 🎨 Advanced Patterns

### Pattern 1: Multi-Platform Social Listening

```typescript
import {
  scrapeInstagramHashtag,
  scrapeTikTokHashtag,
  searchYouTube
} from 'actors'

// Run all platforms in parallel
const [instagramPosts, tiktokVideos, youtubeVideos] = await Promise.all([
  scrapeInstagramHashtag({ hashtag: 'ai', maxResults: 100 }),
  scrapeTikTokHashtag({ hashtag: 'ai', maxResults: 100 }),
  searchYouTube({ query: '#ai', maxResults: 100 })
])

// Combine and filter - only viral content across all platforms
const allViral = [
  ...instagramPosts.filter(p => p.likesCount > 10000),
  ...tiktokVideos.filter(v => v.playCount > 100000),
  ...youtubeVideos.filter(v => v.viewsCount > 50000)
]

console.log(`Found ${allViral.length} viral posts across 3 platforms`)
```

### Pattern 2: Lead Enrichment Pipeline

```typescript
import { searchGoogleMaps, scrapeLinkedInProfile } from 'actors'

// 1. Find businesses on Google Maps
const restaurants = await searchGoogleMaps({
  query: 'restaurants in SF',
  maxResults: 100,
  scrapeContactInfo: true
})

// 2. Filter for qualified leads
const qualified = restaurants.filter(r =>
  r.rating >= 4.5 &&
  r.email &&
  r.reviewsCount >= 50
)

// 3. Enrich with LinkedIn data (if available)
const enriched = await Promise.all(
  qualified.map(async (restaurant) => {
    // Try to find LinkedIn company page
    // ... additional enrichment logic
    return restaurant
  })
)
```

### Pattern 3: Competitive Analysis Dashboard

```typescript
import {
  scrapeInstagramProfile,
  scrapeYouTubeChannel,
  scrapeTikTokProfile
} from 'actors'

async function analyzeCompetitor(username: string) {
  // Gather data from all platforms
  const [instagram, youtube, tiktok] = await Promise.all([
    scrapeInstagramProfile({ username, maxPosts: 30 }),
    scrapeYouTubeChannel({ channelUrl: `https://youtube.com/@${username}`, maxVideos: 30 }),
    scrapeTikTokProfile({ username, maxVideos: 30 })
  ])

  // Calculate engagement metrics in code
  return {
    username,
    instagram: {
      followers: instagram.followersCount,
      avgLikes: average(instagram.latestPosts?.map(p => p.likesCount) || []),
      engagementRate: calculateEngagement(instagram)
    },
    youtube: {
      subscribers: youtube.subscribersCount,
      avgViews: average(youtube.videos?.map(v => v.viewsCount) || [])
    },
    tiktok: {
      followers: tiktok.followersCount,
      avgPlays: average(tiktok.videos?.map(v => v.playCount) || [])
    }
  }
}
```

## 💰 Token Savings Calculator

**Example: Instagram profile with 100 posts**

**MCP Approach:**
```
1. search-actors → 1,000 tokens
2. call-actor → 1,000 tokens
3. get-actor-output → 50,000 tokens (100 unfiltered posts)
TOTAL: ~52,000 tokens
```

**File-Based Approach:**
```typescript
const profile = await scrapeInstagramProfile({
  username: 'user',
  maxPosts: 100
})

// Filter in code - only top 10 posts
const top = profile.latestPosts
  ?.sort((a, b) => b.likesCount - a.likesCount)
  .slice(0, 10)

// TOTAL: ~500 tokens (only 10 filtered posts reach model)
```

**Savings: 99% reduction (52,000 → 500 tokens)**

## 🔧 Actor Reference

### Social Media

#### Instagram
- `scrapeInstagramProfile(input)` - Profile + posts
- `scrapeInstagramPosts(input)` - Posts from user
- `scrapeInstagramHashtag(input)` - Posts by hashtag
- `scrapeInstagramComments(input)` - Comments on post

#### LinkedIn
- `scrapeLinkedInProfile(input)` - Profile + experience + email
- `searchLinkedInJobs(input)` - Job listings
- `scrapeLinkedInPosts(input)` - Posts from profile/company

#### TikTok
- `scrapeTikTokProfile(input)` - Profile + videos
- `scrapeTikTokHashtag(input)` - Videos by hashtag
- `scrapeTikTokComments(input)` - Comments on video

#### YouTube
- `scrapeYouTubeChannel(input)` - Channel + videos
- `searchYouTube(input)` - Search videos
- `scrapeYouTubeComments(input)` - Comments on video

#### Facebook
- `scrapeFacebookPosts(input)` - Posts from pages
- `scrapeFacebookGroups(input)` - Group posts
- `scrapeFacebookComments(input)` - Post comments

### Business & Lead Generation

#### Google Maps
- `searchGoogleMaps(input)` - Search places (with contact extraction!)
- `scrapeGoogleMapsPlace(input)` - Single place details
- `scrapeGoogleMapsReviews(input)` - Place reviews

### E-commerce

#### Amazon
- `scrapeAmazonProduct(input)` - Product details + reviews
- `scrapeAmazonReviews(input)` - Product reviews only

### Web Scraping

#### General Web
- `scrapeWebsite(input)` - Custom multi-page crawling
- `scrapePage(url, pageFunction)` - Single page extraction

## ⚙️ Configuration

**Environment Variables:**
```bash
# Required - Get from https://console.apify.com/account/integrations
APIFY_TOKEN=apify_api_xxxxx...
```

**Actor Run Options:**
```typescript
{
  memory: 2048,    // MB: 128, 256, 512, 1024, 2048, 4096, 8192
  timeout: 300,    // seconds
  build: 'latest'  // or specific build number
}
```

## 🎯 When to Use This vs MCP

**Use File-Based (this skill):**
- ✅ Need to filter large datasets (>100 results)
- ✅ Want to transform/aggregate data in code
- ✅ Multiple sequential operations
- ✅ Control flow (loops, conditionals)
- ✅ Maximum token efficiency

**Use MCP:**
- ❌ Simple single operations with small results (<10 items)
- ❌ One-off exploratory queries
- ❌ Don't want to write code

## 🔗 Links

- Apify Platform: https://apify.com
- Actor Store: https://apify.com/store
- API Docs: https://docs.apify.com/api/v2

---

**Remember: Filter data in code BEFORE returning to model context. This is where the 99% token savings happen!**

Related Skills

Utilities

11146
from danielmiessler/Personal_AI_Infrastructure

Developer utilities and tools — CLI generation, skill scaffolding, agent delegation, system upgrades, evals, documents, parsing, audio editing, Fabric patterns, Cloudflare infrastructure, browser automation, meta-prompting, and aphorisms. USE WHEN create CLI, build CLI, command-line tool, wrap API, add command, upgrade tier, TypeScript CLI, create skill, new skill, scaffold skill, validate skill, update skill, fix skill structure, canonicalize skill, parallel execution, agent teams, delegate, workstreams, swarm, upgrade, improve system, system upgrade, check Anthropic, algorithm upgrade, mine reflections, find sources, research upgrade, PAI upgrade, eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, view results, document, process file, create document, convert format, extract text, PDF, DOCX, XLSX, PPTX, Word, Excel, spreadsheet, PowerPoint, presentation, slides, consulting report, large PDF, merge PDF, fill form, tracked changes, redlining, parse, extract, URL, transcript, entities, JSON, batch, YouTube, article, newsletter, Twitter, browser extension, collision detection, detect content type, extract article, extract newsletter, extract YouTube, extract PDF, parse content, clean audio, edit audio, remove filler words, clean podcast, remove ums, cut dead air, polish audio, transcribe, analyze audio, audio pipeline, fabric, fabric pattern, run fabric, update patterns, sync fabric, summarize, threat model pattern, Cloudflare, worker, deploy, Pages, MCP server, wrangler, DNS, KV, R2, D1, Vectorize, browser, screenshot, debug web, verify UI, troubleshoot frontend, automate browser, browse website, review stories, run stories, web automation, meta-prompting, template generation, prompt optimization, programmatic prompt, render template, validate template, prompt engineering, aphorism, quote, saying, find quote, research thinker, newsletter quotes, add aphorism, search aphorisms.

ContentAnalysis

11146
from danielmiessler/Personal_AI_Infrastructure

Content extraction and analysis — wisdom extraction from videos, podcasts, articles, and YouTube. USE WHEN extract wisdom, content analysis, analyze content, insight report, analyze video, analyze podcast, extract insights, key takeaways, what did I miss, extract from YouTube.

WriteStory

11146
from danielmiessler/Personal_AI_Infrastructure

Layered fiction writing system using Will Storr's storytelling science and rhetorical figures. USE WHEN write story, fiction, novel, short story, book, chapter, story bible, character arc, plot outline, creative writing, worldbuilding, narrative, mystery writing, dialogue, prose, series planning.

USMetrics

11146
from danielmiessler/Personal_AI_Infrastructure

US economic indicators. USE WHEN GDP, inflation, unemployment, economic metrics, gas prices. SkillSearch('usmetrics') for docs.

Sales

11146
from danielmiessler/Personal_AI_Infrastructure

Sales workflows. USE WHEN sales, proposal, pricing. SkillSearch('sales') for docs.

PAI

11146
from danielmiessler/Personal_AI_Infrastructure

Personal AI Infrastructure core. The authoritative reference for how PAI works.

VoiceServer

11146
from danielmiessler/Personal_AI_Infrastructure

Voice server management. USE WHEN voice server, TTS server, voice notification, prosody.

THEALGORITHM

11146
from danielmiessler/Personal_AI_Infrastructure

Universal execution engine using scientific method to achieve ideal state. USE WHEN complex tasks, multi-step work, "run the algorithm", "use the algorithm", OR any non-trivial request that benefits from structured execution with ISC (Ideal State Criteria) tracking.

System

11146
from danielmiessler/Personal_AI_Infrastructure

System maintenance with three core operations - integrity check (find/fix broken references), document session (current transcript), document recent (catch-up since last update). Plus security workflows. USE WHEN integrity check, audit system, document session, document this session, document today, document recent, catch up docs, what's undocumented, check for secrets, security scan, privacy check, OR asking about past work ("we just worked on", "remember when we").

CORE

11146
from danielmiessler/Personal_AI_Infrastructure

Personal AI Infrastructure core. AUTO-LOADS at session start. The authoritative reference for how the PAI system works, how to use it, and all system-level configuration. USE WHEN any session begins, user asks about the system, identity, configuration, workflows, security, or any other question about how the PAI system operates.

thinking

11146
from danielmiessler/Personal_AI_Infrastructure

Multi-mode analytical and creative thinking — first principles decomposition, iterative depth analysis, creative brainstorming, multi-agent council debates, adversarial red teaming, world threat modeling, and scientific hypothesis testing. USE WHEN first principles, decompose, deconstruct, reconstruct, challenge assumptions, iterative depth, multi-angle, deep exploration, be creative, brainstorm, divergent ideas, tree of thoughts, maximum creativity, technical creativity, idea generation, domain specific, council, debate, perspectives, quick consensus, red team, critique, stress test, adversarial validation, parallel analysis, devil's advocate, threat model, world model, future analysis, test idea, test investment, update models, view models, time horizon, think about, figure out, experiment, iterate, science, hypothesis, define goal, design experiment, quick diagnosis, structured investigation, full cycle.

telos

11146
from danielmiessler/Personal_AI_Infrastructure

Life OS and project analysis. USE WHEN TELOS, life goals, projects, dependencies, books, movies. SkillSearch('telos') for docs.