apify-skill

Run web scrapers and extract data from websites and social media platforms using Apify actors. Supports Instagram, TikTok, Twitter/X, LinkedIn, Facebook, YouTube, Google Search, and general web crawling.

100 stars

bytrohitg

View on GitHub Installation ↓

Best use case

apify-skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using apify-skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/apify-skill/SKILL.md --create-dirs "https://raw.githubusercontent.com/trohitg/MachinaOS/main/server/skills/web_agent/apify-skill/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/apify-skill/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How apify-skill Compares

Feature / Agent	apify-skill	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Apify Web Scraping Skill

Run pre-built web scrapers (Actors) to extract data from websites and social media platforms.

## How It Works

This skill provides instructions for the **Apify Actor** tool node. Connect the **Apify Actor** node to Zeenie's `input-tools` handle to enable web scraping capabilities.

## apify_run_actor Tool

Run any Apify actor and retrieve scraped results.

### Schema Fields

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| actor_id | string | Yes | Actor ID from Apify Store (e.g., `apify/instagram-scraper`) |
| input_json | string | No | Actor input as JSON string (default: `{}`) |
| max_results | integer | No | Maximum items to return (default: 100) |

### Popular Actors

| Platform | Actor ID | Use Case |
|----------|----------|----------|
| Instagram | `apify/instagram-scraper` | Profiles, posts, hashtags, comments |
| TikTok | `clockworks/tiktok-scraper` | Videos, profiles, trends, hashtags |
| Twitter/X | `apidojo/tweet-scraper` | Tweets, profiles, search results |
| LinkedIn | `curious_coder/linkedin-profile-scraper` | Profiles, companies |
| Facebook | `apify/facebook-posts-scraper` | Posts, pages, groups |
| YouTube | `apify/youtube-scraper` | Videos, channels, comments |
| Google Search | `apify/google-search-scraper` | SERP results, organic listings |
| Google Maps | `apify/google-maps-scraper` | Places, reviews, business info |
| Web Crawler | `apify/website-content-crawler` | Any website content |
| Web Scraper | `apify/web-scraper` | Custom scraping with selectors |

### Response Format

```json
{
  "run_id": "abc123xyz",
  "actor_id": "apify/instagram-scraper",
  "status": "SUCCEEDED",
  "items": [
    { "id": "123", "text": "Post content...", "likes": 500 }
  ],
  "item_count": 50,
  "dataset_id": "dataset-id-here",
  "compute_units": 0.5,
  "started_at": "2026-02-23T10:00:00Z",
  "finished_at": "2026-02-23T10:02:30Z"
}
```

### Examples

**Scrape Instagram profile:**
```json
{
  "actor_id": "apify/instagram-scraper",
  "input_json": "{\"directUrls\": [\"https://instagram.com/natgeo\"], \"resultsLimit\": 50}",
  "max_results": 50
}
```

**Search TikTok hashtag:**
```json
{
  "actor_id": "clockworks/tiktok-scraper",
  "input_json": "{\"hashtags\": [\"trending\", \"fyp\"], \"resultsPerPage\": 30}",
  "max_results": 30
}
```

**Search Twitter/X:**
```json
{
  "actor_id": "apidojo/tweet-scraper",
  "input_json": "{\"searchTerms\": [\"AI automation\"], \"maxItems\": 100}",
  "max_results": 100
}
```

**Google Search:**
```json
{
  "actor_id": "apify/google-search-scraper",
  "input_json": "{\"queries\": \"best AI tools 2026\", \"maxPagesPerQuery\": 3}",
  "max_results": 30
}
```

**Crawl website content:**
```json
{
  "actor_id": "apify/website-content-crawler",
  "input_json": "{\"startUrls\": [{\"url\": \"https://docs.example.com\"}], \"maxCrawlDepth\": 2, \"maxCrawlPages\": 50}",
  "max_results": 50
}
```

**Scrape Google Maps places:**
```json
{
  "actor_id": "apify/google-maps-scraper",
  "input_json": "{\"searchStringsArray\": [\"restaurants in San Francisco\"], \"maxCrawledPlaces\": 20}",
  "max_results": 20
}
```

## Common Input Parameters by Actor

### Instagram Scraper
| Parameter | Type | Description |
|-----------|------|-------------|
| directUrls | array | Instagram URLs to scrape |
| resultsLimit | number | Max results per URL (1-1000) |
| maxComments | number | Comments per post (0-100) |
| searchType | string | Type of search: `hashtag`, `user`, `place` |

### TikTok Scraper
| Parameter | Type | Description |
|-----------|------|-------------|
| profiles | array | TikTok usernames to scrape |
| hashtags | array | Hashtags to scrape (without #) |
| resultsPerPage | number | Results per request |

### Twitter/X Scraper
| Parameter | Type | Description |
|-----------|------|-------------|
| searchTerms | array | Search queries or hashtags |
| twitterHandles | array | Usernames (without @) |
| maxItems | number | Maximum tweets to fetch |

### Google Search Scraper
| Parameter | Type | Description |
|-----------|------|-------------|
| queries | string | Search query |
| maxPagesPerQuery | number | Pages to scrape (10 results/page) |
| languageCode | string | Language filter (e.g., `en`) |
| countryCode | string | Country filter (e.g., `us`) |

### Website Content Crawler
| Parameter | Type | Description |
|-----------|------|-------------|
| startUrls | array | URLs to crawl from (objects with `url` key) |
| maxCrawlDepth | number | Link depth (0 = start URLs only) |
| maxCrawlPages | number | Total pages limit |

## Error Responses

**Actor not found:**
```json
{
  "success": false,
  "error": "Actor 'invalid/actor-id' not found"
}
```

**Timeout:**
```json
{
  "success": false,
  "error": "Actor run timed out after 300 seconds"
}
```

**Run failed:**
```json
{
  "success": false,
  "error": "Actor run failed with status FAILED"
}
```

## Run Status Values

| Status | Meaning |
|--------|---------|
| SUCCEEDED | Run completed successfully |
| FAILED | Run encountered an error |
| ABORTED | Run was manually stopped |
| TIMED-OUT | Run exceeded timeout limit |
| RUNNING | Run still in progress |

## Use Cases

| Use Case | Actor | Description |
|----------|-------|-------------|
| Social listening | tweet-scraper | Monitor brand mentions |
| Competitor research | instagram-scraper | Analyze competitor posts |
| Lead generation | linkedin-profile-scraper | Extract business contacts |
| SEO research | google-search-scraper | Analyze SERP rankings |
| Content aggregation | website-content-crawler | Collect articles from sites |
| Price monitoring | web-scraper | Track product prices |
| Review analysis | google-maps-scraper | Gather customer reviews |

## Guidelines

1. **Actor IDs**: Use format `username/actor-name` from Apify Store
2. **Input JSON**: Must be valid JSON string with actor-specific parameters
3. **Timeouts**: Default 300 seconds, configurable on the node
4. **Max Results**: Limits items returned (not items scraped)
5. **Memory**: Higher memory = faster execution, higher cost

## Pricing Notes

Apify uses compute units (CU) based on memory and duration:
- CU = Memory (GB) x Duration (hours)
- Free tier: $5/month credits
- Check actor pricing on Apify Store before large scrapes

## Setup Requirements

1. Connect the **Apify Actor** node to Zeenie's `input-tools` handle
2. Configure your Apify API token in Credentials Modal
3. Select an actor or enter custom actor ID
4. Provide actor-specific input as JSON

## Security Notes

1. Respect website terms of service
2. Use reasonable scraping rates to avoid IP bans
3. Don't scrape personal data without consent
4. Store scraped data securely
5. Comply with GDPR and data protection laws

Related Skills

serper-search-skill

100

from trohitg/MachinaOS

Search the web using Serper API for Google-powered search results including web, news, images, and places.

proxy-config-skill

100

from trohitg/MachinaOS

Configure residential proxy providers and make proxied HTTP requests with geo-targeting.

perplexity-search-skill

100

from trohitg/MachinaOS

Search the web using Perplexity Sonar AI for synthesized answers with citations, related questions, and optional images.

http-request-skill

100

from trohitg/MachinaOS

Make HTTP requests to external APIs and web services. Supports GET, POST, PUT, DELETE, PATCH methods with headers and JSON body.

duckduckgo-search-skill

100

from trohitg/MachinaOS

Search the web using DuckDuckGo for free, privacy-focused results with no API key required.

crawlee-scraper-skill

100

from trohitg/MachinaOS

Read and extract content from any web page URL.

browser-skill

100

from trohitg/MachinaOS

Interactive browser automation - navigate, click, type, fill forms, take screenshots, get accessibility snapshots. Supports system Chrome/Edge via auto-detection.

brave-search-skill

100

from trohitg/MachinaOS

Search the web using Brave Search API for privacy-focused, independent search results with no tracking.

nearby-places-skill

100

from trohitg/MachinaOS

Search for nearby places like restaurants, cafes, stores, and services using Google Places API. Find places by type and location.

shell-skill

100

from trohitg/MachinaOS

Execute short-lived shell commands in a sandboxed environment. No PATH access -- use process_manager for npm/python/node commands.

process-manager-skill

100

from trohitg/MachinaOS

Start, stop, and manage long-running processes with full system PATH. Use for npm, python, node, dev servers, watchers, build tools. Destructive file commands blocked.

powershell-skill

100

from trohitg/MachinaOS

Windows PowerShell commands and patterns for process management, file operations, and system administration.