shot-scraper

Automated web scraping using shot-scraper (Playwright-based CLI) via GitHub Actions to extract structured data from websites and export to JSON/SQLite for Datasette. Use when users need to periodically scrape web data, set up automated data collection workflows, extract structured information from pages using JavaScript execution, or create data pipelines that run on schedules.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

shot-scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using shot-scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/shot-scraper/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/shot-scraper/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/shot-scraper/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How shot-scraper Compares

Feature / Agent	shot-scraper	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# shot-scraper

## Quick Start

Extract data by executing JavaScript on a webpage and save as JSON:

```bash
# Install
pip install shot-scraper sqlite-utils
shot-scraper install

# Scrape data
shot-scraper javascript https://example.com/news "({
  articles: Array.from(document.querySelectorAll('article')).map(a => ({
    title: a.querySelector('h2')?.innerText,
    link: a.querySelector('a')?.href,
    date: a.querySelector('.date')?.innerText
  })),
  timestamp: new Date().toISOString()
})" -o data.json

# Import to SQLite (Datasette format)
sqlite-utils insert data.db articles data.json --pk=link --alter
```

## Core Workflow

1. **Write JavaScript extractor** - Returns JSON object or array
2. **Run via GitHub Actions** - Scheduled or on-demand
3. **Import to SQLite** - Use `sqlite-utils` for Datasette compatibility
4. **Commit results** - Track data changes over time

## JavaScript Execution

### Basic extraction
```bash
shot-scraper javascript URL "document.title"
shot-scraper javascript URL -i script.js -o output.json
```

### Return JSON objects
```javascript
// Wrap object literals in parentheses
({
  title: document.title,
  items: Array.from(document.querySelectorAll('.item')).map(i => i.innerText)
})
```

### Handle dynamic content
```javascript
new Promise(done => {
  setTimeout(() => {
    done({ data: document.querySelector('.dynamic')?.innerText });
  }, 2000);
});
```

### Common options
- `--wait MILLISECONDS` - Wait before executing
- `--timeout MILLISECONDS` - Max execution time
- `--auth FILE` - Authentication context
- `--user-agent STRING` - Custom user agent
- `--log-console` - Show console.log output

## GitHub Actions Integration

Basic scheduled scraper workflow:

```yaml
name: Scrape Website
on:
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM
  workflow_dispatch:

jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      
      - name: Install tools
        run: |
          pip install shot-scraper sqlite-utils
          shot-scraper install
      
      - name: Scrape data
        run: |
          shot-scraper javascript https://example.com \
            -i scrape.js \
            -o data/output-$(date +%Y%m%d).json
      
      - name: Import to database
        run: |
          sqlite-utils insert data/scraper.db records \
            data/output-*.json \
            --pk=id --alter
      
      - name: Commit results
        run: |
          git config user.name "Bot"
          git config user.email "bot@example.com"
          git add data/
          git commit -m "Update $(date +%Y-%m-%d)" || exit 0
          git push
```

## Data Export Patterns

### Single record per run
```javascript
({ timestamp: new Date().toISOString(), total: document.querySelectorAll('.item').length })
```
```bash
sqlite-utils insert data.db snapshots output.json --alter
```

### Multiple records per run
```javascript
Array.from(document.querySelectorAll('.product')).map(p => ({
  id: p.dataset.id,
  name: p.querySelector('.name')?.innerText,
  price: parseFloat(p.querySelector('.price')?.innerText.replace(/[^0-9.]/g, '')),
  timestamp: new Date().toISOString()
}))
```
```bash
sqlite-utils insert data.db products output.json --pk=id --replace
```

### With nested data
```bash
# Flatten nested objects
sqlite-utils insert data.db items output.json --flatten --alter
```

## Common Patterns

**Extract tables:**
```javascript
Array.from(document.querySelectorAll('table tr')).map(row => {
  const cells = row.querySelectorAll('td');
  return { col1: cells[0]?.innerText, col2: cells[1]?.innerText };
})
```

**Parse prices/numbers:**
```javascript
parseFloat(text.replace(/[^0-9.]/g, ''))
```

**Wait for element:**
```javascript
new Promise(done => {
  const check = setInterval(() => {
    const el = document.querySelector('.target');
    if (el) { clearInterval(check); done({ data: el.innerText }); }
  }, 100);
  setTimeout(() => { clearInterval(check); done({ error: 'timeout' }); }, 10000);
});
```

## sqlite-utils Commands

```bash
# Insert with primary key
sqlite-utils insert db.db table data.json --pk=id

# Replace existing records
sqlite-utils insert db.db table data.json --pk=id --replace

# Auto-alter schema to fit new data
sqlite-utils insert db.db table data.json --alter

# Create indexes for Datasette
sqlite-utils create-index db.db table column_name

# Query data
sqlite-utils query db.db "SELECT * FROM table" --csv
```

## Advanced Topics

**Authentication:** Store auth.json as GitHub secret, write to file in workflow:
```yaml
env:
  AUTH_JSON: ${{ secrets.AUTH_JSON }}
run: |
  echo "$AUTH_JSON" > auth.json
  shot-scraper javascript URL -a auth.json -i script.js -o data.json
```

**Error handling:** Add retry logic to workflow (see [workflows.md](references/workflows.md))

**Multiple pages:** Process URL list (see [examples.md](references/examples.md))

**Screenshots:** Take alongside data for verification:
```bash
shot-scraper URL -o screenshot.png
shot-scraper javascript URL -i script.js -o data.json
```

## References

- **[examples.md](references/examples.md)** - Complete workflow examples (price tracking, monitoring, error handling)
- **[scraping-patterns.md](references/scraping-patterns.md)** - JavaScript patterns for common scraping scenarios
- **[workflows.md](references/workflows.md)** - Advanced GitHub Actions patterns

## External Resources

- shot-scraper: https://github.com/simonw/shot-scraper
- sqlite-utils: https://sqlite-utils.datasette.io/
- Datasette: https://datasette.io/

Related Skills

screenshots

from diegosouzapw/awesome-omni-skill

Generate marketing screenshots of your app using Playwright. Use when the user wants to create screenshots for Product Hunt, social media, landing pages, or documentation.

screenshot

from diegosouzapw/awesome-omni-skill

Capture screenshots of application windows

firecrawl-scraper

from diegosouzapw/awesome-omni-skill

Extract property data from real estate listing URLs using Firecrawl AI. Use when scraping Zillow, Redfin, Realtor.com, or any property listing site. Returns structured data ready for video generation.

update-screenshots

from diegosouzapw/awesome-omni-skill

Download screenshot baselines from the latest CI run and commit them. Use when asked to update, accept, or refresh component screenshot baselines from CI, or after the screenshot-test GitHub Action reports differences. This skill should be run as a subagent.

alma-scraper

from diegosouzapw/awesome-omni-skill

Intelligent scraper for Australian youth justice sources. Discovers, extracts, and learns from government, Indigenous, research, and media sources.

android-screenshot-automation

from diegosouzapw/awesome-omni-skill

Setup automated screenshot capture for Play Store using Fastlane Screengrab

x-twitter-scraper

from diegosouzapw/awesome-omni-skill

X API & Twitter scraper skill for AI coding agents. Builds integrations with the Xquik REST API, MCP server & webhooks: tweet search, user lookup, follower extraction, engagement metrics, giveaway contest draws, trending topics, account monitoring, reply/retweet/quote extraction, community & Space data, mutual follow checks. Works with Claude Code, Cursor, Codex, Copilot, Windsurf & 40+ agents.

apify-ultimate-scraper

from diegosouzapw/awesome-omni-skill

Universal AI-powered web scraper for any platform. Scrape data from Instagram, Facebook, TikTok, YouTube, Google Maps, Google Search, Google Trends, Booking.com, and TripAdvisor. Use for lead gener...

agentuity-cli-cloud-sandbox-snapshot-tag

from diegosouzapw/awesome-omni-skill

Add or update a tag on a snapshot. Requires authentication. Use for Agentuity cloud platform operations

agentuity-cli-cloud-sandbox-snapshot-get

from diegosouzapw/awesome-omni-skill

Get snapshot details. Requires authentication. Use for Agentuity cloud platform operations

agentuity-cli-cloud-sandbox-snapshot-create

from diegosouzapw/awesome-omni-skill

Create a snapshot from a sandbox. Requires authentication. Use for Agentuity cloud platform operations

agentuity-cli-cloud-sandbox-snapshot-delete

from diegosouzapw/awesome-omni-skill

Delete a snapshot. Requires authentication. Use for Agentuity cloud platform operations