gallery-scraper

Bulk download images from login-protected gallery websites using an attached browser session. Use when asked to scrape, download, or save images from authenticated gallery pages, extract full-size images from thumbnails, or batch download from multi-page galleries.

224 stars

byjdrhyne

View on GitHub Installation ↓

Best use case

gallery-scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using gallery-scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gallery-scraper/SKILL.md --create-dirs "https://raw.githubusercontent.com/jdrhyne/agent-skills/main/clawdbot/gallery-scraper/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gallery-scraper/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gallery-scraper Compares

Feature / Agent	gallery-scraper	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gallery Scraper

Bulk download images from authenticated gallery websites via browser relay.

## Safety Boundaries

- Do not access gallery sites or user accounts that the user has not explicitly attached and authorized.
- Do not download beyond the selected gallery, profile, or page range without confirmation.
- Do not store cookies, tokens, or hidden form values in local output files.
- Do not keep retrying blocked downloads indefinitely; surface rate limits or auth failures instead.

## Prerequisites

- User must have Chrome with OpenClaw Browser Relay extension
- User must be logged into the target site
- User must attach the browser tab (click relay toolbar button, badge ON)

## Workflow

### 1. Attach Browser Tab

Ask user to:
1. Log into the gallery site in Chrome
2. Navigate to the target gallery/profile page
3. Click the OpenClaw Browser Relay toolbar button (badge shows ON)

### 2. Discover Image URL Pattern

Most gallery sites store full-size URLs in data attributes. Common patterns:

```javascript
// Extract via browser evaluate
() => {
  // Try common patterns
  const patterns = [
    'img[data-max]',           // data-max attribute
    'img[data-src]',           // lazy-load pattern
    'img[data-full]',          // full-size pattern
    'a[data-lightbox] img',    // lightbox galleries
    '.gallery-item img'        // generic gallery
  ];
  
  for (const sel of patterns) {
    const imgs = document.querySelectorAll(sel);
    if (imgs.length > 0) {
      return {
        selector: sel,
        count: imgs.length,
        sample: imgs[0].outerHTML.substring(0, 200)
      };
    }
  }
  return null;
}
```

### 3. Extract Full-Size URLs

Once pattern identified, extract all URLs:

```javascript
// For data-max pattern (common)
() => Array.from(document.querySelectorAll('img[data-max]'))
  .map(img => img.dataset.max)

// For thumbnail→full conversion (replace path segment)
() => Array.from(document.querySelectorAll('.gallery img'))
  .map(img => img.src.replace('/thumb/', '/full/'))
```

### 4. Handle Pagination

Check for multiple pages:

```javascript
() => {
  const pagination = document.querySelectorAll('.pagination a, [class*="page"] a');
  return Array.from(pagination).map(a => ({text: a.textContent, href: a.href}));
}
```

Navigate to each page and collect URLs.

### 4b. Batch scrape multiple galleries (iframe trick)

When you need multiple galleries quickly and can’t automate CDP, you can load each gallery in a hidden iframe and extract `data-max` URLs:

```javascript
async () => {
  const urls = [
    'https://site.example/galleries/view/123',
    'https://site.example/galleries/view/456'
  ];
  const results = [];
  for (const url of urls) {
    const iframe = document.createElement('iframe');
    iframe.style.position = 'fixed';
    iframe.style.left = '-9999px';
    iframe.style.width = '800px';
    iframe.style.height = '600px';
    iframe.src = url;
    document.body.appendChild(iframe);
    await new Promise((resolve, reject) => {
      const t = setTimeout(() => reject(new Error('timeout load')), 20000);
      iframe.onload = () => { clearTimeout(t); resolve(); };
    });
    const doc = iframe.contentDocument;
    const start = Date.now();
    let imgs = [];
    while (Date.now() - start < 20000) {
      imgs = Array.from(doc.querySelectorAll('img[data-max]')).map(i => i.dataset.max);
      if (imgs.length) break;
      await new Promise(r => setTimeout(r, 500));
    }
    results.push({ id: url.split('/').pop(), urls: imgs });
    iframe.remove();
  }
  return results;
}
```

### 5. Check CDN Access

Test if CDN requires authentication or just Referer:

```bash
# Test direct access
curl -I "CDN_URL" 2>/dev/null | head -3

# Test with Referer
curl -I -H "Referer: https://SITE_DOMAIN/" "CDN_URL" 2>/dev/null | head -3
```

### 6. Bulk Download

Collect the URLs into a text file, then parallel download:

```bash
# Create output directory
mkdir -p ~/Downloads/gallery_name

# Download with Referer header (parallel)
cd ~/Downloads/gallery_name
while IFS= read -r url; do
  filename=$(basename "$url")
  curl -s -H "Referer: https://SITE_DOMAIN/" -o "$filename" "$url" &
  [ $(jobs -r | wc -l) -ge 8 ] && wait -n
done < urls.txt
wait
```

**Python ThreadPool fallback (avoids shell quoting + wait -n issues):**

```python
import os
import requests
from concurrent.futures import ThreadPoolExecutor

outdir = os.path.expanduser('~/Downloads/gallery_name')
os.makedirs(outdir, exist_ok=True)
headers = {'Referer': 'https://SITE_DOMAIN/', 'User-Agent': 'Mozilla/5.0'}

with open('urls.txt') as f:
    urls = [line.strip() for line in f if line.strip()]

def download(url):
    filename = os.path.join(outdir, os.path.basename(url))
    if os.path.exists(filename) and os.path.getsize(filename) > 0:
        return
    r = requests.get(url, headers=headers, timeout=60)
    r.raise_for_status()
    with open(filename, 'wb') as f:
        f.write(r.content)

with ThreadPoolExecutor(max_workers=8) as ex:
    for url in urls:
        ex.submit(download, url)
```

## Handling Lock Buttons

Some galleries have "lock" buttons to reveal hidden content. Look for:

```javascript
// Find lock/unlock buttons
() => {
  const locks = document.querySelectorAll(
    '[class*="lock"], [class*="unlock"], ' +
    'button[title*="lock"], .premium-unlock'
  );
  return Array.from(locks).map(el => ({
    tag: el.tagName,
    class: el.className,
    text: el.innerText?.substring(0, 30)
  }));
}
```

Click each lock button before extracting URLs.

## Output Organization

Optionally organize by gallery:

```bash
# Derive a gallery-specific folder name from the selected URL
mkdir -p "gallery_<id>"
```

## Troubleshooting

- **403 Forbidden**: Add Referer header or extract cookies from browser
- **Rate limited**: Reduce parallel downloads, add delays
- **Missing images**: Check for JavaScript-loaded content, may need scroll injection
- **Login required for CDN**: Extract session cookies via `document.cookie`

Related Skills

Zendesk

224

from jdrhyne/agent-skills

Manage Zendesk tickets, users, and support workflows through the Zendesk API. Use when searching tickets, updating support state, checking users, or exporting queue data.

task-orchestrator

224

from jdrhyne/agent-skills

Autonomous multi-agent task orchestration with dependency analysis, parallel tmux/Codex execution, and self-healing heartbeat monitoring. Use for large projects with multiple issues/tasks that need coordinated parallel execution.

sysadmin-toolbox

224

from jdrhyne/agent-skills

Tool discovery and shell one-liner reference for sysadmin, DevOps, and security tasks. AUTO-CONSULT this skill when the user is: troubleshooting network issues, debugging processes, analyzing logs, working with SSL/TLS, managing DNS, testing HTTP endpoints, auditing security, working with containers, writing shell scripts, or asks 'what tool should I use for X'. Source: github.com/trimstray/the-book-of-secret-knowledge

salesforce

224

from jdrhyne/agent-skills

Query and manage Salesforce CRM data via the Salesforce CLI (`sf`). Run SOQL/SOSL queries, inspect object schemas, create/update/delete records, bulk import/export, execute Apex, deploy metadata, and make raw REST API calls.

remotion-best-practices

224

from jdrhyne/agent-skills

Best practices for Remotion - Video creation in React

planner

224

from jdrhyne/agent-skills

Create structured plans for multi-task projects that can be used by the task-orchestrator skill. Use when breaking down complex work into parallel and sequential tasks with dependencies.

parallel-task

224

from jdrhyne/agent-skills

Coordinate plan files by launching multiple parallel subagents for unblocked tasks. Triggers on explicit "/parallel-task" commands.

nutrient-openclaw

224

from jdrhyne/agent-skills

OpenClaw-native document processing skill for Nutrient DWS. Use when OpenClaw users need to convert files, extract text or tables, OCR scans, redact PII, watermark PDFs, digitally sign documents, or check credit usage from chat attachments or workspace files. Triggers on OpenClaw tool names (`nutrient_convert_to_pdf`, `nutrient_extract_text`, etc.), "OpenClaw plugin", "Nutrient OpenClaw", and document-processing requests in OpenClaw chats. Files are processed by Nutrient DWS over the network, so use it only when third-party document processing is acceptable. For non-OpenClaw environments, use the universal Nutrient document-processing skill instead.

nudocs

224

from jdrhyne/agent-skills

Upload, edit, and export documents via Nudocs.ai. Use when creating shareable document links for collaborative editing, uploading markdown/docs to Nudocs for rich editing, or pulling back edited content. Triggers on "send to nudocs", "upload to nudocs", "edit in nudocs", "pull from nudocs", "get the nudocs link", "show my nudocs documents".

last30days

224

from jdrhyne/agent-skills

Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.

jira

224

from jdrhyne/agent-skills

Use when the user mentions Jira issues (e.g., "PROJ-123"), asks about tickets, wants to create/view/update issues, check sprint status, or manage their Jira workflow. Triggers on keywords like "jira", "issue", "ticket", "sprint", "backlog", or issue key patterns.

gsc

224

from jdrhyne/agent-skills

Query Google Search Console for SEO data - search queries, top pages, CTR opportunities, URL inspection, and sitemaps. Use when analyzing search performance, finding optimization opportunities, or checking indexing status.