image-scraper

Scrape and download all images from a given URL. Takes a URL, extracts image URLs from the page, and downloads them. Uses python3/curl as primary method, falls back to browser automation if needed. Use when user provides a URL and wants to download images from that page.

33 stars

byaAAaqwq

View on GitHub Installation ↓

Best use case

image-scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using image-scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image-scraper/SKILL.md --create-dirs "https://raw.githubusercontent.com/aAAaqwq/AGI-Super-Team/main/skills/image-scraper/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/image-scraper/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How image-scraper Compares

Feature / Agent	image-scraper	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Image Scraper

Scrape all images from a given URL and download them locally.

## Method 1: Python3 (Primary - Zero Dependency)

```python
#!/usr/bin/env python3
"""Download all images from a URL."""
import sys
import os
import re
import urllib.request
import urllib.error
from html.parser import HTMLParser

class ImageParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.images = []
    def handle_starttag(self, tag, attrs):
        if tag == 'img':
            for attr, val in attrs:
                if attr == 'src' and val:
                    self.images.append(val)
        if tag == 'source':
            for attr, val in attrs:
                if attr == 'src' and val:
                    self.images.append(val)

def scrape_images(url, output_dir="images"):
    os.makedirs(output_dir, exist_ok=True)
    req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    with urllib.request.urlopen(req, timeout=15) as resp:
        html = resp.read().decode('utf-8', errors='ignore')
    parser = ImageParser()
    parser.feed(html)
    # Deduplicate and filter
    seen = set()
    urls = []
    for img in parser.images:
        if img.startswith('//'):
            img = 'https:' + img
        if img.startswith('http') and img not in seen:
            seen.add(img)
            urls.append(img)
    print(f"Found {len(urls)} images")
    for i, img_url in enumerate(urls):
        try:
            ext = os.path.splitext(img_url.split('?')[0])[1] or '.jpg'
            fname = f"{output_dir}/img_{i:03d}{ext}"
            urllib.request.urlretrieve(img_url, fname)
            print(f"  [{i+1}] {fname}")
        except Exception as e:
            print(f"  [{i+1}] FAILED: {e}")
    return urls

if __name__ == "__main__":
    url = sys.argv[1] if len(sys.argv) > 1 else input("URL: ")
    scrape_images(url)
```

**Usage:**
```bash
python3 /path/to/image-scraper.py "https://example.com/article"
```

## Method 2: Curl + Grep (Minimal)

```bash
# Extract image URLs and download
curl -sL "URL" | grep -oP 'https?://[^"]+\.(jpg|jpeg|png|webp|gif)' | sort -u | head -20 | while read url; do
  curl -sL "$url" -o "images/$(echo $url | md5sum | cut -d' ' -f1).${url##*.}"
done
```

## Method 3: Browser Automation (Fallback)

Use OpenClaw's browser tool when the page is JavaScript-rendered or Method 1 fails.

```bash
# 1. Open page in browser
browser(action=open, url="URL")

# 2. Get page content and extract images via JavaScript
browser(action=act, targetId="TAB_ID", request={
  "kind": "evaluate",
  "fn": "() => Array.from(document.querySelectorAll('img')).map(img => img.src)"
})

# 3. Download each image with curl
```

## Decision Flow

1. **Try Method 1** (python3) first — handles most static pages
2. **If 403/blocked**: Try adding headers (`Referer`, `Accept`)
3. **If JS-rendered or paywalled**: Use Method 3 (browser)
4. **Always** print the downloaded file paths

## Output

- Images saved to `./images/` by default
- Named `img_000.jpg`, `img_001.png`, etc.
- Report: "Downloaded N images to images/"

## Notes

- Only downloads images from the given URL, not full site
- Filters out tracking pixels and tiny icons (width/height < 50px optionally)
- Respects robots.txt implicitly (no enforcement)
- For Twitter/X: browser method may be needed due to JS rendering

Related Skills

telegram-scraper-run

from aAAaqwq/AGI-Super-Team

Automatic Telegram scraping

relay-image-gen

from aAAaqwq/AGI-Super-Team

Multi-provider image generation with automatic priority fallback

image-enhancer

from aAAaqwq/AGI-Super-Team

Improves the quality of images, especially screenshots, by enhancing resolution, sharpness, and clarity. Perfect for preparing images for presentations, documentation, or social media posts.

baoyu-xhs-images

from aAAaqwq/AGI-Super-Team

Generates Xiaohongshu (Little Red Book) infographic series with 10 visual styles and 8 layouts. Breaks content into 1-10 cartoon-style images optimized for XHS engagement. Use when user mentions "小红书图片", "XHS images", "RedNote infographics", "小红书种草", or wants social media infographics for Chinese platforms.

apify-ultimate-scraper

from aAAaqwq/AGI-Super-Team

Universal AI-powered web scraper for any platform. Scrape data from Instagram, Facebook, TikTok, YouTube, LinkedIn, X/Twitter, Google Maps, Google Search, Google Trends, Reddit, Airbnb, Yelp, and 15+ more platforms. Use for lead generation, brand monitoring, competitor analysis, influencer discovery, trend research, content analytics, audience analysis, review analysis, SEO intelligence, recruitment, or any data extraction task.

zimage-skill

from aAAaqwq/AGI-Super-Team

Generate images using ModelScope Z-Image-Turbo API. Use when user asks to generate, create, or make images, pictures, or illustrations.

wemp-operator

from aAAaqwq/AGI-Super-Team

> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装

Content & Documentation

zsxq-smart-publish

from aAAaqwq/AGI-Super-Team

Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.

zoom-automation

from aAAaqwq/AGI-Super-Team

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

zoho-crm-automation

from aAAaqwq/AGI-Super-Team

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ziliu-publisher

from aAAaqwq/AGI-Super-Team

字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台（公众号/知乎/小红书/B站/抖音/微博/X等）。当用户需要多平台发布、内容排版、格式适配时使用。触发词：字流、ziliu、多平台发布、一键分发、内容分发、排版发布。

zhihu-post-skill

from aAAaqwq/AGI-Super-Team

> 知乎文章发布——知乎平台内容创作与发布自动化