px-asset-extract

Extracts individual assets like text, illustrations, and icons from images (slides, posters, infographics) as transparent PNGs with a JSON manifest, using pure classical computer vision.

6 stars
Complexity: medium

About this skill

This skill, `px-asset-extract`, is designed to decompose complex visual documents such as slides, posters, infographics, and diagrams into their constituent individual assets. It achieves this by employing a pipeline of classical computer vision techniques (PIL+numpy), including background detection, foreground mask generation, connected component analysis, heuristic classification, and anti-aliased alpha extraction. The process automatically segments, classifies elements into categories like text, illustration, icon, graphic, line, dot, diagram, shadow, and element, and then crops each element as a transparent PNG. The skill is particularly useful for tasks requiring the isolation and classification of visual components for further use, such as design repurposing, content analysis, or generating datasets from visual documents. It operates efficiently on CPU, typically completing tasks in 2-6 seconds, and supports filtering extracted assets by type or excluding certain types. It can also be integrated with visual grounding models by providing pre-computed regions, expanding its utility for more targeted extraction based on descriptive queries. Users would leverage this skill when they need clean, individual, and classified elements from a larger image without relying on large machine learning models, which ensures performance and potentially simpler deployment. The output includes a JSON manifest detailing the properties of each extracted asset, facilitating automated workflows and data management.

Best use case

The primary use case for `px-asset-extract` is to automatically segment and extract various elements (like text, illustrations, icons, diagrams) from visual documents such as slides, posters, infographics, or diagrams. Designers, researchers, and developers working with visual content can benefit most by obtaining clean, transparent PNGs of each individual element, making it easier to repurpose, analyze, or integrate these components into other projects or systems.

Extracts individual assets like text, illustrations, and icons from images (slides, posters, infographics) as transparent PNGs with a JSON manifest, using pure classical computer vision.

The user should expect a collection of individual transparent PNG image files, each containing a segmented asset from the original image, along with a JSON manifest detailing the classification and properties of each extracted asset.

Practical example

Example input

extract assets from this image

Example output

A compressed archive containing `asset_1_text.png`, `asset_2_illustration.png`, `asset_3_icon.png`, and a `manifest.json` file listing details for all extracted elements.

When to use this skill

  • To extract all individual elements from a slide, poster, or infographic.
  • To obtain only specific types of assets, such as illustrations or icons, while skipping text or other elements.
  • When you need to decompose images with multiple visual components into individual transparent PNGs.
  • When a user has an image and explicitly requests individual transparent PNGs of each element or asset.

When not to use this skill

  • For general background removal from a single, dominant subject photo (e.g., a product image).
  • To extract specific objects solely based on a natural language description, without an external visual grounding model or pre-computed regions.

How px-asset-extract Compares

Feature / Agentpx-asset-extractStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Extracts individual assets like text, illustrations, and icons from images (slides, posters, infographics) as transparent PNGs with a JSON manifest, using pure classical computer vision.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# px-asset-extract: Image Asset Extraction

## What It Does

Decomposes images into individual transparent PNG assets with classification and a JSON manifest.
The full pipeline runs in 2-6 seconds on CPU with zero ML models:

1. **Background detection** — median color from image borders
2. **Foreground mask** — Euclidean color distance thresholding
3. **Character bridging** — dilation connects letters into words
4. **Connected components** — union-find with 8-connectivity
5. **Classification** — heuristic typing into 10 categories
6. **Text-line merging** — groups word fragments into text lines
7. **Alpha extraction** — anti-aliased transparent cropping
8. **Deduplication** — removes overlapping and oversized segments

## When to Use This

| Scenario | Use px-asset-extract? |
|----------|----------------------|
| Extract all elements from a slide/poster | Yes — this is the primary use case |
| Get only illustrations, skip text | Yes — use `--types illustration` or `--exclude-types text` |
| Extract specific objects by description | Use with `--regions` + a grounding model (e.g., Florence-2) |
| Remove background from a single photo | No — use a background removal model instead |
| Segment a photo scene | No — use SAM/FastSAM for photographic content |
| Image has textured/photographic background | Limited — works best on clean/solid backgrounds |

## Installation

```bash
git clone https://github.com/JadeLiu-tech/px-asset-extract.git
cd px-asset-extract
pip install .
```

## Usage

### CLI

```bash
# Basic extraction
px-extract <image> -o <output_dir>

# Only extract illustrations and icons
px-extract <image> -o <output_dir> --types illustration,icon

# Extract everything except text and dots
px-extract <image> -o <output_dir> --exclude-types text,dot,line

# Extract from pre-computed bounding boxes (e.g. from px-ground)
px-extract <image> -o <output_dir> --regions regions.json

# Segment only — output JSON, no PNGs
px-extract <image> --segments-only

# Batch processing
px-extract images/*.png -o output/ --batch

# JSON output to stdout
px-extract <image> -o <output_dir> --json --quiet
```

### Python API

```python
from px_asset_extract import extract_assets, load_regions

# Full extraction
result = extract_assets("slide.png", output_dir="assets/")
for asset in result.assets:
    print(f"{asset.id}: {asset.label} at ({asset.bbox.x}, {asset.bbox.y}) -> {asset.file_path}")

# Type filtering
result = extract_assets("slide.png", output_dir="icons/", types=["illustration", "icon"])
result = extract_assets("slide.png", output_dir="graphics/", exclude_types=["text", "line", "dot"])

# Pre-computed regions (from grounding model output)
regions = load_regions("grounded.json")
result = extract_assets("slide.png", output_dir="targeted/", regions=regions)

# Combine regions + type filter
result = extract_assets("slide.png", output_dir="charts/", regions=regions, types=["chart"])
```

## CLI Options

| Option | Default | Description |
|--------|---------|-------------|
| `-o`, `--output` | `assets` | Output directory |
| `--bg-threshold` | `22.0` | Background color distance (lower = more sensitive) |
| `--min-area` | `60` | Minimum segment area in pixels |
| `--dilation` | `2` | Character gap bridging passes |
| `--padding` | `10` | Extra pixels around each asset |
| `--max-coverage` | `0.5` | Max fraction of image a segment can cover |
| `--types` | | Only extract these types (comma-separated) |
| `--exclude-types` | | Skip these types (comma-separated) |
| `--regions` | | JSON file with bounding boxes (skips segmentation) |
| `--segments-only` | | Output segment JSON without extracting PNGs |
| `--no-visualization` | | Skip visualization image |
| `--batch` | | Create subdirectories per image |
| `--json` | | Output results as JSON to stdout |
| `--quiet` | | Suppress progress messages |

## Output

Each run produces:
- `asset_NNN_<type>.png` — individual transparent PNGs
- `manifest.json` — positions, types, and metadata for all assets
- `visualization.png` — input image with color-coded bounding boxes

### Manifest format

```json
{
  "source_image": "slide.png",
  "source_size": {"width": 1920, "height": 1080},
  "background_color": [255, 255, 255],
  "num_assets": 44,
  "assets": [
    {
      "id": "asset_000_illustration",
      "label": "illustration",
      "file": "asset_000_illustration.png",
      "position": {"x": 100, "y": 50, "width": 400, "height": 300},
      "pixel_area": 120000
    }
  ]
}
```

### Regions JSON format (for --regions)

```json
[
  {"x": 100, "y": 50, "width": 400, "height": 300, "label": "chart"},
  {"x1": 600, "y1": 100, "x2": 800, "y2": 300, "label": "logo"}
]
```

Also supports `{"regions": [...]}` wrapper. Label defaults to `"region"` if omitted.

## Asset Types

| Type | Detection Logic |
|------|----------------|
| `text` | dark_ratio > 0.4, uniform ink color |
| `illustration` | Large (>1% image area), colorful |
| `icon` | Small (<3000px area, <60px max dimension) |
| `graphic` | Medium-sized, colored |
| `line` | Thin (min dimension <=5px, extreme aspect ratio) |
| `dot` | Very small (<150px area, <20px dimension) |
| `diagram` | Low fill ratio (<0.25) |
| `diagram_network` | Spans >80% of image, very low fill |
| `shadow` | Bright (>200), low contrast, low saturation |
| `element` | Catch-all for unclassified objects |

## Performance

| Image type | Assets | Time |
|-----------|--------|------|
| Presentation slide | 22-44 | 2-6s |
| Poster | 11 | 3.9s |
| Scientific diagram | 43 | 4.2s |
| Technical diagram | 42 | 4.5s |
| Data chart | 26 | 4.8s |

## Dependencies

Only `Pillow` and `numpy`. Optional `opencv-python` for better alpha edges.

Related Skills

alphashop-image

3891
from openclaw/skills

AlphaShop(遨虾)图像处理 API 工具集。支持11个接口:图片翻译、图片翻译PRO、 图片高清放大、图片主题抠图、图片元素识别、图片元素智能消除、图像裁剪、 虚拟试衣(创建+查询)、模特换肤(创建+查询)。 触发场景:图片翻译、翻译图片文字、放大图片、高清放大、抠图、去背景、 检测水印/Logo/文字、消除水印、去牛皮癣、裁剪图片、虚拟试衣、AI试衣、 模特换肤、换模特、AlphaShop图像、遨虾图片处理。

Image Processing & Analysis

keyword-extractor

31392
from sickn33/antigravity-awesome-skills

Extracts up to 50 highly relevant SEO keywords from text. Use when user wants to generate or extract keywords for given text.

Text AnalysisClaude

bdistill-knowledge-extraction

31392
from sickn33/antigravity-awesome-skills

Extract structured domain knowledge from AI models in-session or from local open-source models via Ollama. No API key needed.

AI Research & Knowledge ManagementClaudeCursorCodex

security-requirement-extraction

31392
from sickn33/antigravity-awesome-skills

Derive security requirements from threat models and business context. Use when translating threats into actionable requirements, creating security user stories, or building security test cases.

java-refactoring-extract-method

28865
from github/awesome-copilot

Refactoring using Extract Methods in Java Language

screenshot-feature-extractor

24269
from davila7/claude-code-templates

Analyze product screenshots to extract feature lists and generate development task checklists. Use when: (1) Analyzing competitor product screenshots for feature extraction, (2) Generating PRD/task lists from UI designs, (3) Batch analyzing multiple app screens, (4) Conducting competitive analysis from visual references.

competitive-ads-extractor

24269
from davila7/claude-code-templates

Extracts and analyzes competitors' ads from ad libraries (Facebook, LinkedIn, etc.) to understand what messaging, problems, and creative approaches are working. Helps inspire and improve your own ad campaigns.

extract

14855
from pbakaus/impeccable

Extract and consolidate reusable components, design tokens, and patterns into your design system. Identifies opportunities for systematic reuse and enriches your component library. Use when the user asks to create components, refactor repeated UI patterns, build a design system, or extract tokens.

ExtractWisdom

11146
from danielmiessler/Personal_AI_Infrastructure

Content-adaptive wisdom extraction — detects what domains exist in content and builds custom sections (not static IDEAS/QUOTES). Produces tailored insight reports from videos, podcasts, articles. USE WHEN extract wisdom, analyze video, analyze podcast, extract insights, what's interesting, extract from YouTube, what did I miss, key takeaways.

create-an-asset

10671
from anthropics/knowledge-work-plugins

Generate tailored sales assets (landing pages, decks, one-pagers, workflow demos) from your deal context. Describe your prospect, audience, and goal — get a polished, branded asset ready to share with customers.

data-context-extractor

10671
from anthropics/knowledge-work-plugins

Generate or improve a company-specific data analysis skill by extracting tribal knowledge from analysts. BOOTSTRAP MODE - Triggers: "Create a data context skill", "Set up data analysis for our warehouse", "Help me create a skill for our database", "Generate a data skill for [company]" → Discovers schemas, asks key questions, generates initial skill with reference files ITERATION MODE - Triggers: "Add context about [domain]", "The skill needs more info about [topic]", "Update the data skill with [metrics/tables/terminology]", "Improve the [domain] reference" → Loads existing skill, asks targeted questions, appends/updates reference files Use when data analysts want Claude to understand their company's specific data warehouse, terminology, metrics definitions, and common query patterns.

extract

9947
from alirezarezvani/claude-skills

Turn a proven pattern or debugging solution into a standalone reusable skill with SKILL.md, reference docs, and examples.