modly-image-to-3d

Desktop app that generates 3D models from images using local AI running entirely on your GPU

3,823 stars

byopenclaw

View on GitHub Installation ↓

Best use case

modly-image-to-3d is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Desktop app that generates 3D models from images using local AI running entirely on your GPU

Teams using modly-image-to-3d should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/modly-image-to-3d/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/adisinghstudent/modly-image-to-3d/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/modly-image-to-3d/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How modly-image-to-3d Compares

Feature / Agent	modly-image-to-3d	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Desktop app that generates 3D models from images using local AI running entirely on your GPU

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Modly Image-to-3D Skill

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

Modly is a local, open-source desktop application (Windows/Linux) that converts photos into 3D mesh models using AI models running entirely on your GPU — no cloud, no API keys required.

---

## Architecture Overview

```
modly/
├── src/                    # Electron + TypeScript frontend
│   ├── main/               # Electron main process
│   ├── renderer/           # React UI (renderer process)
│   └── preload/            # IPC bridge
├── api/                    # Python FastAPI backend
│   ├── generator.py        # Core generation logic
│   └── requirements.txt
├── resources/
│   └── icons/
├── launcher.bat            # Windows quick-start
├── launcher.sh             # Linux quick-start
└── package.json
```

The app runs as an Electron shell over a local Python FastAPI server. Extensions are GitHub repos with a `manifest.json` + `generator.py` that plug into the extension system.

---

## Installation

### Quick start (no build required)

```bash
# Windows
launcher.bat

# Linux
chmod +x launcher.sh
./launcher.sh
```

### Development setup

```bash
# 1. Clone
git clone https://github.com/lightningpixel/modly
cd modly

# 2. Install JS dependencies
npm install

# 3. Set up Python backend
cd api
python -m venv .venv

# Activate (Windows)
.venv\Scripts\activate

# Activate (Linux/macOS)
source .venv/bin/activate

pip install -r requirements.txt
cd ..

# 4. Run dev mode (starts Electron + Python backend)
npm run dev
```

### Production build

```bash
# Build installers for current platform
npm run build

# Output goes to dist/
```

---

## Key npm Scripts

```bash
npm run dev        # Start app in development mode (hot reload)
npm run build      # Package app for distribution
npm run lint       # Run ESLint
npm run typecheck  # TypeScript type checking
```

---

## Extension System

Extensions are GitHub repositories containing:
- `manifest.json` — metadata and model variants
- `generator.py` — generation logic implementing the Modly extension interface

### manifest.json structure

```json
{
  "name": "My 3D Extension",
  "id": "my-extension-id",
  "description": "Generates 3D models using XYZ model",
  "version": "1.0.0",
  "author": "Your Name",
  "repository": "https://github.com/yourname/my-modly-extension",
  "variants": [
    {
      "id": "model-small",
      "name": "Small (faster)",
      "description": "Lighter variant for faster generation",
      "size_gb": 4.2,
      "vram_gb": 6,
      "files": [
        {
          "url": "https://huggingface.co/yourorg/yourmodel/resolve/main/weights.safetensors",
          "filename": "weights.safetensors",
          "sha256": "abc123..."
        }
      ]
    }
  ]
}
```

### generator.py interface

```python
# api/extensions/<extension-id>/generator.py
# Required interface every extension must implement

import sys
import json
from pathlib import Path

def generate(
    image_path: str,
    output_path: str,
    variant_id: str,
    models_dir: str,
    **kwargs
) -> dict:
    """
    Required entry point for all Modly extensions.
    
    Args:
        image_path:  Path to input image file
        output_path: Path where output .glb/.obj should be saved
        variant_id:  Which model variant to use
        models_dir:  Directory where downloaded model weights live
    
    Returns:
        dict with keys:
            success (bool)
            output_file (str) — path to generated mesh
            error (str, optional)
    """
    try:
        # Load your model weights
        weights = Path(models_dir) / variant_id / "weights.safetensors"
        
        # Run your inference
        mesh = run_inference(str(weights), image_path)
        
        # Save output
        mesh.export(output_path)
        
        return {
            "success": True,
            "output_file": output_path
        }
    except Exception as e:
        return {
            "success": False,
            "error": str(e)
        }
```

### Installing an extension (UI flow)

1. Open Modly → go to **Models** page
2. Click **Install from GitHub**
3. Paste the HTTPS URL, e.g. `https://github.com/lightningpixel/modly-hunyuan3d-mini-extension`
4. After install, click **Download** on the desired model variant
5. Select the installed model and upload an image to generate

### Official Extensions

| Extension | Model |
|-----------|-------|
| [modly-hunyuan3d-mini-extension](https://github.com/lightningpixel/modly-hunyuan3d-mini-extension) | Hunyuan3D 2 Mini |

---

## Python Backend API (FastAPI)

The backend runs locally. Key endpoints used by the Electron frontend:

```python
# Typical backend route patterns (api/main.py or similar)

# GET /extensions         — list installed extensions
# GET /extensions/{id}    — get extension details + variants
# POST /extensions/install — install extension from GitHub URL
# POST /generate          — trigger 3D generation
# GET /generate/status    — poll generation progress
# GET /models             — list downloaded model variants
# POST /models/download   — download a model variant
```

### Calling the backend from Electron (IPC pattern)

```typescript
// src/preload/index.ts — exposing backend calls to renderer
import { contextBridge, ipcRenderer } from 'electron'

contextBridge.exposeInMainWorld('modly', {
  generate: (imagePath: string, extensionId: string, variantId: string) =>
    ipcRenderer.invoke('generate', { imagePath, extensionId, variantId }),

  installExtension: (repoUrl: string) =>
    ipcRenderer.invoke('install-extension', { repoUrl }),

  listExtensions: () =>
    ipcRenderer.invoke('list-extensions'),
})
```

```typescript
// src/main/ipc-handlers.ts — main process handling
import { ipcMain } from 'electron'

ipcMain.handle('generate', async (_event, { imagePath, extensionId, variantId }) => {
  const response = await fetch('http://localhost:PORT/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ image_path: imagePath, extension_id: extensionId, variant_id: variantId }),
  })
  return response.json()
})
```

```typescript
// src/renderer/components/GenerateButton.tsx — UI usage
declare global {
  interface Window {
    modly: {
      generate: (imagePath: string, extensionId: string, variantId: string) => Promise<{ success: boolean; output_file?: string; error?: string }>
      installExtension: (repoUrl: string) => Promise<{ success: boolean }>
      listExtensions: () => Promise<Extension[]>
    }
  }
}

async function handleGenerate(imagePath: string) {
  const result = await window.modly.generate(
    imagePath,
    'modly-hunyuan3d-mini-extension',
    'hunyuan3d-mini-turbo'
  )

  if (result.success) {
    console.log('Mesh saved to:', result.output_file)
  } else {
    console.error('Generation failed:', result.error)
  }
}
```

---

## Writing a Custom Extension

### Minimal extension repository structure

```
my-modly-extension/
├── manifest.json
└── generator.py
```

### Example: wrapping a HuggingFace diffusion model

```python
# generator.py
import torch
from PIL import Image
from pathlib import Path

def generate(image_path, output_path, variant_id, models_dir, **kwargs):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    weights_dir = Path(models_dir) / variant_id

    try:
        # Load model (example pattern)
        from your_model_lib import ImageTo3DPipeline
        
        pipe = ImageTo3DPipeline.from_pretrained(
            str(weights_dir),
            torch_dtype=torch.float16
        ).to(device)

        image = Image.open(image_path).convert("RGB")
        
        with torch.no_grad():
            mesh = pipe(image).mesh

        mesh.export(output_path)

        return {"success": True, "output_file": output_path}

    except Exception as e:
        return {"success": False, "error": str(e)}
```

---

## Configuration & Environment

Modly runs fully locally — no environment variables or API keys needed. GPU/CUDA is auto-detected by PyTorch in extensions.

Relevant configuration lives in:

```
package.json          # Electron app metadata, build targets
api/requirements.txt  # Python dependencies for backend
```

If you need to configure the backend port or extension directory, check the Electron main process config (typically `src/main/index.ts`) for constants like `API_PORT` or `EXTENSIONS_DIR`.

---

## Common Patterns

### Check if CUDA is available in an extension

```python
import torch

def get_device():
    if torch.cuda.is_available():
        print(f"Using GPU: {torch.cuda.get_device_name(0)}")
        return "cuda"
    print("No GPU found, falling back to CPU (slow)")
    return "cpu"
```

### Progress reporting from generator.py

```python
import sys
import json

def report_progress(percent: int, message: str):
    """Write progress to stdout so Modly can display it."""
    print(json.dumps({"progress": percent, "message": message}), flush=True)

def generate(image_path, output_path, variant_id, models_dir, **kwargs):
    report_progress(0, "Loading model...")
    # ... load model ...
    report_progress(30, "Processing image...")
    # ... inference ...
    report_progress(90, "Exporting mesh...")
    # ... export ...
    report_progress(100, "Done")
    return {"success": True, "output_file": output_path}
```

### Adding a new page in the renderer (React)

```typescript
// src/renderer/pages/MyPage.tsx
import React, { useEffect, useState } from 'react'

interface Extension {
  id: string
  name: string
  description: string
}

export default function MyPage() {
  const [extensions, setExtensions] = useState<Extension[]>([])

  useEffect(() => {
    window.modly.listExtensions().then(setExtensions)
  }, [])

  return (
    <div>
      <h1>Installed Extensions</h1>
      {extensions.map(ext => (
        <div key={ext.id}>
          <h2>{ext.name}</h2>
          <p>{ext.description}</p>
        </div>
      ))}
    </div>
  )
}
```

---

## Troubleshooting

| Problem | Fix |
|---------|-----|
| `npm run dev` — Python backend not starting | Ensure venv is set up: `cd api && python -m venv .venv && pip install -r requirements.txt` |
| CUDA out of memory | Use a smaller model variant or close other GPU processes |
| Extension install fails | Verify the GitHub URL is HTTPS and the repo contains `manifest.json` at root |
| Generation hangs | Check that your GPU drivers and CUDA toolkit match the PyTorch version in `requirements.txt` |
| App won't launch on Linux | Make `launcher.sh` executable: `chmod +x launcher.sh` |
| Model download stalls | Check disk space; large models (4–10 GB) need adequate free space |
| `torch` not found in extension | Ensure PyTorch is in `api/requirements.txt`, not just the extension's own deps |

### Verifying GPU is detected

```bash
cd api
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'no GPU')"
```

---

## Resources

- **Homepage**: https://modly3d.app
- **Releases**: https://github.com/lightningpixel/modly/releases/latest
- **Official extension**: https://github.com/lightningpixel/modly-hunyuan3d-mini-extension
- **Discord**: https://discord.gg/FjzjRgweVk
- **License**: MIT (attribution required — credit Modly + Lightning Pixel in forks)

Related Skills

alphashop-image

3891

from openclaw/skills

AlphaShop（遨虾）图像处理 API 工具集。支持11个接口：图片翻译、图片翻译PRO、图片高清放大、图片主题抠图、图片元素识别、图片元素智能消除、图像裁剪、虚拟试衣（创建+查询）、模特换肤（创建+查询）。触发场景：图片翻译、翻译图片文字、放大图片、高清放大、抠图、去背景、检测水印/Logo/文字、消除水印、去牛皮癣、裁剪图片、虚拟试衣、AI试衣、模特换肤、换模特、AlphaShop图像、遨虾图片处理。

Image Processing & Analysis

image-gen

3891

from openclaw/skills

Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".

Content & Documentation

minimax-imagegen

3891

from openclaw/skills

Expert image generation skill using MiniMax image-01. Use this skill ANY TIME the user asks to create, generate, make, or produce an image, visual, graphic, banner, illustration, icon, screenshot mockup, hero image, thumbnail, social media asset, app icon, website visual, or any other image — even if they just say "make me a picture of X." This skill should also trigger when the user asks to improve or iterate on a previous image prompt, or when image output would enhance a task (e.g., "I need a hero image for my blog post"). Covers all use cases: website assets for tonyreviewsthings.com and tonysimons.dev, app/software media, marketing visuals, social media content, UI mockups, character/portrait generation, and general creative requests.

image-to-editable-ppt-slide

3891

from openclaw/skills

Rebuild one or more reference images as visually matching editable PowerPoint slides using native shapes, text, fills, and layout instead of a flat screenshot. Use when the user wants an image, flowchart, infographic, dashboard, process diagram, or designed slide converted into an editable PPT/PPTX deck that stays editable and closely matches the source.

openrouter-image-generation

3891

from openclaw/skills

Generate or edit images through OpenRouter's multimodal image generation endpoint (`/api/v1/chat/completions`) using OpenRouter-compatible image models. Use for text-to-image or image-to-image requests when the user wants OpenRouter, `OPENROUTER_API_KEY`, model overrides, or provider-specific `image_config` options.

save-article-with-images

3891

from openclaw/skills

Save web articles locally with images. Automatically downloads images, generates Markdown, and converts to PDF. Supports WeChat Official Account articles via subagent isolation. Triggers: save article, save this article, download article, clip article, wechat article.

blog-image-claw-skill

3891

from openclaw/skills

Generate ai blog image generator images with AI via the Neta AI image generation API (free trial at neta.art/open).

image-review

3891

from openclaw/skills

用户说评价、改进、优化图片时触发。

generate-image

3891

from openclaw/skills

用户请求画图时触发。

modelscope-image-gen

3891

from openclaw/skills

通过魔搭社区(ModelScope) API 生成图片。先使用 --list-models 查看可用模型，然后根据用户需求由 AI 生成专业的提示词，最后调用 API 生成图片。支持 Kolors、Stable Diffusion XL、FLUX 等多种文生图模型。当用户需要使用魔搭社区、ModelScope 或中文 AI 模型生成图片时使用此技能。

keevx-image-to-video

3891

from openclaw/skills

Use the Keevx API to convert images to videos. Supports multiple models (V/KL), various resolutions (720p/1080p/4K), and audio generation. Use this skill when the user needs to: (1) Convert images to video (2) Generate video with Keevx (3) Create and query image-to-video tasks (4) Batch image-to-video conversion. Keywords: image to video, Keevx, video generation.

keevx-image-generate

3891

from openclaw/skills

Use the Keevx API to generate images from prompts and reference images. Supports standard and professional modes, multiple quality levels (1K/2K/4K), various aspect ratios, and batch generation. Use this skill when the user needs to: (1) Generate images from text prompts (2) Create AI images with reference images (3) Batch image generation (4) Query image generation task status. Keywords: image generate, Keevx, AI image, text to image.