vision-llm skill

Patterns and reference for using vision LLMs to convert screenshots to code within the screenshot-to-code system.

7 stars

byheldernoid

View on GitHub Installation ↓

Best use case

vision-llm skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Patterns and reference for using vision LLMs to convert screenshots to code within the screenshot-to-code system.

Teams using vision-llm skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vision-llm/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/automation-productivity/screenshot-to-code/skills/vision-llm/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/vision-llm/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How vision-llm skill Compares

Feature / Agent	vision-llm skill	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Patterns and reference for using vision LLMs to convert screenshots to code within the screenshot-to-code system.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# vision-llm skill

Patterns and reference for using vision LLMs to convert screenshots to code within the screenshot-to-code system.

## Supported models

| Model | Provider | Notes |
|---|---|---|
| `gpt-4o` | OpenAI | Recommended. Best quality, fast. |
| `gpt-4-turbo` | OpenAI | Slower, slightly lower cost. |
| `claude-3-5-sonnet-20241022` | Anthropic | Excellent for complex UIs. |
| `claude-3-opus-20240229` | Anthropic | Highest quality, slowest. |
| Any vision-capable model | Custom | Must accept OpenAI-compatible `/chat/completions`. |

The system uses the OpenAI `/chat/completions` endpoint format. Anthropic models are accessed via an OpenAI-compatible proxy or the `@anthropic-ai/sdk` with the same message structure.

## System prompt

The default system prompt is stored in settings and editable by the user. The production default:

```
You are an expert frontend developer. Convert the provided UI screenshot into clean, well-structured code. Output only the code block, no explanations or markdown fences around the block (the block itself is fine). Produce idiomatic code for the target framework.
```

For each framework the system prompt is augmented with:

- **html-css**: "Output a single self-contained HTML file with a <style> block. Use CSS custom properties for colors and spacing."
- **react-tailwind**: "Output a single .tsx file. Use functional components, TypeScript prop types, and Tailwind CSS classes. Do not import external component libraries."
- **vue**: "Output a single Vue 3 SFC (.vue) with <script setup lang='ts'>, <template>, and <style scoped>."

## Message structure

```typescript
interface VisionMessage {
  role: "user";
  content: [
    {
      type: "text";
      text: string; // framework-specific prompt + optional user instructions
    },
    {
      type: "image_url";
      image_url: {
        url: string; // "data:image/jpeg;base64,<base64>"
        detail: "high"; // always "high" for best code accuracy
      };
    }
  ];
}
```

The messages array sent to the LLM is always `[systemMessage, visionMessage]`.

## Image preprocessing with sharp

Before sending to the LLM, images are preprocessed in `src/lib/image.ts`:

```typescript
import sharp from 'sharp';

export async function preprocessImage(
  buffer: Buffer,
  opts: { maxWidth?: number; maxHeight?: number; quality?: number } = {}
): Promise<{ data: Buffer; mime: 'image/jpeg'; originalSize: number; processedSize: number }> {
  const { maxWidth = 1920, maxHeight = 1080, quality = 85 } = opts;
  const originalSize = buffer.length;
  const data = await sharp(buffer)
    .resize(maxWidth, maxHeight, { fit: 'inside', withoutEnlargement: true })
    .jpeg({ quality })
    .toBuffer();
  return { data, mime: 'image/jpeg', originalSize, processedSize: data.length };
}
```

Always set `detail: "high"` in the image_url object - lower detail significantly degrades code quality.

## Iteration prompt

When the user iterates, the existing generated code is included in the message:

```typescript
const iterationUserText = `
Here is the current version of the code (v${iteration.version}):

\`\`\`${frameworkToLang(framework)}
${existingCode}
\`\`\`

The user wants the following change:
${userPrompt}

Output the complete updated code. Do not add explanations.
`.trim();
```

The same image is re-attached so the LLM can reference the original design.

## Code extraction

The LLM output is expected to contain a single fenced code block. Extraction:

```typescript
export function extractCodeBlock(text: string): string | null {
  // Match ```lang\n...code...\n``` or just ```\n...code...\n```
  const match = text.match(/```(?:\w+)?\n([\s\S]+?)```/);
  if (match) return match[1].trim();
  // Fallback: if no fence, check if output starts with a tag or import
  const trimmed = text.trim();
  if (trimmed.startsWith('<') || trimmed.startsWith('import ') || trimmed.startsWith('export ')) {
    return trimmed;
  }
  return null;
}
```

If extraction returns `null`, the conversion is marked `error` with reason `no_code_extracted`.

## Framework language tags

| Framework | Code fence language |
|---|---|
| `html-css` | `html` |
| `react-tailwind` | `tsx` |
| `vue` | `vue` |

## Token cost estimation

Approximate token usage per conversion (1920x1080 JPEG at 85% quality):

| Component | Tokens (approx) |
|---|---|
| System prompt | 80-120 |
| User text prompt | 40-80 |
| Image (high detail) | 765-1105 (depends on image content) |
| Generated output (html-css) | 800-2000 |
| Generated output (react-tailwind) | 1000-3000 |

Total per conversion: approximately 2000-5000 tokens. Iterations reuse the image attachment.

## Streaming

The API uses streaming (`stream: true`) from the LLM to provide real-time status updates. The server buffers the stream internally and writes the final code to the database once complete. The client polls `GET /convert/:id` for status.

For iteration, the same streaming approach applies - status transitions from `generating` to `done` once the full response is buffered and code is extracted.

## Adding a custom OpenAI-compatible provider

Set `S2C_BASE_URL` in `.env` to the provider's base URL, e.g.:

```
S2C_BASE_URL=https://openrouter.ai/api/v1
S2C_OPENAI_KEY=sk-or-...
S2C_VISION_MODEL=openai/gpt-4o
```

The system uses the standard OpenAI SDK with `baseURL` override. Any provider that supports the `/chat/completions` endpoint with `image_url` content parts will work.

## Common LLM prompt improvements

Add these to the optional instructions field to improve output:

| Goal | Instruction |
|---|---|
| TypeScript types | "Add TypeScript prop types for all components and data." |
| Accessibility | "Include aria-labels on interactive elements and semantic HTML." |
| Mock data | "Include realistic mock data arrays for tables and lists." |
| Hover states | "Add Tailwind hover: and focus: states for all interactive elements." |
| Dark mode | "Support dark mode using Tailwind dark: variants." |
| Responsive | "Make the layout responsive with Tailwind sm:/md:/lg: breakpoints." |
| No Tailwind | "Use plain CSS with BEM class names instead of Tailwind." |

## Troubleshooting model output quality

| Symptom | Cause | Fix |
|---|---|---|
| Generic placeholder code | Image too small or low contrast | Increase upload resolution, use PNG |
| Wrong colors | JPEG compression artifacts | Use PNG or increase JPEG quality to 95% |
| Missing sections | Long UI, model truncated | Add "output the complete full page code, do not truncate" to prompt |
| Tailwind classes not applied | Model used arbitrary values | Add "use standard Tailwind utility classes only, no arbitrary values" |
| Vue options API instead of composition | Default model behavior | Add "use Vue 3 Composition API with <script setup>" to prompt |

Related Skills

Skill: pi-provisioner

from heldernoid/agentic-build-templates

Application-level patterns for the pi-provisioner project.

Skill: Uptime Monitoring

from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

from heldernoid/agentic-build-templates

## Overview

reading-list

from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

from heldernoid/agentic-build-templates

## Purpose