capture-screen

Programmatic screenshot capture on macOS. Find window IDs with Swift CGWindowListCopyWindowInfo, control application windows via AppleScript (zoom, scroll, select), and capture with screencapture. Use when automating screenshots, capturing application windows for documentation, or building multi-shot visual workflows.

25 stars

Best use case

capture-screen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Programmatic screenshot capture on macOS. Find window IDs with Swift CGWindowListCopyWindowInfo, control application windows via AppleScript (zoom, scroll, select), and capture with screencapture. Use when automating screenshots, capturing application windows for documentation, or building multi-shot visual workflows.

Teams using capture-screen should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/capture-screen/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/daymade/claude-code-skills/capture-screen/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/capture-screen/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How capture-screen Compares

Feature / Agentcapture-screenStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Programmatic screenshot capture on macOS. Find window IDs with Swift CGWindowListCopyWindowInfo, control application windows via AppleScript (zoom, scroll, select), and capture with screencapture. Use when automating screenshots, capturing application windows for documentation, or building multi-shot visual workflows.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Capture Screen

Programmatic screenshot capture on macOS: find windows, control views, capture images.

## Quick Start

```bash
# Find Excel window ID
swift scripts/get_window_id.swift Excel

# Capture that window (replace 12345 with actual WID)
screencapture -x -l 12345 output.png
```

## Overview

Three-step workflow:

```
1. Find Window  →  Swift CGWindowListCopyWindowInfo  →  get numeric Window ID
2. Control View  →  AppleScript (osascript)           →  zoom, scroll, select
3. Capture       →  screencapture -l <WID>            →  PNG/JPEG output
```

## Step 1: Get Window ID (Swift)

Use Swift with CoreGraphics to enumerate windows. This is the **only reliable method** on macOS.

### Quick inline execution

```bash
swift -e '
import CoreGraphics
let keyword = "Excel"
let list = CGWindowListCopyWindowInfo(.optionOnScreenOnly, kCGNullWindowID) as? [[String: Any]] ?? []
for w in list {
    let owner = w[kCGWindowOwnerName as String] as? String ?? ""
    let name = w[kCGWindowName as String] as? String ?? ""
    let wid = w[kCGWindowNumber as String] as? Int ?? 0
    if owner.localizedCaseInsensitiveContains(keyword) || name.localizedCaseInsensitiveContains(keyword) {
        print("WID=\(wid) | App=\(owner) | Title=\(name)")
    }
}
'
```

### Using the bundled script

```bash
swift scripts/get_window_id.swift Excel
swift scripts/get_window_id.swift Chrome
swift scripts/get_window_id.swift          # List all windows
```

Output format: `WID=12345 | App=Microsoft Excel | Title=workbook.xlsx`

Parse the WID number for use with `screencapture -l`.

## Step 2: Control Window (AppleScript)

Verified commands for controlling application windows before capture.

### Microsoft Excel (full AppleScript support)

```bash
# Activate (bring to front)
osascript -e 'tell application "Microsoft Excel" to activate'

# Set zoom level (percentage)
osascript -e 'tell application "Microsoft Excel"
    set zoom of active window to 120
end tell'

# Scroll to specific row
osascript -e 'tell application "Microsoft Excel"
    set scroll row of active window to 45
end tell'

# Scroll to specific column
osascript -e 'tell application "Microsoft Excel"
    set scroll column of active window to 3
end tell'

# Select a cell range
osascript -e 'tell application "Microsoft Excel"
    select range "A1" of active sheet
end tell'

# Select a specific sheet
osascript -e 'tell application "Microsoft Excel"
    activate object sheet "DCF" of active workbook
end tell'

# Open a file
osascript -e 'tell application "Microsoft Excel"
    open POSIX file "/path/to/file.xlsx"
end tell'
```

### Any application (basic control)

```bash
# Activate any app
osascript -e 'tell application "Google Chrome" to activate'

# Bring specific window to front (by index)
osascript -e 'tell application "System Events"
    tell process "Google Chrome"
        perform action "AXRaise" of window 1
    end tell
end tell'
```

### Timing and Timeout

Always add `sleep 1` after AppleScript commands before capturing, to allow UI rendering to complete.

**IMPORTANT**: `osascript` hangs indefinitely if the target application is not running or not responding. Always wrap with `timeout`:

```bash
timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
```

## Step 3: Capture (screencapture)

```bash
# Capture specific window by ID
screencapture -l <WID> output.png

# Silent capture (no camera shutter sound)
screencapture -x -l <WID> output.png

# Capture as JPEG
screencapture -l <WID> -t jpg output.jpg

# Capture with delay (seconds)
screencapture -l <WID> -T 2 output.png

# Capture a screen region (interactive)
screencapture -R x,y,width,height output.png
```

### Retina displays

On Retina Macs, `screencapture` outputs 2x resolution by default (e.g., a 2032x1238 window produces a 4064x2476 PNG). This is normal. To get 1x resolution, resize after capture:

```bash
sips --resampleWidth 2032 output.png --out output_1x.png
```

### Verify capture

```bash
# Check file was created and has content
ls -la output.png
file output.png    # Should show "PNG image data, ..."
```

## Multi-Shot Workflow

Complete example: capture multiple sections of an Excel workbook.

```bash
# 1. Open file and activate Excel
osascript -e 'tell application "Microsoft Excel"
    open POSIX file "/path/to/model.xlsx"
    activate
end tell'
sleep 2

# 2. Set up view
osascript -e 'tell application "Microsoft Excel"
    set zoom of active window to 130
    activate object sheet "Summary" of active workbook
end tell'
sleep 1

# 3. Get window ID
#    IMPORTANT: Always re-fetch before capturing. CGWindowID is invalidated
#    when an app restarts or a window is closed and reopened.
WID=$(swift -e '
import CoreGraphics
let list = CGWindowListCopyWindowInfo(.optionOnScreenOnly, kCGNullWindowID) as? [[String: Any]] ?? []
for w in list {
    let owner = w[kCGWindowOwnerName as String] as? String ?? ""
    let wid = w[kCGWindowNumber as String] as? Int ?? 0
    if owner == "Microsoft Excel" { print(wid); break }
}
')
echo "Window ID: $WID"

# 4. Capture Section A (top of sheet)
osascript -e 'tell application "Microsoft Excel"
    set scroll row of active window to 1
end tell'
sleep 1
screencapture -x -l $WID section_a.png

# 5. Capture Section B (further down)
osascript -e 'tell application "Microsoft Excel"
    set scroll row of active window to 45
end tell'
sleep 1
screencapture -x -l $WID section_b.png

# 6. Switch sheet and capture
osascript -e 'tell application "Microsoft Excel"
    activate object sheet "DCF" of active workbook
    set scroll row of active window to 1
end tell'
sleep 1
screencapture -x -l $WID dcf_overview.png
```

## Failed Approaches (DO NOT USE)

These methods were tested and confirmed to fail on macOS:

| Method | Error | Why It Fails |
|--------|-------|-------------|
| `System Events` → `id of window` | Error -1728 | System Events cannot access window IDs in the format screencapture needs |
| Python `import Quartz` (PyObjC) | `ModuleNotFoundError` | PyObjC not installed in system Python; don't attempt to install it — use Swift instead |
| `osascript` window id | Wrong format | Returns AppleScript window index, not CGWindowID needed by `screencapture -l` |

## Supported Applications

| Application | Window ID | AppleScript Control | Notes |
|------------|-----------|-------------------|-------|
| Microsoft Excel | Swift | Full (zoom, scroll, select, activate sheet) | Best supported |
| Google Chrome | Swift | Basic (activate, window management) | No scroll/zoom via AppleScript |
| Any macOS app | Swift | Basic (activate via `tell application`) | screencapture works universally |

AppleScript control depth varies by application. Excel has the richest AppleScript dictionary. For apps with limited AppleScript, use keyboard simulation via `System Events` as a fallback.

Related Skills

monarch-screen-setup

25
from ComeOnOliver/skillshub

Organizes screens and popups in a Defold game using Monarch screen manager. Use when creating new screens, popups, or setting up navigation between them.

screenshots

25
from ComeOnOliver/skillshub

Generate marketing screenshots of your app using Playwright. Use when the user wants to create screenshots for Product Hunt, social media, landing pages, or documentation.

screen-reader-testing

25
from ComeOnOliver/skillshub

Test web applications with screen readers including VoiceOver, NVDA, and JAWS. Use when validating screen reader compatibility, debugging accessibility issues, or ensuring assistive technology support.

app-store-screenshots

25
from ComeOnOliver/skillshub

App Store and Google Play screenshot creation with exact platform specs. Covers iOS/Android dimensions, gallery ordering, device mockups, and preview videos. Use for: app store optimization, ASO, app screenshots, app preview, play store listing. Triggers: app store screenshots, aso, app store optimization, play store screenshots, app preview, app listing, ios screenshots, android screenshots, app store images, app mockup, device mockup, app gallery, store listing

notion-knowledge-capture

25
from ComeOnOliver/skillshub

Capture conversations and decisions into structured Notion pages; use when turning chats/notes into wiki entries, how-tos, decisions, or FAQs with proper linking.

browser-testing-with-screenshots

25
from ComeOnOliver/skillshub

Use when testing web applications with visual verification - automates Chrome browser interactions, element selection, and screenshot capture for confirming UI functionality

screenfull-fullscreen-api

25
from ComeOnOliver/skillshub

Implement fullscreen functionality using screenfull library for cross-browser fullscreen support in ng-events project

Blender Motion Capture

25
from ComeOnOliver/skillshub

## Overview

Applicant Screening

25
from ComeOnOliver/skillshub

## Overview

screenshot-optimization

25
from ComeOnOliver/skillshub

When the user wants to design, optimize, or evaluate App Store screenshots and preview videos. Also use when the user mentions "screenshots", "app preview", "product page design", "screenshot design", "creative assets", or "what should my screenshots show". For A/B testing screenshots, see ab-test-store-listing. For full ASO audit, see aso-audit.

screenshotone-automation

25
from ComeOnOliver/skillshub

Automate Screenshotone tasks via Rube MCP (Composio). Always search tools first for current schemas.

screenshot-fyi-automation

25
from ComeOnOliver/skillshub

Automate Screenshot Fyi tasks via Rube MCP (Composio). Always search tools first for current schemas.