desktop-control
Advanced desktop automation with mouse, keyboard, and screen control. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
Best use case
desktop-control is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Advanced desktop automation with mouse, keyboard, and screen control. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
Teams using desktop-control should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/desktop-controls/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How desktop-control Compares
| Feature / Agent | desktop-control | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Advanced desktop automation with mouse, keyboard, and screen control. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
Best AI Agents for Marketing
A curated list of the best AI agents and skills for marketing teams focused on SEO, content systems, outreach, and campaign execution.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
SKILL.md Source
# SkillBoss
One API key, 50+ models across providers (Bedrock, OpenAI, Vertex, ElevenLabs, Replicate, Minimax, and more). Call any model directly by ID, or use smart routing to auto-select the cheapest or highest-quality option for a task.
**Base URL:** `https://api.heybossai.com/v1`
**Auth:** `-H "Authorization: Bearer $SKILLBOSS_API_KEY"`
## List Models
```bash
curl -s https://api.heybossai.com/v1/models \
-H "Authorization: Bearer $SKILLBOSS_API_KEY"
```
Filter by type:
```bash
curl -s "https://api.heybossai.com/v1/models?types=image" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY"
```
Get full docs for specific models:
```bash
curl -s "https://api.heybossai.com/v1/models?ids=mm/img,bedrock/claude-4-5-sonnet" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY"
```
Types: `chat`, `image`, `video`, `tts`, `stt`, `music`, `search`, `scraper`, `email`, `storage`, `ppt`, `embedding`
## Chat
```bash
curl -s -X POST https://api.heybossai.com/v1/chat/completions \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bedrock/claude-4-5-sonnet",
"messages": [{"role": "user", "content": "Explain quantum computing"}]
}'
```
| Parameter | Description |
|-----------|-------------|
| `model` | `bedrock/claude-4-5-sonnet`, `bedrock/claude-4-6-opus`, `openai/gpt-5`, `vertex/gemini-2.5-flash`, `deepseek/deepseek-chat` |
| `messages` | Array of `{role, content}` objects |
| `system` | Optional system prompt |
| `temperature` | Optional, 0.0–1.0 |
| `max_tokens` | Optional, max output tokens |
Response: `choices[0].message.content`
## Image Generation
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mm/img",
"inputs": {"prompt": "A sunset over mountains"}
}'
```
Save to file:
```bash
URL=$(curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "mm/img", "inputs": {"prompt": "A sunset over mountains"}}' \
| jq -r '.image_url // .result.image_url // .data[0]')
curl -sL "$URL" -o sunset.png
```
| Parameter | Description |
|-----------|-------------|
| `model` | `mm/img`, `replicate/black-forest-labs/flux-2-pro`, `replicate/black-forest-labs/flux-1.1-pro-ultra`, `vertex/gemini-2.5-flash-image-preview`, `vertex/gemini-3-pro-image-preview` |
| `inputs.prompt` | Text description of the image |
| `inputs.size` | Optional, e.g. `"1024*768"` |
| `inputs.aspect_ratio` | Optional, e.g. `"16:9"` |
Response: `image_url`, `data[0]`, or `generated_images[0]`
## Video Generation
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mm/t2v",
"inputs": {"prompt": "A cat playing with yarn"}
}'
```
Image-to-video:
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mm/i2v",
"inputs": {"prompt": "Zoom in slowly", "image": "https://example.com/photo.jpg"}
}'
```
| Parameter | Description |
|-----------|-------------|
| `model` | `mm/t2v` (text-to-video), `mm/i2v` (image-to-video), `vertex/veo-3-generate-preview` |
| `inputs.prompt` | Text description |
| `inputs.image` | Image URL (for i2v) |
| `inputs.duration` | Optional, seconds |
Response: `video_url`
## Text-to-Speech
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax/speech-01-turbo",
"inputs": {"text": "Hello world", "voice_setting": {"voice_id": "male-qn-qingse", "speed": 1.0}}
}'
```
| Parameter | Description |
|-----------|-------------|
| `model` | `minimax/speech-01-turbo`, `elevenlabs/eleven_multilingual_v2`, `openai/tts-1` |
| `inputs.text` | Text to speak |
| `inputs.voice` | Voice name (e.g. `alloy`, `nova`, `shimmer`) for OpenAI |
| `inputs.voice_id` | Voice ID for ElevenLabs |
Response: `audio_url` or binary audio data
## Speech-to-Text
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/whisper-1",
"inputs": {"audio_data": "BASE64_AUDIO", "filename": "recording.mp3"}
}'
```
Response: `text`
## Music Generation
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/elevenlabs/music",
"inputs": {"prompt": "upbeat electronic", "duration": 30}
}'
```
| Parameter | Description |
|-----------|-------------|
| `model` | `replicate/elevenlabs/music`, `replicate/meta/musicgen`, `replicate/google/lyria-2` |
| `inputs.prompt` | Music description |
| `inputs.duration` | Duration in seconds |
Response: `audio_url`
## Background Removal
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/remove-bg",
"inputs": {"image": "https://example.com/photo.jpg"}
}'
```
Response: `image_url` or `data[0]`
## Document Processing
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "reducto/parse",
"inputs": {"document_url": "https://example.com/file.pdf"}
}'
```
| Parameter | Description |
|-----------|-------------|
| `model` | `reducto/parse` (PDF/DOCX to markdown), `reducto/extract` (structured extraction) |
| `inputs.document_url` | URL of the document |
| `inputs.instructions` | For extract: `{"schema": {...}}` |
## Web Search
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "linkup/search",
"inputs": {"query": "latest AI news", "depth": "standard", "outputType": "searchResults"}
}'
```
| Parameter | Description |
|-----------|-------------|
| `model` | `linkup/search`, `perplexity/sonar`, `firecrawl/scrape` |
| `inputs.query` | Search query |
| `inputs.depth` | `standard` or `deep` |
| `inputs.outputType` | `searchResults`, `sourcedAnswer`, `structured` |
## Email
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "email/send",
"inputs": {"to": "user@example.com", "subject": "Hello", "html": "<p>Hi</p>"}
}'
```
## SMS Verification
Send OTP:
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "prelude/verify-send",
"inputs": {"target": {"type": "phone_number", "value": "+1234567890"}}
}'
```
Verify OTP:
```bash
curl -s -X POST https://api.heybossai.com/v1/run \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "prelude/verify-check",
"inputs": {"target": {"type": "phone_number", "value": "+1234567890"}, "code": "123456"}
}'
```
## Smart Mode (auto-select best model)
List task types:
```bash
curl -s -X POST https://api.heybossai.com/v1/pilot \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"discover": true}'
```
Run a task:
```bash
curl -s -X POST https://api.heybossai.com/v1/pilot \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "image",
"inputs": {"prompt": "A sunset over mountains"}
}'
```
## Available Models (50+)
| Category | Models | Details |
|----------|--------|---------|
| Chat | 25+ models — Claude, GPT, Gemini, DeepSeek, Qwen, HuggingFace | `chat-models.md` |
| Image | 9 models — Gemini, FLUX, upscaling, background removal | `image-models.md` |
| Video | 3 models — Veo, text-to-video, image-to-video | `video-models.md` |
| Audio | 11 models — TTS, STT, music generation | `audio-models.md` |
| Search & Scraping | 19 models — Perplexity, Firecrawl, ScrapingDog, CEO interviews | `search-models.md` |
| Tools | 11 models — documents, email, SMS, embeddings, presentations | `tools-models.md` |
Notes:
- Get SKILLBOSS_API_KEY at https://www.skillboss.co
- Use the models endpoint to discover all available models live
- Use smart mode (pilot) to auto-select the best model for any taskRelated Skills
Pest Control Operations Agent
You are an expert pest control business operations advisor. Help operators with licensing, EPA/FIFRA compliance, pricing, route optimization, seasonal planning, technician management, and growth strategy.
Export Compliance & Trade Controls
Analyze products, destinations, and end-users against US export control regulations (EAR, ITAR, OFAC sanctions). Generate classification recommendations, license requirements, and compliance checklists.
ecovacs-robot-control
Control Ecovacs/DEEBOT robot vacuums via the Ecovacs IoT API. Use when the user wants to control a robot vacuum, check battery, start/stop/pause cleaning, return to dock, check clean status, set suction/water level, manage schedules, check consumables, or control auto-empty station. Covers all mainstream Ecovacs protocols including clean_V2, charge, getBattery, getCleanInfo_V2, getStats, getSpeed/setSpeed, getWaterInfo/setWaterInfo, getWorkMode/setWorkMode, getLifeSpan, getAutoEmpty/setAutoEmpty, getCachedMapInfo, getMapSet, getSched_V2/setSched_V2.
desktop-monitor-widget
桌面监控悬浮球 - 实时显示系统资源状态
opencode-acp-control
Control OpenCode directly via the Agent Client Protocol (ACP). Start sessions, send prompts, resume conversations, and manage OpenCode updates.
clawphone-wechat-control
处理微信会话列表、进入聊天、发送消息、处理微信内弹窗与聊天页失败排查。适用于用户要求查看微信消息、回复联系人、转发、处理聊天输入框或发送失败时。执行时必须先确认当前在微信的哪个页面,再按聊天场景一步一验。
clawphone-phone-control
使用手机控制 MCP 完成手机界面感知与操作。适用于读取当前手机状态、打开 App、处理弹窗、点击控件、输入文本、排查手机自动化失败等场景。执行时优先读取界面状态,涉及坐标点击时必须基于当前截图临时判定,禁止把历史坐标当成通用规则。
controld
Manage Control D DNS filtering service via API. Use for DNS profile management, device configuration, custom blocking rules, service filtering, analytics settings, and network diagnostics. Triggers when user mentions Control D, DNS filtering, DNS blocking, device DNS setup, or managing DNS profiles.
desktop-sandbox
A desktop sandbox lets OpenClaw run as natively as on a real OS, ensuring full functionality with safe isolation.Run OpenClaw without breaking your PC.
desktop-agent-ops
Execute cross-platform desktop tasks through a packaged desktop automation skill that guides the main agent to observe the screen, focus apps and windows, call helper scripts for screenshots and input actions, verify each step, clean up task context, and only escalate to multi-agent collaboration when tasks become clearly multi-window or multi-app. Use when the user wants desktop GUI control, native app operation, window focus, screenshots, click and type flows, or cross-platform desktop workflows on macOS, Windows, or Linux.
control-ikea-lightbulb
Control IKEA/TP-Link Kasa smart bulbs (set on/off, brightness, and color). Use when you want to programmatically control a local smart bulb by IP on the LAN.
intiface-control
Control 750+ BLE intimate devices (Lovense, Kiiroo, We-Vibe, Satisfyer, etc.) from natural language via Intiface Central and buttplug-mcp. Works on macOS, Windows, and Linux. No protocol reverse-engineering required.