type4me-macos-voice-input
MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift
Best use case
type4me-macos-voice-input is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift
Teams using type4me-macos-voice-input should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/type4me-macos-voice-input/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How type4me-macos-voice-input Compares
| Feature / Agent | type4me-macos-voice-input | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Type4Me macOS Voice Input
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
Type4Me is a macOS voice input tool that captures audio via global hotkey, transcribes it using local (SherpaOnnx/Paraformer/Zipformer) or cloud (Volcengine/Deepgram) ASR engines, optionally post-processes text via LLM, and injects the result into any app. All credentials and history are stored locally — no telemetry, no cloud sync.
## Architecture Overview
```
Type4Me/
├── ASR/ # ASR engine abstraction
│ ├── ASRProvider.swift # Provider enum + protocols
│ ├── ASRProviderRegistry.swift # Plugin registry
│ ├── Providers/ # Per-vendor config files
│ ├── SherpaASRClient.swift # Local streaming ASR
│ ├── SherpaOfflineASRClient.swift
│ ├── VolcASRClient.swift # Volcengine streaming ASR
│ └── DeepgramASRClient.swift # Deepgram streaming ASR
├── Bridge/ # SherpaOnnx C API Swift bridge
├── Audio/ # Audio capture
├── Session/ # Core state machine: record→ASR→inject
├── Input/ # Global hotkey management
├── Services/ # Credentials, hotwords, model manager
├── Protocol/ # Volcengine WebSocket codec
└── UI/ # SwiftUI (FloatingBar + Settings)
```
## Installation
### Prerequisites
```bash
# Xcode Command Line Tools
xcode-select --install
# CMake (for local ASR engine)
brew install cmake
```
### Build & Deploy from Source
```bash
git clone https://github.com/joewongjc/type4me.git
cd type4me
# Step 1: Compile SherpaOnnx local engine (~5 min, one-time)
bash scripts/build-sherpa.sh
# Step 2: Build, bundle, sign, install to /Applications, and launch
bash scripts/deploy.sh
```
### Download Pre-built App
Download `Type4Me-v1.2.3.dmg` from releases (cloud ASR only, no local engine):
```
https://github.com/joewongjc/type4me/releases/tag/v1.2.3
```
If macOS blocks the app:
```bash
xattr -d com.apple.quarantine /Applications/Type4Me.app
```
### Download Local ASR Models
```bash
mkdir -p ~/Library/Application\ Support/Type4Me/Models
# Option A: Lightweight ~20MB
tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01.tar.bz2 \
-C ~/Library/Application\ Support/Type4Me/Models/
# Option B: Balanced ~236MB (recommended)
tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 \
-C ~/Library/Application\ Support/Type4Me/Models/
# Option C: Bilingual Chinese+English ~1GB
tar xjf ~/Downloads/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2 \
-C ~/Library/Application\ Support/Type4Me/Models/
```
Expected structure for Paraformer model:
```
~/Library/Application Support/Type4Me/Models/
└── sherpa-onnx-streaming-paraformer-bilingual-zh-en/
├── encoder.int8.onnx
├── decoder.int8.onnx
└── tokens.txt
```
## Key Protocols
### SpeechRecognizer Protocol
Every ASR client must implement this protocol:
```swift
protocol SpeechRecognizer: AnyObject {
/// Start a new recognition session
func startRecognition() async throws
/// Feed raw PCM audio data
func appendAudio(_ buffer: AVAudioPCMBuffer) async
/// Stop and get final result
func stopRecognition() async throws -> String
/// Cancel without result
func cancelRecognition() async
/// Streaming partial results (optional)
var partialResultHandler: ((String) -> Void)? { get set }
}
```
### ASRProviderConfig Protocol
Each vendor's credential definition:
```swift
protocol ASRProviderConfig {
/// Unique identifier string
static var providerID: String { get }
/// Display name in Settings UI
static var displayName: String { get }
/// Credential fields shown in Settings
static var credentialFields: [CredentialField] { get }
/// Validate credentials before use
static func validate(_ credentials: [String: String]) -> Bool
/// Create the recognizer instance
static func createClient(
credentials: [String: String],
config: RecognitionConfig
) throws -> SpeechRecognizer
}
```
## Adding a New ASR Provider
### Step 1: Create Provider Config
Create `Type4Me/ASR/Providers/OpenAIWhisperProvider.swift`:
```swift
import Foundation
struct OpenAIWhisperProvider: ASRProviderConfig {
static let providerID = "openai_whisper"
static let displayName = "OpenAI Whisper"
static let credentialFields: [CredentialField] = [
CredentialField(
key: "api_key",
label: "API Key",
placeholder: "sk-...",
isSecret: true
),
CredentialField(
key: "model",
label: "Model",
placeholder: "whisper-1",
isSecret: false
)
]
static func validate(_ credentials: [String: String]) -> Bool {
guard let apiKey = credentials["api_key"], !apiKey.isEmpty else {
return false
}
return apiKey.hasPrefix("sk-")
}
static func createClient(
credentials: [String: String],
config: RecognitionConfig
) throws -> SpeechRecognizer {
guard let apiKey = credentials["api_key"] else {
throw ASRError.missingCredential("api_key")
}
let model = credentials["model"] ?? "whisper-1"
return OpenAIWhisperASRClient(apiKey: apiKey, model: model, config: config)
}
}
```
### Step 2: Implement the ASR Client
Create `Type4Me/ASR/OpenAIWhisperASRClient.swift`:
```swift
import Foundation
import AVFoundation
final class OpenAIWhisperASRClient: SpeechRecognizer {
var partialResultHandler: ((String) -> Void)?
private let apiKey: String
private let model: String
private let config: RecognitionConfig
private var audioData: Data = Data()
init(apiKey: String, model: String, config: RecognitionConfig) {
self.apiKey = apiKey
self.model = model
self.config = config
}
func startRecognition() async throws {
audioData = Data()
}
func appendAudio(_ buffer: AVAudioPCMBuffer) async {
// Convert PCM buffer to raw bytes and accumulate
guard let channelData = buffer.floatChannelData?[0] else { return }
let frameCount = Int(buffer.frameLength)
let bytes = UnsafeBufferPointer(start: channelData, count: frameCount)
// Convert Float32 PCM to Int16 for Whisper API
let int16Samples = bytes.map { sample -> Int16 in
return Int16(max(-32768, min(32767, Int(sample * 32767))))
}
int16Samples.withUnsafeBytes { ptr in
audioData.append(contentsOf: ptr)
}
}
func stopRecognition() async throws -> String {
// Build multipart form request to Whisper API
var request = URLRequest(url: URL(string: "https://api.openai.com/v1/audio/transcriptions")!)
request.httpMethod = "POST"
request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
let boundary = UUID().uuidString
request.setValue("multipart/form-data; boundary=\(boundary)",
forHTTPHeaderField: "Content-Type")
var body = Data()
// Append audio file part
body.append("--\(boundary)\r\n".data(using: .utf8)!)
body.append("Content-Disposition: form-data; name=\"file\"; filename=\"audio.raw\"\r\n".data(using: .utf8)!)
body.append("Content-Type: audio/raw\r\n\r\n".data(using: .utf8)!)
body.append(audioData)
body.append("\r\n".data(using: .utf8)!)
// Append model part
body.append("--\(boundary)\r\n".data(using: .utf8)!)
body.append("Content-Disposition: form-data; name=\"model\"\r\n\r\n".data(using: .utf8)!)
body.append("\(model)\r\n".data(using: .utf8)!)
body.append("--\(boundary)--\r\n".data(using: .utf8)!)
request.httpBody = body
let (data, response) = try await URLSession.shared.data(for: request)
guard let httpResponse = response as? HTTPURLResponse,
httpResponse.statusCode == 200 else {
throw ASRError.networkError("Whisper API returned error")
}
let result = try JSONDecoder().decode(WhisperResponse.self, from: data)
return result.text
}
func cancelRecognition() async {
audioData = Data()
}
}
private struct WhisperResponse: Codable {
let text: String
}
```
### Step 3: Register the Provider
In `Type4Me/ASR/ASRProviderRegistry.swift`, add to the `all` array:
```swift
struct ASRProviderRegistry {
static let all: [any ASRProviderConfig.Type] = [
SherpaParaformerProvider.self,
VolcengineProvider.self,
DeepgramProvider.self,
OpenAIWhisperProvider.self, // ← Add your provider here
]
}
```
## Credentials Storage
Credentials are stored at `~/Library/Application Support/Type4Me/credentials.json` with permissions `0600`. Never hardcode secrets — always load via `CredentialStore`:
```swift
// Reading credentials
let store = CredentialStore.shared
let apiKey = store.get(providerID: "openai_whisper", key: "api_key")
// Writing credentials
store.set(providerID: "openai_whisper", key: "api_key", value: userInputKey)
// Checking if configured
let isConfigured = store.isConfigured(providerID: "openai_whisper",
fields: OpenAIWhisperProvider.credentialFields)
```
## Custom Processing Modes with Prompt Variables
Processing modes use LLM post-processing with three context variables:
| Variable | Value |
|---|---|
| `{text}` | Recognized speech text |
| `{selected}` | Text selected in active app at record start |
| `{clipboard}` | Clipboard content at record start |
Example custom mode prompts:
```swift
// Translate selection using voice command
let translatePrompt = """
The user selected this text: {selected}
Voice command: {text}
Execute the command on the selected text. Output only the result.
"""
// Code review via voice
let codeReviewPrompt = """
Code to review:
{clipboard}
Review instruction: {text}
Provide focused feedback addressing the instruction.
"""
// Email reply drafting
let emailPrompt = """
Original email: {selected}
My reply intent (spoken): {text}
Write a professional email reply. Output only the email body.
"""
```
## Built-in Processing Modes
```swift
enum ProcessingMode {
case fast // Direct ASR output, zero latency
case performance // Dual-channel: streaming + offline refinement
case englishTranslation // Chinese speech → English text
case promptOptimize // Raw prompt → optimized prompt via LLM
case command // Voice command + selected/clipboard context → LLM action
case custom(prompt: String) // User-defined prompt template
}
```
## Session State Machine
The core recording flow in `Session/`:
```
[Idle]
→ hotkey pressed → [Recording] → audio streams to ASR client
→ hotkey released/pressed again → [Processing]
→ ASR returns text → [LLM Post-processing] (if mode requires)
→ [Injecting] → text injected into active app
→ [Idle]
```
## Updating After Source Changes
```bash
cd type4me
git pull
bash scripts/deploy.sh
# SherpaOnnx does NOT need recompiling unless engine version changed
```
## Troubleshooting
### App won't open (security warning)
```bash
xattr -d com.apple.quarantine /Applications/Type4Me.app
```
### Local model not recognized in Settings
Verify the directory structure exactly matches:
```bash
ls ~/Library/Application\ Support/Type4Me/Models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/
# Must show: encoder.int8.onnx decoder.int8.onnx tokens.txt
```
### SherpaOnnx build fails
```bash
# Ensure cmake is installed
brew install cmake
# Clean and retry
rm -rf Frameworks/
bash scripts/build-sherpa.sh
```
### New ASR provider not appearing in Settings
- Confirm the provider type is added to `ASRProviderRegistry.all`
- Ensure `providerID` is unique across all providers
- Clean build: `swift package clean && bash scripts/deploy.sh`
### Audio not captured / no floating bar
- Grant microphone permission: System Settings → Privacy & Security → Microphone → Type4Me ✓
- Grant Accessibility permission for text injection: System Settings → Privacy & Security → Accessibility → Type4Me ✓
### Credentials not saving
```bash
# Check file exists and has correct permissions
ls -la ~/Library/Application\ Support/Type4Me/credentials.json
# Should show: -rw------- (0600)
# Fix permissions if needed:
chmod 0600 ~/Library/Application\ Support/Type4Me/credentials.json
```
### Export history to CSV
Open Settings → History → select date range → Export CSV. The SQLite database is at:
```bash
~/Library/Application\ Support/Type4Me/history.db
# Direct query:
sqlite3 ~/Library/Application\ Support/Type4Me/history.db \
"SELECT datetime(timestamp,'unixepoch'), text FROM records ORDER BY timestamp DESC LIMIT 20;"
```
## System Requirements
- macOS 14.0 (Sonoma) or later
- Apple Silicon (M1/M2/M3/M4) recommended for local ASR inference
- Xcode Command Line Tools + CMake for source builds
- Internet connection only needed for cloud ASR providersRelated Skills
whatcable-macos-usb-inspector
macOS menu bar app that identifies USB-C cable capabilities and charging diagnostics using IOKit
puremac-macos-cleaner
Free open-source macOS cleaner built with SwiftUI — CleanMyMac alternative with zero telemetry, scheduled auto-cleaning, and Xcode/Homebrew/system cache cleanup.
openless-voice-input
OpenLess open-source voice input for macOS & Windows — press a hotkey, speak, get AI-polished text inserted at your cursor in any app.
omnivoice-tts
Expert skill for OmniVoice, a massively multilingual zero-shot TTS model supporting 600+ languages with voice cloning and voice design capabilities.
md-preview-app-macos
Native macOS Markdown viewer app with Quick Look extension, Mermaid diagrams, KaTeX math, document outline, and editor integration
capso-screenshot-macos
Expert skill for Capso, the open-source macOS screenshot and screen recording app built with Swift 6 and SwiftUI — covers architecture, building from source, package APIs, and contributing.
```markdown
---
zeroboot-vm-sandbox
Sub-millisecond VM sandboxes for AI agents using copy-on-write KVM forking via Zeroboot
yourvpndead-vpn-detection
Android app that detects VPN/proxy servers (VLESS/xray/sing-box) via local SOCKS5 vulnerability, exposing exit IPs and server configs without root
xata-postgres-platform
Expert skill for Xata open-source cloud-native Postgres platform with copy-on-write branching, scale-to-zero, and Kubernetes deployment
x-mentor-skill-nuwa
AI-powered X (Twitter) content strategy skill that distills methodologies from 6 top creators + open-source algorithm data into actionable writing, growth, and monetization guidance.
wx-favorites-report
End-to-end pipeline to extract, decrypt, and visualize WeChat Mac favorites from encrypted SQLite DB into an interactive HTML report.