voice-system-expert

Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

voice-system-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "voice-system-expert" skill to help with this workflow task. Context: Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/voice-system-expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/barnhardt-enterprises-inc/voice-system-expert/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/voice-system-expert/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How voice-system-expert Compares

Feature / Agent	voice-system-expert	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Quetrex Voice System Expert

## CRITICAL: Read This First

Quetrex's voice system architecture is **extensively documented and battle-tested**. Before making ANY changes to voice-related code, you MUST read:

1. **ADR-001-VOICE-ECHO-CANCELLATION.md** (definitive architectural decision)
2. **docs/architecture/VOICE-SYSTEM.md** (technical implementation)
3. **docs/features/voice-interface.md** (user-facing features)

**Location:** `src/lib/openai-realtime.ts`

## Core Architecture Decision

### ALWAYS-ON MICROPHONE + BROWSER AEC

This is **Decision 4** from VOICE-SYSTEM.md and the definitive approach documented in ADR-001.

```typescript
// ✅ CORRECT: Always-on microphone
const mediaStream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: true,  // CRITICAL - browser handles echo cancellation
    noiseSuppression: true,
    autoGainControl: true,
  },
})

// Microphone track stays ENABLED throughout conversation
// Browser's native AEC prevents feedback loops
// Server-side VAD (Voice Activity Detection) handles turn detection
```

## How It Works

### Audio Pipeline
```
User speaks
    ↓
Microphone (always enabled, echoCancellation: true)
    ↓
WebRTC → OpenAI Realtime API
    ↓
Server-side VAD detects speech
    ↓
OpenAI processes and responds
    ↓
Audio response via WebRTC
    ↓
HTMLAudioElement playback (stays in browser pipeline)
    ↓
Browser AEC compares mic input + speaker output
    ↓
Echo automatically canceled (no feedback loop)
```

### Why This Works

**Browser Echo Cancellation Requirements:**
1. Both microphone input AND speaker output must be in browser's audio graph
2. Audio must flow through browser's WebRTC stack
3. No manual intervention needed (browser handles it)

**This is the industry standard:**
- ChatGPT voice mode
- Google Meet
- Zoom
- Discord
- Microsoft Teams

They ALL use always-on microphone + browser AEC.

## DO NOT DO THESE THINGS

### ❌ DON'T: Toggle Microphone Track

```typescript
// ❌ WRONG - This breaks echo cancellation
async function pauseRecording() {
  microphone.enabled = false // DON'T DO THIS
}

async function resumeRecording() {
  microphone.enabled = true // DON'T DO THIS
}
```

**Why this fails:**
- Not industry standard
- Causes audio glitches
- Can break echo cancellation
- Adds artificial delays
- More complex state management
- No real benefit

### ❌ DON'T: Route Audio Outside Browser

```typescript
// ❌ WRONG - AudioWorklet bypass to native audio
const workletNode = new AudioWorkletNode(audioContext, 'bypass-processor')
workletNode.port.postMessage({ cmd: 'route-to-native' })
```

**Why this fails:**
- Breaks browser AEC (audio leaves browser pipeline)
- Causes echo/feedback loops
- Requires complex native audio handling
- Platform-specific implementations
- Already tried and failed (see abandoned-approaches/)

### ❌ DON'T: Implement Custom Echo Cancellation

```typescript
// ❌ WRONG - Custom AEC implementation
class CustomEchoCanceller {
  cancelEcho(input: AudioBuffer, output: AudioBuffer) {
    // Complex DSP code...
  }
}
```

**Why this fails:**
- Reinventing the wheel
- Browser AEC is battle-tested by billions of users
- Extremely complex to implement correctly
- Requires deep DSP knowledge
- Performance issues
- Platform-specific tuning needed

### ❌ DON'T: Add Artificial Delays

```typescript
// ❌ WRONG - Delays for echo prevention
await new Promise(resolve => setTimeout(resolve, 500))
await playAudio()
await new Promise(resolve => setTimeout(resolve, 500))
resumeRecording()
```

**Why this fails:**
- Not necessary (browser AEC handles it)
- Degrades user experience
- Adds latency
- Still doesn't prevent echo if implementation is wrong

## What You CAN Change

### ✅ DO: Adjust Audio Settings

```typescript
// ✅ Can tweak these settings
const constraints = {
  audio: {
    echoCancellation: true,      // MUST be true
    noiseSuppression: true,       // Can adjust
    autoGainControl: true,        // Can adjust
    sampleRate: 24000,            // Can change for quality
    channelCount: 1,              // Mono is fine for voice
  },
}
```

### ✅ DO: Handle Connection States

```typescript
// ✅ Can manage connection lifecycle
async function connectVoice() {
  // Setup WebRTC connection
  // Start streaming
  // Handle connection events
}

async function disconnectVoice() {
  // Clean up WebRTC connection
  // Stop streaming
  // Release microphone
}
```

### ✅ DO: Add UI Feedback

```typescript
// ✅ Can add visual indicators
function onVoiceActivity(active: boolean) {
  if (active) {
    // Show "listening" indicator
    // Animate microphone icon
  } else {
    // Show "idle" indicator
  }
}
```

### ✅ DO: Handle Errors

```typescript
// ✅ Can improve error handling
try {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
} catch (error) {
  if (error.name === 'NotAllowedError') {
    // Show permission request UI
  } else if (error.name === 'NotFoundError') {
    // Show "no microphone found" error
  }
}
```

## Implementation Location

**Primary file:** `src/lib/openai-realtime.ts`

**Key functions:**
- `setupMediaStream()` - Initializes microphone with correct constraints
- `connectToOpenAI()` - Establishes WebRTC connection
- `handleAudioResponse()` - Plays AI responses via HTMLAudioElement

**DO NOT modify:**
- Microphone enable/disable logic (should stay always-on)
- Echo cancellation settings (must be true)
- Audio routing (must stay in browser pipeline)

**CAN modify:**
- UI feedback and visual indicators
- Error handling and user messages
- Connection retry logic
- Audio quality settings (sample rate, etc.)

## Why This Architecture Was Chosen

From ADR-001:

**Option A: Trust Industry Pattern (CHOSEN)**
- ✅ Used by ChatGPT, Google Meet, Zoom, Discord
- ✅ Browser vendors optimize for this pattern
- ✅ Works across all platforms (macOS, Windows, Linux, mobile)
- ✅ No custom implementation needed
- ✅ Proven reliability

**Options Rejected:**
- ❌ Manual microphone toggling (not industry standard)
- ❌ AudioWorklet native bypass (breaks AEC)
- ❌ Custom echo cancellation (reinventing wheel)
- ❌ Native Rust WebRTC (2-4 weeks work for no benefit)

## Platform Context: Web Application

**IMPORTANT:** Quetrex is now a **pure web application**, not a Tauri desktop app.

**Why this matters for voice:**
- Browser echo cancellation works perfectly in all browsers
- No WKWebView limitations (that was the Tauri problem)
- Universal compatibility (Chrome, Safari, Firefox, Edge)
- No platform-specific audio handling needed
- Just works™️

**The WKWebView Problem (Historical):**
Quetrex was originally a Tauri desktop app. On macOS, Tauri uses WKWebView, which has a bug: **WebRTC audio playback doesn't work**. This forced us to try workarounds (AudioWorklet bypass, native audio routing), which all broke echo cancellation.

**Solution:** Convert to web application. Now echo cancellation works perfectly everywhere.

## Testing Voice Changes

If you must modify voice code:

1. **Test on real browsers** (not just dev tools)
2. **Test the echo scenario:**
   - Turn up speaker volume
   - Start voice conversation
   - Verify no feedback loop/echo
3. **Test across platforms:**
   - Chrome (most common)
   - Safari (macOS, iOS)
   - Firefox
   - Edge
4. **Test edge cases:**
   - Poor network connection
   - Microphone permission denied
   - Mid-conversation disconnect

## When to Consult Documentation

**Before any voice changes, read:**
1. `docs/decisions/ADR-001-VOICE-ECHO-CANCELLATION.md`
2. `docs/architecture/VOICE-SYSTEM.md`
3. `docs/development/abandoned-approaches/`

**If you see mentions of:**
- Tauri → Ignore (Quetrex is web app now)
- WKWebView → Ignore (not relevant anymore)
- AudioWorklet bypass → Don't implement (already tried, failed)
- Manual mic toggling → Don't implement (not industry standard)

## Summary

**The Golden Rule:** Trust browser echo cancellation. Always-on microphone + `echoCancellation: true` + audio in browser pipeline = Perfect echo cancellation.

**DO:**
- Keep microphone enabled throughout conversation
- Use `echoCancellation: true`
- Keep audio in browser (HTMLAudioElement)
- Let OpenAI Realtime API handle VAD
- Trust the industry pattern

**DON'T:**
- Toggle microphone track
- Route audio outside browser
- Implement custom echo cancellation
- Add artificial delays
- Try to "improve" what already works

**If in doubt:** Read ADR-001 and VOICE-SYSTEM.md. The architecture is thoroughly documented for a reason.

Related Skills

design-system-patterns

242

from aiskillstore/marketplace

Build scalable design systems with design tokens, theming infrastructure, and component architecture patterns. Use when creating design tokens, implementing theme switching, building component libraries, or establishing design system foundations.

system-environment-setup

242

from aiskillstore/marketplace

Configure development and production environments for consistent and reproducible setups. Use when setting up new projects, Docker environments, or development tooling. Handles Docker Compose, .env configuration, dev containers, and infrastructure as code.

voice-ai-engine-development

242

from aiskillstore/marketplace

Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support

voice-ai-development

242

from aiskillstore/marketplace

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. Knows how to build low-latency, production-ready voice experiences. Use when: voice ai, voice agent, speech to text, text to speech, realtime voice.

voice-agents

242

from aiskillstore/marketplace

Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation flow with sub-800ms latency while handling interruptions, background noise, and emotional nuance. This skill covers two architectures: speech-to-speech (OpenAI Realtime API, lowest latency, most natural) and pipeline (STT→LLM→TTS, more control, easier to debug). Key insight: latency is the constraint. Hu

typescript-expert

242

from aiskillstore/marketplace

TypeScript and JavaScript expert with deep knowledge of type-level programming, performance optimization, monorepo management, migration strategies, and modern tooling. Use PROACTIVELY for any TypeScript/JavaScript issues including complex type gymnastics, build performance, debugging, and architectural decisions. If a specialized expert is a better fit, I will recommend switching and stop.

threat-modeling-expert

242

from aiskillstore/marketplace

Expert in threat modeling methodologies, security architecture review, and risk assessment. Masters STRIDE, PASTA, attack trees, and security requirement extraction. Use for security architecture reviews, threat identification, and secure-by-design planning.

tailwind-design-system

242

from aiskillstore/marketplace

Build scalable design systems with Tailwind CSS, design tokens, component libraries, and responsive patterns. Use when creating component libraries, implementing design systems, or standardizing UI patterns.

systems-programming-rust-project

242

from aiskillstore/marketplace

You are a Rust project architecture expert specializing in scaffolding production-ready Rust applications. Generate complete project structures with cargo tooling, proper module organization, testing

service-mesh-expert

242

from aiskillstore/marketplace

Expert service mesh architect specializing in Istio, Linkerd, and cloud-native networking patterns. Masters traffic management, security policies, observability integration, and multi-cluster mesh con

radix-ui-design-system

242

from aiskillstore/marketplace

Build accessible design systems with Radix UI primitives. Headless component customization, theming strategies, and compound component patterns for production-grade UI libraries.

prisma-expert

242

from aiskillstore/marketplace

Prisma ORM expert for schema design, migrations, query optimization, relations modeling, and database operations. Use PROACTIVELY for Prisma schema issues, migration problems, query performance, relation design, or database connection issues.