ClaudeChatGPTGeminiCursorGitHub CopilotDeepSeekAiderContinueVoice AI & Speech Processing

azure-ai-voicelive-ts

Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.

31,392 stars
Complexity: medium

About this skill

This skill leverages the Azure AI Voice Live SDK, providing JavaScript and TypeScript capabilities for building advanced real-time voice AI applications. It establishes bidirectional WebSocket communication with Azure AI services, allowing for dynamic interactions essential for voice assistants, live transcription, and command processing. Agents can utilize this SDK to generate code and integrate real-time voice capabilities into Node.js or modern browser environments. This allows applications to leverage Azure's sophisticated speech-to-text, text-to-speech, and language understanding services. As part of the 'antigravity-awesome-skills' collection, it aims to extend AI agents' abilities to interact with external services, specifically focusing on real-time voice processing and generation.

Best use case

Developing or integrating real-time voice assistants, building applications for live transcription, enabling voice-controlled interfaces, or creating interactive spoken language experiences powered by Azure AI.

Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.

Successful generation and execution of JavaScript/TypeScript code that establishes a real-time, bidirectional voice connection with Azure AI services. This enables features like live speech recognition, voice synthesis, and command processing within a designated application. The agent should be able to integrate and utilize Azure's voice capabilities effectively.

Practical example

Example input

Instruction to agent: 'Write a Node.js script that connects to Azure AI Voice Live, continuously transcribes incoming audio from a microphone, and outputs the text in real-time to the console. Also, ensure it can speak back a simple 'Hello from Azure!' greeting.'

Example output

The agent generates and presents a working JavaScript/TypeScript code snippet using `@azure/ai-voicelive` for Node.js. It includes instructions for setting up `AZURE_VOICELIVE_ENDPOINT` and handling authentication. The code would demonstrate connecting to the Azure service, sending simulated or actual audio, receiving real-time transcription events, and playing back synthesized speech.

```typescript
// Example code snippet generated by agent
import { VoiceLiveClient, AudioConfig } from "@azure/ai-voicelive";
import { SpeechConfig, SpeechSynthesizer } from "microsoft-cognitiveservices-speech-sdk";

// Assuming environment variables are set:
// AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
// AZURE_SPEECH_KEY=YOUR_SPEECH_KEY
// AZURE_SPEECH_REGION=YOUR_SPEECH_REGION

const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT;
const speechKey = process.env.AZURE_SPEECH_KEY;
const speechRegion = process.env.AZURE_SPEECH_REGION;

async function runVoiceLiveDemo() {
  if (!endpoint || !speechKey || !speechRegion) {
    console.error("Please set AZURE_VOICELIVE_ENDPOINT, AZURE_SPEECH_KEY, and AZURE_SPEECH_REGION environment variables.");
    return;
  }

  const audioConfig = AudioConfig.fromDefaultMicrophoneInput(); // Or from a stream
  const client = new VoiceLiveClient(endpoint, { audioConfig });

  client.on("connectionStatusChanged", (status) => {
    console.log(`Connection status: ${status.status}`);
  });

  client.on("speechRecognized", (event) => {
    if (event.result.text) {
      console.log(`Recognized: "${event.result.text}"`);
      if (event.result.text.toLowerCase().includes("hello azure")) {
        sayHelloFromAzure();
      }
    }
  });

  client.on("error", (error) => {
    console.error("Voice Live Error:", error);
  });

  await client.startLiveRecognition();
  console.log("Connected to Azure AI Voice Live. Start speaking...");

  function sayHelloFromAzure() {
    const speechConfig = SpeechConfig.fromSubscription(speechKey!, speechRegion!);
    const synthesizer = new SpeechSynthesizer(speechConfig);

    synthesizer.speakTextAsync("Hello from Azure! How can I assist you?",
      result => {
        if (result.reason === 1) { // 1 is ResultReason.SynthesizingAudioCompleted
          console.log("Spoke: Hello from Azure!");
        } else {
          console.error("Speech synthesis canceled, " + result.errorDetails);
        }
        synthesizer.close();
      }, error => {
        console.error("Speech synthesis error:", error);
        synthesizer.close();
      });
  }

  // Keep the process alive, or stop based on application logic
  // For demonstration, let's stop after 60 seconds
  setTimeout(async () => {
    await client.stopLiveRecognition();
    console.log("Live recognition stopped.");
  }, 60000);
}

runVoiceLiveDemo();
```

When to use this skill

  • When an AI agent needs to process or generate audio in real-time, establish a continuous voice interaction with a user, or integrate Azure AI's advanced speech capabilities into an application or service. Ideal for scenarios requiring low-latency voice communication and dynamic responses within JavaScript/TypeScript environments.

When not to use this skill

  • Not suitable for offline voice processing, simple one-shot speech-to-text tasks where real-time bidirectional communication is not required, or when an agent requires a pre-packaged API call rather than writing code using an SDK. Not applicable for non-voice related tasks or when Azure AI is not the desired speech provider.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/azure-ai-voicelive-ts/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/azure-ai-voicelive-ts/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/azure-ai-voicelive-ts/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How azure-ai-voicelive-ts Compares

Feature / Agentazure-ai-voicelive-tsStandard Approach
Platform SupportClaude, ChatGPT, Gemini, Cursor, GitHub Copilot, DeepSeek, Aider, ContinueLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.

Which AI agents support this skill?

This skill is designed for Claude, ChatGPT, Gemini, Cursor, GitHub Copilot, DeepSeek, Aider, Continue.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# @azure/ai-voicelive (JavaScript/TypeScript)

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.

## Installation

```bash
npm install @azure/ai-voicelive @azure/identity
# TypeScript users
npm install @types/node
```

**Current Version**: 1.0.0-beta.3

**Supported Environments**:
- Node.js LTS versions (20+)
- Modern browsers (Chrome, Firefox, Safari, Edge)

## Environment Variables

```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
# Optional: Logging
AZURE_LOG_LEVEL=info
```

## Authentication

### Microsoft Entra ID (Recommended)

```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";

const client = new VoiceLiveClient(endpoint, credential);
```

### API Key

```typescript
import { AzureKeyCredential } from "@azure/core-auth";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const endpoint = "https://your-resource.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("your-api-key");

const client = new VoiceLiveClient(endpoint, credential);
```

## Client Hierarchy

```
VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
    ├── updateSession()      → Configure session options
    ├── subscribe()          → Event handlers (Azure SDK pattern)
    ├── sendAudio()          → Stream audio input
    ├── addConversationItem() → Add messages/function outputs
    └── sendEvent()          → Send raw protocol events
```

## Quick Start

```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT!;

// Create client and start session
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Configure session
await session.updateSession({
  modalities: ["text", "audio"],
  instructions: "You are a helpful AI assistant. Respond naturally.",
  voice: {
    type: "azure-standard",
    name: "en-US-AvaNeural",
  },
  turnDetection: {
    type: "server_vad",
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
});

// Subscribe to events
const subscription = session.subscribe({
  onResponseAudioDelta: async (event, context) => {
    // Handle streaming audio output
    const audioData = event.delta;
    playAudioChunk(audioData);
  },
  onResponseTextDelta: async (event, context) => {
    // Handle streaming text
    process.stdout.write(event.delta);
  },
  onInputAudioTranscriptionCompleted: async (event, context) => {
    console.log("User said:", event.transcript);
  },
});

// Send audio from microphone
function sendAudioChunk(audioBuffer: ArrayBuffer) {
  session.sendAudio(audioBuffer);
}
```

## Session Configuration

```typescript
await session.updateSession({
  // Modalities
  modalities: ["audio", "text"],
  
  // System instructions
  instructions: "You are a customer service representative.",
  
  // Voice selection
  voice: {
    type: "azure-standard",  // or "azure-custom", "openai"
    name: "en-US-AvaNeural",
  },
  
  // Turn detection (VAD)
  turnDetection: {
    type: "server_vad",      // or "azure_semantic_vad"
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  
  // Audio formats
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
  
  // Tools (function calling)
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string" }
        },
        required: ["location"]
      }
    }
  ],
  toolChoice: "auto",
});
```

## Event Handling (Azure SDK Pattern)

The SDK uses a subscription-based event handling pattern:

```typescript
const subscription = session.subscribe({
  // Connection lifecycle
  onConnected: async (args, context) => {
    console.log("Connected:", args.connectionId);
  },
  onDisconnected: async (args, context) => {
    console.log("Disconnected:", args.code, args.reason);
  },
  onError: async (args, context) => {
    console.error("Error:", args.error.message);
  },
  
  // Session events
  onSessionCreated: async (event, context) => {
    console.log("Session created:", context.sessionId);
  },
  onSessionUpdated: async (event, context) => {
    console.log("Session updated");
  },
  
  // Audio input events (VAD)
  onInputAudioBufferSpeechStarted: async (event, context) => {
    console.log("Speech started at:", event.audioStartMs);
  },
  onInputAudioBufferSpeechStopped: async (event, context) => {
    console.log("Speech stopped at:", event.audioEndMs);
  },
  
  // Transcription events
  onConversationItemInputAudioTranscriptionCompleted: async (event, context) => {
    console.log("User said:", event.transcript);
  },
  onConversationItemInputAudioTranscriptionDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  
  // Response events
  onResponseCreated: async (event, context) => {
    console.log("Response started");
  },
  onResponseDone: async (event, context) => {
    console.log("Response complete");
  },
  
  // Streaming text
  onResponseTextDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  onResponseTextDone: async (event, context) => {
    console.log("\n--- Text complete ---");
  },
  
  // Streaming audio
  onResponseAudioDelta: async (event, context) => {
    const audioData = event.delta;
    playAudioChunk(audioData);
  },
  onResponseAudioDone: async (event, context) => {
    console.log("Audio complete");
  },
  
  // Audio transcript (what assistant said)
  onResponseAudioTranscriptDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  
  // Function calling
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const result = await getWeather(args.location);
      
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(result),
      });
      
      await session.sendEvent({ type: "response.create" });
    }
  },
  
  // Catch-all for debugging
  onServerEvent: async (event, context) => {
    console.log("Event:", event.type);
  },
});

// Clean up when done
await subscription.close();
```

## Function Calling

```typescript
// Define tools in session config
await session.updateSession({
  modalities: ["audio", "text"],
  instructions: "Help users with weather information.",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City and state or country",
          },
        },
        required: ["location"],
      },
    },
  ],
  toolChoice: "auto",
});

// Handle function calls
const subscription = session.subscribe({
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const weatherData = await fetchWeather(args.location);
      
      // Send function result
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(weatherData),
      });
      
      // Trigger response generation
      await session.sendEvent({ type: "response.create" });
    }
  },
});
```

## Voice Options

| Voice Type | Config | Example |
|------------|--------|---------|
| Azure Standard | `{ type: "azure-standard", name: "..." }` | `"en-US-AvaNeural"` |
| Azure Custom | `{ type: "azure-custom", name: "...", endpointId: "..." }` | Custom voice endpoint |
| Azure Personal | `{ type: "azure-personal", speakerProfileId: "..." }` | Personal voice clone |
| OpenAI | `{ type: "openai", name: "..." }` | `"alloy"`, `"echo"`, `"shimmer"` |

## Supported Models

| Model | Description | Use Case |
|-------|-------------|----------|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio | High-quality conversational AI |
| `gpt-4o-mini-realtime-preview` | Lightweight GPT-4o | Fast, efficient interactions |
| `phi4-mm-realtime` | Phi multimodal | Cost-effective applications |

## Turn Detection Options

```typescript
// Server VAD (default)
turnDetection: {
  type: "server_vad",
  threshold: 0.5,
  prefixPaddingMs: 300,
  silenceDurationMs: 500,
}

// Azure Semantic VAD (smarter detection)
turnDetection: {
  type: "azure_semantic_vad",
}

// Azure Semantic VAD (English optimized)
turnDetection: {
  type: "azure_semantic_vad_en",
}

// Azure Semantic VAD (Multilingual)
turnDetection: {
  type: "azure_semantic_vad_multilingual",
}
```

## Audio Formats

| Format | Sample Rate | Use Case |
|--------|-------------|----------|
| `pcm16` | 24kHz | Default, high quality |
| `pcm16-8000hz` | 8kHz | Telephony |
| `pcm16-16000hz` | 16kHz | Voice assistants |
| `g711_ulaw` | 8kHz | Telephony (US) |
| `g711_alaw` | 8kHz | Telephony (EU) |

## Key Types Reference

| Type | Purpose |
|------|---------|
| `VoiceLiveClient` | Main client for creating sessions |
| `VoiceLiveSession` | Active WebSocket session |
| `VoiceLiveSessionHandlers` | Event handler interface |
| `VoiceLiveSubscription` | Active event subscription |
| `ConnectionContext` | Context for connection events |
| `SessionContext` | Context for session events |
| `ServerEventUnion` | Union of all server events |

## Error Handling

```typescript
import {
  VoiceLiveError,
  VoiceLiveConnectionError,
  VoiceLiveAuthenticationError,
  VoiceLiveProtocolError,
} from "@azure/ai-voicelive";

const subscription = session.subscribe({
  onError: async (args, context) => {
    const { error } = args;
    
    if (error instanceof VoiceLiveConnectionError) {
      console.error("Connection error:", error.message);
    } else if (error instanceof VoiceLiveAuthenticationError) {
      console.error("Auth error:", error.message);
    } else if (error instanceof VoiceLiveProtocolError) {
      console.error("Protocol error:", error.message);
    }
  },
  
  onServerError: async (event, context) => {
    console.error("Server error:", event.error?.message);
  },
});
```

## Logging

```typescript
import { setLogLevel } from "@azure/logger";

// Enable verbose logging
setLogLevel("info");

// Or via environment variable
// AZURE_LOG_LEVEL=info
```

## Browser Usage

```typescript
// Browser requires bundler (Vite, webpack, etc.)
import { VoiceLiveClient } from "@azure/ai-voicelive";
import { InteractiveBrowserCredential } from "@azure/identity";

// Use browser-compatible credential
const credential = new InteractiveBrowserCredential({
  clientId: "your-client-id",
  tenantId: "your-tenant-id",
});

const client = new VoiceLiveClient(endpoint, credential);

// Request microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 24000 });

// Process audio and send to session
// ... (see samples for full implementation)
```

## Best Practices

1. **Always use `DefaultAzureCredential`** — Never hardcode API keys
2. **Set both modalities** — Include `["text", "audio"]` for voice assistants
3. **Use Azure Semantic VAD** — Better turn detection than basic server VAD
4. **Handle all error types** — Connection, auth, and protocol errors
5. **Clean up subscriptions** — Call `subscription.close()` when done
6. **Use appropriate audio format** — PCM16 at 24kHz for best quality

## Reference Links

| Resource | URL |
|----------|-----|
| npm Package | https://www.npmjs.com/package/@azure/ai-voicelive |
| GitHub Source | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive |
| Samples | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive/samples |
| API Reference | https://learn.microsoft.com/javascript/api/@azure/ai-voicelive |

## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

Related Skills

azure-ai-voicelive-py

31392
from sickn33/antigravity-awesome-skills

Build real-time voice AI applications with bidirectional WebSocket communication.

Voice AI & Speech ProcessingClaudeChatGPTGemini

azure-ai-voicelive-java

31392
from sickn33/antigravity-awesome-skills

Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.

Voice AI & Speech ProcessingClaudeGitHub CopilotAider

azure-ai-voicelive-dotnet

31392
from sickn33/antigravity-awesome-skills

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.

Voice AI & Speech ProcessingClaudeChatGPTGemini

microsoft-azure-webjobs-extensions-authentication-events-dotnet

31392
from sickn33/antigravity-awesome-skills

Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.

Identity Management / Authentication & AuthorizationClaude

azure-web-pubsub-ts

31392
from sickn33/antigravity-awesome-skills

Real-time messaging with WebSocket connections and pub/sub patterns.

Messaging & CommunicationClaude

azure-storage-queue-ts

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.

Cloud IntegrationClaude

azure-storage-queue-py

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.

Cloud IntegrationClaude

azure-storage-file-share-ts

31392
from sickn33/antigravity-awesome-skills

Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.

Cloud Storage ManagementClaude

azure-storage-file-share-py

31392
from sickn33/antigravity-awesome-skills

Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.

Cloud Storage ManagementClaude

azure-storage-file-datalake-py

31392
from sickn33/antigravity-awesome-skills

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.

Cloud Storage ManagementClaude

azure-storage-blob-ts

31392
from sickn33/antigravity-awesome-skills

Azure Blob Storage JavaScript/TypeScript SDK (@azure/storage-blob) for blob operations. Use for uploading, downloading, listing, and managing blobs and containers.

Cloud Storage ManagementClaude

azure-storage-blob-rust

31392
from sickn33/antigravity-awesome-skills

Azure Blob Storage SDK for Rust. Use for uploading, downloading, and managing blobs and containers.

Cloud Storage ManagementClaude