azure-ai-voicelive-ts
Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.
About this skill
This skill leverages the Azure AI Voice Live SDK, providing JavaScript and TypeScript capabilities for building advanced real-time voice AI applications. It establishes bidirectional WebSocket communication with Azure AI services, allowing for dynamic interactions essential for voice assistants, live transcription, and command processing. Agents can utilize this SDK to generate code and integrate real-time voice capabilities into Node.js or modern browser environments. This allows applications to leverage Azure's sophisticated speech-to-text, text-to-speech, and language understanding services. As part of the 'antigravity-awesome-skills' collection, it aims to extend AI agents' abilities to interact with external services, specifically focusing on real-time voice processing and generation.
Best use case
Developing or integrating real-time voice assistants, building applications for live transcription, enabling voice-controlled interfaces, or creating interactive spoken language experiences powered by Azure AI.
Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.
Successful generation and execution of JavaScript/TypeScript code that establishes a real-time, bidirectional voice connection with Azure AI services. This enables features like live speech recognition, voice synthesis, and command processing within a designated application. The agent should be able to integrate and utilize Azure's voice capabilities effectively.
Practical example
Example input
Instruction to agent: 'Write a Node.js script that connects to Azure AI Voice Live, continuously transcribes incoming audio from a microphone, and outputs the text in real-time to the console. Also, ensure it can speak back a simple 'Hello from Azure!' greeting.'
Example output
The agent generates and presents a working JavaScript/TypeScript code snippet using `@azure/ai-voicelive` for Node.js. It includes instructions for setting up `AZURE_VOICELIVE_ENDPOINT` and handling authentication. The code would demonstrate connecting to the Azure service, sending simulated or actual audio, receiving real-time transcription events, and playing back synthesized speech.
```typescript
// Example code snippet generated by agent
import { VoiceLiveClient, AudioConfig } from "@azure/ai-voicelive";
import { SpeechConfig, SpeechSynthesizer } from "microsoft-cognitiveservices-speech-sdk";
// Assuming environment variables are set:
// AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
// AZURE_SPEECH_KEY=YOUR_SPEECH_KEY
// AZURE_SPEECH_REGION=YOUR_SPEECH_REGION
const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT;
const speechKey = process.env.AZURE_SPEECH_KEY;
const speechRegion = process.env.AZURE_SPEECH_REGION;
async function runVoiceLiveDemo() {
if (!endpoint || !speechKey || !speechRegion) {
console.error("Please set AZURE_VOICELIVE_ENDPOINT, AZURE_SPEECH_KEY, and AZURE_SPEECH_REGION environment variables.");
return;
}
const audioConfig = AudioConfig.fromDefaultMicrophoneInput(); // Or from a stream
const client = new VoiceLiveClient(endpoint, { audioConfig });
client.on("connectionStatusChanged", (status) => {
console.log(`Connection status: ${status.status}`);
});
client.on("speechRecognized", (event) => {
if (event.result.text) {
console.log(`Recognized: "${event.result.text}"`);
if (event.result.text.toLowerCase().includes("hello azure")) {
sayHelloFromAzure();
}
}
});
client.on("error", (error) => {
console.error("Voice Live Error:", error);
});
await client.startLiveRecognition();
console.log("Connected to Azure AI Voice Live. Start speaking...");
function sayHelloFromAzure() {
const speechConfig = SpeechConfig.fromSubscription(speechKey!, speechRegion!);
const synthesizer = new SpeechSynthesizer(speechConfig);
synthesizer.speakTextAsync("Hello from Azure! How can I assist you?",
result => {
if (result.reason === 1) { // 1 is ResultReason.SynthesizingAudioCompleted
console.log("Spoke: Hello from Azure!");
} else {
console.error("Speech synthesis canceled, " + result.errorDetails);
}
synthesizer.close();
}, error => {
console.error("Speech synthesis error:", error);
synthesizer.close();
});
}
// Keep the process alive, or stop based on application logic
// For demonstration, let's stop after 60 seconds
setTimeout(async () => {
await client.stopLiveRecognition();
console.log("Live recognition stopped.");
}, 60000);
}
runVoiceLiveDemo();
```When to use this skill
- When an AI agent needs to process or generate audio in real-time, establish a continuous voice interaction with a user, or integrate Azure AI's advanced speech capabilities into an application or service. Ideal for scenarios requiring low-latency voice communication and dynamic responses within JavaScript/TypeScript environments.
When not to use this skill
- Not suitable for offline voice processing, simple one-shot speech-to-text tasks where real-time bidirectional communication is not required, or when an agent requires a pre-packaged API call rather than writing code using an SDK. Not applicable for non-voice related tasks or when Azure AI is not the desired speech provider.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/azure-ai-voicelive-ts/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How azure-ai-voicelive-ts Compares
| Feature / Agent | azure-ai-voicelive-ts | Standard Approach |
|---|---|---|
| Platform Support | Claude, ChatGPT, Gemini, Cursor, GitHub Copilot, DeepSeek, Aider, Continue | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.
Which AI agents support this skill?
This skill is designed for Claude, ChatGPT, Gemini, Cursor, GitHub Copilot, DeepSeek, Aider, Continue.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
SKILL.md Source
# @azure/ai-voicelive (JavaScript/TypeScript)
Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.
## Installation
```bash
npm install @azure/ai-voicelive @azure/identity
# TypeScript users
npm install @types/node
```
**Current Version**: 1.0.0-beta.3
**Supported Environments**:
- Node.js LTS versions (20+)
- Modern browsers (Chrome, Firefox, Safari, Edge)
## Environment Variables
```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
# Optional: Logging
AZURE_LOG_LEVEL=info
```
## Authentication
### Microsoft Entra ID (Recommended)
```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";
const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
```
### API Key
```typescript
import { AzureKeyCredential } from "@azure/core-auth";
import { VoiceLiveClient } from "@azure/ai-voicelive";
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("your-api-key");
const client = new VoiceLiveClient(endpoint, credential);
```
## Client Hierarchy
```
VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
├── updateSession() → Configure session options
├── subscribe() → Event handlers (Azure SDK pattern)
├── sendAudio() → Stream audio input
├── addConversationItem() → Add messages/function outputs
└── sendEvent() → Send raw protocol events
```
## Quick Start
```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";
const credential = new DefaultAzureCredential();
const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT!;
// Create client and start session
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");
// Configure session
await session.updateSession({
modalities: ["text", "audio"],
instructions: "You are a helpful AI assistant. Respond naturally.",
voice: {
type: "azure-standard",
name: "en-US-AvaNeural",
},
turnDetection: {
type: "server_vad",
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 500,
},
inputAudioFormat: "pcm16",
outputAudioFormat: "pcm16",
});
// Subscribe to events
const subscription = session.subscribe({
onResponseAudioDelta: async (event, context) => {
// Handle streaming audio output
const audioData = event.delta;
playAudioChunk(audioData);
},
onResponseTextDelta: async (event, context) => {
// Handle streaming text
process.stdout.write(event.delta);
},
onInputAudioTranscriptionCompleted: async (event, context) => {
console.log("User said:", event.transcript);
},
});
// Send audio from microphone
function sendAudioChunk(audioBuffer: ArrayBuffer) {
session.sendAudio(audioBuffer);
}
```
## Session Configuration
```typescript
await session.updateSession({
// Modalities
modalities: ["audio", "text"],
// System instructions
instructions: "You are a customer service representative.",
// Voice selection
voice: {
type: "azure-standard", // or "azure-custom", "openai"
name: "en-US-AvaNeural",
},
// Turn detection (VAD)
turnDetection: {
type: "server_vad", // or "azure_semantic_vad"
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 500,
},
// Audio formats
inputAudioFormat: "pcm16",
outputAudioFormat: "pcm16",
// Tools (function calling)
tools: [
{
type: "function",
name: "get_weather",
description: "Get current weather",
parameters: {
type: "object",
properties: {
location: { type: "string" }
},
required: ["location"]
}
}
],
toolChoice: "auto",
});
```
## Event Handling (Azure SDK Pattern)
The SDK uses a subscription-based event handling pattern:
```typescript
const subscription = session.subscribe({
// Connection lifecycle
onConnected: async (args, context) => {
console.log("Connected:", args.connectionId);
},
onDisconnected: async (args, context) => {
console.log("Disconnected:", args.code, args.reason);
},
onError: async (args, context) => {
console.error("Error:", args.error.message);
},
// Session events
onSessionCreated: async (event, context) => {
console.log("Session created:", context.sessionId);
},
onSessionUpdated: async (event, context) => {
console.log("Session updated");
},
// Audio input events (VAD)
onInputAudioBufferSpeechStarted: async (event, context) => {
console.log("Speech started at:", event.audioStartMs);
},
onInputAudioBufferSpeechStopped: async (event, context) => {
console.log("Speech stopped at:", event.audioEndMs);
},
// Transcription events
onConversationItemInputAudioTranscriptionCompleted: async (event, context) => {
console.log("User said:", event.transcript);
},
onConversationItemInputAudioTranscriptionDelta: async (event, context) => {
process.stdout.write(event.delta);
},
// Response events
onResponseCreated: async (event, context) => {
console.log("Response started");
},
onResponseDone: async (event, context) => {
console.log("Response complete");
},
// Streaming text
onResponseTextDelta: async (event, context) => {
process.stdout.write(event.delta);
},
onResponseTextDone: async (event, context) => {
console.log("\n--- Text complete ---");
},
// Streaming audio
onResponseAudioDelta: async (event, context) => {
const audioData = event.delta;
playAudioChunk(audioData);
},
onResponseAudioDone: async (event, context) => {
console.log("Audio complete");
},
// Audio transcript (what assistant said)
onResponseAudioTranscriptDelta: async (event, context) => {
process.stdout.write(event.delta);
},
// Function calling
onResponseFunctionCallArgumentsDone: async (event, context) => {
if (event.name === "get_weather") {
const args = JSON.parse(event.arguments);
const result = await getWeather(args.location);
await session.addConversationItem({
type: "function_call_output",
callId: event.callId,
output: JSON.stringify(result),
});
await session.sendEvent({ type: "response.create" });
}
},
// Catch-all for debugging
onServerEvent: async (event, context) => {
console.log("Event:", event.type);
},
});
// Clean up when done
await subscription.close();
```
## Function Calling
```typescript
// Define tools in session config
await session.updateSession({
modalities: ["audio", "text"],
instructions: "Help users with weather information.",
tools: [
{
type: "function",
name: "get_weather",
description: "Get current weather for a location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "City and state or country",
},
},
required: ["location"],
},
},
],
toolChoice: "auto",
});
// Handle function calls
const subscription = session.subscribe({
onResponseFunctionCallArgumentsDone: async (event, context) => {
if (event.name === "get_weather") {
const args = JSON.parse(event.arguments);
const weatherData = await fetchWeather(args.location);
// Send function result
await session.addConversationItem({
type: "function_call_output",
callId: event.callId,
output: JSON.stringify(weatherData),
});
// Trigger response generation
await session.sendEvent({ type: "response.create" });
}
},
});
```
## Voice Options
| Voice Type | Config | Example |
|------------|--------|---------|
| Azure Standard | `{ type: "azure-standard", name: "..." }` | `"en-US-AvaNeural"` |
| Azure Custom | `{ type: "azure-custom", name: "...", endpointId: "..." }` | Custom voice endpoint |
| Azure Personal | `{ type: "azure-personal", speakerProfileId: "..." }` | Personal voice clone |
| OpenAI | `{ type: "openai", name: "..." }` | `"alloy"`, `"echo"`, `"shimmer"` |
## Supported Models
| Model | Description | Use Case |
|-------|-------------|----------|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio | High-quality conversational AI |
| `gpt-4o-mini-realtime-preview` | Lightweight GPT-4o | Fast, efficient interactions |
| `phi4-mm-realtime` | Phi multimodal | Cost-effective applications |
## Turn Detection Options
```typescript
// Server VAD (default)
turnDetection: {
type: "server_vad",
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 500,
}
// Azure Semantic VAD (smarter detection)
turnDetection: {
type: "azure_semantic_vad",
}
// Azure Semantic VAD (English optimized)
turnDetection: {
type: "azure_semantic_vad_en",
}
// Azure Semantic VAD (Multilingual)
turnDetection: {
type: "azure_semantic_vad_multilingual",
}
```
## Audio Formats
| Format | Sample Rate | Use Case |
|--------|-------------|----------|
| `pcm16` | 24kHz | Default, high quality |
| `pcm16-8000hz` | 8kHz | Telephony |
| `pcm16-16000hz` | 16kHz | Voice assistants |
| `g711_ulaw` | 8kHz | Telephony (US) |
| `g711_alaw` | 8kHz | Telephony (EU) |
## Key Types Reference
| Type | Purpose |
|------|---------|
| `VoiceLiveClient` | Main client for creating sessions |
| `VoiceLiveSession` | Active WebSocket session |
| `VoiceLiveSessionHandlers` | Event handler interface |
| `VoiceLiveSubscription` | Active event subscription |
| `ConnectionContext` | Context for connection events |
| `SessionContext` | Context for session events |
| `ServerEventUnion` | Union of all server events |
## Error Handling
```typescript
import {
VoiceLiveError,
VoiceLiveConnectionError,
VoiceLiveAuthenticationError,
VoiceLiveProtocolError,
} from "@azure/ai-voicelive";
const subscription = session.subscribe({
onError: async (args, context) => {
const { error } = args;
if (error instanceof VoiceLiveConnectionError) {
console.error("Connection error:", error.message);
} else if (error instanceof VoiceLiveAuthenticationError) {
console.error("Auth error:", error.message);
} else if (error instanceof VoiceLiveProtocolError) {
console.error("Protocol error:", error.message);
}
},
onServerError: async (event, context) => {
console.error("Server error:", event.error?.message);
},
});
```
## Logging
```typescript
import { setLogLevel } from "@azure/logger";
// Enable verbose logging
setLogLevel("info");
// Or via environment variable
// AZURE_LOG_LEVEL=info
```
## Browser Usage
```typescript
// Browser requires bundler (Vite, webpack, etc.)
import { VoiceLiveClient } from "@azure/ai-voicelive";
import { InteractiveBrowserCredential } from "@azure/identity";
// Use browser-compatible credential
const credential = new InteractiveBrowserCredential({
clientId: "your-client-id",
tenantId: "your-tenant-id",
});
const client = new VoiceLiveClient(endpoint, credential);
// Request microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 24000 });
// Process audio and send to session
// ... (see samples for full implementation)
```
## Best Practices
1. **Always use `DefaultAzureCredential`** — Never hardcode API keys
2. **Set both modalities** — Include `["text", "audio"]` for voice assistants
3. **Use Azure Semantic VAD** — Better turn detection than basic server VAD
4. **Handle all error types** — Connection, auth, and protocol errors
5. **Clean up subscriptions** — Call `subscription.close()` when done
6. **Use appropriate audio format** — PCM16 at 24kHz for best quality
## Reference Links
| Resource | URL |
|----------|-----|
| npm Package | https://www.npmjs.com/package/@azure/ai-voicelive |
| GitHub Source | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive |
| Samples | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive/samples |
| API Reference | https://learn.microsoft.com/javascript/api/@azure/ai-voicelive |
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.Related Skills
azure-ai-voicelive-py
Build real-time voice AI applications with bidirectional WebSocket communication.
azure-ai-voicelive-java
Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.
azure-ai-voicelive-dotnet
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.
microsoft-azure-webjobs-extensions-authentication-events-dotnet
Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.
azure-web-pubsub-ts
Real-time messaging with WebSocket connections and pub/sub patterns.
azure-storage-queue-ts
Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.
azure-storage-queue-py
Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.
azure-storage-file-share-ts
Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.
azure-storage-file-share-py
Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.
azure-storage-file-datalake-py
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
azure-storage-blob-ts
Azure Blob Storage JavaScript/TypeScript SDK (@azure/storage-blob) for blob operations. Use for uploading, downloading, listing, and managing blobs and containers.
azure-storage-blob-rust
Azure Blob Storage SDK for Rust. Use for uploading, downloading, and managing blobs and containers.