azure-ai-voicelive-ts

Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots in Node.js or browser environments. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant TypeScript", "bidirectional audio", "speech-to-speech JavaScript".

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

azure-ai-voicelive-ts is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots in Node.js or browser environments. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant TypeScript", "bidirectional audio", "speech-to-speech JavaScript".

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "azure-ai-voicelive-ts" skill to help with this workflow task. Context: Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots in Node.js or browser environments. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant TypeScript", "bidirectional audio", "speech-to-speech JavaScript".

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/azure-ai-voicelive-ts/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/sickn33/azure-ai-voicelive-ts/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/azure-ai-voicelive-ts/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How azure-ai-voicelive-ts Compares

Feature / Agent	azure-ai-voicelive-ts	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# @azure/ai-voicelive (JavaScript/TypeScript)

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.

## Installation

```bash
npm install @azure/ai-voicelive @azure/identity
# TypeScript users
npm install @types/node
```

**Current Version**: 1.0.0-beta.3

**Supported Environments**:
- Node.js LTS versions (20+)
- Modern browsers (Chrome, Firefox, Safari, Edge)

## Environment Variables

```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
# Optional: Logging
AZURE_LOG_LEVEL=info
```

## Authentication

### Microsoft Entra ID (Recommended)

```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";

const client = new VoiceLiveClient(endpoint, credential);
```

### API Key

```typescript
import { AzureKeyCredential } from "@azure/core-auth";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const endpoint = "https://your-resource.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("your-api-key");

const client = new VoiceLiveClient(endpoint, credential);
```

## Client Hierarchy

```
VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
    ├── updateSession()      → Configure session options
    ├── subscribe()          → Event handlers (Azure SDK pattern)
    ├── sendAudio()          → Stream audio input
    ├── addConversationItem() → Add messages/function outputs
    └── sendEvent()          → Send raw protocol events
```

## Quick Start

```typescript
import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT!;

// Create client and start session
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Configure session
await session.updateSession({
  modalities: ["text", "audio"],
  instructions: "You are a helpful AI assistant. Respond naturally.",
  voice: {
    type: "azure-standard",
    name: "en-US-AvaNeural",
  },
  turnDetection: {
    type: "server_vad",
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
});

// Subscribe to events
const subscription = session.subscribe({
  onResponseAudioDelta: async (event, context) => {
    // Handle streaming audio output
    const audioData = event.delta;
    playAudioChunk(audioData);
  },
  onResponseTextDelta: async (event, context) => {
    // Handle streaming text
    process.stdout.write(event.delta);
  },
  onInputAudioTranscriptionCompleted: async (event, context) => {
    console.log("User said:", event.transcript);
  },
});

// Send audio from microphone
function sendAudioChunk(audioBuffer: ArrayBuffer) {
  session.sendAudio(audioBuffer);
}
```

## Session Configuration

```typescript
await session.updateSession({
  // Modalities
  modalities: ["audio", "text"],
  
  // System instructions
  instructions: "You are a customer service representative.",
  
  // Voice selection
  voice: {
    type: "azure-standard",  // or "azure-custom", "openai"
    name: "en-US-AvaNeural",
  },
  
  // Turn detection (VAD)
  turnDetection: {
    type: "server_vad",      // or "azure_semantic_vad"
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  
  // Audio formats
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
  
  // Tools (function calling)
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string" }
        },
        required: ["location"]
      }
    }
  ],
  toolChoice: "auto",
});
```

## Event Handling (Azure SDK Pattern)

The SDK uses a subscription-based event handling pattern:

```typescript
const subscription = session.subscribe({
  // Connection lifecycle
  onConnected: async (args, context) => {
    console.log("Connected:", args.connectionId);
  },
  onDisconnected: async (args, context) => {
    console.log("Disconnected:", args.code, args.reason);
  },
  onError: async (args, context) => {
    console.error("Error:", args.error.message);
  },
  
  // Session events
  onSessionCreated: async (event, context) => {
    console.log("Session created:", context.sessionId);
  },
  onSessionUpdated: async (event, context) => {
    console.log("Session updated");
  },
  
  // Audio input events (VAD)
  onInputAudioBufferSpeechStarted: async (event, context) => {
    console.log("Speech started at:", event.audioStartMs);
  },
  onInputAudioBufferSpeechStopped: async (event, context) => {
    console.log("Speech stopped at:", event.audioEndMs);
  },
  
  // Transcription events
  onConversationItemInputAudioTranscriptionCompleted: async (event, context) => {
    console.log("User said:", event.transcript);
  },
  onConversationItemInputAudioTranscriptionDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  
  // Response events
  onResponseCreated: async (event, context) => {
    console.log("Response started");
  },
  onResponseDone: async (event, context) => {
    console.log("Response complete");
  },
  
  // Streaming text
  onResponseTextDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  onResponseTextDone: async (event, context) => {
    console.log("\n--- Text complete ---");
  },
  
  // Streaming audio
  onResponseAudioDelta: async (event, context) => {
    const audioData = event.delta;
    playAudioChunk(audioData);
  },
  onResponseAudioDone: async (event, context) => {
    console.log("Audio complete");
  },
  
  // Audio transcript (what assistant said)
  onResponseAudioTranscriptDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  
  // Function calling
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const result = await getWeather(args.location);
      
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(result),
      });
      
      await session.sendEvent({ type: "response.create" });
    }
  },
  
  // Catch-all for debugging
  onServerEvent: async (event, context) => {
    console.log("Event:", event.type);
  },
});

// Clean up when done
await subscription.close();
```

## Function Calling

```typescript
// Define tools in session config
await session.updateSession({
  modalities: ["audio", "text"],
  instructions: "Help users with weather information.",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City and state or country",
          },
        },
        required: ["location"],
      },
    },
  ],
  toolChoice: "auto",
});

// Handle function calls
const subscription = session.subscribe({
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const weatherData = await fetchWeather(args.location);
      
      // Send function result
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(weatherData),
      });
      
      // Trigger response generation
      await session.sendEvent({ type: "response.create" });
    }
  },
});
```

## Voice Options

| Voice Type | Config | Example |
|------------|--------|---------|
| Azure Standard | `{ type: "azure-standard", name: "..." }` | `"en-US-AvaNeural"` |
| Azure Custom | `{ type: "azure-custom", name: "...", endpointId: "..." }` | Custom voice endpoint |
| Azure Personal | `{ type: "azure-personal", speakerProfileId: "..." }` | Personal voice clone |
| OpenAI | `{ type: "openai", name: "..." }` | `"alloy"`, `"echo"`, `"shimmer"` |

## Supported Models

| Model | Description | Use Case |
|-------|-------------|----------|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio | High-quality conversational AI |
| `gpt-4o-mini-realtime-preview` | Lightweight GPT-4o | Fast, efficient interactions |
| `phi4-mm-realtime` | Phi multimodal | Cost-effective applications |

## Turn Detection Options

```typescript
// Server VAD (default)
turnDetection: {
  type: "server_vad",
  threshold: 0.5,
  prefixPaddingMs: 300,
  silenceDurationMs: 500,
}

// Azure Semantic VAD (smarter detection)
turnDetection: {
  type: "azure_semantic_vad",
}

// Azure Semantic VAD (English optimized)
turnDetection: {
  type: "azure_semantic_vad_en",
}

// Azure Semantic VAD (Multilingual)
turnDetection: {
  type: "azure_semantic_vad_multilingual",
}
```

## Audio Formats

| Format | Sample Rate | Use Case |
|--------|-------------|----------|
| `pcm16` | 24kHz | Default, high quality |
| `pcm16-8000hz` | 8kHz | Telephony |
| `pcm16-16000hz` | 16kHz | Voice assistants |
| `g711_ulaw` | 8kHz | Telephony (US) |
| `g711_alaw` | 8kHz | Telephony (EU) |

## Key Types Reference

| Type | Purpose |
|------|---------|
| `VoiceLiveClient` | Main client for creating sessions |
| `VoiceLiveSession` | Active WebSocket session |
| `VoiceLiveSessionHandlers` | Event handler interface |
| `VoiceLiveSubscription` | Active event subscription |
| `ConnectionContext` | Context for connection events |
| `SessionContext` | Context for session events |
| `ServerEventUnion` | Union of all server events |

## Error Handling

```typescript
import {
  VoiceLiveError,
  VoiceLiveConnectionError,
  VoiceLiveAuthenticationError,
  VoiceLiveProtocolError,
} from "@azure/ai-voicelive";

const subscription = session.subscribe({
  onError: async (args, context) => {
    const { error } = args;
    
    if (error instanceof VoiceLiveConnectionError) {
      console.error("Connection error:", error.message);
    } else if (error instanceof VoiceLiveAuthenticationError) {
      console.error("Auth error:", error.message);
    } else if (error instanceof VoiceLiveProtocolError) {
      console.error("Protocol error:", error.message);
    }
  },
  
  onServerError: async (event, context) => {
    console.error("Server error:", event.error?.message);
  },
});
```

## Logging

```typescript
import { setLogLevel } from "@azure/logger";

// Enable verbose logging
setLogLevel("info");

// Or via environment variable
// AZURE_LOG_LEVEL=info
```

## Browser Usage

```typescript
// Browser requires bundler (Vite, webpack, etc.)
import { VoiceLiveClient } from "@azure/ai-voicelive";
import { InteractiveBrowserCredential } from "@azure/identity";

// Use browser-compatible credential
const credential = new InteractiveBrowserCredential({
  clientId: "your-client-id",
  tenantId: "your-tenant-id",
});

const client = new VoiceLiveClient(endpoint, credential);

// Request microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 24000 });

// Process audio and send to session
// ... (see samples for full implementation)
```

## Best Practices

1. **Always use `DefaultAzureCredential`** — Never hardcode API keys
2. **Set both modalities** — Include `["text", "audio"]` for voice assistants
3. **Use Azure Semantic VAD** — Better turn detection than basic server VAD
4. **Handle all error types** — Connection, auth, and protocol errors
5. **Clean up subscriptions** — Call `subscription.close()` when done
6. **Use appropriate audio format** — PCM16 at 24kHz for best quality

## Reference Links

| Resource | URL |
|----------|-----|
| npm Package | https://www.npmjs.com/package/@azure/ai-voicelive |
| GitHub Source | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive |
| Samples | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive/samples |
| API Reference | https://learn.microsoft.com/javascript/api/@azure/ai-voicelive |

Related Skills

azure-quotas

242

from aiskillstore/marketplace

Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".

DevOps & Infrastructure

microsoft-azure-webjobs-extensions-authentication-events-dotnet

242

from aiskillstore/marketplace

Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions. Use for token enrichment, custom claims, attribute collection, and OTP customization in Entra ID. Triggers: "Authentication Events", "WebJobsAuthenticationEventsTrigger", "OnTokenIssuanceStart", "OnAttributeCollectionStart", "custom claims", "token enrichment", "Entra custom extension", "authentication extension".

azure-web-pubsub-ts

242

from aiskillstore/marketplace

Build real-time messaging applications using Azure Web PubSub SDKs for JavaScript (@azure/web-pubsub, @azure/web-pubsub-client). Use when implementing WebSocket-based real-time features, pub/sub messaging, group chat, or live notifications.

azure-storage-queue-ts

242

from aiskillstore/marketplace

Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues. Supports visibility timeout, message encoding, and batch operations. Triggers: "queue storage", "@azure/storage-queue", "QueueServiceClient", "QueueClient", "send message", "receive message", "dequeue", "visibility timeout".

azure-storage-queue-py

242

from aiskillstore/marketplace

Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing. Triggers: "queue storage", "QueueServiceClient", "QueueClient", "message queue", "dequeue".

azure-storage-file-share-ts

242

from aiskillstore/marketplace

Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations. Use for creating shares, managing directories, uploading/downloading files, and handling file metadata. Supports Azure Files SMB protocol scenarios. Triggers: "file share", "@azure/storage-file-share", "ShareServiceClient", "ShareClient", "SMB", "Azure Files".

azure-storage-file-share-py

242

from aiskillstore/marketplace

Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud. Triggers: "azure-storage-file-share", "ShareServiceClient", "ShareClient", "file share", "SMB".

azure-storage-file-datalake-py

242

from aiskillstore/marketplace

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations. Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".

azure-storage-blob-ts

242

from aiskillstore/marketplace

Azure Blob Storage JavaScript/TypeScript SDK (@azure/storage-blob) for blob operations. Use for uploading, downloading, listing, and managing blobs and containers. Supports block blobs, append blobs, page blobs, SAS tokens, and streaming. Triggers: "blob storage", "@azure/storage-blob", "BlobServiceClient", "ContainerClient", "upload blob", "download blob", "SAS token", "block blob".

azure-storage-blob-rust

242

from aiskillstore/marketplace

Azure Blob Storage SDK for Rust. Use for uploading, downloading, and managing blobs and containers. Triggers: "blob storage rust", "BlobClient rust", "upload blob rust", "download blob rust", "container rust".

azure-storage-blob-py

242

from aiskillstore/marketplace

Azure Blob Storage SDK for Python. Use for uploading, downloading, listing blobs, managing containers, and blob lifecycle. Triggers: "blob storage", "BlobServiceClient", "ContainerClient", "BlobClient", "upload blob", "download blob".

azure-storage-blob-java

242

from aiskillstore/marketplace

Build blob storage applications with Azure Storage Blob SDK for Java. Use when uploading, downloading, or managing files in Azure Blob Storage, working with containers, or implementing streaming data operations.