azure-ai-voicelive-dotnet
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.
About this skill
This skill provides AI agents with the foundational knowledge and installation instructions for integrating the Azure AI Voice Live SDK into .NET applications. It enables agents to construct sophisticated real-time voice AI solutions that utilize bidirectional WebSocket communication for natural, low-latency interactions. Agents can learn to set up the necessary NuGet packages (`Azure.AI.VoiceLive`, `Azure.Identity`, `NAudio` for audio handling), configure Azure service endpoints, specify AI models (e.g., `gpt-4o-realtime-preview`), and select voices (e.g., `en-US-AvaNeural`). This facilitates the creation of highly responsive voice assistants, interactive agents, and other voice-enabled applications within the .NET ecosystem, particularly for AI agents capable of code generation and execution.
Best use case
Developing custom real-time voice assistants that require low-latency, bidirectional audio streaming. Building interactive voice-enabled applications with dynamic conversational flows in a .NET environment. Creating .NET-based prototypes or production systems that integrate Azure AI's advanced voice capabilities for transcription, synthesis, and interaction. Enabling agents to generate and integrate code snippets for real-time voice processing within a development workflow.
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.
A functional .NET application or a successfully integrated code module that utilizes Azure AI Voice Live for real-time bidirectional voice communication, capable of understanding spoken input and generating natural-sounding speech output. This could range from a proof-of-concept voice assistant to a component within a larger system.
Practical example
Example input
Develop a .NET console application that uses Azure AI Voice Live to create a real-time English-speaking voice assistant. It should listen for user input and respond using the 'en-US-AvaNeural' voice, connecting to a specified Azure AI endpoint and using 'gpt-4o-realtime-preview' for language understanding. Provide the necessary `dotnet` commands for setup and a basic C# code structure.
Example output
{"status": "success", "message": "Generated .NET project structure and setup instructions for a real-time voice assistant.", "project_name": "VoiceAssistantApp", "setup_commands": ["mkdir VoiceAssistantApp", "cd VoiceAssistantApp", "dotnet new console", "dotnet add package Azure.AI.VoiceLive", "dotnet add package Azure.Identity", "dotnet add package NAudio"], "environment_variables_to_set": {"AZURE_VOICELIVE_ENDPOINT": "https://<your_resource>.services.ai.azure.com/", "AZURE_VOICELIVE_MODEL": "gpt-4o-realtime-preview", "AZURE_VOICELIVE_VOICE": "en-US-AvaNeural"}, "sample_code_snippet_description": "Basic C# code demonstrating real-time voice capture, Azure AI Voice Live client initialization, and bidirectional communication. (Actual C# code would be provided here.)"}When to use this skill
- When the AI agent is tasked with building or integrating a .NET application that requires sophisticated, real-time voice AI features.
- When an agent needs to leverage Azure AI services for high-quality voice input and output with minimal latency.
- For projects requiring custom voice models or specific language/voice configurations that Azure AI Voice Live supports.
- When the AI agent operates within a development environment capable of executing .NET commands and managing NuGet packages.
When not to use this skill
- When the AI agent needs to perform a simple, one-off voice recognition or text-to-speech task without building a full application.
- When the target environment is not .NET or cannot accommodate installing SDKs and compiling code.
- When the task requires an immediate, pre-built voice API call rather than the development of a new application.
- For agents that do not have code generation or execution capabilities in a .NET context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/azure-ai-voicelive-dotnet/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How azure-ai-voicelive-dotnet Compares
| Feature / Agent | azure-ai-voicelive-dotnet | Standard Approach |
|---|---|---|
| Platform Support | Claude, ChatGPT, Gemini, GitHub Copilot, Cursor, DeepSeek, Windsurf, Cline, Roo Code, OpenCode, Aider, Continue | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.
Which AI agents support this skill?
This skill is designed for Claude, ChatGPT, Gemini, GitHub Copilot, Cursor, DeepSeek, Windsurf, Cline, Roo Code, OpenCode, Aider, Continue.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# Azure.AI.VoiceLive (.NET)
Real-time voice AI SDK for building bidirectional voice assistants with Azure AI.
## Installation
```bash
dotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio # For audio capture/playback
```
**Current Versions**: Stable v1.0.0, Preview v1.1.0-beta.1
## Environment Variables
```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeural
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
```
## Authentication
### Microsoft Entra ID (Recommended)
```csharp
using Azure.Identity;
using Azure.AI.VoiceLive;
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
```
**Required Role**: `Cognitive Services User` (assign in Azure Portal → Access control)
### API Key
```csharp
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
```
## Client Hierarchy
```
VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
├── ConfigureSessionAsync()
├── GetUpdatesAsync() → SessionUpdate events
├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
├── SendAudioAsync()
└── StartResponseAsync()
```
## Core Workflow
### 1. Start Session and Configure
```csharp
using Azure.Identity;
using Azure.AI.VoiceLive;
var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());
var model = "gpt-4o-mini-realtime-preview";
// Start session
using VoiceLiveSession session = await client.StartSessionAsync(model);
// Configure session
VoiceLiveSessionOptions sessionOptions = new()
{
Model = model,
Instructions = "You are a helpful AI assistant. Respond naturally.",
Voice = new AzureStandardVoice("en-US-AvaNeural"),
TurnDetection = new AzureSemanticVadTurnDetection()
{
Threshold = 0.5f,
PrefixPadding = TimeSpan.FromMilliseconds(300),
SilenceDuration = TimeSpan.FromMilliseconds(500)
},
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16
};
// Set modalities (both text and audio for voice assistants)
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);
await session.ConfigureSessionAsync(sessionOptions);
```
### 2. Process Events
```csharp
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
switch (serverEvent)
{
case SessionUpdateResponseAudioDelta audioDelta:
byte[] audioData = audioDelta.Delta.ToArray();
// Play audio via NAudio or other audio library
break;
case SessionUpdateResponseTextDelta textDelta:
Console.Write(textDelta.Delta);
break;
case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
// Handle function call (see Function Calling section)
break;
case SessionUpdateError error:
Console.WriteLine($"Error: {error.Error.Message}");
break;
case SessionUpdateResponseDone:
Console.WriteLine("\n--- Response complete ---");
break;
}
}
```
### 3. Send User Message
```csharp
await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();
```
### 4. Function Calling
```csharp
// Define function
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
Description = "Get the current weather for a given location",
Parameters = BinaryData.FromString("""
{
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state or country"
}
},
"required": ["location"]
}
""")
};
// Add to session options
sessionOptions.Tools.Add(weatherFunction);
// Handle function call in event loop
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
if (functionCall.Name == "get_current_weather")
{
var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
string location = parameters?["location"] ?? "";
// Call external service
string weatherInfo = $"The weather in {location} is sunny, 75°F.";
// Send response
await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
await session.StartResponseAsync();
}
}
```
## Voice Options
| Voice Type | Class | Example |
|------------|-------|---------|
| Azure Standard | `AzureStandardVoice` | `"en-US-AvaNeural"` |
| Azure HD | `AzureStandardVoice` | `"en-US-Ava:DragonHDLatestNeural"` |
| Azure Custom | `AzureCustomVoice` | Custom voice with endpoint ID |
## Supported Models
| Model | Description |
|-------|-------------|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio |
| `gpt-4o-mini-realtime-preview` | Lightweight, fast interactions |
| `phi4-mm-realtime` | Cost-effective multimodal |
## Key Types Reference
| Type | Purpose |
|------|---------|
| `VoiceLiveClient` | Main client for creating sessions |
| `VoiceLiveSession` | Active WebSocket session |
| `VoiceLiveSessionOptions` | Session configuration |
| `AzureStandardVoice` | Standard Azure voice provider |
| `AzureSemanticVadTurnDetection` | Voice activity detection |
| `VoiceLiveFunctionDefinition` | Function tool definition |
| `UserMessageItem` | User text message |
| `FunctionCallOutputItem` | Function call response |
| `SessionUpdateResponseAudioDelta` | Audio chunk event |
| `SessionUpdateResponseTextDelta` | Text chunk event |
## Best Practices
1. **Always set both modalities** — Include `Text` and `Audio` for voice assistants
2. **Use `AzureSemanticVadTurnDetection`** — Provides natural conversation flow
3. **Configure appropriate silence duration** — 500ms typical to avoid premature cutoffs
4. **Use `using` statement** — Ensures proper session disposal
5. **Handle all event types** — Check for errors, audio, text, and function calls
6. **Use DefaultAzureCredential** — Never hardcode API keys
## Error Handling
```csharp
if (serverEvent is SessionUpdateError error)
{
if (error.Error.Message.Contains("Cancellation failed: no active response"))
{
// Benign error, can ignore
}
else
{
Console.WriteLine($"Error: {error.Error.Message}");
}
}
```
## Audio Configuration
- **Input Format**: `InputAudioFormat.Pcm16` (16-bit PCM)
- **Output Format**: `OutputAudioFormat.Pcm16`
- **Sample Rate**: 24kHz recommended
- **Channels**: Mono
## Related SDKs
| SDK | Purpose | Install |
|-----|---------|---------|
| `Azure.AI.VoiceLive` | Real-time voice (this SDK) | `dotnet add package Azure.AI.VoiceLive` |
| `Microsoft.CognitiveServices.Speech` | Speech-to-text, text-to-speech | `dotnet add package Microsoft.CognitiveServices.Speech` |
| `NAudio` | Audio capture/playback | `dotnet add package NAudio` |
## Reference Links
| Resource | URL |
|----------|-----|
| NuGet Package | https://www.nuget.org/packages/Azure.AI.VoiceLive |
| API Reference | https://learn.microsoft.com/dotnet/api/azure.ai.voicelive |
| GitHub Source | https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.VoiceLive |
| Quickstart | https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-quickstart |
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.Related Skills
azure-ai-voicelive-ts
Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.
azure-ai-voicelive-py
Build real-time voice AI applications with bidirectional WebSocket communication.
azure-ai-voicelive-java
Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.
microsoft-azure-webjobs-extensions-authentication-events-dotnet
Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.
dotnet-backend
Build ASP.NET Core 8+ backend services with EF Core, auth, background jobs, and production API patterns.
dotnet-backend-patterns
Master C#/.NET patterns for building production-grade APIs, MCP servers, and enterprise backends with modern best practices (2024/2025).
dotnet-architect
Expert .NET backend architect specializing in C#, ASP.NET Core, Entity Framework, Dapper, and enterprise application patterns.
azure-web-pubsub-ts
Real-time messaging with WebSocket connections and pub/sub patterns.
azure-storage-queue-ts
Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.
azure-storage-queue-py
Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.
azure-storage-file-share-ts
Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.
azure-storage-file-share-py
Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.