azure-ai-voicelive-dotnet

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.

16 stars

Best use case

azure-ai-voicelive-dotnet is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.

Teams using azure-ai-voicelive-dotnet should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/azure-ai-voicelive-dotnet/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/azure-ai-voicelive-dotnet/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/azure-ai-voicelive-dotnet/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How azure-ai-voicelive-dotnet Compares

Feature / Agent	azure-ai-voicelive-dotnet	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Azure.AI.VoiceLive (.NET)

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI.

## Installation

```bash
dotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio                    # For audio capture/playback
```

**Current Versions**: Stable v1.0.0, Preview v1.1.0-beta.1

## Environment Variables

```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeural
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
```

## Authentication

### Microsoft Entra ID (Recommended)

```csharp
using Azure.Identity;
using Azure.AI.VoiceLive;

Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
```

**Required Role**: `Cognitive Services User` (assign in Azure Portal → Access control)

### API Key

```csharp
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
```

## Client Hierarchy

```
VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
    ├── ConfigureSessionAsync()
    ├── GetUpdatesAsync() → SessionUpdate events
    ├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
    ├── SendAudioAsync()
    └── StartResponseAsync()
```

## Core Workflow

### 1. Start Session and Configure

```csharp
using Azure.Identity;
using Azure.AI.VoiceLive;

var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());

var model = "gpt-4o-mini-realtime-preview";

// Start session
using VoiceLiveSession session = await client.StartSessionAsync(model);

// Configure session
VoiceLiveSessionOptions sessionOptions = new()
{
    Model = model,
    Instructions = "You are a helpful AI assistant. Respond naturally.",
    Voice = new AzureStandardVoice("en-US-AvaNeural"),
    TurnDetection = new AzureSemanticVadTurnDetection()
    {
        Threshold = 0.5f,
        PrefixPadding = TimeSpan.FromMilliseconds(300),
        SilenceDuration = TimeSpan.FromMilliseconds(500)
    },
    InputAudioFormat = InputAudioFormat.Pcm16,
    OutputAudioFormat = OutputAudioFormat.Pcm16
};

// Set modalities (both text and audio for voice assistants)
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);

await session.ConfigureSessionAsync(sessionOptions);
```

### 2. Process Events

```csharp
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
    switch (serverEvent)
    {
        case SessionUpdateResponseAudioDelta audioDelta:
            byte[] audioData = audioDelta.Delta.ToArray();
            // Play audio via NAudio or other audio library
            break;
            
        case SessionUpdateResponseTextDelta textDelta:
            Console.Write(textDelta.Delta);
            break;
            
        case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
            // Handle function call (see Function Calling section)
            break;
            
        case SessionUpdateError error:
            Console.WriteLine($"Error: {error.Error.Message}");
            break;
            
        case SessionUpdateResponseDone:
            Console.WriteLine("\n--- Response complete ---");
            break;
    }
}
```

### 3. Send User Message

```csharp
await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();
```

### 4. Function Calling

```csharp
// Define function
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
    Description = "Get the current weather for a given location",
    Parameters = BinaryData.FromString("""
        {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state or country"
                }
            },
            "required": ["location"]
        }
        """)
};

// Add to session options
sessionOptions.Tools.Add(weatherFunction);

// Handle function call in event loop
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
    if (functionCall.Name == "get_current_weather")
    {
        var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
        string location = parameters?["location"] ?? "";
        
        // Call external service
        string weatherInfo = $"The weather in {location} is sunny, 75°F.";
        
        // Send response
        await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
        await session.StartResponseAsync();
    }
}
```

## Voice Options

| Voice Type | Class | Example |
|------------|-------|---------|
| Azure Standard | `AzureStandardVoice` | `"en-US-AvaNeural"` |
| Azure HD | `AzureStandardVoice` | `"en-US-Ava:DragonHDLatestNeural"` |
| Azure Custom | `AzureCustomVoice` | Custom voice with endpoint ID |

## Supported Models

| Model | Description |
|-------|-------------|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio |
| `gpt-4o-mini-realtime-preview` | Lightweight, fast interactions |
| `phi4-mm-realtime` | Cost-effective multimodal |

## Key Types Reference

| Type | Purpose |
|------|---------|
| `VoiceLiveClient` | Main client for creating sessions |
| `VoiceLiveSession` | Active WebSocket session |
| `VoiceLiveSessionOptions` | Session configuration |
| `AzureStandardVoice` | Standard Azure voice provider |
| `AzureSemanticVadTurnDetection` | Voice activity detection |
| `VoiceLiveFunctionDefinition` | Function tool definition |
| `UserMessageItem` | User text message |
| `FunctionCallOutputItem` | Function call response |
| `SessionUpdateResponseAudioDelta` | Audio chunk event |
| `SessionUpdateResponseTextDelta` | Text chunk event |

## Best Practices

1. **Always set both modalities** — Include `Text` and `Audio` for voice assistants
2. **Use `AzureSemanticVadTurnDetection`** — Provides natural conversation flow
3. **Configure appropriate silence duration** — 500ms typical to avoid premature cutoffs
4. **Use `using` statement** — Ensures proper session disposal
5. **Handle all event types** — Check for errors, audio, text, and function calls
6. **Use DefaultAzureCredential** — Never hardcode API keys

## Error Handling

```csharp
if (serverEvent is SessionUpdateError error)
{
    if (error.Error.Message.Contains("Cancellation failed: no active response"))
    {
        // Benign error, can ignore
    }
    else
    {
        Console.WriteLine($"Error: {error.Error.Message}");
    }
}
```

## Audio Configuration

- **Input Format**: `InputAudioFormat.Pcm16` (16-bit PCM)
- **Output Format**: `OutputAudioFormat.Pcm16`
- **Sample Rate**: 24kHz recommended
- **Channels**: Mono

## Related SDKs

| SDK | Purpose | Install |
|-----|---------|---------|
| `Azure.AI.VoiceLive` | Real-time voice (this SDK) | `dotnet add package Azure.AI.VoiceLive` |
| `Microsoft.CognitiveServices.Speech` | Speech-to-text, text-to-speech | `dotnet add package Microsoft.CognitiveServices.Speech` |
| `NAudio` | Audio capture/playback | `dotnet add package NAudio` |

## Reference Links

| Resource | URL |
|----------|-----|
| NuGet Package | https://www.nuget.org/packages/Azure.AI.VoiceLive |
| API Reference | https://learn.microsoft.com/dotnet/api/azure.ai.voicelive |
| GitHub Source | https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.VoiceLive |
| Quickstart | https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-quickstart |

## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

Related Skills

dotnet-wpf

from diegosouzapw/awesome-omni-skill

.NET WPF component and application patterns Triggers on: **/*.xaml, **/*.cs

azure-storage-file-datalake-py

from diegosouzapw/awesome-omni-skill

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.

azure-ai-vision-imageanalysis-java

from diegosouzapw/awesome-omni-skill

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

azure-ai-contentunderstanding-py

from diegosouzapw/awesome-omni-skill

Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.

azure-ai-contentsafety-ts

from diegosouzapw/awesome-omni-skill

Analyze text and images for harmful content using Azure AI Content Safety (@azure-rest/ai-content-safety). Use when moderating user-generated content, detecting hate speech, violence, sexual conten...

azure-ai-contentsafety-py

from diegosouzapw/awesome-omni-skill

Azure AI Content Safety SDK for Python. Use for detecting harmful content in text and images with multi-severity classification.

azure-ai-contentsafety-java

from diegosouzapw/awesome-omni-skill

Build content moderation applications with Azure AI Content Safety SDK for Java. Use when implementing text/image analysis, blocklist management, or harm detection for hate, violence, sexual conten...

dotnet-gha-patterns

from diegosouzapw/awesome-omni-skill

Composes GitHub Actions workflows. Reusable workflows, composite actions, matrix, caching.

azure-communication-callautomation-java

from diegosouzapw/awesome-omni-skill

Build call automation workflows with Azure Communication Services Call Automation Java SDK. Use when implementing IVR systems, call routing, call recording, DTMF recognition, text-to-speech, or AI-...

azure-ai-transcription-py

from diegosouzapw/awesome-omni-skill

Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization.

senior-dotnet

from diegosouzapw/awesome-omni-skill

World-class C# and .NET development skill for enterprise applications, ASP.NET Core web APIs, and cloud-native systems. Expertise in .NET 8, Entity Framework Core, ASP.NET Core Identity, Blazor, and microservices architecture. Includes project scaffolding, dependency management, security implementation, and performance optimization.

microsoft-azure-webjobs-extensions-authentication-events-dotnet

from diegosouzapw/awesome-omni-skill

Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.