ClaudeChatGPTGeminiGitHub CopilotCursorDeepSeekWindsurfClineRoo CodeOpenCodeAiderContinueVoice AI & Speech Processing

azure-ai-voicelive-dotnet

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.

31,392 stars
Complexity: medium

About this skill

This skill provides AI agents with the foundational knowledge and installation instructions for integrating the Azure AI Voice Live SDK into .NET applications. It enables agents to construct sophisticated real-time voice AI solutions that utilize bidirectional WebSocket communication for natural, low-latency interactions. Agents can learn to set up the necessary NuGet packages (`Azure.AI.VoiceLive`, `Azure.Identity`, `NAudio` for audio handling), configure Azure service endpoints, specify AI models (e.g., `gpt-4o-realtime-preview`), and select voices (e.g., `en-US-AvaNeural`). This facilitates the creation of highly responsive voice assistants, interactive agents, and other voice-enabled applications within the .NET ecosystem, particularly for AI agents capable of code generation and execution.

Best use case

Developing custom real-time voice assistants that require low-latency, bidirectional audio streaming. Building interactive voice-enabled applications with dynamic conversational flows in a .NET environment. Creating .NET-based prototypes or production systems that integrate Azure AI's advanced voice capabilities for transcription, synthesis, and interaction. Enabling agents to generate and integrate code snippets for real-time voice processing within a development workflow.

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.

A functional .NET application or a successfully integrated code module that utilizes Azure AI Voice Live for real-time bidirectional voice communication, capable of understanding spoken input and generating natural-sounding speech output. This could range from a proof-of-concept voice assistant to a component within a larger system.

Practical example

Example input

Develop a .NET console application that uses Azure AI Voice Live to create a real-time English-speaking voice assistant. It should listen for user input and respond using the 'en-US-AvaNeural' voice, connecting to a specified Azure AI endpoint and using 'gpt-4o-realtime-preview' for language understanding. Provide the necessary `dotnet` commands for setup and a basic C# code structure.

Example output

{"status": "success", "message": "Generated .NET project structure and setup instructions for a real-time voice assistant.", "project_name": "VoiceAssistantApp", "setup_commands": ["mkdir VoiceAssistantApp", "cd VoiceAssistantApp", "dotnet new console", "dotnet add package Azure.AI.VoiceLive", "dotnet add package Azure.Identity", "dotnet add package NAudio"], "environment_variables_to_set": {"AZURE_VOICELIVE_ENDPOINT": "https://<your_resource>.services.ai.azure.com/", "AZURE_VOICELIVE_MODEL": "gpt-4o-realtime-preview", "AZURE_VOICELIVE_VOICE": "en-US-AvaNeural"}, "sample_code_snippet_description": "Basic C# code demonstrating real-time voice capture, Azure AI Voice Live client initialization, and bidirectional communication. (Actual C# code would be provided here.)"}

When to use this skill

  • When the AI agent is tasked with building or integrating a .NET application that requires sophisticated, real-time voice AI features.
  • When an agent needs to leverage Azure AI services for high-quality voice input and output with minimal latency.
  • For projects requiring custom voice models or specific language/voice configurations that Azure AI Voice Live supports.
  • When the AI agent operates within a development environment capable of executing .NET commands and managing NuGet packages.

When not to use this skill

  • When the AI agent needs to perform a simple, one-off voice recognition or text-to-speech task without building a full application.
  • When the target environment is not .NET or cannot accommodate installing SDKs and compiling code.
  • When the task requires an immediate, pre-built voice API call rather than the development of a new application.
  • For agents that do not have code generation or execution capabilities in a .NET context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/azure-ai-voicelive-dotnet/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/azure-ai-voicelive-dotnet/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/azure-ai-voicelive-dotnet/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How azure-ai-voicelive-dotnet Compares

Feature / Agentazure-ai-voicelive-dotnetStandard Approach
Platform SupportClaude, ChatGPT, Gemini, GitHub Copilot, Cursor, DeepSeek, Windsurf, Cline, Roo Code, OpenCode, Aider, ContinueLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.

Which AI agents support this skill?

This skill is designed for Claude, ChatGPT, Gemini, GitHub Copilot, Cursor, DeepSeek, Windsurf, Cline, Roo Code, OpenCode, Aider, Continue.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Azure.AI.VoiceLive (.NET)

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI.

## Installation

```bash
dotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio                    # For audio capture/playback
```

**Current Versions**: Stable v1.0.0, Preview v1.1.0-beta.1

## Environment Variables

```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeural
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
```

## Authentication

### Microsoft Entra ID (Recommended)

```csharp
using Azure.Identity;
using Azure.AI.VoiceLive;

Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
```

**Required Role**: `Cognitive Services User` (assign in Azure Portal → Access control)

### API Key

```csharp
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
```

## Client Hierarchy

```
VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
    ├── ConfigureSessionAsync()
    ├── GetUpdatesAsync() → SessionUpdate events
    ├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
    ├── SendAudioAsync()
    └── StartResponseAsync()
```

## Core Workflow

### 1. Start Session and Configure

```csharp
using Azure.Identity;
using Azure.AI.VoiceLive;

var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());

var model = "gpt-4o-mini-realtime-preview";

// Start session
using VoiceLiveSession session = await client.StartSessionAsync(model);

// Configure session
VoiceLiveSessionOptions sessionOptions = new()
{
    Model = model,
    Instructions = "You are a helpful AI assistant. Respond naturally.",
    Voice = new AzureStandardVoice("en-US-AvaNeural"),
    TurnDetection = new AzureSemanticVadTurnDetection()
    {
        Threshold = 0.5f,
        PrefixPadding = TimeSpan.FromMilliseconds(300),
        SilenceDuration = TimeSpan.FromMilliseconds(500)
    },
    InputAudioFormat = InputAudioFormat.Pcm16,
    OutputAudioFormat = OutputAudioFormat.Pcm16
};

// Set modalities (both text and audio for voice assistants)
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);

await session.ConfigureSessionAsync(sessionOptions);
```

### 2. Process Events

```csharp
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
    switch (serverEvent)
    {
        case SessionUpdateResponseAudioDelta audioDelta:
            byte[] audioData = audioDelta.Delta.ToArray();
            // Play audio via NAudio or other audio library
            break;
            
        case SessionUpdateResponseTextDelta textDelta:
            Console.Write(textDelta.Delta);
            break;
            
        case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
            // Handle function call (see Function Calling section)
            break;
            
        case SessionUpdateError error:
            Console.WriteLine($"Error: {error.Error.Message}");
            break;
            
        case SessionUpdateResponseDone:
            Console.WriteLine("\n--- Response complete ---");
            break;
    }
}
```

### 3. Send User Message

```csharp
await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();
```

### 4. Function Calling

```csharp
// Define function
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
    Description = "Get the current weather for a given location",
    Parameters = BinaryData.FromString("""
        {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state or country"
                }
            },
            "required": ["location"]
        }
        """)
};

// Add to session options
sessionOptions.Tools.Add(weatherFunction);

// Handle function call in event loop
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
    if (functionCall.Name == "get_current_weather")
    {
        var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
        string location = parameters?["location"] ?? "";
        
        // Call external service
        string weatherInfo = $"The weather in {location} is sunny, 75°F.";
        
        // Send response
        await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
        await session.StartResponseAsync();
    }
}
```

## Voice Options

| Voice Type | Class | Example |
|------------|-------|---------|
| Azure Standard | `AzureStandardVoice` | `"en-US-AvaNeural"` |
| Azure HD | `AzureStandardVoice` | `"en-US-Ava:DragonHDLatestNeural"` |
| Azure Custom | `AzureCustomVoice` | Custom voice with endpoint ID |

## Supported Models

| Model | Description |
|-------|-------------|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio |
| `gpt-4o-mini-realtime-preview` | Lightweight, fast interactions |
| `phi4-mm-realtime` | Cost-effective multimodal |

## Key Types Reference

| Type | Purpose |
|------|---------|
| `VoiceLiveClient` | Main client for creating sessions |
| `VoiceLiveSession` | Active WebSocket session |
| `VoiceLiveSessionOptions` | Session configuration |
| `AzureStandardVoice` | Standard Azure voice provider |
| `AzureSemanticVadTurnDetection` | Voice activity detection |
| `VoiceLiveFunctionDefinition` | Function tool definition |
| `UserMessageItem` | User text message |
| `FunctionCallOutputItem` | Function call response |
| `SessionUpdateResponseAudioDelta` | Audio chunk event |
| `SessionUpdateResponseTextDelta` | Text chunk event |

## Best Practices

1. **Always set both modalities** — Include `Text` and `Audio` for voice assistants
2. **Use `AzureSemanticVadTurnDetection`** — Provides natural conversation flow
3. **Configure appropriate silence duration** — 500ms typical to avoid premature cutoffs
4. **Use `using` statement** — Ensures proper session disposal
5. **Handle all event types** — Check for errors, audio, text, and function calls
6. **Use DefaultAzureCredential** — Never hardcode API keys

## Error Handling

```csharp
if (serverEvent is SessionUpdateError error)
{
    if (error.Error.Message.Contains("Cancellation failed: no active response"))
    {
        // Benign error, can ignore
    }
    else
    {
        Console.WriteLine($"Error: {error.Error.Message}");
    }
}
```

## Audio Configuration

- **Input Format**: `InputAudioFormat.Pcm16` (16-bit PCM)
- **Output Format**: `OutputAudioFormat.Pcm16`
- **Sample Rate**: 24kHz recommended
- **Channels**: Mono

## Related SDKs

| SDK | Purpose | Install |
|-----|---------|---------|
| `Azure.AI.VoiceLive` | Real-time voice (this SDK) | `dotnet add package Azure.AI.VoiceLive` |
| `Microsoft.CognitiveServices.Speech` | Speech-to-text, text-to-speech | `dotnet add package Microsoft.CognitiveServices.Speech` |
| `NAudio` | Audio capture/playback | `dotnet add package NAudio` |

## Reference Links

| Resource | URL |
|----------|-----|
| NuGet Package | https://www.nuget.org/packages/Azure.AI.VoiceLive |
| API Reference | https://learn.microsoft.com/dotnet/api/azure.ai.voicelive |
| GitHub Source | https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.VoiceLive |
| Quickstart | https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-quickstart |

## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

Related Skills

azure-ai-voicelive-ts

31392
from sickn33/antigravity-awesome-skills

Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.

Voice AI & Speech ProcessingClaudeChatGPTGemini

azure-ai-voicelive-py

31392
from sickn33/antigravity-awesome-skills

Build real-time voice AI applications with bidirectional WebSocket communication.

Voice AI & Speech ProcessingClaudeChatGPTGemini

azure-ai-voicelive-java

31392
from sickn33/antigravity-awesome-skills

Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.

Voice AI & Speech ProcessingClaudeGitHub CopilotAider

microsoft-azure-webjobs-extensions-authentication-events-dotnet

31392
from sickn33/antigravity-awesome-skills

Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.

Identity Management / Authentication & AuthorizationClaude

dotnet-backend

31392
from sickn33/antigravity-awesome-skills

Build ASP.NET Core 8+ backend services with EF Core, auth, background jobs, and production API patterns.

Code GenerationClaude

dotnet-backend-patterns

31392
from sickn33/antigravity-awesome-skills

Master C#/.NET patterns for building production-grade APIs, MCP servers, and enterprise backends with modern best practices (2024/2025).

Software DevelopmentClaude

dotnet-architect

31392
from sickn33/antigravity-awesome-skills

Expert .NET backend architect specializing in C#, ASP.NET Core, Entity Framework, Dapper, and enterprise application patterns.

Software DevelopmentClaude

azure-web-pubsub-ts

31392
from sickn33/antigravity-awesome-skills

Real-time messaging with WebSocket connections and pub/sub patterns.

Messaging & CommunicationClaude

azure-storage-queue-ts

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.

Cloud IntegrationClaude

azure-storage-queue-py

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.

Cloud IntegrationClaude

azure-storage-file-share-ts

31392
from sickn33/antigravity-awesome-skills

Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.

Cloud Storage ManagementClaude

azure-storage-file-share-py

31392
from sickn33/antigravity-awesome-skills

Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.

Cloud Storage ManagementClaude