azure-ai-voicelive-java
Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.
About this skill
This skill integrates the Azure AI VoiceLive SDK for Java, providing a robust framework for real-time, bidirectional voice conversations with AI assistants. By leveraging WebSocket technology, it facilitates low-latency communication, allowing AI agents to seamlessly understand spoken input and generate natural-sounding spoken responses. This capability is essential for developing sophisticated voice-driven applications, interactive virtual assistants, and conversational interfaces that require dynamic auditory interaction and natural language processing. The integration requires a Java development environment and valid Azure AI Speech service credentials.
Best use case
Building AI assistants capable of engaging in fluid, natural voice conversations; developing interactive voice response (IVR) systems powered by AI; creating voice-controlled applications; enhancing AI agent capabilities with live speech interaction for customer support, educational tools, or virtual companions.
Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.
An AI assistant capable of initiating and sustaining natural, real-time spoken dialogues, interpreting user voice commands and queries, and generating contextually appropriate, natural-sounding spoken responses through the Azure AI VoiceLive service.
Practical example
Example input
User: "Help me integrate Azure AI VoiceLive for real-time voice interaction in my Java application. I need the dependencies and environment variable setup to get started."
Example output
To integrate Azure AI VoiceLive for Java, add the following dependency to your `pom.xml` (Maven) or `build.gradle` (Gradle):
```xml
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-voicelive</artifactId>
<version>1.0.0-beta.2</version>
</dependency>
```
Set these environment variables:
```bash
AZURE_VOICELIVE_ENDPOINT=https://<your-azure-ai-resource>.openai.azure.com/
AZURE_VOICELIVE_API_KEY=<your-api-key>
```
Then, you can initialize the client in your Java code:
```java
import com.azure.ai.voicelive.VoiceLiveClient;
import com.azure.core.credential.AzureKeyCredential;
String endpoint = System.getenv("AZURE_VOICELIVE_ENDPOINT");
String apiKey = System.getenv("AZURE_VOICELIVE_API_KEY");
VoiceLiveClient client = new VoiceLiveClient(endpoint, new AzureKeyCredential(apiKey));
// Use 'client' to start conversations, send audio, and receive AI responses.
```
This client enables your application to handle real-time voice input and output with Azure AI.When to use this skill
- When your AI agent or application needs to process and respond to spoken language in real-time; when developing voice-first applications with Azure AI services; when low-latency, bidirectional voice communication is critical for an engaging user experience.
When not to use this skill
- When text-based communication is sufficient for the application's needs; when real-time voice interaction is not a requirement; when operating in environments where Java development or Azure AI services are not feasible or preferred; for simple, one-off speech-to-text or text-to-speech tasks that do not require full conversational context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/azure-ai-voicelive-java/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How azure-ai-voicelive-java Compares
| Feature / Agent | azure-ai-voicelive-java | Standard Approach |
|---|---|---|
| Platform Support | Claude, GitHub Copilot, Aider, Continue | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.
Which AI agents support this skill?
This skill is designed for Claude, GitHub Copilot, Aider, Continue.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# Azure AI VoiceLive SDK for Java
Real-time, bidirectional voice conversations with AI assistants using WebSocket technology.
## Installation
```xml
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-voicelive</artifactId>
<version>1.0.0-beta.2</version>
</dependency>
```
## Environment Variables
```bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.openai.azure.com/
AZURE_VOICELIVE_API_KEY=<your-api-key>
```
## Authentication
### API Key
```java
import com.azure.ai.voicelive.VoiceLiveAsyncClient;
import com.azure.ai.voicelive.VoiceLiveClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT"))
.credential(new AzureKeyCredential(System.getenv("AZURE_VOICELIVE_API_KEY")))
.buildAsyncClient();
```
### DefaultAzureCredential (Recommended)
```java
import com.azure.identity.DefaultAzureCredentialBuilder;
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT"))
.credential(new DefaultAzureCredentialBuilder().build())
.buildAsyncClient();
```
## Key Concepts
| Concept | Description |
|---------|-------------|
| `VoiceLiveAsyncClient` | Main entry point for voice sessions |
| `VoiceLiveSessionAsyncClient` | Active WebSocket connection for streaming |
| `VoiceLiveSessionOptions` | Configuration for session behavior |
### Audio Requirements
- **Sample Rate**: 24kHz (24000 Hz)
- **Bit Depth**: 16-bit PCM
- **Channels**: Mono (1 channel)
- **Format**: Signed PCM, little-endian
## Core Workflow
### 1. Start Session
```java
import reactor.core.publisher.Mono;
client.startSession("gpt-4o-realtime-preview")
.flatMap(session -> {
System.out.println("Session started");
// Subscribe to events
session.receiveEvents()
.subscribe(
event -> System.out.println("Event: " + event.getType()),
error -> System.err.println("Error: " + error.getMessage())
);
return Mono.just(session);
})
.block();
```
### 2. Configure Session Options
```java
import com.azure.ai.voicelive.models.*;
import java.util.Arrays;
ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()
.setThreshold(0.5) // Sensitivity (0.0-1.0)
.setPrefixPaddingMs(300) // Audio before speech
.setSilenceDurationMs(500) // Silence to end turn
.setInterruptResponse(true) // Allow interruptions
.setAutoTruncate(true)
.setCreateResponse(true);
AudioInputTranscriptionOptions transcription = new AudioInputTranscriptionOptions(
AudioInputTranscriptionOptionsModel.WHISPER_1);
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setInstructions("You are a helpful AI voice assistant.")
.setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)))
.setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
.setInputAudioFormat(InputAudioFormat.PCM16)
.setOutputAudioFormat(OutputAudioFormat.PCM16)
.setInputAudioSamplingRate(24000)
.setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD))
.setInputAudioEchoCancellation(new AudioEchoCancellation())
.setInputAudioTranscription(transcription)
.setTurnDetection(turnDetection);
// Send configuration
ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options);
session.sendEvent(updateEvent).subscribe();
```
### 3. Send Audio Input
```java
byte[] audioData = readAudioChunk(); // Your PCM16 audio data
session.sendInputAudio(BinaryData.fromBytes(audioData)).subscribe();
```
### 4. Handle Events
```java
session.receiveEvents().subscribe(event -> {
ServerEventType eventType = event.getType();
if (ServerEventType.SESSION_CREATED.equals(eventType)) {
System.out.println("Session created");
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED.equals(eventType)) {
System.out.println("User started speaking");
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED.equals(eventType)) {
System.out.println("User stopped speaking");
} else if (ServerEventType.RESPONSE_AUDIO_DELTA.equals(eventType)) {
if (event instanceof SessionUpdateResponseAudioDelta) {
SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event;
playAudioChunk(audioEvent.getDelta());
}
} else if (ServerEventType.RESPONSE_DONE.equals(eventType)) {
System.out.println("Response complete");
} else if (ServerEventType.ERROR.equals(eventType)) {
if (event instanceof SessionUpdateError) {
SessionUpdateError errorEvent = (SessionUpdateError) event;
System.err.println("Error: " + errorEvent.getError().getMessage());
}
}
});
```
## Voice Configuration
### OpenAI Voices
```java
// Available: ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)));
```
### Azure Voices
```java
// Azure Standard Voice
options.setVoice(BinaryData.fromObject(new AzureStandardVoice("en-US-JennyNeural")));
// Azure Custom Voice
options.setVoice(BinaryData.fromObject(new AzureCustomVoice("myVoice", "endpointId")));
// Azure Personal Voice
options.setVoice(BinaryData.fromObject(
new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL)));
```
## Function Calling
```java
VoiceLiveFunctionDefinition weatherFunction = new VoiceLiveFunctionDefinition("get_weather")
.setDescription("Get current weather for a location")
.setParameters(BinaryData.fromObject(parametersSchema));
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setTools(Arrays.asList(weatherFunction))
.setInstructions("You have access to weather information.");
```
## Best Practices
1. **Use async client** — VoiceLive requires reactive patterns
2. **Configure turn detection** for natural conversation flow
3. **Enable noise reduction** for better speech recognition
4. **Handle interruptions** gracefully with `setInterruptResponse(true)`
5. **Use Whisper transcription** for input audio transcription
6. **Close sessions** properly when conversation ends
## Error Handling
```java
session.receiveEvents()
.doOnError(error -> System.err.println("Connection error: " + error.getMessage()))
.onErrorResume(error -> {
// Attempt reconnection or cleanup
return Flux.empty();
})
.subscribe();
```
## Reference Links
| Resource | URL |
|----------|-----|
| GitHub Source | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive |
| Samples | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive/src/samples |
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.Related Skills
azure-ai-voicelive-ts
Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication.
azure-ai-voicelive-py
Build real-time voice AI applications with bidirectional WebSocket communication.
azure-ai-voicelive-dotnet
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication.
n8n-code-javascript
Write JavaScript code in n8n Code nodes. Use when writing JavaScript in n8n, using $input/$json/$node syntax, making HTTP requests with $helpers, working with dates using DateTime, troubleshooting Code node errors, or choosing between Code node modes.
microsoft-azure-webjobs-extensions-authentication-events-dotnet
Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.
javascript-typescript-typescript-scaffold
You are a TypeScript project architecture expert specializing in scaffolding production-ready Node.js and frontend applications. Generate complete project structures with modern tooling (pnpm, Vite, N
javascript-pro
Master modern JavaScript with ES6+, async patterns, and Node.js APIs. Handles promises, event loops, and browser/Node compatibility.
javascript-mastery
33+ essential JavaScript concepts every developer should know, inspired by [33-js-concepts](https://github.com/leonardomso/33-js-concepts).
java-pro
Master Java 21+ with modern features like virtual threads, pattern matching, and Spring Boot 3.x. Expert in the latest Java ecosystem including GraalVM, Project Loom, and cloud-native patterns.
azure-web-pubsub-ts
Real-time messaging with WebSocket connections and pub/sub patterns.
azure-storage-queue-ts
Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.
azure-storage-queue-py
Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.