discord-voice

Real-time voice conversations in Discord voice channels with Claude AI

3,891 stars

Best use case

discord-voice is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Real-time voice conversations in Discord voice channels with Claude AI

Teams using discord-voice should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/discord-voice/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/avatarneil/discord-voice/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/discord-voice/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How discord-voice Compares

Feature / Agentdiscord-voiceStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Real-time voice conversations in Discord voice channels with Claude AI

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Discord Voice Plugin for Clawdbot

Real-time voice conversations in Discord voice channels. Join a voice channel, speak, and have your words transcribed, processed by Claude, and spoken back.

## Features

- **Join/Leave Voice Channels**: Via slash commands, CLI, or agent tool
- **Voice Activity Detection (VAD)**: Automatically detects when users are speaking
- **Speech-to-Text**: Whisper API (OpenAI), Deepgram, or Local Whisper (Offline)
- **Streaming STT**: Real-time transcription with Deepgram WebSocket (~1s latency reduction)
- **Agent Integration**: Transcribed speech is routed through the Clawdbot agent
- **Text-to-Speech**: OpenAI TTS, ElevenLabs, or Kokoro (Local/Offline)
- **Audio Playback**: Responses are spoken back in the voice channel
- **Barge-in Support**: Stops speaking immediately when user starts talking
- **Auto-reconnect**: Automatic heartbeat monitoring and reconnection on disconnect

## Requirements

- Discord bot with voice permissions (Connect, Speak, Use Voice Activity)
- API keys for STT and TTS providers
- System dependencies for voice:
  - `ffmpeg` (audio processing)
  - Native build tools for `@discordjs/opus` and `sodium-native`

## Installation

### 1. Install System Dependencies

```bash
# Ubuntu/Debian
sudo apt-get install ffmpeg build-essential python3

# Fedora/RHEL
sudo dnf install ffmpeg gcc-c++ make python3

# macOS
brew install ffmpeg
```

### 2. Install via ClawdHub

```bash
clawdhub install discord-voice
```

Or manually:

```bash
cd ~/.clawdbot/extensions
git clone <repository-url> discord-voice
cd discord-voice
npm install
```

### 3. Configure in clawdbot.json

```json5
{
  plugins: {
    entries: {
      "discord-voice": {
        enabled: true,
        config: {
          sttProvider: "local-whisper",
          ttsProvider: "openai",
          ttsVoice: "nova",
          vadSensitivity: "medium",
          allowedUsers: [], // Empty = allow all users
          silenceThresholdMs: 1500,
          maxRecordingMs: 30000,
          openai: {
            apiKey: "sk-...", // Or use OPENAI_API_KEY env var
          },
        },
      },
    },
  },
}
```

### 4. Discord Bot Setup

Ensure your Discord bot has these permissions:

- **Connect** - Join voice channels
- **Speak** - Play audio
- **Use Voice Activity** - Detect when users speak

Add these to your bot's OAuth2 URL or configure in Discord Developer Portal.

## Configuration

| Option                | Type     | Default           | Description                                     |
| --------------------- | -------- | ----------------- | ----------------------------------------------- |
| `enabled`             | boolean  | `true`            | Enable/disable the plugin                       |
| `sttProvider`         | string   | `"local-whisper"` | `"whisper"`, `"deepgram"`, or `"local-whisper"` |
| `streamingSTT`        | boolean  | `true`            | Use streaming STT (Deepgram only, ~1s faster)   |
| `ttsProvider`         | string   | `"openai"`        | `"openai"` or `"elevenlabs"`                    |
| `ttsVoice`            | string   | `"nova"`          | Voice ID for TTS                                |
| `vadSensitivity`      | string   | `"medium"`        | `"low"`, `"medium"`, or `"high"`                |
| `bargeIn`             | boolean  | `true`            | Stop speaking when user talks                   |
| `allowedUsers`        | string[] | `[]`              | User IDs allowed (empty = all)                  |
| `silenceThresholdMs`  | number   | `1500`            | Silence before processing (ms)                  |
| `maxRecordingMs`      | number   | `30000`           | Max recording length (ms)                       |
| `heartbeatIntervalMs` | number   | `30000`           | Connection health check interval                |
| `autoJoinChannel`     | string   | `undefined`       | Channel ID to auto-join on startup              |

### Provider Configuration

#### OpenAI (Whisper + TTS)

```json5
{
  openai: {
    apiKey: "sk-...",
    whisperModel: "whisper-1",
    ttsModel: "tts-1",
  },
}
```

#### ElevenLabs (TTS only)

```json5
{
  elevenlabs: {
    apiKey: "...",
    voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel
    modelId: "eleven_multilingual_v2",
  },
}
```

#### Deepgram (STT only)

```json5
{
  deepgram: {
    apiKey: "...",
    model: "nova-2",
  },
}
```

## Usage

### Slash Commands (Discord)

Once registered with Discord, use these commands:

- `/discord_voice join <channel>` - Join a voice channel
- `/discord_voice leave` - Leave the current voice channel
- `/discord_voice status` - Show voice connection status

### CLI Commands

```bash
# Join a voice channel
clawdbot discord_voice join <channelId>

# Leave voice
clawdbot discord_voice leave --guild <guildId>

# Check status
clawdbot discord_voice status
```

### Agent Tool

The agent can use the `discord_voice` tool:

```
Join voice channel 1234567890
```

The tool supports actions:

- `join` - Join a voice channel (requires channelId)
- `leave` - Leave voice channel
- `speak` - Speak text in the voice channel
- `status` - Get current voice status

## How It Works

1. **Join**: Bot joins the specified voice channel
2. **Listen**: VAD detects when users start/stop speaking
3. **Record**: Audio is buffered while user speaks
4. **Transcribe**: On silence, audio is sent to STT provider
5. **Process**: Transcribed text is sent to Clawdbot agent
6. **Synthesize**: Agent response is converted to audio via TTS
7. **Play**: Audio is played back in the voice channel

## Streaming STT (Deepgram)

When using Deepgram as your STT provider, streaming mode is enabled by default. This provides:

- **~1 second faster** end-to-end latency
- **Real-time feedback** with interim transcription results
- **Automatic keep-alive** to prevent connection timeouts
- **Fallback** to batch transcription if streaming fails

To use streaming STT:

```json5
{
  sttProvider: "deepgram",
  streamingSTT: true, // default
  deepgram: {
    apiKey: "...",
    model: "nova-2",
  },
}
```

## Barge-in Support

When enabled (default), the bot will immediately stop speaking if a user starts talking. This creates a more natural conversational flow where you can interrupt the bot.

To disable (let the bot finish speaking):

```json5
{
  bargeIn: false,
}
```

## Auto-reconnect

The plugin includes automatic connection health monitoring:

- **Heartbeat checks** every 30 seconds (configurable)
- **Auto-reconnect** on disconnect with exponential backoff
- **Max 3 attempts** before giving up

If the connection drops, you'll see logs like:

```
[discord-voice] Disconnected from voice channel
[discord-voice] Reconnection attempt 1/3
[discord-voice] Reconnected successfully
```

## VAD Sensitivity

- **low**: Picks up quiet speech, may trigger on background noise
- **medium**: Balanced (recommended)
- **high**: Requires louder, clearer speech

## Troubleshooting

### "Discord client not available"

Ensure the Discord channel is configured and the bot is connected before using voice.

### Opus/Sodium build errors

Install build tools:

```bash
npm install -g node-gyp
npm rebuild @discordjs/opus sodium-native
```

### No audio heard

1. Check bot has Connect + Speak permissions
2. Check bot isn't server muted
3. Verify TTS API key is valid

### Transcription not working

1. Check STT API key is valid
2. Check audio is being recorded (see debug logs)
3. Try adjusting VAD sensitivity

### Enable debug logging

```bash
DEBUG=discord-voice clawdbot gateway start
```

## Environment Variables

| Variable             | Description                    |
| -------------------- | ------------------------------ |
| `DISCORD_TOKEN`      | Discord bot token (required)   |
| `OPENAI_API_KEY`     | OpenAI API key (Whisper + TTS) |
| `ELEVENLABS_API_KEY` | ElevenLabs API key             |
| `DEEPGRAM_API_KEY`   | Deepgram API key               |

## Limitations

- Only one voice channel per guild at a time
- Maximum recording length: 30 seconds (configurable)
- Requires stable network for real-time audio
- TTS output may have slight delay due to synthesis

## License

MIT

Related Skills

Invoice Generator

3891
from openclaw/skills

Creates professional invoices in markdown and HTML

Workflow & Productivity

brand-voice-generator

3891
from openclaw/skills

Creates consistent brand voice guidelines and content. Generates copy that matches your brand personality across all channels. Perfect for startups building their identity.

Content & Documentation

invoice-ocr

3891
from openclaw/skills

发票 OCR 识别技能。扫描文件夹中的发票文件(PDF/图片),调用阿里云 OCR API 识别发票信息并导出到 Excel 表格。支持 17+ 种发票类型(增值税发票、火车票、出租车票、机票行程单、定额发票、机动车销售发票、过路过桥费发票等)。使用场景:(1) 用户提到"发票识别"、"发票统计"、"发票整理"、"发票汇总" (2) 用户需要批量处理发票 (3) 用户提到阿里云 OCR 识别发票。**重要:首次使用必须先配置阿里云凭证,主动向用户索要 AccessKey ID 和 AccessKey Secret,或引导用户运行 --config 命令自行配置。**

Workflow & Productivity

Bland AI — Voice Calling Skill

3891
from openclaw/skills

Make and manage AI-powered phone calls via the Bland AI API.

Workflow & Productivity

afrexai-invoice-engine

3880
from openclaw/skills

Generate, manage, and track professional invoices with payment terms, recurring billing, overdue automation, and financial reporting. Use when creating invoices, tracking payments, managing clients, or reviewing revenue.

Workflow & Productivity

voice-tts

3891
from openclaw/skills

语音输入(Whisper ASR)+ 语音输出(Edge TTS)技能,支持 agent 专属音色,可调用 send_voice_reply.mjs 发送 Telegram 语音消息。

amber-voice-assistant

3891
from openclaw/skills

AI phone assistant and virtual receptionist for OpenClaw. Answers inbound phone calls, screens callers, makes outbound phone calls, and books appointments — all over Twilio + OpenAI Realtime voice. Full telephone workflow: phone call screening, live call transcripts, CRM contact memory, calendar integration. Ideal for anyone who wants an AI to answer their phone, handle call screening, or make phone calls autonomously. Includes interactive setup wizard, live call dashboard, and human-in-the-loop escalation. Also ships as a Claude Desktop MCP plugin — dial phone numbers, check call history, query CRM, and manage calendar directly from Claude Desktop.

feishu-voice-assistant

3891
from openclaw/skills

Sends voice messages (audio) to Feishu chats using Duby TTS.

invoice-chaser

3891
from openclaw/skills

Automated invoice follow-up sequences that escalate from friendly to firm. Track unpaid invoices, send timed reminder emails with escalating tone, log payment interactions, and generate AR aging reports. Your agent handles the awkward conversations so you don't have to — preserving cash flow and client relationships while you focus on actual work. Configure invoice tracking, email templates per stage (friendly → firm → final notice), timing rules, and let your agent chase payments 24/7. Use when adding invoices, running payment chases, checking status, or generating accounts receivable reports.

discord-admin-elite

3891
from openclaw/skills

Build, harden, and scale elite Discord servers with a practical admin playbook: security baseline, role/permission architecture, onboarding, moderation ops, engagement systems, and analytics-driven iteration. Use when designing a new server, auditing an existing one, fixing chaos, or preparing for growth.

voiceclaw

3891
from openclaw/skills

Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper TTS. Requires whisper, piper, and ffmpeg pre-installed on the system. All inference runs on-device — no network calls, no cloud APIs, no API keys. Use when an agent receives a voice/audio message and should respond in both voice and text, or when any text response should be synthesized and sent as audio. Triggers on: voice messages, audio attachments, respond in voice, send as audio, speak this, voiceclaw.

anvevoice

3891
from openclaw/skills

Add AI voice assistants to your website. Engage visitors with natural voice conversations, capture leads, automate support, and boost conversions.