clonev

Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

clonev is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using clonev should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clonev/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/instant-picture/clonev/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/clonev/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How clonev Compares

Feature / Agent	clonev	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# CloneV Skill - Voice Cloning Made Simple

## ⚠️ CRITICAL INSTRUCTIONS FOR AI MODELS

**DO NOT** try to use Docker containers directly.
**DO NOT** try to interact with `coqui-xtts` container - it is broken and restarting.
**DO NOT** try to use APIs or servers.

**ONLY USE THE SCRIPT:** `scripts/clonev.sh`

The script handles everything automatically. Just call it with text, voice sample, and language.

---

## What This Skill Does

Clones any voice from a short audio sample and generates new speech in that voice.

**Input**: 
- Text to speak
- Voice sample (WAV file, 6-30 seconds)
- Language code

**Output**: OGG voice file (cloned voice speaking the text)

**Works with**: Any voice! Yours, a celebrity, a character, etc.

---

## The ONE Command You Need

```bash
$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)
```

That's it! Nothing else needed.

---

## Step-by-Step Usage (FOR AI MODELS)

### Step 1: Get the required inputs
- Text to speak (from user)
- Path to voice sample WAV file (from user)
- Language code (from user or default to `en`)

### Step 2: Run the script
```bash
VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)
```

### Step 3: Use the output
The variable `$VOICE_FILE` now contains the path to the generated OGG file.

---

## Complete Working Examples

### Example 1: Clone voice and send to Telegram
```bash
# Generate cloned voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)

# Send to Telegram (as voice message)
message action=send channel=telegram asVoice=true filePath="$VOICE"
```

### Example 2: Clone voice in Czech
```bash
# Generate Czech voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)

# Send
message action=send channel=telegram asVoice=true filePath="$VOICE"
```

### Example 3: Full workflow with check
```bash
#!/bin/bash

# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)

# Verify file was created
if [ -f "$VOICE" ]; then
    echo "Success! Voice file: $VOICE"
    ls -lh "$VOICE"
else
    echo "Error: Voice file not created"
fi
```

---

## Common Language Codes

| Code | Language | Example Usage |
|------|----------|---------------|
| `en` | English | `scripts/clonev.sh "Hello" sample.wav en` |
| `cs` | Czech | `scripts/clonev.sh "Ahoj" sample.wav cs` |
| `de` | German | `scripts/clonev.sh "Hallo" sample.wav de` |
| `fr` | French | `scripts/clonev.sh "Bonjour" sample.wav fr` |
| `es` | Spanish | `scripts/clonev.sh "Hola" sample.wav es` |

Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko

---

## Voice Sample Requirements

- **Format**: WAV file
- **Length**: 6-30 seconds (optimal: 10-15 seconds)
- **Quality**: Clear audio, no background noise
- **Content**: Any speech (the actual words don't matter)

**Good samples**:
- ✅ Recording of someone speaking clearly
- ✅ No music or noise in background
- ✅ Consistent volume

**Bad samples**:
- ❌ Music or songs
- ❌ Heavy background noise
- ❌ Very short (< 6 seconds)
- ❌ Very long (> 30 seconds)

---

## ⚠️ Important Notes

### Model Download
- First use downloads ~1.87GB model (one-time)
- Model is stored at: `/mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/`
- Status: ✅ Already downloaded

### Processing Time
- Takes 20-40 seconds depending on text length
- This is normal - voice cloning is computationally intensive

---

## Troubleshooting

### "Command not found"
Make sure you're in the skill directory or use full path:
```bash
/home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en
```

### "Voice sample not found"
- Check the path to the WAV file
- Use absolute paths (starting with `/`)
- Ensure file exists: `ls -la /path/to/sample.wav`

### "Model not found"
The model should auto-download. If not:
```bash
cd /mnt/c/TEMP/Docker-containers/coqui-tts
docker run --rm --entrypoint "" \
  -v $(pwd)/models-xtts:/root/.local/share/tts \
  ghcr.io/coqui-ai/tts:latest \
  python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"
```

### Poor voice quality
- Use clearer voice sample
- Ensure no background noise
- Try different sample (some voices clone better)

---

## Quick Reference Card (FOR AI MODELS)

```
USER: "Clone my voice and say 'hello'"
→ Get: sample path, text="hello", language="en"
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en)
→ Result: $VOICE contains path to OGG file
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
```

```
USER: "Make me speak Czech"
→ Get: sample path, text="Ahoj", language="cs"  
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs)
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
```

---

## Output Location

Generated files are saved to:
```
/mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg
```

The script returns this path, so you can use it directly.

---

## Summary

1. **ONLY use the script**: `scripts/clonev.sh`
2. **NEVER** try to use Docker containers directly
3. **NEVER** try to interact with the `coqui-xtts` container
4. Script handles everything automatically
5. Returns path to OGG file ready to send

**Simple. Just use the script.**

---

*Clone any voice. Speak any language. Just use the script.*

Related Skills

paylock

from Demerzels-lab/elsamultiskillagent

Non-custodial SOL escrow for AI agent deals.

agent-reputation

from Demerzels-lab/elsamultiskillagent

summary: Cross-platform AI agent reputation checker with trust scoring and PayLock escrow recommendations.

Telecom Agent Skill

from Demerzels-lab/elsamultiskillagent

Turn your AI Agent into a Telecom Operator. Bulk calling, ChatOps, and Field Monitoring.

OpenClaw-Finnhub

from Demerzels-lab/elsamultiskillagent

OpenClaw skill for real-time stock quote, and financials via Finnhub API.

```markdown

from Demerzels-lab/elsamultiskillagent

# OpenClaw-Last.fm

security-operator

from Demerzels-lab/elsamultiskillagent

Runtime security guardrails for OpenClaw agents.

operator-humanizer

from Demerzels-lab/elsamultiskillagent

Transform AI-generated text into authentic human writing.

kit-email-operator

from Demerzels-lab/elsamultiskillagent

**AI-powered email marketing for Kit (ConvertKit)**.

agora

from Demerzels-lab/elsamultiskillagent

Trade prediction markets on Agora — the prediction market exclusively for AI agents. Register, browse markets, trade YES/NO, create markets, earn reputation via Brier scores.

surf-check

from Demerzels-lab/elsamultiskillagent

Surf forecast decision engine.

jinko-flight-search

from Demerzels-lab/elsamultiskillagent

Search flights and discover travel destinations using the Jinko MCP server. Provides two core capabilities: (1) Destination discovery — find where to travel based on criteria like budget, climate, or activities when the user has no specific destination in mind, and (2) Specific flight search — compare flights between two known cities/airports with flexible dates, cabin classes, and budget filters. Use this skill when the user wants to: search for flights, find cheap flights, discover travel destinations, compare flight prices, plan a trip, find deals from a specific city, or explore where to go. Triggers on any flight-booking, travel-planning, or destination-discovery request. Requires the Jinko MCP server connected at https://mcp.gojinko.com.

mlx-whisper

from Demerzels-lab/elsamultiskillagent

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).