# Voices and Skills Guide
## Quick Start
### Switching Voices
1. Place your voice file as `C:\AI\localvoicemode\voice_references\default.wav`
2. Restart the app - it will use your new voice
### Switching Skills/Personalities
1. Edit `C:\AI\localvoicemode\v2\ml-server\.env`
2. Add: `DEFAULT_SKILL=hermione-companion`
3. Restart the app
---
## Voice Requirements
Pocket TTS clones voices from reference audio. For best results:
| Requirement | Optimal | Acceptable |
|-------------|---------|------------|
| **Duration** | 10 seconds | 5-30 seconds |
| **Format** | WAV (16-bit PCM) | WAV (any bit depth) |
| **Sample Rate** | 24000 Hz | 16000-48000 Hz |
| **Channels** | Mono | Stereo (will be converted) |
| **Content** | Clear speech | Minimal background noise |
### Recording Tips
1. Record in a quiet environment
2. Speak naturally (avoid reading robotically)
3. Include varied intonation (questions, statements)
4. Keep consistent microphone distance
5. Use [ai-coustics](https://ai-coustics.com/) to enhance recordings (optional)
### Voice File Locations
Priority order (first found is used):
1. `voice_references/default.wav` - Default voice override
2. `skills/<skill-id>/reference.wav` - Per-skill voice
3. `voice_references/<skill-id>.wav` - Global skill voice
4. Built-in voices: `alba`, `marius`, `javert`, `fantine`
### Swapping Voices
**Method 1: Replace default.wav**
```
C:\AI\localvoicemode\voice_references\default.wav
```
**Method 2: Rename your file**
```
# Your voice file -> default.wav
my_voice.wav -> default.wav
```
**Method 3: Per-skill voice**
```
C:\AI\localvoicemode\skills\hermione-companion\reference.wav
```
---
## Skills/Personalities
Skills define character personalities, system prompts, and optional voice files.
### Available Skills
| Skill ID | Character | Description |
|----------|-----------|-------------|
| `assistant-default` | Default Assistant | General-purpose voice assistant |
| `hermione-companion` | Hermione Granger | Roleplay companion at the Hog's Head pub |
### Skill Structure
```
skills/<skill-id>/
├── SKILL.md # Character definition (required)
├── reference.wav # Custom voice (optional)
├── avatar.png # Character image (optional)
└── references/ # Lore/knowledge files (optional)
```
### Creating a New Skill
1. Create folder: `skills/my-character/`
2. Create `SKILL.md`:
```yaml
---
id: my-character
name: My Character
display_name: "My Character"
description: A custom character
voice: reference.wav
metadata:
setting: "Where the character is"
greeting: "Hello! How can I help you?"
---
# My Character
## System Prompt
You are My Character. [Full personality description here...]
```
3. Add voice file (optional): `reference.wav`
4. Restart and select the skill in Settings
### SKILL.md Format
The file uses YAML frontmatter + Markdown body:
```yaml
---
id: skill-id # Unique identifier
name: Character Name # Display name
display_name: "Name" # Optional display override
description: Brief desc # For skill listing
voice: reference.wav # Voice file (optional)
avatar: avatar.png # Avatar image (optional)
metadata:
setting: "Scene description"
greeting: "First message"
personality_traits:
- Trait 1
- Trait 2
speech_patterns:
- "Tends to say..."
---
# Character Name
## System Prompt
Full character instructions here...
```
---
## Configuration
### Environment Variables (.env)
Edit `C:\AI\localvoicemode\v2\ml-server\.env`:
```bash
# LLM Provider
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=deepseek/deepseek-chat-v3-0324
# Default skill to load
DEFAULT_SKILL=assistant-default
# Force LLM provider: lm_studio, openrouter, openai
# VOICE_PROVIDER=openrouter
```
### System Prompt Override
To use a custom system prompt without creating a skill:
1. Edit `.env`
2. Add: `SYSTEM_PROMPT="You are a helpful assistant..."`
---
## Troubleshooting
### Voice not changing
- Ensure file is named exactly `default.wav`
- Check file format (must be WAV)
- Restart the ML server
### Skill not loading
- Check SKILL.md YAML syntax
- Verify skill folder exists in `skills/`
- Check ML server logs for errors
### Voice quality issues
- Use higher quality source audio
- Ensure 10+ seconds of speech
- Remove background noise
- Try different reference audio