# LocalVoiceMode
**Local voice interface with Character Skills** - Self-contained voice chat system.
Uses **Parakeet TDT 0.6B** (NVIDIA) for fast GPU speech recognition, **Pocket TTS** (Kyutai) for natural text-to-speech. Auto-detects **LM Studio**, **OpenRouter**, or **OpenAI** as the LLM backend.
## Features
- **Parakeet TDT ASR** - NVIDIA's fast speech recognition (GPU accelerated via ONNX)
- **Pocket TTS** - Kyutai's natural-sounding text-to-speech with voice cloning
- **Smart Turn Detection** - Knows when you're done speaking, not just detecting silence
- **Auto-Provider Detection** - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI
- **Modern Rich UI** - Beautiful terminal interface with audio visualization
- **Character Skills** - Load different personalities with custom voices
- **MCP Integration** - Works with Claude Code and other MCP-enabled tools
## Quick Start
### 1. Clone and Setup
```batch
git clone https://github.com/your-username/localvoicemode.git
cd localvoicemode
setup.bat
```
This creates a virtual environment and installs all dependencies.
### 2. HuggingFace Login (Required)
Pocket TTS requires accepting the model license:
```batch
.venv\Scripts\huggingface-cli.exe login
```
Then accept the license at: https://huggingface.co/kyutai/pocket-tts
### 3. Configure LLM Provider
**Option A: LM Studio (Recommended for local)**
1. Open LM Studio
2. Load your preferred model
3. Start the local server (default: `http://localhost:1234`)
**Option B: OpenRouter**
```batch
set OPENROUTER_API_KEY=your-key-here
```
Get your key at: https://openrouter.ai/keys
**Option C: OpenAI**
```batch
set OPENAI_API_KEY=your-key-here
```
### 4. Run Voice Chat
```batch
REM Default assistant
VoiceChat.bat
REM With Hermione character
VoiceChat.bat hermione
REM Push-to-talk mode
VoiceChat.bat hermione ptt
```
## Provider Detection
LocalVoiceMode automatically detects available providers in this order:
1. **LM Studio** - Scans ports 1234, 1235, 1236, 8080, 5000
2. **OpenRouter** - Uses `OPENROUTER_API_KEY` environment variable
3. **OpenAI** - Uses `OPENAI_API_KEY` environment variable
Force a specific provider with `VOICE_PROVIDER=openrouter` (or `lm_studio`, `openai`).
## Directory Structure
```
localvoicemode/
├── voice_client.py # Main voice client entry point
├── mcp_server.py # MCP server for AI assistant integration
├── requirements.txt # Python dependencies
├── setup.bat # Setup script (run first!)
├── VoiceChat.bat # Launch script
├── start_voicemode.bat # MCP server launcher
│
├── src/localvoicemode/ # Core package
│ ├── audio/ # Audio recording
│ ├── speech/ # ASR, TTS, VAD, filters
│ ├── llm/ # Provider management
│ ├── skills/ # Skill loading
│ └── state/ # State machines, config
│
├── skills/ # Character skills
│ ├── assistant-default/ # Default assistant
│ └── hermione-companion/
│ ├── SKILL.md # Character definition
│ ├── references/ # Lore files
│ └── scripts/ # Helper scripts
│
└── voice_references/ # Custom voice files (.wav)
```
## Skills System
Skills define character personalities, system prompts, and optional knowledge.
### List Available Skills
```batch
.venv\Scripts\python.exe voice_client.py --list-skills
```
### Create a New Skill
1. Create directory: `skills/my-skill/`
2. Create `SKILL.md`:
```yaml
---
id: my-skill
name: My Character
display_name: "My Character"
description: Brief description
metadata:
greeting: "Hello! How can I help?"
---
# My Character
## System Prompt
You are My Character. [Full instructions here...]
```
3. Add optional files:
- `reference.wav` - Voice clone source (10s of clear speech)
- `avatar.png` - Character image
- `references/` - Knowledge markdown files
### Voice Cloning
Pocket TTS supports voice cloning from reference audio.
**Requirements:**
- WAV format (16-bit PCM)
- ~10 seconds of clean speech
- Clear recording, minimal background noise
Place the file at:
- `skills/my-skill/reference.wav` (per-skill), or
- `voice_references/my-skill.wav` (global)
## Voice Modes
### VAD Mode (default)
Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.
```batch
VoiceChat.bat hermione
```
### PTT Mode
Push-to-Talk - hold Space to record, release to send.
```batch
VoiceChat.bat hermione ptt
```
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `VOICE_API_URL` | `http://localhost:1234/v1` | OpenAI-compatible API URL |
| `VOICE_API_KEY` | (none) | API key for the provider |
| `VOICE_MODEL` | (auto) | Model name to use |
| `VOICE_PROVIDER` | (auto) | Force provider: lm_studio, openrouter, openai |
| `OPENROUTER_API_KEY` | (none) | OpenRouter API key |
| `OPENAI_API_KEY` | (none) | OpenAI API key |
| `VOICE_TTS_VOICE` | `alba` | Default TTS voice |
| `VOICE_DEVICE` | `cuda` | ASR device: cuda (GPU) or cpu |
| `VOICE_SMART_TURN_THRESHOLD` | `0.5` | Turn completion threshold (0.0-1.0) |
### Command Line Options
```
python voice_client.py [options]
Options:
--skill, -s SKILL Load a character skill
--list-skills, -l List available skills
--list-providers List available LLM providers
--provider, -p PROV Force provider: lm_studio, openrouter, openai
--mode, -m MODE Input mode: vad, ptt, or type
--device DEVICE ASR device: cuda or cpu
--api-url URL OpenAI-compatible API URL
--api-key KEY API key for the provider
--model MODEL Model name to use
--headless Run without UI (for MCP integration)
```
## MCP Integration
LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.
### Start MCP Server
```batch
start_voicemode.bat
```
### Available Tools
- `speak(text)` - Speak text aloud (TTS)
- `listen()` - Listen for speech (STT)
- `converse(text)` - Speak and listen for response
- `start_voice(skill)` - Start voice chat with a character
- `stop_voice()` - Stop voice chat
- `voice_status()` - Check if voice mode is running
- `list_voices()` - List available characters
- `provider_status()` - Show available providers
- `set_speech_mode(mode)` - Set verbosity: roleplay, coder, minimal, silent
- `get_speech_mode()` - Get current speech mode
### Slash Commands
These slash commands are available in Claude Code and compatible AI assistants:
| Command | Description |
|---------|-------------|
| `/speak <text>` | TTS only - speak text aloud |
| `/listen` | STT only - transcribe speech to text |
| `/tts-only` | Mode: Claude speaks, you type |
| `/stt-only` | Mode: You speak, Claude responds in text |
| `/voice-roleplay` | Full expressive speech output |
| `/voice-coder` | Summaries & completions only |
| `/voice` | Speak one message via voice |
| `/voice-on` | Start continuous voice mode |
| `/voice-off` | Stop voice mode |
| `/voice-typing` | You type, Claude speaks (hold RIGHT SHIFT to speak) |
### Speech Modes
Control how much Claude speaks:
| Mode | Description |
|------|-------------|
| `roleplay` | Full expressive output - speaks everything naturally (default) |
| `coder` | Summaries only - task completions, errors, questions |
| `minimal` | Very terse - only critical announcements |
| `silent` | No speech - text only |
Switch modes with `/voice-roleplay`, `/voice-coder`, or the `set_speech_mode()` tool.
### Voice Commands While Running
- Say **"stop"** or **"goodbye"** to end
- Say **"change voice"** to switch characters
## GPU Support
Parakeet TDT uses ONNX Runtime with GPU acceleration:
1. **TensorRT** (best performance) - Auto-detected if installed
2. **CUDA** (good performance) - Requires CUDA/cuDNN
3. **CPU** (fallback) - Always available
Check GPU status:
```batch
.venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())"
```
## Troubleshooting
### No audio detected
- Check microphone permissions
- Verify default audio device: `python -c "import sounddevice; print(sounddevice.query_devices())"`
### Pocket TTS not working
- Accept license: https://huggingface.co/kyutai/pocket-tts
- Login: `.venv\Scripts\huggingface-cli.exe login`
### LM Studio connection failed
- Verify LM Studio server is running
- Check URL: default is `http://localhost:1234`
- Ensure a model is loaded
### OpenRouter/OpenAI not working
- Verify API key is set in `.env` or environment
- Check `python voice_client.py --list-providers` to see detected providers
### GPU/CUDA not working
- Ensure NVIDIA drivers are installed
- Install CUDA Toolkit 12.x
- Reinstall: `pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]`
## Credits
- **Parakeet TDT**: [NVIDIA NeMo](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) - Apache 2.0
- **Pocket TTS**: [Kyutai](https://huggingface.co/kyutai/pocket-tts) - CC-BY-4.0
- **Smart Turn**: [Livekit](https://github.com/livekit/smart-turn) - Apache 2.0
- **Silero VAD**: [Silero](https://github.com/snakers4/silero-vad) - MIT