# LocalVoiceMode v2
Hybrid voice interface with Rust GUI and Python ML backend.
## Architecture
```
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ voicemode-ui (Rust) │◄───────►│ ml-server (Python) │
│ - egui + wgpu GUI │ JSON-RPC│ - TTS/ASR/SER models │
│ - Bloom shader effects │ │ - LLM client │
│ - Audio I/O (cpal) │ │ - Skill loader │
│ - VAD (Silero ONNX) │ │ - Model registry │
└─────────────────────────────┘ └─────────────────────────────┘
```
## Features
- **Lightsaber Waveform Visualizer** - GPU bloom shaders with customizable colors
- **Model Hot-Swap** - Plugin system for TTS/ASR/SER models
- **Emotion Tracking** - SenseVoice SER feeds emotion to LLM context
- **Hybrid App** - Full settings GUI + minimal overlay mode
## Color Palette
| Name | Hex | Use |
|------|-----|-----|
| Lapis Lazuli | `#2659A5` | Primary - listening |
| Mace Windu Purple | `#8033CC` | Accent - processing |
| Emerald | `#33B24D` | Success - speaking |
| Burnt Orange | `#CC661A` | Warning |
| Sith Red | `#E61A1A` | Error |
## Quick Start
### Prerequisites
- Rust 1.75+ with cargo
- Python 3.11+ with uv or pip
- NVIDIA GPU (optional, for CUDA acceleration)
### Build & Run
```bash
# Terminal 1: Start ML server
cd ml-server
uv venv && uv pip install -e .
python -m ml_server
# Terminal 2: Start GUI
cd voicemode-ui
cargo run --release
```
## Project Structure
```
v2/
├── voicemode-ui/ # Rust GUI application
│ ├── src/
│ │ ├── ui/ # egui components + shaders
│ │ ├── audio/ # cpal capture/playback
│ │ ├── ipc/ # JSON-RPC client
│ │ └── state/ # App state management
│ └── assets/ # Textures, shaders
│
├── ml-server/ # Python ML backend
│ └── ml_server/
│ ├── models/ # TTS, ASR, SER adapters
│ ├── registry/ # Model plugin system
│ ├── ipc/ # JSON-RPC server
│ ├── llm/ # LLM client
│ └── skills/ # Character skills
│
├── PROMPT.md # Autonomous agent instructions
└── CLAUDE.md # Claude Code instructions
```
## Supported Models
### TTS (Text-to-Speech)
- [x] Pocket TTS (default)
- [ ] IndexTTS2 (planned)
- [ ] Qwen3-TTS (planned)
### ASR (Speech Recognition)
- [x] Parakeet TDT 0.6B v3
### SER (Speech Emotion Recognition)
- [ ] SenseVoice-Small (planned)
## License
MIT