LocalVoiceMode

README.md•9.11 KiB

# LocalVoiceMode **Local voice interface with Character Skills** - Self-contained voice chat system. Uses **Parakeet TDT 0.6B** (NVIDIA) for fast GPU speech recognition, **Pocket TTS** (Kyutai) for natural text-to-speech. Auto-detects **LM Studio**, **OpenRouter**, or **OpenAI** as the LLM backend. ## Features - **Parakeet TDT ASR** - NVIDIA's fast speech recognition (GPU accelerated via ONNX) - **Pocket TTS** - Kyutai's natural-sounding text-to-speech with voice cloning - **Smart Turn Detection** - Knows when you're done speaking, not just detecting silence - **Auto-Provider Detection** - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI - **Modern Rich UI** - Beautiful terminal interface with audio visualization - **Character Skills** - Load different personalities with custom voices - **MCP Integration** - Works with Claude Code and other MCP-enabled tools ## Quick Start ### 1. Clone and Setup ```batch git clone https://github.com/your-username/localvoicemode.git cd localvoicemode setup.bat ``` This creates a virtual environment and installs all dependencies. ### 2. HuggingFace Login (Required) Pocket TTS requires accepting the model license: ```batch .venv\Scripts\huggingface-cli.exe login ``` Then accept the license at: https://huggingface.co/kyutai/pocket-tts ### 3. Configure LLM Provider **Option A: LM Studio (Recommended for local)** 1. Open LM Studio 2. Load your preferred model 3. Start the local server (default: `http://localhost:1234`) **Option B: OpenRouter** ```batch set OPENROUTER_API_KEY=your-key-here ``` Get your key at: https://openrouter.ai/keys **Option C: OpenAI** ```batch set OPENAI_API_KEY=your-key-here ``` ### 4. Run Voice Chat ```batch REM Default assistant VoiceChat.bat REM With Hermione character VoiceChat.bat hermione REM Push-to-talk mode VoiceChat.bat hermione ptt ``` ## Provider Detection LocalVoiceMode automatically detects available providers in this order: 1. **LM Studio** - Scans ports 1234, 1235, 1236, 8080, 5000 2. **OpenRouter** - Uses `OPENROUTER_API_KEY` environment variable 3. **OpenAI** - Uses `OPENAI_API_KEY` environment variable Force a specific provider with `VOICE_PROVIDER=openrouter` (or `lm_studio`, `openai`). ## Directory Structure ``` localvoicemode/ ├── voice_client.py # Main voice client entry point ├── mcp_server.py # MCP server for AI assistant integration ├── requirements.txt # Python dependencies ├── setup.bat # Setup script (run first!) ├── VoiceChat.bat # Launch script ├── start_voicemode.bat # MCP server launcher │ ├── src/localvoicemode/ # Core package │ ├── audio/ # Audio recording │ ├── speech/ # ASR, TTS, VAD, filters │ ├── llm/ # Provider management │ ├── skills/ # Skill loading │ └── state/ # State machines, config │ ├── skills/ # Character skills │ ├── assistant-default/ # Default assistant │ └── hermione-companion/ │ ├── SKILL.md # Character definition │ ├── references/ # Lore files │ └── scripts/ # Helper scripts │ └── voice_references/ # Custom voice files (.wav) ``` ## Skills System Skills define character personalities, system prompts, and optional knowledge. ### List Available Skills ```batch .venv\Scripts\python.exe voice_client.py --list-skills ``` ### Create a New Skill 1. Create directory: `skills/my-skill/` 2. Create `SKILL.md`: ```yaml --- id: my-skill name: My Character display_name: "My Character" description: Brief description metadata: greeting: "Hello! How can I help?" --- # My Character ## System Prompt You are My Character. [Full instructions here...] ``` 3. Add optional files: - `reference.wav` - Voice clone source (10s of clear speech) - `avatar.png` - Character image - `references/` - Knowledge markdown files ### Voice Cloning Pocket TTS supports voice cloning from reference audio. **Requirements:** - WAV format (16-bit PCM) - ~10 seconds of clean speech - Clear recording, minimal background noise Place the file at: - `skills/my-skill/reference.wav` (per-skill), or - `voice_references/my-skill.wav` (global) ## Voice Modes ### VAD Mode (default) Voice Activity Detection with Smart Turn - automatically detects when you're done speaking. ```batch VoiceChat.bat hermione ``` ### PTT Mode Push-to-Talk - hold Space to record, release to send. ```batch VoiceChat.bat hermione ptt ``` ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `VOICE_API_URL` | `http://localhost:1234/v1` | OpenAI-compatible API URL | | `VOICE_API_KEY` | (none) | API key for the provider | | `VOICE_MODEL` | (auto) | Model name to use | | `VOICE_PROVIDER` | (auto) | Force provider: lm_studio, openrouter, openai | | `OPENROUTER_API_KEY` | (none) | OpenRouter API key | | `OPENAI_API_KEY` | (none) | OpenAI API key | | `VOICE_TTS_VOICE` | `alba` | Default TTS voice | | `VOICE_DEVICE` | `cuda` | ASR device: cuda (GPU) or cpu | | `VOICE_SMART_TURN_THRESHOLD` | `0.5` | Turn completion threshold (0.0-1.0) | ### Command Line Options ``` python voice_client.py [options] Options: --skill, -s SKILL Load a character skill --list-skills, -l List available skills --list-providers List available LLM providers --provider, -p PROV Force provider: lm_studio, openrouter, openai --mode, -m MODE Input mode: vad, ptt, or type --device DEVICE ASR device: cuda or cpu --api-url URL OpenAI-compatible API URL --api-key KEY API key for the provider --model MODEL Model name to use --headless Run without UI (for MCP integration) ``` ## MCP Integration LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools. ### Start MCP Server ```batch start_voicemode.bat ``` ### Available Tools - `speak(text)` - Speak text aloud (TTS) - `listen()` - Listen for speech (STT) - `converse(text)` - Speak and listen for response - `start_voice(skill)` - Start voice chat with a character - `stop_voice()` - Stop voice chat - `voice_status()` - Check if voice mode is running - `list_voices()` - List available characters - `provider_status()` - Show available providers - `set_speech_mode(mode)` - Set verbosity: roleplay, coder, minimal, silent - `get_speech_mode()` - Get current speech mode ### Slash Commands These slash commands are available in Claude Code and compatible AI assistants: | Command | Description | |---------|-------------| | `/speak <text>` | TTS only - speak text aloud | | `/listen` | STT only - transcribe speech to text | | `/tts-only` | Mode: Claude speaks, you type | | `/stt-only` | Mode: You speak, Claude responds in text | | `/voice-roleplay` | Full expressive speech output | | `/voice-coder` | Summaries & completions only | | `/voice` | Speak one message via voice | | `/voice-on` | Start continuous voice mode | | `/voice-off` | Stop voice mode | | `/voice-typing` | You type, Claude speaks (hold RIGHT SHIFT to speak) | ### Speech Modes Control how much Claude speaks: | Mode | Description | |------|-------------| | `roleplay` | Full expressive output - speaks everything naturally (default) | | `coder` | Summaries only - task completions, errors, questions | | `minimal` | Very terse - only critical announcements | | `silent` | No speech - text only | Switch modes with `/voice-roleplay`, `/voice-coder`, or the `set_speech_mode()` tool. ### Voice Commands While Running - Say **"stop"** or **"goodbye"** to end - Say **"change voice"** to switch characters ## GPU Support Parakeet TDT uses ONNX Runtime with GPU acceleration: 1. **TensorRT** (best performance) - Auto-detected if installed 2. **CUDA** (good performance) - Requires CUDA/cuDNN 3. **CPU** (fallback) - Always available Check GPU status: ```batch .venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())" ``` ## Troubleshooting ### No audio detected - Check microphone permissions - Verify default audio device: `python -c "import sounddevice; print(sounddevice.query_devices())"` ### Pocket TTS not working - Accept license: https://huggingface.co/kyutai/pocket-tts - Login: `.venv\Scripts\huggingface-cli.exe login` ### LM Studio connection failed - Verify LM Studio server is running - Check URL: default is `http://localhost:1234` - Ensure a model is loaded ### OpenRouter/OpenAI not working - Verify API key is set in `.env` or environment - Check `python voice_client.py --list-providers` to see detected providers ### GPU/CUDA not working - Ensure NVIDIA drivers are installed - Install CUDA Toolkit 12.x - Reinstall: `pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]` ## Credits - **Parakeet TDT**: [NVIDIA NeMo](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) - Apache 2.0 - **Pocket TTS**: [Kyutai](https://huggingface.co/kyutai/pocket-tts) - CC-BY-4.0 - **Smart Turn**: [Livekit](https://github.com/livekit/smart-turn) - Apache 2.0 - **Silero VAD**: [Silero](https://github.com/snakers4/silero-vad) - MIT

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DevMan57/voiceblitz-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•9.11 KiB