Allows the use of OpenAI's language models as the conversational backend, providing the intelligence for processing voice inputs and generating spoken responses.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@LocalVoiceModestart a voice session using the Hermione personality"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
LocalVoiceMode
Local voice interface with Character Skills - Self-contained voice chat system.
Uses Parakeet TDT 0.6B (NVIDIA) for fast GPU speech recognition, Pocket TTS (Kyutai) for natural text-to-speech. Auto-detects LM Studio, OpenRouter, or OpenAI as the LLM backend.
Features
Parakeet TDT ASR - NVIDIA's fast speech recognition (GPU accelerated via ONNX)
Pocket TTS - Kyutai's natural-sounding text-to-speech with voice cloning
Smart Turn Detection - Knows when you're done speaking, not just detecting silence
Auto-Provider Detection - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI
Modern Rich UI - Beautiful terminal interface with audio visualization
Character Skills - Load different personalities with custom voices
MCP Integration - Works with Claude Code and other MCP-enabled tools
Quick Start
1. Clone and Setup
This creates a virtual environment and installs all dependencies.
2. HuggingFace Login (Required)
Pocket TTS requires accepting the model license:
Then accept the license at: https://huggingface.co/kyutai/pocket-tts
3. Configure LLM Provider
Option A: LM Studio (Recommended for local)
Open LM Studio
Load your preferred model
Start the local server (default:
http://localhost:1234)
Option B: OpenRouter
Get your key at: https://openrouter.ai/keys
Option C: OpenAI
4. Run Voice Chat
Provider Detection
LocalVoiceMode automatically detects available providers in this order:
LM Studio - Scans ports 1234, 1235, 1236, 8080, 5000
OpenRouter - Uses
OPENROUTER_API_KEYenvironment variableOpenAI - Uses
OPENAI_API_KEYenvironment variable
Force a specific provider with VOICE_PROVIDER=openrouter (or lm_studio, openai).
Directory Structure
Skills System
Skills define character personalities, system prompts, and optional knowledge.
List Available Skills
Create a New Skill
Create directory:
skills/my-skill/Create
SKILL.md:
Add optional files:
reference.wav- Voice clone source (10s of clear speech)avatar.png- Character imagereferences/- Knowledge markdown files
Voice Cloning
Pocket TTS supports voice cloning from reference audio.
Requirements:
WAV format (16-bit PCM)
~10 seconds of clean speech
Clear recording, minimal background noise
Place the file at:
skills/my-skill/reference.wav(per-skill), orvoice_references/my-skill.wav(global)
Voice Modes
VAD Mode (default)
Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.
PTT Mode
Push-to-Talk - hold Space to record, release to send.
Configuration
Environment Variables
Variable | Default | Description |
|
| OpenAI-compatible API URL |
| (none) | API key for the provider |
| (auto) | Model name to use |
| (auto) | Force provider: lm_studio, openrouter, openai |
| (none) | OpenRouter API key |
| (none) | OpenAI API key |
|
| Default TTS voice |
|
| ASR device: cuda (GPU) or cpu |
|
| Turn completion threshold (0.0-1.0) |
Command Line Options
MCP Integration
LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.
Start MCP Server
Available Tools
speak(text)- Speak text aloud (TTS)listen()- Listen for speech (STT)converse(text)- Speak and listen for responsestart_voice(skill)- Start voice chat with a characterstop_voice()- Stop voice chatvoice_status()- Check if voice mode is runninglist_voices()- List available charactersprovider_status()- Show available providersset_speech_mode(mode)- Set verbosity: roleplay, coder, minimal, silentget_speech_mode()- Get current speech mode
Slash Commands
These slash commands are available in Claude Code and compatible AI assistants:
Command | Description |
| TTS only - speak text aloud |
| STT only - transcribe speech to text |
| Mode: Claude speaks, you type |
| Mode: You speak, Claude responds in text |
| Full expressive speech output |
| Summaries & completions only |
| Speak one message via voice |
| Start continuous voice mode |
| Stop voice mode |
| You type, Claude speaks (hold RIGHT SHIFT to speak) |
Speech Modes
Control how much Claude speaks:
Mode | Description |
| Full expressive output - speaks everything naturally (default) |
| Summaries only - task completions, errors, questions |
| Very terse - only critical announcements |
| No speech - text only |
Switch modes with /voice-roleplay, /voice-coder, or the set_speech_mode() tool.
Voice Commands While Running
Say "stop" or "goodbye" to end
Say "change voice" to switch characters
GPU Support
Parakeet TDT uses ONNX Runtime with GPU acceleration:
TensorRT (best performance) - Auto-detected if installed
CUDA (good performance) - Requires CUDA/cuDNN
CPU (fallback) - Always available
Check GPU status:
Troubleshooting
No audio detected
Check microphone permissions
Verify default audio device:
python -c "import sounddevice; print(sounddevice.query_devices())"
Pocket TTS not working
Accept license: https://huggingface.co/kyutai/pocket-tts
Login:
.venv\Scripts\huggingface-cli.exe login
LM Studio connection failed
Verify LM Studio server is running
Check URL: default is
http://localhost:1234Ensure a model is loaded
OpenRouter/OpenAI not working
Verify API key is set in
.envor environmentCheck
python voice_client.py --list-providersto see detected providers
GPU/CUDA not working
Ensure NVIDIA drivers are installed
Install CUDA Toolkit 12.x
Reinstall:
pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]
Credits
Parakeet TDT: NVIDIA NeMo - Apache 2.0
Pocket TTS: Kyutai - CC-BY-4.0
Smart Turn: Livekit - Apache 2.0
Silero VAD: Silero - MIT