LocalVoiceMode
Allows the use of OpenAI's language models as the conversational backend, providing the intelligence for processing voice inputs and generating spoken responses.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@LocalVoiceModestart a voice session using the Hermione personality"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
LocalVoiceMode
Local voice interface with Character Skills - Self-contained voice chat system.
Uses Parakeet TDT 0.6B (NVIDIA) for fast GPU speech recognition, Pocket TTS (Kyutai) for natural text-to-speech. Auto-detects LM Studio, OpenRouter, or OpenAI as the LLM backend.
Features
Parakeet TDT ASR - NVIDIA's fast speech recognition (GPU accelerated via ONNX)
Pocket TTS - Kyutai's natural-sounding text-to-speech with voice cloning
Smart Turn Detection - Knows when you're done speaking, not just detecting silence
Auto-Provider Detection - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI
Modern Rich UI - Beautiful terminal interface with audio visualization
Character Skills - Load different personalities with custom voices
MCP Integration - Works with Claude Code and other MCP-enabled tools
Quick Start
1. Clone and Setup
git clone https://github.com/your-username/localvoicemode.git
cd localvoicemode
setup.batThis creates a virtual environment and installs all dependencies.
2. HuggingFace Login (Required)
Pocket TTS requires accepting the model license:
.venv\Scripts\huggingface-cli.exe loginThen accept the license at: https://huggingface.co/kyutai/pocket-tts
3. Configure LLM Provider
Option A: LM Studio (Recommended for local)
Open LM Studio
Load your preferred model
Start the local server (default:
http://localhost:1234)
Option B: OpenRouter
set OPENROUTER_API_KEY=your-key-hereGet your key at: https://openrouter.ai/keys
Option C: OpenAI
set OPENAI_API_KEY=your-key-here4. Run Voice Chat
REM Default assistant
VoiceChat.bat
REM With Hermione character
VoiceChat.bat hermione
REM Push-to-talk mode
VoiceChat.bat hermione pttProvider Detection
LocalVoiceMode automatically detects available providers in this order:
LM Studio - Scans ports 1234, 1235, 1236, 8080, 5000
OpenRouter - Uses
OPENROUTER_API_KEYenvironment variableOpenAI - Uses
OPENAI_API_KEYenvironment variable
Force a specific provider with VOICE_PROVIDER=openrouter (or lm_studio, openai).
Directory Structure
localvoicemode/
├── voice_client.py # Main voice client entry point
├── mcp_server.py # MCP server for AI assistant integration
├── requirements.txt # Python dependencies
├── setup.bat # Setup script (run first!)
├── VoiceChat.bat # Launch script
├── start_voicemode.bat # MCP server launcher
│
├── src/localvoicemode/ # Core package
│ ├── audio/ # Audio recording
│ ├── speech/ # ASR, TTS, VAD, filters
│ ├── llm/ # Provider management
│ ├── skills/ # Skill loading
│ └── state/ # State machines, config
│
├── skills/ # Character skills
│ ├── assistant-default/ # Default assistant
│ └── hermione-companion/
│ ├── SKILL.md # Character definition
│ ├── references/ # Lore files
│ └── scripts/ # Helper scripts
│
└── voice_references/ # Custom voice files (.wav)Skills System
Skills define character personalities, system prompts, and optional knowledge.
List Available Skills
.venv\Scripts\python.exe voice_client.py --list-skillsCreate a New Skill
Create directory:
skills/my-skill/Create
SKILL.md:
---
id: my-skill
name: My Character
display_name: "My Character"
description: Brief description
metadata:
greeting: "Hello! How can I help?"
---
# My Character
## System Prompt
You are My Character. [Full instructions here...]Add optional files:
reference.wav- Voice clone source (10s of clear speech)avatar.png- Character imagereferences/- Knowledge markdown files
Voice Cloning
Pocket TTS supports voice cloning from reference audio.
Requirements:
WAV format (16-bit PCM)
~10 seconds of clean speech
Clear recording, minimal background noise
Place the file at:
skills/my-skill/reference.wav(per-skill), orvoice_references/my-skill.wav(global)
Voice Modes
VAD Mode (default)
Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.
VoiceChat.bat hermionePTT Mode
Push-to-Talk - hold Space to record, release to send.
VoiceChat.bat hermione pttConfiguration
Environment Variables
Variable | Default | Description |
|
| OpenAI-compatible API URL |
| (none) | API key for the provider |
| (auto) | Model name to use |
| (auto) | Force provider: lm_studio, openrouter, openai |
| (none) | OpenRouter API key |
| (none) | OpenAI API key |
|
| Default TTS voice |
|
| ASR device: cuda (GPU) or cpu |
|
| Turn completion threshold (0.0-1.0) |
Command Line Options
python voice_client.py [options]
Options:
--skill, -s SKILL Load a character skill
--list-skills, -l List available skills
--list-providers List available LLM providers
--provider, -p PROV Force provider: lm_studio, openrouter, openai
--mode, -m MODE Input mode: vad, ptt, or type
--device DEVICE ASR device: cuda or cpu
--api-url URL OpenAI-compatible API URL
--api-key KEY API key for the provider
--model MODEL Model name to use
--headless Run without UI (for MCP integration)MCP Integration
LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.
Start MCP Server
start_voicemode.batAvailable Tools
speak(text)- Speak text aloud (TTS)listen()- Listen for speech (STT)converse(text)- Speak and listen for responsestart_voice(skill)- Start voice chat with a characterstop_voice()- Stop voice chatvoice_status()- Check if voice mode is runninglist_voices()- List available charactersprovider_status()- Show available providersset_speech_mode(mode)- Set verbosity: roleplay, coder, minimal, silentget_speech_mode()- Get current speech mode
Slash Commands
These slash commands are available in Claude Code and compatible AI assistants:
Command | Description |
| TTS only - speak text aloud |
| STT only - transcribe speech to text |
| Mode: Claude speaks, you type |
| Mode: You speak, Claude responds in text |
| Full expressive speech output |
| Summaries & completions only |
| Speak one message via voice |
| Start continuous voice mode |
| Stop voice mode |
| You type, Claude speaks (hold RIGHT SHIFT to speak) |
Speech Modes
Control how much Claude speaks:
Mode | Description |
| Full expressive output - speaks everything naturally (default) |
| Summaries only - task completions, errors, questions |
| Very terse - only critical announcements |
| No speech - text only |
Switch modes with /voice-roleplay, /voice-coder, or the set_speech_mode() tool.
Voice Commands While Running
Say "stop" or "goodbye" to end
Say "change voice" to switch characters
GPU Support
Parakeet TDT uses ONNX Runtime with GPU acceleration:
TensorRT (best performance) - Auto-detected if installed
CUDA (good performance) - Requires CUDA/cuDNN
CPU (fallback) - Always available
Check GPU status:
.venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())"Troubleshooting
No audio detected
Check microphone permissions
Verify default audio device:
python -c "import sounddevice; print(sounddevice.query_devices())"
Pocket TTS not working
Accept license: https://huggingface.co/kyutai/pocket-tts
Login:
.venv\Scripts\huggingface-cli.exe login
LM Studio connection failed
Verify LM Studio server is running
Check URL: default is
http://localhost:1234Ensure a model is loaded
OpenRouter/OpenAI not working
Verify API key is set in
.envor environmentCheck
python voice_client.py --list-providersto see detected providers
GPU/CUDA not working
Ensure NVIDIA drivers are installed
Install CUDA Toolkit 12.x
Reinstall:
pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]
Credits
Parakeet TDT: NVIDIA NeMo - Apache 2.0
Pocket TTS: Kyutai - CC-BY-4.0
Smart Turn: Livekit - Apache 2.0
Silero VAD: Silero - MIT
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Appeared in Searches
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/DevMan57/voiceblitz-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server