Which integrations are available for this server?

Allows the use of OpenAI's language models as the conversational backend, providing the intelligence for processing voice inputs and generating spoken responses.

How do I use LocalVoiceMode?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@LocalVoiceMode start a voice session using the Hermione personality" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

LocalVoiceMode

Local voice interface with Character Skills - Self-contained voice chat system.

Uses Parakeet TDT 0.6B (NVIDIA) for fast GPU speech recognition, Pocket TTS (Kyutai) for natural text-to-speech. Auto-detects LM Studio, OpenRouter, or OpenAI as the LLM backend.

Features

Parakeet TDT ASR - NVIDIA's fast speech recognition (GPU accelerated via ONNX)
Pocket TTS - Kyutai's natural-sounding text-to-speech with voice cloning
Smart Turn Detection - Knows when you're done speaking, not just detecting silence
Auto-Provider Detection - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI
Modern Rich UI - Beautiful terminal interface with audio visualization
Character Skills - Load different personalities with custom voices
MCP Integration - Works with Claude Code and other MCP-enabled tools

Quick Start

1. Clone and Setup

git clone https://github.com/your-username/localvoicemode.git cd localvoicemode setup.bat

This creates a virtual environment and installs all dependencies.

Pocket TTS requires accepting the model license:

.venv\Scripts\huggingface-cli.exe login

Then accept the license at: https://huggingface.co/kyutai/pocket-tts

3. Configure LLM Provider

Option A: LM Studio (Recommended for local)

Open LM Studio
Load your preferred model
Start the local server (default: http://localhost:1234)

Option B: OpenRouter

set OPENROUTER_API_KEY=your-key-here

Get your key at: https://openrouter.ai/keys

Option C: OpenAI

set OPENAI_API_KEY=your-key-here

4. Run Voice Chat

REM Default assistant VoiceChat.bat REM With Hermione character VoiceChat.bat hermione REM Push-to-talk mode VoiceChat.bat hermione ptt

Provider Detection

LocalVoiceMode automatically detects available providers in this order:

LM Studio - Scans ports 1234, 1235, 1236, 8080, 5000
OpenRouter - Uses OPENROUTER_API_KEY environment variable
OpenAI - Uses OPENAI_API_KEY environment variable

Force a specific provider with VOICE_PROVIDER=openrouter (or lm_studio, openai).

Directory Structure

localvoicemode/ ├── voice_client.py # Main voice client entry point ├── mcp_server.py # MCP server for AI assistant integration ├── requirements.txt # Python dependencies ├── setup.bat # Setup script (run first!) ├── VoiceChat.bat # Launch script ├── start_voicemode.bat # MCP server launcher │ ├── src/localvoicemode/ # Core package │ ├── audio/ # Audio recording │ ├── speech/ # ASR, TTS, VAD, filters │ ├── llm/ # Provider management │ ├── skills/ # Skill loading │ └── state/ # State machines, config │ ├── skills/ # Character skills │ ├── assistant-default/ # Default assistant │ └── hermione-companion/ │ ├── SKILL.md # Character definition │ ├── references/ # Lore files │ └── scripts/ # Helper scripts │ └── voice_references/ # Custom voice files (.wav)

Skills System

Skills define character personalities, system prompts, and optional knowledge.

List Available Skills

.venv\Scripts\python.exe voice_client.py --list-skills

Create a New Skill

Create directory: skills/my-skill/
Create SKILL.md:

--- id: my-skill name: My Character display_name: "My Character" description: Brief description metadata: greeting: "Hello! How can I help?" --- # My Character ## System Prompt You are My Character. [Full instructions here...]

Add optional files:
- reference.wav - Voice clone source (10s of clear speech)
- avatar.png - Character image
- references/ - Knowledge markdown files

Voice Cloning

Pocket TTS supports voice cloning from reference audio.

Requirements:

WAV format (16-bit PCM)
~10 seconds of clean speech
Clear recording, minimal background noise

Place the file at:

skills/my-skill/reference.wav (per-skill), or
voice_references/my-skill.wav (global)

Voice Modes

VAD Mode (default)

Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.

VoiceChat.bat hermione

PTT Mode

Push-to-Talk - hold Space to record, release to send.

VoiceChat.bat hermione ptt

Configuration

Environment Variables

Variable	Default	Description
`VOICE_API_URL`	`http://localhost:1234/v1`	OpenAI-compatible API URL
`VOICE_API_KEY`	(none)	API key for the provider
`VOICE_MODEL`	(auto)	Model name to use
`VOICE_PROVIDER`	(auto)	Force provider: lm_studio, openrouter, openai
`OPENROUTER_API_KEY`	(none)	OpenRouter API key
`OPENAI_API_KEY`	(none)	OpenAI API key
`VOICE_TTS_VOICE`	`alba`	Default TTS voice
`VOICE_DEVICE`	`cuda`	ASR device: cuda (GPU) or cpu
`VOICE_SMART_TURN_THRESHOLD`	`0.5`	Turn completion threshold (0.0-1.0)

Command Line Options

python voice_client.py [options] Options: --skill, -s SKILL Load a character skill --list-skills, -l List available skills --list-providers List available LLM providers --provider, -p PROV Force provider: lm_studio, openrouter, openai --mode, -m MODE Input mode: vad, ptt, or type --device DEVICE ASR device: cuda or cpu --api-url URL OpenAI-compatible API URL --api-key KEY API key for the provider --model MODEL Model name to use --headless Run without UI (for MCP integration)

MCP Integration

LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.

Start MCP Server

start_voicemode.bat

Available Tools

speak(text) - Speak text aloud (TTS)
listen() - Listen for speech (STT)
converse(text) - Speak and listen for response
start_voice(skill) - Start voice chat with a character
stop_voice() - Stop voice chat
voice_status() - Check if voice mode is running
list_voices() - List available characters
provider_status() - Show available providers
set_speech_mode(mode) - Set verbosity: roleplay, coder, minimal, silent
get_speech_mode() - Get current speech mode

Slash Commands

These slash commands are available in Claude Code and compatible AI assistants:

Command	Description
`/speak <text>`	TTS only - speak text aloud
`/listen`	STT only - transcribe speech to text
`/tts-only`	Mode: Claude speaks, you type
`/stt-only`	Mode: You speak, Claude responds in text
`/voice-roleplay`	Full expressive speech output
`/voice-coder`	Summaries & completions only
`/voice`	Speak one message via voice
`/voice-on`	Start continuous voice mode
`/voice-off`	Stop voice mode
`/voice-typing`	You type, Claude speaks (hold RIGHT SHIFT to speak)

Speech Modes

Control how much Claude speaks:

Mode	Description
`roleplay`	Full expressive output - speaks everything naturally (default)
`coder`	Summaries only - task completions, errors, questions
`minimal`	Very terse - only critical announcements
`silent`	No speech - text only

Switch modes with /voice-roleplay, /voice-coder, or the set_speech_mode() tool.

Voice Commands While Running

Say "stop" or "goodbye" to end
Say "change voice" to switch characters

GPU Support

Parakeet TDT uses ONNX Runtime with GPU acceleration:

TensorRT (best performance) - Auto-detected if installed
CUDA (good performance) - Requires CUDA/cuDNN
CPU (fallback) - Always available

Check GPU status:

.venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())"

Troubleshooting

No audio detected

Check microphone permissions
Verify default audio device: python -c "import sounddevice; print(sounddevice.query_devices())"

Pocket TTS not working

Accept license: https://huggingface.co/kyutai/pocket-tts
Login: .venv\Scripts\huggingface-cli.exe login

LM Studio connection failed

Verify LM Studio server is running
Check URL: default is http://localhost:1234
Ensure a model is loaded

OpenRouter/OpenAI not working

Verify API key is set in .env or environment
Check python voice_client.py --list-providers to see detected providers

GPU/CUDA not working

Ensure NVIDIA drivers are installed
Install CUDA Toolkit 12.x
Reinstall: pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]

Credits

Parakeet TDT: NVIDIA NeMo - Apache 2.0
Pocket TTS: Kyutai - CC-BY-4.0
Smart Turn: Livekit - Apache 2.0
Silero VAD: Silero - MIT

LocalVoiceMode

LocalVoiceMode

Features

Quick Start

1. Clone and Setup

3. Configure LLM Provider

4. Run Voice Chat

Provider Detection

Directory Structure

Skills System

List Available Skills

Create a New Skill

Voice Cloning

Voice Modes

VAD Mode (default)

PTT Mode

Configuration

Environment Variables

Command Line Options

MCP Integration

Start MCP Server

Available Tools

Slash Commands

Speech Modes

Voice Commands While Running

GPU Support

Troubleshooting

No audio detected

Pocket TTS not working

LM Studio connection failed

OpenRouter/OpenAI not working

GPU/CUDA not working

Credits

Resources

Appeared in Searches

Latest Blog Posts

MCP directory API

LocalVoiceMode

Features

Quick Start

1. Clone and Setup

2. HuggingFace Login (Required)

3. Configure LLM Provider

4. Run Voice Chat

Provider Detection

Directory Structure

Skills System

List Available Skills

Create a New Skill

Voice Cloning

Voice Modes

VAD Mode (default)

PTT Mode

Configuration

Environment Variables

Command Line Options

MCP Integration

Start MCP Server

Available Tools

Slash Commands

Speech Modes

Voice Commands While Running

GPU Support

Troubleshooting

No audio detected

Pocket TTS not working

LM Studio connection failed

OpenRouter/OpenAI not working

GPU/CUDA not working

Credits

Resources

Appeared in Searches

Latest Blog Posts

MCP directory API