Skip to main content
Glama

LocalVoiceMode

Local voice interface with Character Skills - Self-contained voice chat system.

Uses Parakeet TDT 0.6B (NVIDIA) for fast GPU speech recognition, Pocket TTS (Kyutai) for natural text-to-speech. Auto-detects LM Studio, OpenRouter, or OpenAI as the LLM backend.

Features

  • Parakeet TDT ASR - NVIDIA's fast speech recognition (GPU accelerated via ONNX)

  • Pocket TTS - Kyutai's natural-sounding text-to-speech with voice cloning

  • Smart Turn Detection - Knows when you're done speaking, not just detecting silence

  • Auto-Provider Detection - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI

  • Modern Rich UI - Beautiful terminal interface with audio visualization

  • Character Skills - Load different personalities with custom voices

  • MCP Integration - Works with Claude Code and other MCP-enabled tools

Quick Start

1. Clone and Setup

git clone https://github.com/your-username/localvoicemode.git cd localvoicemode setup.bat

This creates a virtual environment and installs all dependencies.

2. HuggingFace Login (Required)

Pocket TTS requires accepting the model license:

.venv\Scripts\huggingface-cli.exe login

Then accept the license at: https://huggingface.co/kyutai/pocket-tts

3. Configure LLM Provider

Option A: LM Studio (Recommended for local)

  1. Open LM Studio

  2. Load your preferred model

  3. Start the local server (default: http://localhost:1234)

Option B: OpenRouter

set OPENROUTER_API_KEY=your-key-here

Get your key at: https://openrouter.ai/keys

Option C: OpenAI

set OPENAI_API_KEY=your-key-here

4. Run Voice Chat

REM Default assistant VoiceChat.bat REM With Hermione character VoiceChat.bat hermione REM Push-to-talk mode VoiceChat.bat hermione ptt

Provider Detection

LocalVoiceMode automatically detects available providers in this order:

  1. LM Studio - Scans ports 1234, 1235, 1236, 8080, 5000

  2. OpenRouter - Uses OPENROUTER_API_KEY environment variable

  3. OpenAI - Uses OPENAI_API_KEY environment variable

Force a specific provider with VOICE_PROVIDER=openrouter (or lm_studio, openai).

Directory Structure

localvoicemode/ ├── voice_client.py # Main voice client entry point ├── mcp_server.py # MCP server for AI assistant integration ├── requirements.txt # Python dependencies ├── setup.bat # Setup script (run first!) ├── VoiceChat.bat # Launch script ├── start_voicemode.bat # MCP server launcher │ ├── src/localvoicemode/ # Core package │ ├── audio/ # Audio recording │ ├── speech/ # ASR, TTS, VAD, filters │ ├── llm/ # Provider management │ ├── skills/ # Skill loading │ └── state/ # State machines, config │ ├── skills/ # Character skills │ ├── assistant-default/ # Default assistant │ └── hermione-companion/ │ ├── SKILL.md # Character definition │ ├── references/ # Lore files │ └── scripts/ # Helper scripts │ └── voice_references/ # Custom voice files (.wav)

Skills System

Skills define character personalities, system prompts, and optional knowledge.

List Available Skills

.venv\Scripts\python.exe voice_client.py --list-skills

Create a New Skill

  1. Create directory: skills/my-skill/

  2. Create SKILL.md:

--- id: my-skill name: My Character display_name: "My Character" description: Brief description metadata: greeting: "Hello! How can I help?" --- # My Character ## System Prompt You are My Character. [Full instructions here...]
  1. Add optional files:

    • reference.wav - Voice clone source (10s of clear speech)

    • avatar.png - Character image

    • references/ - Knowledge markdown files

Voice Cloning

Pocket TTS supports voice cloning from reference audio.

Requirements:

  • WAV format (16-bit PCM)

  • ~10 seconds of clean speech

  • Clear recording, minimal background noise

Place the file at:

  • skills/my-skill/reference.wav (per-skill), or

  • voice_references/my-skill.wav (global)

Voice Modes

VAD Mode (default)

Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.

VoiceChat.bat hermione

PTT Mode

Push-to-Talk - hold Space to record, release to send.

VoiceChat.bat hermione ptt

Configuration

Environment Variables

Variable

Default

Description

VOICE_API_URL

http://localhost:1234/v1

OpenAI-compatible API URL

VOICE_API_KEY

(none)

API key for the provider

VOICE_MODEL

(auto)

Model name to use

VOICE_PROVIDER

(auto)

Force provider: lm_studio, openrouter, openai

OPENROUTER_API_KEY

(none)

OpenRouter API key

OPENAI_API_KEY

(none)

OpenAI API key

VOICE_TTS_VOICE

alba

Default TTS voice

VOICE_DEVICE

cuda

ASR device: cuda (GPU) or cpu

VOICE_SMART_TURN_THRESHOLD

0.5

Turn completion threshold (0.0-1.0)

Command Line Options

python voice_client.py [options] Options: --skill, -s SKILL Load a character skill --list-skills, -l List available skills --list-providers List available LLM providers --provider, -p PROV Force provider: lm_studio, openrouter, openai --mode, -m MODE Input mode: vad, ptt, or type --device DEVICE ASR device: cuda or cpu --api-url URL OpenAI-compatible API URL --api-key KEY API key for the provider --model MODEL Model name to use --headless Run without UI (for MCP integration)

MCP Integration

LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.

Start MCP Server

start_voicemode.bat

Available Tools

  • speak(text) - Speak text aloud (TTS)

  • listen() - Listen for speech (STT)

  • converse(text) - Speak and listen for response

  • start_voice(skill) - Start voice chat with a character

  • stop_voice() - Stop voice chat

  • voice_status() - Check if voice mode is running

  • list_voices() - List available characters

  • provider_status() - Show available providers

  • set_speech_mode(mode) - Set verbosity: roleplay, coder, minimal, silent

  • get_speech_mode() - Get current speech mode

Slash Commands

These slash commands are available in Claude Code and compatible AI assistants:

Command

Description

/speak <text>

TTS only - speak text aloud

/listen

STT only - transcribe speech to text

/tts-only

Mode: Claude speaks, you type

/stt-only

Mode: You speak, Claude responds in text

/voice-roleplay

Full expressive speech output

/voice-coder

Summaries & completions only

/voice

Speak one message via voice

/voice-on

Start continuous voice mode

/voice-off

Stop voice mode

/voice-typing

You type, Claude speaks (hold RIGHT SHIFT to speak)

Speech Modes

Control how much Claude speaks:

Mode

Description

roleplay

Full expressive output - speaks everything naturally (default)

coder

Summaries only - task completions, errors, questions

minimal

Very terse - only critical announcements

silent

No speech - text only

Switch modes with /voice-roleplay, /voice-coder, or the set_speech_mode() tool.

Voice Commands While Running

  • Say "stop" or "goodbye" to end

  • Say "change voice" to switch characters

GPU Support

Parakeet TDT uses ONNX Runtime with GPU acceleration:

  1. TensorRT (best performance) - Auto-detected if installed

  2. CUDA (good performance) - Requires CUDA/cuDNN

  3. CPU (fallback) - Always available

Check GPU status:

.venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())"

Troubleshooting

No audio detected

  • Check microphone permissions

  • Verify default audio device: python -c "import sounddevice; print(sounddevice.query_devices())"

Pocket TTS not working

LM Studio connection failed

  • Verify LM Studio server is running

  • Check URL: default is http://localhost:1234

  • Ensure a model is loaded

OpenRouter/OpenAI not working

  • Verify API key is set in .env or environment

  • Check python voice_client.py --list-providers to see detected providers

GPU/CUDA not working

  • Ensure NVIDIA drivers are installed

  • Install CUDA Toolkit 12.x

  • Reinstall: pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]

Credits

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DevMan57/voiceblitz-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server