Skip to main content
Glama

LocalVoiceMode

Local voice interface with Character Skills - Self-contained voice chat system.

Uses Parakeet TDT 0.6B (NVIDIA) for fast GPU speech recognition, Pocket TTS (Kyutai) for natural text-to-speech. Auto-detects LM Studio, OpenRouter, or OpenAI as the LLM backend.

Features

  • Parakeet TDT ASR - NVIDIA's fast speech recognition (GPU accelerated via ONNX)

  • Pocket TTS - Kyutai's natural-sounding text-to-speech with voice cloning

  • Smart Turn Detection - Knows when you're done speaking, not just detecting silence

  • Auto-Provider Detection - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI

  • Modern Rich UI - Beautiful terminal interface with audio visualization

  • Character Skills - Load different personalities with custom voices

  • MCP Integration - Works with Claude Code and other MCP-enabled tools

Quick Start

1. Clone and Setup

git clone https://github.com/your-username/localvoicemode.git
cd localvoicemode
setup.bat

This creates a virtual environment and installs all dependencies.

2. HuggingFace Login (Required)

Pocket TTS requires accepting the model license:

.venv\Scripts\huggingface-cli.exe login

Then accept the license at: https://huggingface.co/kyutai/pocket-tts

3. Configure LLM Provider

Option A: LM Studio (Recommended for local)

  1. Open LM Studio

  2. Load your preferred model

  3. Start the local server (default: http://localhost:1234)

Option B: OpenRouter

set OPENROUTER_API_KEY=your-key-here

Get your key at: https://openrouter.ai/keys

Option C: OpenAI

set OPENAI_API_KEY=your-key-here

4. Run Voice Chat

REM Default assistant
VoiceChat.bat

REM With Hermione character
VoiceChat.bat hermione

REM Push-to-talk mode
VoiceChat.bat hermione ptt

Provider Detection

LocalVoiceMode automatically detects available providers in this order:

  1. LM Studio - Scans ports 1234, 1235, 1236, 8080, 5000

  2. OpenRouter - Uses OPENROUTER_API_KEY environment variable

  3. OpenAI - Uses OPENAI_API_KEY environment variable

Force a specific provider with VOICE_PROVIDER=openrouter (or lm_studio, openai).

Directory Structure

localvoicemode/
├── voice_client.py        # Main voice client entry point
├── mcp_server.py          # MCP server for AI assistant integration
├── requirements.txt       # Python dependencies
├── setup.bat              # Setup script (run first!)
├── VoiceChat.bat          # Launch script
├── start_voicemode.bat    # MCP server launcher
│
├── src/localvoicemode/    # Core package
│   ├── audio/             # Audio recording
│   ├── speech/            # ASR, TTS, VAD, filters
│   ├── llm/               # Provider management
│   ├── skills/            # Skill loading
│   └── state/             # State machines, config
│
├── skills/                # Character skills
│   ├── assistant-default/ # Default assistant
│   └── hermione-companion/
│       ├── SKILL.md       # Character definition
│       ├── references/    # Lore files
│       └── scripts/       # Helper scripts
│
└── voice_references/      # Custom voice files (.wav)

Skills System

Skills define character personalities, system prompts, and optional knowledge.

List Available Skills

.venv\Scripts\python.exe voice_client.py --list-skills

Create a New Skill

  1. Create directory: skills/my-skill/

  2. Create SKILL.md:

---
id: my-skill
name: My Character
display_name: "My Character"
description: Brief description
metadata:
  greeting: "Hello! How can I help?"
---

# My Character

## System Prompt

You are My Character. [Full instructions here...]
  1. Add optional files:

    • reference.wav - Voice clone source (10s of clear speech)

    • avatar.png - Character image

    • references/ - Knowledge markdown files

Voice Cloning

Pocket TTS supports voice cloning from reference audio.

Requirements:

  • WAV format (16-bit PCM)

  • ~10 seconds of clean speech

  • Clear recording, minimal background noise

Place the file at:

  • skills/my-skill/reference.wav (per-skill), or

  • voice_references/my-skill.wav (global)

Voice Modes

VAD Mode (default)

Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.

VoiceChat.bat hermione

PTT Mode

Push-to-Talk - hold Space to record, release to send.

VoiceChat.bat hermione ptt

Configuration

Environment Variables

Variable

Default

Description

VOICE_API_URL

http://localhost:1234/v1

OpenAI-compatible API URL

VOICE_API_KEY

(none)

API key for the provider

VOICE_MODEL

(auto)

Model name to use

VOICE_PROVIDER

(auto)

Force provider: lm_studio, openrouter, openai

OPENROUTER_API_KEY

(none)

OpenRouter API key

OPENAI_API_KEY

(none)

OpenAI API key

VOICE_TTS_VOICE

alba

Default TTS voice

VOICE_DEVICE

cuda

ASR device: cuda (GPU) or cpu

VOICE_SMART_TURN_THRESHOLD

0.5

Turn completion threshold (0.0-1.0)

Command Line Options

python voice_client.py [options]

Options:
  --skill, -s SKILL      Load a character skill
  --list-skills, -l      List available skills
  --list-providers       List available LLM providers
  --provider, -p PROV    Force provider: lm_studio, openrouter, openai
  --mode, -m MODE        Input mode: vad, ptt, or type
  --device DEVICE        ASR device: cuda or cpu
  --api-url URL          OpenAI-compatible API URL
  --api-key KEY          API key for the provider
  --model MODEL          Model name to use
  --headless             Run without UI (for MCP integration)

MCP Integration

LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.

Start MCP Server

start_voicemode.bat

Available Tools

  • speak(text) - Speak text aloud (TTS)

  • listen() - Listen for speech (STT)

  • converse(text) - Speak and listen for response

  • start_voice(skill) - Start voice chat with a character

  • stop_voice() - Stop voice chat

  • voice_status() - Check if voice mode is running

  • list_voices() - List available characters

  • provider_status() - Show available providers

  • set_speech_mode(mode) - Set verbosity: roleplay, coder, minimal, silent

  • get_speech_mode() - Get current speech mode

Slash Commands

These slash commands are available in Claude Code and compatible AI assistants:

Command

Description

/speak <text>

TTS only - speak text aloud

/listen

STT only - transcribe speech to text

/tts-only

Mode: Claude speaks, you type

/stt-only

Mode: You speak, Claude responds in text

/voice-roleplay

Full expressive speech output

/voice-coder

Summaries & completions only

/voice

Speak one message via voice

/voice-on

Start continuous voice mode

/voice-off

Stop voice mode

/voice-typing

You type, Claude speaks (hold RIGHT SHIFT to speak)

Speech Modes

Control how much Claude speaks:

Mode

Description

roleplay

Full expressive output - speaks everything naturally (default)

coder

Summaries only - task completions, errors, questions

minimal

Very terse - only critical announcements

silent

No speech - text only

Switch modes with /voice-roleplay, /voice-coder, or the set_speech_mode() tool.

Voice Commands While Running

  • Say "stop" or "goodbye" to end

  • Say "change voice" to switch characters

GPU Support

Parakeet TDT uses ONNX Runtime with GPU acceleration:

  1. TensorRT (best performance) - Auto-detected if installed

  2. CUDA (good performance) - Requires CUDA/cuDNN

  3. CPU (fallback) - Always available

Check GPU status:

.venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())"

Troubleshooting

No audio detected

  • Check microphone permissions

  • Verify default audio device: python -c "import sounddevice; print(sounddevice.query_devices())"

Pocket TTS not working

LM Studio connection failed

  • Verify LM Studio server is running

  • Check URL: default is http://localhost:1234

  • Ensure a model is loaded

OpenRouter/OpenAI not working

  • Verify API key is set in .env or environment

  • Check python voice_client.py --list-providers to see detected providers

GPU/CUDA not working

  • Ensure NVIDIA drivers are installed

  • Install CUDA Toolkit 12.x

  • Reinstall: pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]

Credits

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DevMan57/voiceblitz-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server