Skip to main content
Glama

LocalVoiceMode

Local voice interface with Character Skills - Self-contained voice chat system.

Uses Parakeet TDT 0.6B (NVIDIA) for fast GPU speech recognition, Pocket TTS (Kyutai) for natural text-to-speech. Auto-detects LM Studio, OpenRouter, or OpenAI as the LLM backend.

Features

  • Parakeet TDT ASR - NVIDIA's fast speech recognition (GPU accelerated via ONNX)

  • Pocket TTS - Kyutai's natural-sounding text-to-speech with voice cloning

  • Smart Turn Detection - Knows when you're done speaking, not just detecting silence

  • Auto-Provider Detection - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI

  • Modern Rich UI - Beautiful terminal interface with audio visualization

  • Character Skills - Load different personalities with custom voices

  • MCP Integration - Works with Claude Code and other MCP-enabled tools

Quick Start

1. Clone and Setup

git clone https://github.com/your-username/localvoicemode.git
cd localvoicemode
setup.bat

This creates a virtual environment and installs all dependencies.

2. HuggingFace Login (Required)

Pocket TTS requires accepting the model license:

.venv\Scripts\huggingface-cli.exe login

Then accept the license at: https://huggingface.co/kyutai/pocket-tts

3. Configure LLM Provider

Option A: LM Studio (Recommended for local)

  1. Open LM Studio

  2. Load your preferred model

  3. Start the local server (default: http://localhost:1234)

Option B: OpenRouter

set OPENROUTER_API_KEY=your-key-here

Get your key at: https://openrouter.ai/keys

Option C: OpenAI

set OPENAI_API_KEY=your-key-here

4. Run Voice Chat

REM Default assistant
VoiceChat.bat

REM With Hermione character
VoiceChat.bat hermione

REM Push-to-talk mode
VoiceChat.bat hermione ptt

Provider Detection

LocalVoiceMode automatically detects available providers in this order:

  1. LM Studio - Scans ports 1234, 1235, 1236, 8080, 5000

  2. OpenRouter - Uses OPENROUTER_API_KEY environment variable

  3. OpenAI - Uses OPENAI_API_KEY environment variable

Force a specific provider with VOICE_PROVIDER=openrouter (or lm_studio, openai).

Directory Structure

localvoicemode/
├── voice_client.py        # Main voice client entry point
├── mcp_server.py          # MCP server for AI assistant integration
├── requirements.txt       # Python dependencies
├── setup.bat              # Setup script (run first!)
├── VoiceChat.bat          # Launch script
├── start_voicemode.bat    # MCP server launcher
│
├── src/localvoicemode/    # Core package
│   ├── audio/             # Audio recording
│   ├── speech/            # ASR, TTS, VAD, filters
│   ├── llm/               # Provider management
│   ├── skills/            # Skill loading
│   └── state/             # State machines, config
│
├── skills/                # Character skills
│   ├── assistant-default/ # Default assistant
│   └── hermione-companion/
│       ├── SKILL.md       # Character definition
│       ├── references/    # Lore files
│       └── scripts/       # Helper scripts
│
└── voice_references/      # Custom voice files (.wav)

Skills System

Skills define character personalities, system prompts, and optional knowledge.

List Available Skills

.venv\Scripts\python.exe voice_client.py --list-skills

Create a New Skill

  1. Create directory: skills/my-skill/

  2. Create SKILL.md:

---
id: my-skill
name: My Character
display_name: "My Character"
description: Brief description
metadata:
  greeting: "Hello! How can I help?"
---

# My Character

## System Prompt

You are My Character. [Full instructions here...]
  1. Add optional files:

    • reference.wav - Voice clone source (10s of clear speech)

    • avatar.png - Character image

    • references/ - Knowledge markdown files

Voice Cloning

Pocket TTS supports voice cloning from reference audio.

Requirements:

  • WAV format (16-bit PCM)

  • ~10 seconds of clean speech

  • Clear recording, minimal background noise

Place the file at:

  • skills/my-skill/reference.wav (per-skill), or

  • voice_references/my-skill.wav (global)

Voice Modes

VAD Mode (default)

Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.

VoiceChat.bat hermione

PTT Mode

Push-to-Talk - hold Space to record, release to send.

VoiceChat.bat hermione ptt

Configuration

Environment Variables

Variable

Default

Description

VOICE_API_URL

http://localhost:1234/v1

OpenAI-compatible API URL

VOICE_API_KEY

(none)

API key for the provider

VOICE_MODEL

(auto)

Model name to use

VOICE_PROVIDER

(auto)

Force provider: lm_studio, openrouter, openai

OPENROUTER_API_KEY

(none)

OpenRouter API key

OPENAI_API_KEY

(none)

OpenAI API key

VOICE_TTS_VOICE

alba

Default TTS voice

VOICE_DEVICE

cuda

ASR device: cuda (GPU) or cpu

VOICE_SMART_TURN_THRESHOLD

0.5

Turn completion threshold (0.0-1.0)

Command Line Options

python voice_client.py [options]

Options:
  --skill, -s SKILL      Load a character skill
  --list-skills, -l      List available skills
  --list-providers       List available LLM providers
  --provider, -p PROV    Force provider: lm_studio, openrouter, openai
  --mode, -m MODE        Input mode: vad, ptt, or type
  --device DEVICE        ASR device: cuda or cpu
  --api-url URL          OpenAI-compatible API URL
  --api-key KEY          API key for the provider
  --model MODEL          Model name to use
  --headless             Run without UI (for MCP integration)

MCP Integration

LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.

Start MCP Server

start_voicemode.bat

Available Tools

  • speak(text) - Speak text aloud (TTS)

  • listen() - Listen for speech (STT)

  • converse(text) - Speak and listen for response

  • start_voice(skill) - Start voice chat with a character

  • stop_voice() - Stop voice chat

  • voice_status() - Check if voice mode is running

  • list_voices() - List available characters

  • provider_status() - Show available providers

  • set_speech_mode(mode) - Set verbosity: roleplay, coder, minimal, silent

  • get_speech_mode() - Get current speech mode

Slash Commands

These slash commands are available in Claude Code and compatible AI assistants:

Command

Description

/speak <text>

TTS only - speak text aloud

/listen

STT only - transcribe speech to text

/tts-only

Mode: Claude speaks, you type

/stt-only

Mode: You speak, Claude responds in text

/voice-roleplay

Full expressive speech output

/voice-coder

Summaries & completions only

/voice

Speak one message via voice

/voice-on

Start continuous voice mode

/voice-off

Stop voice mode

/voice-typing

You type, Claude speaks (hold RIGHT SHIFT to speak)

Speech Modes

Control how much Claude speaks:

Mode

Description

roleplay

Full expressive output - speaks everything naturally (default)

coder

Summaries only - task completions, errors, questions

minimal

Very terse - only critical announcements

silent

No speech - text only

Switch modes with /voice-roleplay, /voice-coder, or the set_speech_mode() tool.

Voice Commands While Running

  • Say "stop" or "goodbye" to end

  • Say "change voice" to switch characters

GPU Support

Parakeet TDT uses ONNX Runtime with GPU acceleration:

  1. TensorRT (best performance) - Auto-detected if installed

  2. CUDA (good performance) - Requires CUDA/cuDNN

  3. CPU (fallback) - Always available

Check GPU status:

.venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())"

Troubleshooting

No audio detected

  • Check microphone permissions

  • Verify default audio device: python -c "import sounddevice; print(sounddevice.query_devices())"

Pocket TTS not working

LM Studio connection failed

  • Verify LM Studio server is running

  • Check URL: default is http://localhost:1234

  • Ensure a model is loaded

OpenRouter/OpenAI not working

  • Verify API key is set in .env or environment

  • Check python voice_client.py --list-providers to see detected providers

GPU/CUDA not working

  • Ensure NVIDIA drivers are installed

  • Install CUDA Toolkit 12.x

  • Reinstall: pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]

Credits

-
security - not tested
F
license - not found
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DevMan57/voiceblitz-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server