Voice Mode

CLAUDE.md•4.64 KiB

# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Voice Interaction Load the voicemode skill for voice conversation support: `/voicemode:voicemode` ## Project Overview VoiceMode is a Python package that provides voice interaction capabilities for AI assistants through the Model Context Protocol (MCP). It enables natural voice conversations with Claude Code and other AI coding assistants by integrating speech-to-text (STT) and text-to-speech (TTS) services. ## Key Commands ### Development & Testing ```bash # Install in development mode with dependencies make dev-install # Run all unit tests make test # Or directly: uv run pytest tests/ -v --tb=short # Run specific test uv run pytest tests/test_voice_mode.py -v # Clean build artifacts make clean ``` ### Building & Publishing ```bash # Build Python package make build-package # Build development version (auto-versioned) make build-dev # Test package installation make test-package # Release workflow (bumps version, tags, pushes) make release ``` ### Documentation ```bash # Serve docs locally at http://localhost:8000 make docs-serve # Build documentation site make docs-build # Check docs for errors (strict mode) make docs-check ``` ## Architecture Overview ### Core Components 1. **MCP Server (`voice_mode/server.py`)** - FastMCP-based server providing voice tools via stdio transport - Auto-imports all tools, prompts, and resources - Handles FFmpeg availability checks and logging setup 2. **Tool System (`voice_mode/tools/`)** - **converse.py**: Primary voice conversation tool with TTS/STT integration - **service.py**: Unified service management for Whisper/Kokoro - **providers.py**: Provider discovery and registry management - **devices.py**: Audio device detection and management - Services subdirectory contains install/uninstall tools for Whisper and Kokoro - See [Tool Loading Architecture](docs/reference/tool-loading-architecture.md) for internal details 3. **Provider System (`voice_mode/providers.py`)** - Dynamic discovery of OpenAI-compatible TTS/STT endpoints - Health checking and failover support - Maintains registry of available voice services 4. **Configuration (`voice_mode/config.py`)** - Environment-based configuration with sensible defaults - Support for voice preference files (project/user level) - Audio format configuration (PCM, MP3, WAV, FLAC, AAC, Opus) 5. **Resources (`voice_mode/resources/`)** - MCP resources exposed for client access - Statistics, configuration, changelog, and version information - Whisper model management ### Service Architecture The project supports multiple voice service backends: - **OpenAI API**: Cloud-based TTS/STT (requires API key) - **Whisper.cpp**: Local speech-to-text service - **Kokoro**: Local text-to-speech with multiple voices Services can be installed and managed through MCP tools, with automatic service discovery and health checking. ### Key Design Patterns 1. **OpenAI API Compatibility**: All voice services expose OpenAI-compatible endpoints, enabling transparent switching between providers 2. **Dynamic Tool Discovery**: Tools are auto-imported from the tools directory structure 3. **Failover Support**: Automatic fallback between services based on availability 4. **Local Microphone Transport**: Direct audio capture via PyAudio for voice interactions 5. **Audio Format Negotiation**: Automatic format validation against provider capabilities ## Development Notes - The project uses `uv` for package management (not pip directly) - Python 3.10+ is required - FFmpeg is required for audio processing - The project follows a modular architecture with FastMCP patterns - Service installation tools handle platform-specific setup (launchd on macOS, systemd on Linux) - Event logging and conversation logging are available for debugging - WebRTC VAD is used for silence detection when available ## Testing - Unit tests: `tests/` - run with `make test` - Manual tests: `tests/manual/` - require user interaction ## Logging Logs are stored in `~/.voicemode/`: - `logs/conversations/` - Voice exchange history (JSONL) - `logs/events/` - Operational events and errors - `audio/` - Saved TTS/STT audio files - `voicemode.env` - User configuration ## See Also - **[skills/voicemode/SKILL.md](skills/voicemode/SKILL.md)** - Voice interaction usage and MCP tools - **[docs/tutorials/getting-started.md](docs/tutorials/getting-started.md)** - Installation guide - **[docs/guides/configuration.md](docs/guides/configuration.md)** - Configuration reference - **[docs/concepts/architecture.md](docs/concepts/architecture.md)** - Detailed architecture

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•4.64 KiB