Skip to main content
Glama

Voice Mode

by mbailey
code-reading-guide.md6.15 kB
# VoiceMode Converse Code Path Guide ## Overview This guide traces the code path from running `voicemode converse` through all the relevant files. ## Code Path ### 1. Entry Point: Package Configuration **File:** `pyproject.toml` **Lines:** 100-102 ```toml [project.scripts] voice-mode = "voice_mode.cli:voice_mode" voicemode = "voice_mode.cli:voice_mode" ``` Both commands map to the `voice_mode()` function in the CLI module. ### 2. CLI Entry Function **File:** `voice_mode/cli.py` **Function:** `voice_mode()` **Lines:** 74-76 ```python def voice_mode() -> None: """Entry point for voicemode command - starts the MCP server or runs subcommands.""" voice_mode_main_cli() ``` This calls the main CLI group. ### 3. Main CLI Group **File:** `voice_mode/cli.py` **Function:** `voice_mode_main_cli()` **Lines:** 40-72 This is a Click command group that handles all subcommands. Without a subcommand, it starts the MCP server. With subcommands, it routes to the appropriate handler. ### 4. Converse Subcommand **File:** `voice_mode/cli.py` **Decorator:** `@voice_mode_main_cli.command()` **Function:** `converse()` **Lines:** 1650-1815 Key parameters: - `--message, -m`: Initial message to speak - `--wait/--no-wait`: Whether to wait for response - `--duration, -d`: Listen duration - `--voice`: TTS voice to use - `--tts-provider`: OpenAI or Kokoro - `--continuous, -c`: Continuous conversation mode The function creates an async wrapper (line 1690) that calls the actual tool implementation. ### 5. Converse Tool Implementation **File:** `voice_mode/tools/converse.py` **Decorator:** `@mcp.tool()` **Function:** `converse()` **Lines:** 1194-1195 (and beyond) This is the main MCP tool that handles: - Voice synthesis (TTS) - Voice recognition (STT) - Provider selection - Audio playback and recording - Conversation logging Key imports: - Line 61-70: Core functions from `voice_mode.core` - Line 60: Provider registry from `voice_mode.provider_discovery` - Line 28: Conversation logger ### 6. Core Audio Functions **File:** `voice_mode/core.py` Key functions: - `get_openai_clients()` - Line 131: Initialize API clients - `text_to_speech()` - Line 163: Convert text to audio - `play_chime_start()` - Line 612: Play start chime - `play_chime_end()` - Line 644: Play end chime - `generate_chime()` - Line 528: Generate chime audio - `cleanup()` - Line 676: Cleanup resources ### 7. Provider Discovery **File:** `voice_mode/provider_discovery.py` Discovers and manages TTS/STT providers: - OpenAI API (cloud) - Whisper.cpp (local STT) - Kokoro (local TTS) - LiveKit (real-time communication) The provider registry is imported at line 60 of converse.py and used to select the best available provider. ### 8. Configuration **File:** `voice_mode/config.py` Loaded by converse.py (lines 29-58), provides: - Audio settings (sample rate, channels) - Service endpoints - Debug flags - Audio feedback settings - Silence detection parameters --- ## Demonstration Script ### Setup ```bash # Terminal control pattern: Use show tool with neovim panes # After each step, use look tool to confirm the pane is showing correctly ``` ### Step 1: Show Entry Point ```bash # Show pyproject.toml focusing on scripts section show pyproject.toml look # Confirm it's visible # Highlight lines 100-102 in neovim ``` ### Step 2: Show CLI Entry Functions ```bash # Show the voice_mode() entry point show voice_mode/cli.py:74-76 look # Confirm visible # Show the main CLI group show voice_mode/cli.py:40-72 look # Confirm visible ``` ### Step 3: Show Converse Command ```bash # Show the converse subcommand definition show voice_mode/cli.py:1650-1700 look # Confirm visible # Search for the async wrapper # Navigate to line 1690 to show how it calls the tool ``` ### Step 4: Show Tool Implementation ```bash # Open the converse tool file show voice_mode/tools/converse.py:1194 look # Confirm visible # Show the key imports show voice_mode/tools/converse.py:1-100 look # Confirm visible ``` ### Step 5: Show Core Functions ```bash # Show the core module with TTS/STT functions show voice_mode/core.py look # Confirm visible # Highlight text_to_speech function # Search for: /^async def text_to_speech ``` ### Step 6: Show Provider Discovery ```bash # Show provider discovery system show voice_mode/provider_discovery.py look # Confirm visible ``` --- ## Voice Demonstration Script While showing each file, use voice to narrate: ```python converse("Let's trace the voicemode converse command. First, the entry point is in pyproject.toml.", wait_for_response=False) # Then show pyproject.toml converse("When you run voicemode converse, it calls voice_mode function in cli.py", wait_for_response=False) # Then show cli.py entry converse("This routes to the converse subcommand, which is a Click command", wait_for_response=False) # Then show converse function converse("The CLI wraps the actual MCP tool, which is in tools/converse.py", wait_for_response=False) # Then show tools/converse.py converse("The tool uses core functions for text-to-speech and speech-to-text", wait_for_response=False) # Then show core.py converse("Provider discovery finds available TTS and STT services", wait_for_response=False) # Then show provider_discovery.py ``` --- ## Summary Flow ``` User runs: voicemode converse ↓ pyproject.toml (scripts) → voice_mode.cli:voice_mode ↓ cli.py:voice_mode() → voice_mode_main_cli() ↓ cli.py:converse() command (Click subcommand) ↓ tools/converse.py:converse() (MCP tool) ↓ Parallel operations: - core.py:text_to_speech() → Speak the message - Recording → Listen for response - core.py:play_chime_start/end() → Audio feedback - provider_discovery.py → Select TTS/STT providers ↓ Return transcribed response to user ``` ## Key Concepts 1. **Click CLI Framework**: Handles command parsing and routing 2. **FastMCP Tools**: MCP tool decorator makes functions available to AI 3. **Provider Registry**: Dynamic discovery of local/cloud services 4. **Async/Await**: All voice operations are asynchronous 5. **Configuration**: Environment-based with sensible defaults

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server