Skip to main content
Glama

Voice Mode

by mbailey
CHANGELOG.md80.3 kB
# Changelog All notable changes to VoiceMode (formerly voice-mcp) will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ## [5.0.3] - 2025-10-05 ### Fixed - **Whisper Model Default** - Consolidated default Whisper model to single constant - Fixed inconsistency where CLI defaulted to `large-v2` while tool used `base` - All code now references `DEFAULT_WHISPER_MODEL = "base"` from config.py - DRY principle applied across CLI and tool implementations ## [5.0.2] - 2025-10-05 ### Fixed - **CLI Error Messages** - Display meaningful OpenAI error messages in CLI commands - Improved error reporting throughout CLI interface - Better user feedback when OpenAI API calls fail ## [5.0.1] - 2025-10-04 ### Fixed - **Version Command** - Fixed NameError caused by undefined `current_version` variable ## [5.0.0] - 2025-10-04 ### Added - **Pre-built Core ML Model Support** - Major improvement for Apple Silicon users - Downloads pre-built Core ML models from Hugging Face instead of building locally - Eliminates need for full Xcode installation (saves ~15GB disk space) - Significantly faster whisper model installation (minutes vs hours) - Automatic Core ML support detection and installation - Progress indicators for all model downloads - **OpenAI Error Handling** - Clear, actionable error messages - Dedicated error parser for OpenAI API failures - User-friendly messages for quota exceeded, invalid API key, and rate limits - Helpful suggestions and fallback options displayed - Improved error reporting throughout TTS/STT pipeline - **Whisper CLI Improvements** - Unified `whisper model` command as getter/setter for active model - Auto-restart whisper service when changing models - Better command organization with service groups - Enhanced progress bars and help text formatting - Default model download during whisper installation ### Changed - **Version Display** - Now shows "VoiceMode" name and git status - **CLI Structure** - Reorganized whisper commands for better UX - **Installation** - Automatic Core ML support without manual configuration ### Fixed - **WSL Detection** - Improved detection to handle WSLInterop-late - **Error Messages** - Integration of OpenAIErrorParser for clear user feedback - **CLI Help** - Removed redundant subcommands list from whisper service help - **Progress Bars** - Improved formatting and reliability ### Removed - **install_torch Parameter** - No longer needed with automatic Core ML support - **Legacy Health Check Code** - Cleaned up leftover code from provider system refactoring ### Internal - **Project Structure** - Cleaned up project root and reorganized files - **Documentation** - Updated Core ML documentation with new download strategy ## [4.8.0] - 2025-10-03 ### Added - **CI Mode for Installer** - GitHub Actions integration for automated testing - New `--ci` flag for non-interactive installer operation - Automatic dependency installation and configuration - Status display showing installed dependencies with pass/fail indicators - Comprehensive system information collection for debugging - **Enhanced Installer UX** - Detailed TTS error reporting with provider-specific failure information - Better error handling and user feedback throughout installation - Improved shell-specific source command display - Flexible working directory selection - Status checking for all dependencies before installation ### Fixed - **Platform Compatibility** - Added Debian/Ubuntu support to installer alongside Fedora - Simplified Python version checking (UV manages Python installation) - Corrected echo escape sequence handling across different shells - Made pip optional since UV handles package management ### Documentation - **Core ML Improvements** - Enhanced Core ML specification with pre-built model download strategy - Simplified voice selection guide by removing redundant examples - Clarified ~/claude directory usage (sandbox for testing, not projects) ## [4.7.1] - 2025-09-23 ### Fixed - **CLI Commands** - Fix broken whisper model install command import path after services refactoring ### Changed - **Testing** - Move comprehensive help test from manual to automated testing for CI coverage ## [4.7.0] - 2025-09-22 ### Fixed - **Voice Interaction** - Distinguish STT connection errors from genuine "no speech detected" scenarios (#62) - Display detailed error messages showing attempted endpoints and specific failures - Fix misleading "[no speech detected]" messages when services are unavailable - Simplify STT function architecture from 3 layers to 2 - Clean up "Unclosed client session" asyncio warning on startup ### Added - **Testing** - Add comprehensive test coverage for STT error scenarios - Test connection failures, authentication errors, and fallback behavior ### Documentation - Add troubleshooting guide index with diagnostic flowchart ## [4.6.0] - 2025-09-21 ### Added - **Streamlined Installation Experience** - Add OpenAI API key setup with browser integration for quick start - Add microphone detection before voice test - Improve confirmation prompts with smart defaults ([Y/n] for essential, [y/N] for optional) - Position OpenAI as recommended quick-start path (~3 minute setup) - Consolidate system dependency prompts into single confirmation - **Documentation** - Add comprehensive tool loading architecture documentation - Improve README clarity and simplify getting started - Add reference documentation for internal systems ### Changed - **Installation Flow** - Remove local services prompt from initial setup (moved to post-install docs) - Focus installation on getting users to Claude Code quickly - Update Claude Code configuration to use `uvx --refresh` for latest version - Use long flags in install script for better clarity - **Code Organization** - Refactor services directory structure - flatten tools directory hierarchy - Move service tools up one level (services/whisper/install.py → whisper_install.py) - Simplify tool loading logic for uniform subdirectory handling - Move version utilities from tools to utils module ### Fixed - **macOS Compatibility** - Fix bash completion "nosort: invalid option name" error on bash 3.2 - Remove timeout command usage (not available on macOS by default) - Correct microphone device counting (filter input devices only) - Ensure voice test works with API keys from config files - Simplify voice test command for better compatibility - **Tool Loading** - Handle sound_fonts subdirectory with underscore in name correctly - Fix installer tests after services directory restructuring ## [4.5.0] - 2025-09-18 ### Added - **Enhanced STT Logging** - Add comprehensive logging for speech-to-text operations - Log provider selection and fallback attempts - Include transcription details and provider info in logs - **Configuration Management** - Add `voicemode config edit` command for easy configuration file editing - Support custom editor selection via --editor flag - Automatically open configuration file in default editor - **Tool Environment Variables** - Replace VOICEMODE_TOOLS with VOICEMODE_TOOLS_ENABLED and VOICEMODE_TOOLS_DISABLED - Allow fine-grained control over tool availability - Support comma-separated lists for enabling/disabling specific tools ### Changed - **Provider Selection Architecture** - Consolidate dual provider selection systems into single simple failover approach - Remove SIMPLE_FAILOVER configuration - simple failover is now the only mode - Simplify get_tts_config and get_stt_config to use direct configuration - Eliminate ~400 lines of unused provider registry selection logic - Provider registry now only stores endpoint info without complex selection ### Fixed - Disable OpenAI client retries for local endpoints to avoid delays - Fix logger name consistency (voicemode vs voice-mode) for STT logging - Prevent test_installers from killing running voice services during tests - Update tests to work with refactored provider system - Resolve test failures related to new environment variables ## [4.4.0] - 2025-09-10 ### Added - **MCP Registry Support** - Add server.json configuration for MCP registry publication - Add mcp-name field to README for PyPI package validation - Integrate MCP registry publishing into CI/CD workflow - Support DNS-based namespace authentication (com.failmode/voicemode) - Update Makefile to sync server.json version during releases - **Cloudflare Worker for voicemode.sh** - Serve install script via custom domain - Smart user-agent detection for CLI vs browser - Cached script delivery with fallback - **Selective Tool Loading** - Reduce token usage by loading tools on demand - Implement smart tool filtering based on context - Add tool loading configuration options - **Documentation Improvements** - Complete documentation reorganization - Add tutorials, guides, and reference sections - Improve getting-started guide with clear paths - Add universal installer as primary quick start - Archive outdated documentation - **Three Bears Agent Support** - Add baby-bear, mama-bear, and papa-bear agent configurations - Integrate with sound fonts for agent-specific audio feedback ### Changed - Consolidate PyPI and MCP Registry publishing workflows - Update branch references from 'main' to 'master' - Improve Cloudflare Worker error handling and caching - Rename hook to hooks, stdin-receiver to receiver ### Fixed - Fix broken documentation links after refactor - Restore minimal claude command group for hook support - Fix Claude settings.json path configuration ## [4.3.2] - 2025-09-03 ### Fixed - Add missing pyyaml dependency to pyproject.toml - Remove macOS-only restriction from package - Add Claude hooks configuration to repository settings ## [4.3.1] - 2025-09-03 ### Fixed - Minor bug fixes and improvements ## [4.3.0] - 2025-09-03 ### Added - Sound fonts with MP3 support and Three Bears sounds integration ## [4.2.0] - 2025-09-03 ### Added - **🧠 Think Out Loud Mode - AI Reasoning Made Audible** - Revolutionary feature that transforms AI's internal thinking into spoken performances - Extracts and voices Claude's reasoning blocks using multiple personas - Herman's Head / Inside Out style multi-voice performances for different reasoning types - Theditor agent that orchestrates thinking performances with distinct voices - Makes AI decision-making transparent and engaging through voice - **🔊 Sound Fonts Integration - Audio Feedback for Every Action** - Play custom sounds for tool operations, errors, and completions - Filesystem-based sound font system with automatic discovery - Claude Code integration via receiver for hook-based audio - CLI command `play-sound` with theme, action, and sound selection - Enhances user experience with auditory feedback during operations - MP3 support added for 90% file size reduction over WAV - Recursive directory copying for complete sound font structure - Three Bears sound fonts for baby-bear, mama-bear, and papa-bear agents - Sound fonts disabled by default (VOICEMODE_SOUNDFONTS_ENABLED=false) - **🎭 Claude Code Deep Integration** - Extract and analyze Claude's conversation logs in real-time - Access Claude's internal thinking blocks for transparency - CLI commands for message extraction with multiple output formats - Automatic context detection for Claude Code sessions - Foundation for advanced AI introspection features ### Changed - **Enhanced Message Extraction** - Generic and flexible extraction supporting full conversations - Multiple output formats: full messages, text only, or thinking only - Better filtering by message type (user/assistant) - Improved integration with voice mode tools ### Removed - **Redundant get_claude_thinking MCP tool** - Consolidated into more powerful get_claude_messages tool ### Documentation - **Comprehensive Think Out Loud Documentation** - Agent specifications for theditor - Claude orchestration instructions - Voice persona mapping guide - Integration patterns and examples ## [4.1.0] - 2025-09-01 ### Added - **Pronunciation middleware for TTS/STT text processing** - Configurable pronunciation rules system that processes text before TTS and after STT - Regex-based text substitution rules with YAML configuration - Separate TTS and STT rule sets for bidirectional corrections - Privacy support - rules can be marked private to hide from LLM tool listings - Default rules for common patterns (3M, PoE, GbE, etc.) - Full CLI interface for managing pronunciation rules - MCP tool for LLM-based rule management with `pronounce` tool - Integrated into converse tool for automatic text processing - New configuration file: `voice_mode/data/default_pronunciation.yaml` ## [4.0.1] - 2025-09-01 ### Removed - Removed `whisperx` optional dependency to fix PyPI upload compatibility - The dependency was specified as a Git URL which is not allowed for PyPI packages - WhisperX functionality was recently added and not essential for core features ## [4.0.0] - 2025-08-31 ### BREAKING CHANGES - **Unified voice configuration system** - **BREAKING**: Replaced `.voices.txt` files with unified `.voicemode.env` configuration - Changed environment variable from `VOICEMODE_TTS_VOICES` to `VOICEMODE_VOICES` for simplicity - Implemented cascading configuration: env vars > project configs > global config - Added directory tree walking for project-specific configuration discovery - Supports runtime configuration reloading via MCP tools - **Migration Required**: Users must migrate from `.voices.txt` to `.voicemode.env` with `VOICEMODE_VOICES=voice1,voice2` format ### Added - **Comprehensive test coverage reporting system** - Integration with pytest-cov for coverage measurement - HTML coverage reports generated in htmlcov/ directory - Coverage badges and metrics for monitoring code quality - Automated coverage reporting in CI/CD pipeline - **Word-level timestamps for transcription** - Enhanced transcription output with word-level timing information - Support for SubRip (SRT) format output with precise word timestamps - New transcription CLI command for processing audio files - Comprehensive transcription backend supporting multiple formats - Word timing data available for improved accessibility and analysis - **Enhanced voice selection guide** - Comprehensive documentation for voice selection across different providers - Clear migration instructions from old `.voices.txt` system ### Removed - **Legacy voice preference system** - Removed 578 lines of old `voice_preferences.py` system - Eliminated unreliable `.voices.txt` file parsing - Removed associated test files for deprecated voice preference system ## [3.34.3] - 2025-08-26 ### Changed - **Service management code cleanup** - Removed references to non-existent `start.sh` script in Kokoro service discovery - Improved Kokoro start script detection by checking for GPU/CPU specific scripts only - Cleaned up code paths for better maintainability ## [2.34.2] - 2025-08-26 ## [2.34.1] - 2025-08-26 ### Fixed - **Whisper service enable command** - Fixed `voicemode whisper enable` using incorrect template variable names - Changed from WHISPER_BIN/MODEL_FILE to START_SCRIPT_PATH/INSTALL_DIR to match plist template - Correctly locates start-whisper-server.sh script in whisper install directory - Fixes KeyError 'START_SCRIPT_PATH' when enabling whisper service after installation ### Changed - **Installer reliability** - Added `--force` flag to `uv tool install --upgrade` command - Ensures voicemode is fully reinstalled even if already present - Prevents stale installations when package structure changes - Improves update reliability when running install.sh multiple times ## [2.34.0] - 2025-08-26 ### Changed - **Installer improvements** - Refactored installer to use permanent `uv tool install --upgrade` instead of `uvx --refresh` - Added `uv tool update-shell` for automatic PATH configuration - Improved shell completion detection with smart fallbacks - Added Homebrew zsh completion directory support on macOS - Implemented XDG-compliant paths for bash completions - Removed fish shell support to simplify maintenance (bash/zsh only) - Zsh completions now correctly use underscore prefix (_voicemode) - MCP configuration now uses plain `voice-mode` command for better performance ### Fixed - **Service file updates on reinstall** - Fixed whisper and kokoro installers to always update service files (plist/systemd) even when service is already installed - Ensures paths are properly expanded (no `~` symbols) in service files - Fixes issue where broken service files with unexpanded paths would remain broken after running install.sh - Service files now updated to latest templates on every install, ensuring users always get working configurations ## [2.33.4] - 2025-08-26 ### Fixed - **CoreML support restoration** - Re-enabled CoreML acceleration after fixing plist template path issues - Improved CoreML compilation flags handling in whisper installer ## [2.33.3] - 2025-08-26 ### Changed - Version bump for testing installer fixes ## [2.33.2] - 2025-08-26 ### Fixed - **Whisper service installation** - Corrected plist template path in whisper installer - Fixed CoreML support compilation flags (disabled then re-enabled after testing) - Removed duplicate inline plist fallback to prevent template divergence ## [2.33.1] - 2025-08-26 ### Fixed - **Whisper service LaunchAgent fixes** - Fixed LaunchAgent plist to call start-whisper-server.sh script instead of binary directly - Script provides dynamic model selection via VOICEMODE_WHISPER_MODEL environment variable - Added proper command-line arguments (--inference-path, --threads) missing from direct binary call - Resolved Signal 15 (SIGTERM) restart loop caused by missing parameters ### Changed - **Service configuration templates** - Refactored Whisper installer to use plist template file instead of inline generation - Template approach improves maintainability and makes configuration easier to find - Removed duplicate inline plist fallback to prevent template/code divergence - Templates are packaged with distribution ensuring availability ## [2.33.0] - 2025-08-26 ### Fixed - **CoreML acceleration improvements** - Re-enabled CoreML acceleration in installer after fixing template loading issues - Fixed CoreML conversion with dedicated Python environment to avoid dependency conflicts - Improved CoreML setup to handle PyTorch dependency management properly - Disabled misleading CoreML prompt temporarily while fixing PyTorch installation - **Whisper service improvements** - Implemented unified Whisper startup script for Mac and Linux - Fixed Whisper service to respect VOICEMODE_WHISPER_MODEL setting properly - Changed default Whisper model from large-v2 to base for faster initial setup - **Installer script stability** - Fixed script exit after Whisper installation when CoreML setup CLI check fails - Properly handle check_voice_mode_cli failures in setup_coreml_acceleration - Installer now continues with Kokoro and LiveKit even if CoreML setup encounters issues - Fixed installer exit issue after Whisper when checking for voicemode CLI - **Documentation corrections** - Removed mention of response_duration from converse prompt to avoid confusion ### Changed - **Web documentation improvements** - Updated Quick Start to use `curl -O && bash install.sh` for proper interactive prompts - Clarified OpenAI API key is optional and serves as backup when local services unavailable - Added comprehensive list of what the installer automatically configures - Changed example to use `claude converse` instead of interactive prompt - Updated README to use `/voicemode:converse` for consistent voice usage - **Configuration updates** - Added voicemode MCP to Claude Code configuration for easier integration ## [2.32.0] - 2025-08-25 ### Added - **Safe shell completions in installer** - Re-enabled shell completion setup with runtime command availability checks - Completions only activate if `voicemode` command is found in PATH - Prevents shell startup errors when command is not available - Supports bash, zsh, and fish shells with safe fallback behavior - **Interactive PyTorch installation prompt for Whisper** - Added `--install-torch` flag to CLI for explicit PyTorch installation - Interactive prompt when Core ML acceleration requires PyTorch (~2.5GB) - Clear user choice between Core ML acceleration and standard Metal performance ### Changed - **CLI consistency improvements** - Replaced all user-facing "voice-mode" references with "voicemode" - Updated shell completion environment variables from `_VOICE_MODE_COMPLETE` to `_VOICEMODE_COMPLETE` - Removed redundant `completion` command, keeping only the superior `completions` command - Simplified command help text and examples for consistency - Logger name remains "voice-mode" for backward compatibility ### Fixed - **Installer script reliability** - Fixed false positive failure detection when Whisper shows "Make clean" warning - Improved service installation success detection - Fixed incorrect `whisper model-install` command (should be `whisper model install`) - Removed non-existent `--auto-confirm` flag from CoreML installation - **Clean CLI output** - Replaced deprecated `proc.connections()` with `proc.net_connections()` to eliminate warnings - Suppressed httpx INFO logging in CLI commands for cleaner output - All warnings and debug info still available with `--debug` flag ## [2.30.0] - 2025-08-25 ### Added - **Intelligent update command** - Automatic detection of installation method (UV tool, UV pip, or standard pip) - Uses appropriate update strategy based on installation type - Seamless updates regardless of how Voice Mode was installed ## [2.29.0] - 2025-08-25 ### Added - **CoreML acceleration support for Whisper on Apple Silicon** - Added optional dependency group 'coreml' with PyTorch and CoreMLTools - Enhanced whisper_model_install tool with install_torch and auto_confirm parameters - Automatic detection of Apple Silicon Macs with CoreML acceleration offer - User-friendly confirmation prompts for large (~2.5GB) PyTorch download - Graceful fallback to Metal acceleration if CoreML requirements not met - Clear instructions for enabling CoreML later if initially skipped - **Beautiful installer experience** - Added Voice Mode ASCII art in Claude Code orange color - Enhanced preamble with clear value proposition and privacy messaging - Early system detection with special recognition for Apple Silicon - Professional presentation with centered text and visual hierarchy ### Fixed - **Improved converse tool documentation** - Simplified listen_duration parameter documentation - Removed confusing duration recommendations that led to unnecessary overrides - Clarified that silence detection handles timing well with sensible defaults - Reduces cognitive load and prevents token waste from explicit duration settings ## [2.28.3] - 2025-08-24 ### Fixed - **Parameter type handling for MCP tools** - Fixed vad_aggressiveness parameter to accept string values from LLMs - Fixed port parameters in kokoro_install and livekit_install - Fixed lines parameter in service management tool - All numeric parameters now properly convert strings to integers - Addresses systemic issue where Claude Code MCP client passes strings - **Installer script uvx command corrections** - Fixed MCP configuration to use correct command `uvx voice-mode` (without --refresh) - Installer now always refreshes to latest version at start - Removed unnecessary --refresh flags from runtime commands - Updated user-facing command examples to show correct usage ## [2.28.2] - 2025-08-24 ### Added - **Configurable audio feedback pip delays** - Added VOICEMODE_PIP_LEADING_SILENCE and VOICEMODE_PIP_TRAILING_SILENCE environment variables - Allows customization of silence before and after audio feedback chimes - Configurable via converse tool parameters pip_leading_silence and pip_trailing_silence - Helps prevent audio cutoff on Bluetooth devices and other audio systems with delay ### Fixed - **Audio feedback for Bluetooth devices** - Added silence buffer before chimes to prevent Bluetooth audio cutoff - Improved compatibility with devices that have audio activation delay - Better audio feedback experience across different output devices ## [2.28.1] - 2025-08-24 ### Added - **Standardized project naming as VoiceMode MCP** - Consistent branding across all documentation and code - Updated project descriptions and metadata - Renamed internal references from "voice-mode" to "VoiceMode MCP" - Maintains backward compatibility with existing installations ### Fixed - **CoreML fallback for whisper.cpp on Apple Silicon** - Added proper error handling when CoreML models fail to load - Automatically falls back to CPU processing if CoreML initialization fails - Prevents whisper-server crashes on systems with CoreML issues - Improves reliability on various macOS configurations ## [2.28.0] - 2025-08-23 ### Added - **Comprehensive CLI help support** - Added `-h` and `--help` options to all CLI commands and subcommands - Consistent help functionality across all command groups (kokoro, whisper, livekit, config, etc.) - Help options available for both groups and individual commands - Improved user experience with quick access to command documentation - **Core ML support for whisper.cpp installation** - Whisper install now uses CMake instead of Make for better control - Automatically enables Core ML support on Apple Silicon Macs - Provides ~3x faster encoding performance with Core ML acceleration - Core ML models automatically converted during installation - Falls back gracefully if Core ML conversion fails - **Enhanced whisper status command** - Shows whisper.cpp version information - Displays Core ML support status (enabled/disabled) - Shows if Core ML model is active for current model - Reports GPU acceleration type (Metal/CUDA) - Helper utility in `whisper_version.py` for capability detection - **Audio conversion optimization for local whisper** - Automatically detects truly local whisper (not SSH-forwarded) - Skips WAV to MP3 conversion for local whisper, sending WAV directly - Adds timing measurements for audio format conversion - Logs conversion time at INFO level for performance monitoring - Significantly reduces STT processing time for local deployments - **Whisper model benchmark command** - New `whisper model benchmark` CLI command - Compares performance across multiple models - Shows load time, encode time, and total processing time - Calculates real-time factor for each model - Fixed timing output by removing --no-prints flag - Helps users choose optimal model for speed/accuracy tradeoffs - Provides personalized recommendations based on results ### Fixed - **MCP server configuration** - Fixed .mcp.json to use `uv run voicemode` for local development - Removed hardcoded paths for better portability - Works correctly with project-local development version - **Whisper model management** - Fixed model active command to properly update configuration - Fixed naming conflict in model install CLI command - Benchmark now correctly shows timing information - Core ML conversion errors are now properly reported and handled ## [2.27.0] - 2025-08-20 ### Added - **CLI version and update commands** - New `voice-mode version` command to display current version - New `voice-mode update` command to upgrade to latest version - Comprehensive bats tests for version and update functionality - Automatic version detection from package metadata - **Shell completion support for CLI** - New `voice-mode completion` command group with bash, zsh, and fish subcommands - Automatic tab completion for all commands, options, and arguments - Install.sh automatically configures shell completions during setup - Native Click completion mechanism for dynamic suggestions - **Parallel operations documentation** - Documented `wait_for_response=False` pattern in converse tool - Enables speaking while performing other operations simultaneously - Creates more natural conversations by eliminating dead air - Marked as RECOMMENDED pattern with clear usage examples - **Comprehensive Whisper model management system** - New `whisper models` CLI command to list all available models with status - `whisper model active` command to get/set the active model - `whisper model install` and `whisper model remove` commands - Model registry with complete size/hash metadata for all Whisper models - Color-coded output showing installed/available models (green=installed, yellow=selected) - Support for English-only models and all multilingual variants - Automatic Core ML conversion on Apple Silicon for improved performance - Shell completion support for all model management commands - **MCP tools for model management** - `list_models` tool to list all available Whisper models with status - Enhanced `download_model` tool with registry validation - Force download option to re-download corrupted models - Skip Core ML option for testing - Parity between CLI and MCP interfaces - **Infrastructure improvements** - Centralized model registry in `whisper/models.py` with all model metadata - Model categorization: tiny, base, small, medium, large, turbo - Size information for all models (39MB to 3.1GB) - SHA256 hashes for integrity verification - Shared download logic extracted to helpers module - Dynamic Click-based shell completions replacing static files - Comprehensive test suite for model management ### Changed - **Configuration file naming** - Renamed `.voicemode.env` to `voicemode.env` (removed leading dot) - Added backwards compatibility to check for old filename - Shows deprecation warning when old filename is used - Updated all documentation to reference new filename - Updated systemd service templates - Replaced static shell completions with Click-generated dynamic completions - Shell completion files now generated from CLI structure - Whisper model downloads now use centralized registry for validation - Model status checks now verify both file existence and selection ### Fixed - **macOS installation improvements** - Added coreutils dependency for timeout command support - Fixed duplicate launchctl load in service installers - Improved zsh PATH configuration by sourcing profile after UV/npm additions - Skip sudo prompts on macOS to prevent installation issues - **Test suite fixes** - Fixed deprecation warning appearing in help output - Renamed deprecated `.voicemode.env` to `voicemode.env` to fix test failures - Whisper model management now properly uses voicemode.env configuration file - Test suite updated for all API changes and return value structures - Resolved all CI test failures related to service status and diagnostics ### Removed - Old static shell completion files - SERVICE_COMMANDS.md (replaced by integrated CLI commands) - Shell aliases file (functionality moved to Click commands) ## [2.26.0] - 2025-08-18 ### Added - **CLI converse command** - Direct voice conversations from the command line - New `voice-mode converse` command for testing voice interactions - Supports all MCP tool options (voice, speed, audio format, etc.) - Continuous conversation mode with `--continuous` flag - Useful for testing TTS/STT services without MCP client - Full control over voice parameters and silence detection ### Changed - **Gitignore update** - Added `*.prof` files to gitignore for profiling output ## [2.25.1] - 2025-08-18 ### Fixed - **WSL2 detection display** - Fixed incorrect WSL2 label on non-WSL systems - Parameter expansion bug was showing "(WSL2)" on all Linux systems - Now properly checks if IS_WSL equals "true" before adding WSL2 label - Fixes false positive on Fedora and other Linux distributions - Complements previous detection fix from 2025-08-17 ## [git push origin -v2.25.0] - 2025-08-18 ## [2.25.0] - 2025-08-18 ### Fixed - **uvx command refresh flag** - Add --refresh flag to all uvx commands in installer - Ensures latest version is always fetched when running voice-mode commands - Fixes issues with cached old versions being used - Applies to service installation, uninstallation, and status commands - **Performance optimization** - Significantly improved help command performance - Lazy load heavy imports (numpy, scipy, webrtcvad) only when needed - Help command now runs 10x faster (from ~1.5s to ~0.15s) - Faster MCP server startup time for better user experience - **Config path expansion** - Fixed tilde expansion for user home directories - Configuration paths now properly expand `~` to user home directory - Fixes issues with paths like `~/Models/kokoro` not being found - Added comprehensive tests for path expansion functionality - **Frontend imports** - Corrected import statements to use single module - Fixed import errors in livekit frontend commands - All frontend commands now properly import from frontend module ## [2.24.0] - 2025-08-16 ### Added - **Enhanced Voice Activity Detection** - Improved silence detection behavior - VAD now waits indefinitely for speech before starting silence detection - No more timeouts when user hasn't started speaking yet - Silent recordings are not sent to STT, reducing API costs and preventing hallucinations - Returns "No speech detected" message instead of processing silence - Significantly improves user experience for voice interactions - **VAD debugging mode** - Comprehensive debugging for Voice Activity Detection - New `VOICEMODE_VAD_DEBUG` environment variable enables detailed VAD logging - Shows real-time speech detection decisions, state transitions, and timing - Helps diagnose issues where recording stops before speech or cuts off early - Added test script `scripts/test-vad-enhancement.py` for VAD testing - Documented in `docs/vad-debugging.md` with common issues and solutions ## [2.23.0] - 2025-08-16 ### Added - **`skip_tts` parameter** - Dynamic control over text-to-speech in converse tool - Add optional `skip_tts` parameter to override global `VOICEMODE_SKIP_TTS` setting - When `True`: Skip TTS for faster text-only responses - When `False`: Always use TTS regardless of environment setting - When `None` (default): Follow `VOICEMODE_SKIP_TTS` environment variable - Enables LLM to intelligently choose between voice and text-only responses - **`VOICEMODE_SKIP_TTS` environment variable** - Global TTS skip configuration - Set to `true` for permanent text-only mode (faster responses) - Can be overridden per-call with `skip_tts` parameter - Useful for rapid development iterations or when voice isn't needed ### Fixed - **Service status detection** - Correctly identify SSH-forwarded vs locally running services - SSH processes listening on service ports are now recognized as port forwards - Status command now shows 🔄 for forwarded services vs ✅ for local services - Prevents confusion about where services are actually running ## [2.22.3] - 2025-08-16 ### Fixed - **Service auto-enable error** - Fix 'FunctionTool' object is not callable - Changed whisper and kokoro installers to use `enable_service` function instead of MCP tool - Services can now be properly auto-enabled after installation - **Whisper build errors** - Remove obsolete make server command - whisper-server is now built as part of the main build target - Removed unnecessary build step that was causing errors - **Build output verbosity** - Suppress cmake/make output unless debugging - Build output is now captured and only shown on errors - Use VOICEMODE_DEBUG=true to see full build output - Significantly cleaner installation experience ## [2.22.2] - 2025-08-16 ### Fixed - **CLI deprecation warnings** - Suppress known warnings for cleaner output - Hide audioop, pkg_resources, and psutil deprecation warnings by default - Warnings can be shown with `VOICEMODE_DEBUG=true` or `--debug` flag - Improves user experience when running CLI commands and MCP server - Applies to both direct CLI usage and MCP server invocations ## [2.22.1] - 2025-08-16 ### Fixed - **Package size reduction** - Exclude unnecessary files from wheel distribution - Exclude `__pycache__`, `node_modules`, `.next/cache` directories - Exclude test files, logs, and build artifacts - Remove overly broad shared-data section that included entire frontend - Significantly reduces installed package size - **Install.sh service detection** - Fix service command availability check - Handle Python deprecation warnings that were causing false negatives - Check for actual help output content instead of just exit code - Services now install correctly when warnings are present - Add `--help` and `--debug` flags for better troubleshooting - Support `DEBUG=true` environment variable ## [2.22.0] - 2025-08-16 ### Added - **LiveKit service integration** - Complete support for LiveKit as a managed service - Install/uninstall LiveKit server with `voice-mode livekit install/uninstall` - Service management commands: `start/stop/status/restart/enable/disable/logs` - Frontend management for LiveKit Voice Assistant UI - Configurable host/port settings for frontend - SSL configuration examples and documentation - Production-ready frontend build support - Bash completions for all new commands - **Service installation in install.sh** - Automated service setup during installation - Offers to install Whisper, Kokoro, and LiveKit services - Quick mode (Y) installs all services automatically - Selective mode (s) allows choosing individual services - Uses `uvx voice-mode` for robust operation on fresh systems - Cross-platform support for Linux and macOS - **Install.sh automated testing** - Comprehensive test suite (temporarily skipped) - Unit tests for individual bash functions - Functional tests with environment mocking - Integration tests for complete installation flows - Foundation for future testing improvements - **Documentation improvements** - YubiKey touch detector setup guide - LiveKit SSL configuration examples - Install.sh robustness analysis - Service installation feature documentation ### Fixed - **LiveKit local development** - Added dummy API key support for local services - **Frontend dependency handling** - Improved error messages and dependency resolution - **Service enable command** - Resolved frontend service enable command issues - **LiveKit WebSocket URL** - Hardcoded to wss://x1:8443 for reliable connections ### Changed - **Whisper default port** - Updated from 2000 to 2022 in shell aliases - **Install.sh robustness** - Always use `uvx voice-mode` for consistency - **Test infrastructure** - Skip failing tests temporarily to maintain green CI ## [2.21.1] - 2025-08-13 - Late update to changelog for release 2.21.0 ## [2.21.0] - 2025-08-13 ### Added - **CLI service commands** - New subcommands for managing Whisper and Kokoro services - `voice-mode service whisper start/stop/status/restart/enable/disable/logs` - `voice-mode service kokoro start/stop/status/restart/enable/disable/logs` - `voice-mode service whisper update-service-files` - Update systemd/launchd service files - `voice-mode service kokoro update-service-files` - Update systemd/launchd service files - Unified interface for controlling both STT and TTS services - Direct access to service management without needing MCP client - Consistent behavior across Linux (systemd) and macOS (launchd) ## [2.20.1] - 2025-08-11 ### Fixed - **Speed parameter validation error** - Fixed MCP validation error when passing speed parameter as string - Added type conversion from string to float for speed parameter - Now properly handles speed values passed by MCP clients (e.g., via uvx) - Added comprehensive validation and error messages for invalid speed values ## [2.20.0] - 2025-08-10 ### Added - **VAD aggressiveness control** - New `vad_aggressiveness` parameter in converse tool for controlling Voice Activity Detection sensitivity (0-3) - 0 = least aggressive filtering (more permissive), 3 = most aggressive (strict) - Allows adapting to different environments: quiet rooms (0-1) vs noisy environments (2-3) - Also configurable via VOICEMODE_VAD_AGGRESSIVENESS environment variable ### Changed - **Improved VAD documentation** - Clarified that aggressiveness controls how strictly VAD filters out non-speech - Updated examples to better demonstrate appropriate use cases - Fixed configuration documentation that had backwards descriptions ## [2.19.0] - 2025-08-10 ### Added - **MCP prompt command: /release-notes** - New command to display recent changelog entries directly in Claude Code - Shows 5 most recent versions by default (configurable with parameter) - Parses and formats CHANGELOG.md for easy reading - Inspired by Claude Code's own /release-notes feature - Includes comprehensive test coverage ### Fixed - Release notes prompt now handles empty string parameters correctly - Command works properly with both source and installed packages - Changelog is now accessible as an MCP resource when package is installed ### Changed - Release notes output format now matches Claude Code's clean, minimal style - Removed decorative headers and footers for cleaner terminal output - Release notes displayed in chronological order (oldest first) ## [2.18.0] - 2025-08-10 ### Added - **TTS speed control** - New `speed` parameter in converse tool for controlling speech rate (0.25 to 4.0) - Currently supported by Kokoro TTS and any OpenAI-compatible services that support speed - Examples: 0.5 = half speed, 2.0 = double speed - Based on user request in issue #15 ## [2.17.3] - 2025-08-07 ### Fixed - **STT audio saving with simple failover** - Fixed critical bug where STT audio files were not being saved when VOICEMODE_SIMPLE_FAILOVER=true - Simple failover now properly saves audio recordings before processing - Ensures consistent audio archival behavior across all failover modes ### Changed - **Post-release improvements from 2.17.2** - Improved installer output formatting when Voice Mode is already configured - Fixed installer test that was failing due to complex mock setup - GitHub release notes now feature universal installer as primary installation method - Manual installation methods (pip, claude mcp) moved to subsection ## [2.17.2] - 2025-07-29 ### Added - **Universal installer script** for automatic setup - Single command installation: `curl -O https://getvoicemode.com/install.sh && bash install.sh` - Cross-platform support: Linux (Ubuntu/Fedora), macOS, and Windows WSL - Automatic dependency installation (Node.js, audio libraries, etc.) - Claude Code installation and Voice Mode MCP configuration - WSL2-specific audio setup and troubleshooting guidance - Symlink in project root for easy access: `./install.sh` - **Centralized GPU detection utility** - Unified GPU detection across platforms (Metal, CUDA, ROCm) - Automatic selection of GPU vs CPU scripts for services - Intelligent fallback when specific scripts are missing - **Service robustness improvements** - Systemd services now use `Restart=on-failure` instead of `Restart=always` - Added `RestartPreventExitStatus=127` to prevent restart loops when executables are missing - Services fail cleanly when installation directory is moved or deleted ### Changed - **Installation improvements** - Installer now detects `ffmpeg` by command presence on Fedora (handles RPM Fusion installs) - Fixed installer exiting early due to `dnf check-update` exit codes - Automatic fallback for older Claude Code versions without `--scope` flag support - Better handling of existing Voice Mode configurations - **Documentation updates** - README now emphasizes Claude Code and AI code editors as primary audience - Quick Start section leads with automatic installer - Clarified that OpenAI API key is optional (local services available) - Added clear explanation of free, open-source alternatives ### Fixed - **Configuration directory creation** - Removed redundant `config/` directory creation (config stored in `voicemode.env`) - Fixed issue where services created unexpected directory structures - **Claude Code compatibility** - Fixed `claude mcp list` and `claude mcp add` for versions without `--scope` support - Installer now works with all Claude Code versions ### Developer - Added comprehensive tests for GPU detection and missing script scenarios - Updated service file versions to 1.1.1 with improved systemd configuration ## [2.17.1] - 2025-07-29 ## [2.17.0] - 2025-07-29 ### 🎯 Major Voice Activity Detection Improvement - **Dramatically improved silence detection accuracy** by fixing audio resampling bug - VAD was receiving corrupted audio due to improper downsampling from 24kHz to 16kHz - Now uses proper signal resampling instead of simple truncation - Results in significantly better speech/silence detection and fewer false positives - Background noise (fans, traffic) is now properly filtered out ### Added - **Automatic configuration file loading** - Voice Mode now creates `~/.voicemode/voicemode.env` on first run - Template file includes all available settings with documentation - Environment variables always take precedence over file settings - Secure file permissions (0600) automatically set - **Enhanced conversation logs with provider tracking (Schema v3)** - Added `provider_url` and `provider_type` fields to track which endpoint handled requests - Added detailed timing metrics: TTFA, generation time, playback time, transcription time - Moved conversation logs to `logs/conversations/` subdirectory for better organization - **Year/month directory structure for audio files** - Audio files now saved in `audio/YYYY/MM/` structure to prevent flat directory issues - Automatic date extraction from filenames - Backward compatible with existing flat structure - Development version detection with git info for better debugging - Service file update capability for systemd/launchd services - Complete service health check implementation - Services can report readiness status - Better failover decisions based on actual service health - Enhanced service management with unified `service` tool - Auto-installation of cmake dependency for whisper.cpp on macOS - Simple failover mode for provider selection - New `VOICEMODE_SIMPLE_FAILOVER` environment variable (default: true) - Try each endpoint in order without health checks - Immediate failover on connection refused errors - No performance penalty as connection failures are instant ### Changed - **Major project structure reorganization** - Moved external dependencies to `vendor/` directory (industry standard naming) - `bin/livekit-admin-mcp` → `vendor/livekit-admin-mcp/` - `livekit/` → `vendor/livekit-voice-assistant/` - Moved development docs to `docs/development/` - `DUAL_PACKAGE_NAMES.md`, `TESTING-CHECKLIST.md` - `insights/` → `docs/development/insights/` - Moved test files from root to `tests/` directory - Merged `config-examples/` into `docs/integrations/` - Removed directories moved to shadow repository: `assets/`, `notebooks/`, `npm-voicemode/` - Cleaned up root directory following Python best practices - Moved `testing/` to `docs/testing/` for manual testing procedures - Removed backup files (`.mcp.json-20250728`) and temporary files (`COMMIT-MESSAGE.txt`) - Removed outdated `.env.example` from root (superseded by auto-generated config) - Renamed `conversation.py` to `converse.py` for consistency - Reorganized service directory structure for better maintainability - Consolidated service management prompts and documentation - Separated MCP templates and data from resources - **Improved configuration documentation** - Added comprehensive configuration reference guide - Documented auto-generation of `~/.voicemode/voicemode.env` - Clarified that environment variables take precedence over file settings - **Configuration management now uses user-level config only** - Removed project-level `voicemode.env` support for security - All configuration stored in `~/.voicemode/voicemode.env` ### Fixed - Fixed `wait_for_response=false` being ignored in converse tool - Added proper string-to-boolean conversion for MCP parameters - Fixed critical TTS failover issue when Kokoro is stopped - Simple failover now maps Kokoro voices to OpenAI-compatible voices - af_sky → nova, af_alloy → alloy, etc. - Prevents "Failed to speak message" errors when primary TTS is unavailable - Fixed provider failover issues when local services are stopped - Previously, stopping a service (like Kokoro) would cause "All TTS providers failed" errors - Now correctly fails over to next available endpoint (e.g., OpenAI) - Removed unnecessary soundfile dependency from simple failover - Fixed various import issues with TTS_VOICES and TTS_MODELS - **Fixed whisper model download tool** - Now correctly checks `~/.voicemode/services/whisper/` installation path - Support for both new and legacy whisper installation locations - Fixed model type validation ## [2.16.0] - 2025-07-28 [YANKED] ### Added - Version management for whisper.cpp and kokoro-fastapi services - Track installed versions and support upgrades - Install only tagged releases by default (not latest commits) - `version` parameter added to install tools - Version info displayed in service status - Uninstall tools for whisper and kokoro services - Clean removal of services and optionally models/data - Automatic service stop before uninstall - Migration helpers for old service naming conventions - Automatically migrate from old naming (e.g., whisper-server) to new (whisper) - Built into installers, no separate tool needed ### Changed - Reorganized helper functions from tools/ to utils/ - `voice_mode/tools/services/common.py` → `voice_mode/utils/services/common.py` - `voice_mode/tools/services/*/helpers.py` → `voice_mode/utils/services/*_helpers.py` - `voice_mode/tools/services/version_helpers.py` → `voice_mode/utils/version_helpers.py` - Helper functions are no longer exposed as MCP tools ### Fixed - Fixed test failures caused by version management implementation - Fixed version parsing bug with mixed int/string comparisons - Updated test mocks to use correct import paths - Made subprocess mocks command-specific to avoid breaking git commands - Fixed unified service test failures - Added missing subprocess.run and Path.exists mocks - Fixed incomplete Popen mock setup ## [2.15.0] - 2025-07-23 ## [2.14.0] - 2025-07-20 ### Added - New `VOICEMODE_DEFAULT_LISTEN_DURATION` environment variable to customize default listening time (defaults to 120s) ### Changed - Changed default listen_duration parameter from 45s to 120s for more natural conversations - Updated recommended listen durations in converse tool documentation - Normal conversational responses: 20s → 30s - Open-ended questions: 30s → 60s - Detailed explanations: 100s → 120s (now the default) - Stories or long explanations: 300s (unchanged) - Provides more generous time for users giving longer responses - **BREAKING**: Voice preference files renamed from `voices.txt` to `.voices.txt` - Now uses hidden files to avoid cluttering project directories - Affects both standalone files and those in `.voicemode` directories ### Fixed - Improved error messages when OpenAI API key is missing to provide helpful guidance - Now explicitly mentions need to set OPENAI_API_KEY or use local services - Differentiates between missing API key and other connection failures ## [2.13.0] - 2025-07-14 ### Added - Unified CLI system with shared exchanges library - New `voicemode` command with multiple subcommands (`show`, `tell`, `diagnose`, etc.) - Centralized exchange processing library for consistent behavior across scripts - Modular command architecture for easy extension - Enhanced exchange logging capabilities: - STT entries logged even when no speech is detected for better debugging - Real-time logging of both TTS and STT exchanges - Split timing metrics between STT and TTS for accurate performance attribution - Additional metadata in exchange logs for better analysis ### Changed - Migrated all standalone scripts to use the unified CLI system - Scripts now accessible as subcommands of the main `voicemode` command - Consistent argument parsing and help documentation across all commands - Improved code reuse through shared libraries ### Fixed - Fixed undefined `audio_path` variable in STT logging - Fixed incorrect hardcoded audio format in STT logs (now uses actual format) ## [2.12.0] - 2025-07-06 ### Fixed - Fixed TypeError in `refresh_provider_registry` tool that prevented TTS service detection (#6) - Changed incorrect `url=` parameter to `base_url=` when creating EndpointInfo objects - Added unit tests to prevent regression ## [2.11.0] - 2025-07-06 ### Added - Password protection for LiveKit voice assistant frontend - Prevents unauthorized access to voice conversation interface - Configurable via `LIVEKIT_ACCESS_PASSWORD` environment variable - Includes `.env.local.example` template with secure defaults - Password validation on API endpoint before token generation - Prominent language support guidance in conversation tool - Clear language-specific voice recommendations for 8 languages - Mandatory voice selection for non-English text - Warning about American accent when using default voices - Examples for Spanish, French, Italian, Portuguese, Chinese, Japanese, and Hindi ### Changed - Updated convention paths from `.conventions/` to `docs/conventions/` in CLAUDE.md - Enhanced language voice selection documentation with explicit requirements ### Documentation - Added Spanish voice conversation example demonstrating language-specific voice selection - Added blind community outreach contacts and resources for accessibility collaboration - Updated LiveKit frontend README with password protection instructions ## [2.10.0] - 2025-07-06 ### Added - All 67 Kokoro TTS voices now available for local text-to-speech - Complete set of high-quality voices across multiple accents and languages - Voices include various English accents (American, British, Australian, Indian, Nigerian, Scottish) - Multiple voices per accent for variety (e.g., 9 American female, 13 American male voices) - Support for international English speakers - Automatically available when Kokoro TTS service is running - Voice preference files support for project and user-level voice settings - Supports both standalone `voices.txt` and `.voicemode/voices.txt` files - Automatic discovery by walking up directory tree from current working directory - User-level fallback to `~/voices.txt` or `~/.voicemode/voices.txt` - Standalone files take precedence over .voicemode directory files - Simple text format with one voice name per line - Comments and empty lines supported - Preferences take priority over environment variables - New `check_audio_dependencies` MCP tool for diagnosing audio system setup - Checks for required system packages on Linux/WSL - Verifies PulseAudio status - Provides platform-specific installation commands - Helpful for troubleshooting audio initialization errors - Enhanced audio error handling with helpful diagnostics - Detects missing system packages and suggests installation commands - WSL-specific guidance for audio setup - Better error messages when audio recording fails ### Fixed - Mock voice preferences in provider selection tests to prevent test pollution - Skip conversation browser playback test when Flask is not installed ### Documentation - Updated Roo Code integration guide with comprehensive MCP interface instructions - Added visual guide to MCP settings and troubleshooting section - Added comprehensive Voice Preferences section to configuration documentation - Updated README with voice preference file examples - Updated Ubuntu/Debian installation instructions to include all required audio packages (pulseaudio, libasound2-plugins) - Added WSL2-specific note in README pointing to detailed troubleshooting guide ## [2.9.0] - 2025-07-03 ### Added - Version logging on server startup for better debugging and support ### Fixed - Cleaned up debug output by removing duplicate print statements - Suppressed known upstream deprecation warnings from dependencies: - pydub SyntaxWarnings for invalid escape sequences - audioop deprecation (already handled with audioop-lts for Python 3.13+) - pkg_resources deprecation in webrtcvad - Converted debug print statements to proper logger calls ## [2.8.0] - 2025-07-03 ### Changed - Changed default `min_listen_duration` from 1.0 to 2.0 seconds to provide more time for users to think before responding ## [2.7.1] - 2025-07-03 ### Changed - Changed default `min_listen_duration` from 0.0 to 1.0 seconds to prevent premature cutoffs ## [2.7.1] - 2025-07-03 ### Fixed - Fixed failing test for stdio restoration on recording error - Added Flask to project dependencies for conversation browser script ## [2.7.0] - 2025-07-03 ### Added - Minimum listen duration control for voice responses - New `min_listen_duration` parameter in `converse()` tool (default: 0.0) - Prevents silence detection from stopping recording before minimum duration - Useful for preventing premature cutoffs when users need thinking time - Works alongside existing `listen_duration` (max) parameter - Validates that min_listen_duration <= listen_duration - Examples: - Complex questions: 2-3 seconds minimum - Open-ended prompts: 3-5 seconds minimum - Quick responses: 0.5-1 second minimum ## [2.6.0] - 2025-06-30 ### Changed - Updated Discord link to new community server - Increased default listen duration to 45 seconds for better user experience - Fixed config import issue in conversation tool - Improved FFmpeg detection for MCP mode ### Added - Screencast preparation materials including title cards - Initial screencast planning documentation ## [2.5.1] - 2025-06-28 ## [2.5.0] - 2025-06-28 ### Added - Automatic silence detection for voice recording - Uses WebRTC VAD (Voice Activity Detection) to detect when user stops speaking - Automatically stops recording after configurable silence threshold (default: 1000ms) - Significantly reduces latency for short responses (e.g., "yes" now takes ~1s instead of 20s) - Configurable via environment variables: - `VOICEMODE_ENABLE_SILENCE_DETECTION` - Enable/disable feature (default: true) - `VOICEMODE_VAD_AGGRESSIVENESS` - VAD sensitivity 0-3 (default: 2) - `VOICEMODE_SILENCE_THRESHOLD_MS` - Silence duration before stopping (default: 1000) - `VOICEMODE_MIN_RECORDING_DURATION` - Minimum recording time (default: 0.5s) - Added `disable_vad` parameter to converse() for per-interaction control - Automatic fallback to fixed-duration recording if VAD unavailable or errors occur - Comprehensive test suite and manual testing tools - Full documentation in docs/silence-detection.md - Voice-first provider selection algorithm - TTS providers are now selected based on voice availability rather than base URL order - Ensures Kokoro is automatically selected when af_sky voice is preferred - Added provider_type field to EndpointInfo for clearer provider identification - Improved model selection to respect provider capabilities - Comprehensive test coverage for voice-first selection logic - Configurable initial silence grace period - New `VOICEMODE_INITIAL_SILENCE_GRACE_PERIOD` environment variable (default: 4.0s) - Prevents premature cutoff when users need time to think before speaking - Gives users more time to start speaking before VAD stops recording - Trace-level debug logging - Enabled with `VOICEMODE_DEBUG=trace` environment variable - Includes httpx and openai library debug output - Writes to `~/.voicemode/logs/debug/voicemode_debug_YYYY-MM-DD.log` - Helps diagnose provider connection issues ### Fixed - Fixed WebRTC VAD sample rate compatibility issue - VAD requires 8kHz, 16kHz, or 32kHz but voice_mode uses 24kHz - Implemented proper sample extraction for VAD processing - Silence detection now works correctly with 24kHz audio - Added automatic STT (Speech-to-Text) failover mechanism - STT now automatically tries all configured endpoints when one fails - Matches the existing TTS failover behavior for consistency - Prevents complete STT failure when primary endpoint has connection issues - Implemented optimistic endpoint initialization - All endpoints now assumed healthy at startup instead of pre-checked - Endpoints only marked unhealthy when they actually fail during use - Prevents false negatives from overly strict health checks - Added optimistic mode to refresh_provider_registry tool (default: True) - Fixed EndpointInfo attribute naming bug - Renamed 'url' to 'base_url' for consistency across codebase - Fixed AttributeError that was preventing STT failover from working - Fixed Kokoro TTS not being selected despite being available - Provider registry now initializes with known Kokoro voices - Enables automatic Kokoro selection when af_sky is preferred - Prevented microphone indicator flickering on macOS - Changed from start/stop recording for each interaction to continuous stream - Microphone stays active during voice session preventing UI flicker - More responsive recording start times ### Changed - Replaced all localhost URLs with 127.0.0.1 for better IPv6 compatibility - Prevents issues with SSH port forwarding on dual-stack systems - Affects TTS, STT, and LiveKit default URLs throughout codebase ### Removed - Cleaned up temporary and development files - Removed unused debug scripts and test files - Removed obsolete documentation and analysis files ### Planned - In-memory buffer for conversation timing metrics - Track full conversation lifecycle including Claude response times - Maintain recent interaction history without persistent storage - Enable better performance analysis and debugging - Sentence-based TTS streaming - Send first sentence to TTS immediately while rest is being generated - Significant reduction in time to first audio (TTFA) - More responsive conversation experience ## [2.4.1] - 2025-06-25 ## [2.4.0] - 2025-06-25 ### Added - Unified event logging system for tracking voice interaction events - JSONL format for easy parsing and analysis - Automatic daily log rotation - Thread-safe async file writing - Session-based event grouping - Configurable via `VOICEMODE_EVENT_LOG_ENABLED` and `VOICEMODE_EVENT_LOG_DIR` - Event types tracked: - TTS events: request, start, first audio, playback start/end, errors - Recording events: start, end, saved - STT events: request, start, complete, no speech, errors - System events: session start/end, transport switches, provider switches - Automatic timing metric calculation from event timestamps - Integration with conversation flow for accurate performance tracking - Provider management tools for voice-mode - `refresh_provider_registry` tool to manually update health checks - `get_provider_details` tool to inspect specific endpoints - Support for filtering by service type (tts/stt) or specific URL - Automatic TTS failover support in conversation tools - Systematic failover through all configured endpoints - Failed endpoints marked as unhealthy for automatic exclusion - Better error tracking and debugging information ### Changed - TTS provider selection algorithm now uses URL-priority based selection - Iterates through TTS_BASE_URLS in preference order - Supports both voice and model preference matching - More predictable provider selection behavior - Default TTS configuration updated for local-first experience - Kokoro (127.0.0.1:8880) prioritized over OpenAI - Default voices: af_sky, alloy (available on both providers) - Model preference order: gpt-4o-mini-tts, tts-1-hd, tts-1 - Voice parameter selection guidelines added to CLAUDE.md - Encourages auto-selection over manual specification - Clear examples of when to specify parameters ### Fixed - Negative response time calculation in conversation metrics - Response time now correctly measured from end of recording - Event-based timing provides more accurate measurements ### Removed - VOICE_ALLOW_EMOTIONS environment variable (emotional TTS now automatic with gpt-4o-mini-tts) ## [2.3.0] - 2025-06-23 ### Added - Comprehensive uv/uvx documentation (`docs/uv.md`) - Installation and version management guide - Development setup instructions - Integration with Claude Desktop - Documentation section in README with organized links to all guides - WSL2 microphone troubleshooting guide and diagnostic script - Test script for direct STT verification ### Fixed - STT audio format now defaults to MP3 when base format is PCM, fixing OpenAI Whisper compatibility - OpenAI Whisper API doesn't support PCM format for uploads - Automatic fallback ensures STT continues to work with default configuration ### Changed - Simplified audio feedback configuration to boolean AUDIO_FEEDBACK_ENABLED - Removed voice feedback functionality, keeping only chime feedback - Updated provider base URL specification to use comma-separated lists - PCM remains the default format for TTS streaming (best performance) - Standardized audio sample rate to 24kHz across codebase (was 44.1kHz) - Updated SAMPLE_RATE configuration constant - Replaced all hardcoded sample rate values with config constant - Aligned test mocks with new standard rate - Ensures consistency between OpenAI and Kokoro TTS providers ## [2.2.0] - 2025-06-22 ### Added - Configurable audio format support with PCM as the default for TTS streaming - Environment variables for audio format configuration: - `VOICEMODE_AUDIO_FORMAT` - Primary format (default: pcm) - `VOICEMODE_TTS_AUDIO_FORMAT` - TTS-specific override (default: pcm) - `VOICEMODE_STT_AUDIO_FORMAT` - STT-specific override - Support for multiple audio formats: pcm, mp3, wav, flac, aac, opus - Format-specific quality settings: - `VOICEMODE_OPUS_BITRATE` (default: 32000) - `VOICEMODE_MP3_BITRATE` (default: 64k) - `VOICEMODE_AAC_BITRATE` (default: 64k) - Automatic format validation based on provider capabilities - Provider-aware format fallback logic - Test suite for audio format configuration - Streaming audio playback infrastructure: - `VOICEMODE_STREAMING_ENABLED` (default: true) - `VOICEMODE_STREAM_CHUNK_SIZE` (default: 4096) - `VOICEMODE_STREAM_BUFFER_MS` (default: 150) - `VOICEMODE_STREAM_MAX_BUFFER` (default: 2.0) - TTFA (Time To First Audio) metric in timing output - Per-request audio format override via `audio_format` parameter in conversation tools - **Live Statistics Dashboard**: Comprehensive conversation performance tracking - Real-time performance metrics (TTFA, TTS generation, STT processing, total turnaround) - Session statistics (interaction counts, success rates, provider usage) - MCP tools: `voice_statistics`, `voice_statistics_summary`, `voice_statistics_recent`, `voice_statistics_reset`, `voice_statistics_export` - MCP resources: `voice://statistics/{type}`, `voice://statistics/summary/{format}`, `voice://statistics/export/{timestamp}` - Automatic integration with conversation tools - no manual tracking required - Thread-safe statistics collection across concurrent operations - Memory-efficient storage (maintains last 1000 interactions) ### Changed - **BREAKING**: All `VOICE_MODE_` environment variables renamed to `VOICEMODE_` - `VOICE_MODE_DEBUG` → `VOICEMODE_DEBUG` - `VOICE_MODE_SAVE_AUDIO` → `VOICEMODE_SAVE_AUDIO` - `VOICE_MODE_AUDIO_FEEDBACK` → `VOICEMODE_AUDIO_FEEDBACK` - `VOICE_MODE_FEEDBACK_VOICE` → `VOICEMODE_FEEDBACK_VOICE` - `VOICE_MODE_FEEDBACK_MODEL` → `VOICEMODE_FEEDBACK_MODEL` - `VOICE_MODE_FEEDBACK_STYLE` → `VOICEMODE_FEEDBACK_STYLE` - `VOICE_MODE_PREFER_LOCAL` → `VOICEMODE_PREFER_LOCAL` - `VOICE_MODE_AUTO_START_KOKORO` → `VOICEMODE_AUTO_START_KOKORO` - Also renamed non-prefixed variables to use `VOICEMODE_` prefix: - `VOICE_ALLOW_EMOTIONS` → `VOICEMODE_ALLOW_EMOTIONS` - `VOICE_EMOTION_AUTO_UPGRADE` → `VOICEMODE_EMOTION_AUTO_UPGRADE` - Default audio format changed from MP3 to PCM for zero-latency TTS streaming - Audio format is now validated against provider capabilities before use - Dynamic audio loading based on format instead of hardcoded MP3 - Centralized all configuration in `voice_mcp/config.py` to eliminate duplication - Logger names updated from "voice-mcp" to "voicemode" - Debug directory paths updated: - `~/voice-mcp_recordings/` → `~/voicemode_recordings/` - `~/voice-mcp_audio/` → `~/voicemode_audio/` ### Benefits - Zero-latency TTS streaming with PCM format - Best real-time performance for voice conversations - Universal compatibility with all audio systems - Maintains backward compatibility with compressed formats - Cleaner, consistent environment variable naming ### Known Issues - OpenAI TTS with Opus format produces poor audio quality - NOT recommended for streaming - Use PCM (default) or MP3 for TTS instead - Opus still works well for STT uploads and file storage ## [2.1.3] - 2025-06-20 ## [2.1.2] - 2025-06-20 ## [2.1.1] - 2025-06-20 ### Fixed - Fixed `voice_status` tool error where `get_provider_display_status` was called with incorrect arguments - Updated `.mcp.json` to use local package installation with `--refresh` flag ### Added - Audio feedback chimes for recording start/stop (inspired by PR #1 from @jtuffin) - New `VOICE_MODE_AUDIO_FEEDBACK` configuration with options: `chime` (default), `voice`, `both`, `none` - Backward compatibility for boolean audio feedback values ### Changed - Replaced all references from `voice-mcp` to `voice-mode` throughout documentation - Updated MCP configuration examples to use `uvx` instead of outdated `./mcp-servers/` directory - Removed hardcoded version from `server_new.py` - Changed default listen duration to 15 seconds (from 10s/20s) in all voice conversation functions for better balance - Audio feedback now defaults to chimes instead of voice for faster, less intrusive feedback ## [2.1.0] - 2025-06-20 ## [2.0.3] - 2025-06-20 ## [2.0.2] - 2025-06-20 ## [2.0.1] - 2025-06-20 ### Changed - Consolidated package structure from three to two pyproject.toml files - Removed unpublishable `voicemode` package configuration - Made `voice-mode` the primary package (in pyproject.toml) - Moved `voice-mcp` to secondary configuration (pyproject-voice-mcp.toml) ### Added - Documentation for local development with uvx (`docs/local-development-uvx.md`) ## [2.0.0] - 2025-06-20 ### 🎉 Major Project Rebrand: VoiceMCP → VoiceMode We're excited to announce that **voice-mcp** has been rebranded to **VoiceMode**! This change reflects our vision for the project's future. While MCP (Model Context Protocol) describes the underlying technology, VoiceMode better captures what this tool actually delivers - a seamless voice interaction mode for AI assistants. #### Why the Change? - **Clarity**: VoiceMode immediately communicates the tool's purpose - **Timelessness**: The name isn't tied to a specific protocol that may evolve - **Simplicity**: Easier to remember and more intuitive for users #### What's Changed? - Primary command renamed from `voice-mcp` to `voicemode` - GitHub repository moved to `mbailey/voicemode` - Primary PyPI package is now `voice-mode` (hyphenated due to naming restrictions) - Legacy `voice-mcp` package maintained for backward compatibility - Documentation and branding updated throughout - Simplified package structure to dual-package configuration #### Backward Compatibility - The `voice-mcp` command remains available for existing users - Both `voice-mode` and `voice-mcp` packages available on PyPI - All packages provide the `voicemode` command ### Changed - Consolidated package configuration to two pyproject.toml files - Made `voice-mode` the primary package with VoiceMode branding - Updated package descriptions to reflect the rebrand ### Added - Local development documentation for uvx usage ## [0.1.30] - 2025-06-19 ### Added - Audio feedback with whispered responses by default - Configurable audio feedback style (whisper or shout) via VOICE_MODE_FEEDBACK_STYLE environment variable - Support for overriding audio feedback settings per conversation ## [0.1.29] - 2025-06-17 ### Changed - Refactored MCP prompt names to use kebab-case convention (kokoro-start, kokoro-stop, kokoro-status, voice-status) - Renamed Kokoro tool functions to follow consistent naming pattern (start_kokoro → kokoro_start, stop_kokoro → kokoro_stop) ## [0.1.28] - 2025-06-17 ### Added - MCP prompts for Kokoro TTS management: - `kokoro-start` - Start the local Kokoro TTS service - `kokoro-stop` - Stop the local Kokoro TTS service - `kokoro-status` - Check the status of Kokoro service - `voice-status` - Check comprehensive status of all voice services - Instructions in CLAUDE.md for AI assistants on when to use Kokoro tools ## [0.1.27] - 2025-06-17 ### Added - Voice chat prompt/command (`/voice-mcp:converse`) for interactive voice conversations - Automatic local provider preference with VOICE_MODE_PREFER_LOCAL environment variable - Documentation improvements with better organization and cross-linking ### Changed - Renamed voice_chat prompt to converse for clarity - Simplified voice_chat prompt to take no arguments ## [0.1.26] - 2025-06-17 ### Fixed - Added missing voice_mode() function to cli.py for voice-mode command ## [0.1.25] - 2025-06-17 ### Added - Build tooling improvements for dual package maintenance ### Fixed - Missing psutil dependency in voice-mode package ## [0.1.24] - 2025-06-17 ### Fixed - Improved signal handling for proper Ctrl-C shutdown - First Ctrl-C attempts graceful shutdown - Second Ctrl-C forces immediate exit ## [0.1.23] - 2025-06-17 ### Added - Provider registry system MVP for managing TTS/STT providers - Dynamic provider discovery and registration - Automatic availability checking - Feature-based provider filtering - Dual package name support (voice-mcp and voice-mode) - Both commands now available in voice-mode package - Maintains backward compatibility - Service management tools for Kokoro TTS: - `start_kokoro` - Start the Kokoro TTS service using uvx - `stop_kokoro` - Stop the running Kokoro service - `kokoro_status` - Check service status with CPU/memory usage - Automatic cleanup of services on server shutdown - psutil dependency for process monitoring - `list_tts_voices` tool to list all available TTS voices by provider - Shows OpenAI standard and enhanced voices with characteristics - Lists Kokoro voices with descriptions - Includes usage examples and emotional speech guidance - Checks API/service availability for each provider ### Changed - Default TTS voices updated: alloy for OpenAI, af_sky for Kokoro ## [0.1.22] - 2025-06-16 ### Added - Local STT/TTS configuration support in .mcp.json - Split TTS metrics into generation and playback components for better performance insights - Tracks TTS generation time (API call) separately from playback time - Displays metrics as tts_gen, tts_play, and tts_total ### Changed - Modified text_to_speech() to return (success, metrics) tuple - Updated all tests to handle new TTS return format ## [0.1.21] - 2025-06-16 ### Added - VOICE_MODE_SAVE_AUDIO environment variable to save all TTS/STT audio files - Audio files saved to ~/voice-mcp_audio/ with timestamps - Improved voice selection documentation and guidance ### Changed - Voice parameter changed from Literal to str for flexibility in voice selection ## [0.1.19] - 2025-06-15 ### Added - TTS provider selection parameter to converse function ("openai" or "kokoro") - Auto-detection of TTS provider based on voice selection - Support for multiple TTS endpoints with provider-specific clients ## [0.1.18] - 2025-06-15 ### Changed - Removed mcp-neovim-server from .mcp.json configuration ## [0.1.17] - 2025-06-15 ### Changed - Minor version bump (no functional changes) ## [0.1.16] - 2025-06-15 ## [0.1.16] - 2025-06-15 ### Added - Voice parameter to converse function for dynamic TTS voice selection - Support for Kokoro voices: af_sky, af_sarah, am_adam, af_nicole, am_michael - Python 3.13 support with conditional audioop-lts dependency ### Fixed - BrokenResourceError when concurrent voice operations interfere with MCP stdio communication - Enhanced sounddevice stderr redirection workaround to prevent stdio corruption - Added concurrency lock to serialize audio operations and prevent race conditions - Protected stdio file descriptors during audio recording and playback operations - Added anyio.BrokenResourceError to exception handling for MCP disconnections - Configure pytest to exclude manual test scripts from CI builds ## [0.1.15] - 2025-06-14 ### Fixed - Removed load_dotenv call that was causing import error ## [0.1.14] - 2025-06-14 ### Fixed - Updated GitHub workflows for new project structure ## [0.1.13] - 2025-06-14 ### Added - Performance timing in voice responses showing TTS, recording, and STT durations - Local STT/TTS documentation for Whisper.cpp and Kokoro - CONTRIBUTING.md with development setup instructions - CHANGELOG.md for tracking changes ### Changed - Refactored from python-package subdirectory to top-level Python package - Moved MCP server symlinks from mcp-servers/ to bin/ directory - Updated wrapper script to properly resolve symlinks for venv detection - Improved signal handlers to prevent premature exit - Configure build to only include essential files in package ### Fixed - Audio playback dimension mismatch when adding silence buffer - MCP server connection persistence (was disconnecting after each request) - Event loop cleanup errors on shutdown - Wrapper script path resolution for symlinked execution - Critical syntax errors in voice-mcp script ### Removed - Unused python-dotenv dependency - Temporary test files (test_audio.py, test_minimal_mcp.py) - Redundant test dependencies in pyproject.toml - All container/Docker support ## [0.1.12] - 2025-06-14 ### Added - Kokoro TTS support with configuration examples - Export examples in .env.example for various setups - Centralized version management and automatic PyPI publishing ### Changed - Simplified project structure with top-level package ## [0.1.11] - 2025-06-13 ### Added - Initial voice-mcp implementation - OpenAI-compatible STT/TTS support - LiveKit integration for room-based voice communication - MCP tool interface with converse, listen_for_speech, check_room_status, and check_audio_devices - Debug mode with audio recording capabilities - Support for multiple transport methods (local microphone and LiveKit) ## [0.1.0 - 0.1.10] - 2025-06-13 ### Added - Initial development and iteration of voice-mcp - Basic MCP server structure - OpenAI API integration for STT/TTS - Audio recording and playback functionality - Configuration via environment variables

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server