Speech MCP

# Speech MCP Changes ## Latest Changes ### 7. Kokoro TTS Integration - Added support for Kokoro TTS, a high-quality neural text-to-speech engine - Created a modular TTS adapter system to support multiple TTS engines - Added a Kokoro adapter that falls back to pyttsx3 if Kokoro is not available - Added pip optional dependencies for easy installation: `pip install speech-mcp[kokoro]` - Added an installation script for Kokoro and its dependencies - Updated documentation with information about Kokoro TTS - Enhanced the UI to use Kokoro when available - Added support for multiple voice styles and languages ### 6. Freezing and State Management Issues - Fixed issue where `start_conversation()` would freeze indefinitely - Added file system permission testing to verify transcription file can be created - Added fallback transcription mechanism when UI process fails to create transcription file - Reduced timeout from 10 minutes to 30 seconds for better responsiveness - Added emergency transcription when timeout occurs - Added more detailed debug output to console - Added psutil dependency for better process management - Enhanced state management to avoid getting stuck in listening state - Added force reset of speech state at the beginning of `start_conversation()` - Modified error handling to return error messages instead of raising exceptions - Updated documentation with troubleshooting information ### 5. Migrated to faster-whisper - Replaced openai-whisper with faster-whisper for improved performance - Updated transcription processing to work with faster-whisper's API - Updated documentation to reflect the new dependency - Configured faster-whisper to use CPU with int8 quantization for better compatibility ## Previous Fixes ### 1. Listen Function Timeout Issue - Increased the timeout from 60 seconds to 10 minutes - Added progress messages during listening to show that the system is still waiting - Reduced silence detection threshold from 0.01 to 0.005 to make it less sensitive - Increased maximum silence duration from 1.5 to 2.0 seconds before ending recording ### 2. Speak Function Not Producing Audio - Added pyttsx3 as a dependency for text-to-speech functionality - Implemented actual speech synthesis using pyttsx3 instead of just simulating speech with delays - Added fallback to simulation if text-to-speech fails ### 3. UI Not Opening - Enhanced error handling and logging in the UI startup process - Added detailed logging of UI process output to help diagnose issues - Increased the startup wait time from 1 to 2 seconds - Added checks to verify if the UI process is still running after startup - Improved process termination handling for existing UI processes - **Fixed path issue:** Changed UI process startup to use Python module import (`python -m speech_mcp.ui`) instead of direct file path - **Fixed log output:** Improved log output formatting to clean up error messages - **Added minimal UI:** Implemented a simple status window that shows when the system is listening or speaking ### 4. Simplified API - Reduced the API to just two main functions: - `start_conversation()`: Launches the UI and immediately starts listening - `reply(text)`: Speaks the provided text and then listens for a response - Removed separate `start_voice_mode()`, `listen()`, and `speak()` functions - Simplified the workflow for voice conversations ## How to Test 1. Make sure all dependencies are installed: ``` source .venv/bin/activate uv pip install -e . ``` 2. (Optional) Install Kokoro TTS: ``` python scripts/install_kokoro.py ``` 3. Run the speech-mcp server: ``` speech-mcp ``` 4. Start a conversation: ``` user_input = start_conversation() ``` 5. Reply to the user and get their response: ``` user_response = reply("This is a test of the speech synthesis system") ``` ## Troubleshooting If you encounter issues: 1. Check the log files: - `/Users/mnovich/Development/speech-mcp/src/speech_mcp/speech-mcp.log` - `/Users/mnovich/Development/speech-mcp/src/speech_mcp/speech-mcp-server.log` - `/Users/mnovich/Development/speech-mcp/src/speech_mcp/speech-mcp-ui.log` 2. Make sure the UI process is running: ``` ps aux | grep speech_mcp ``` 3. If the UI is not running, check for error messages in the logs and try restarting the server. 4. If the extension seems stuck, try deleting or resetting the state file: ``` echo '{"ui_active": false, "listening": false, "speaking": false, "last_transcript": "", "last_response": ""}' > src/speech_mcp/speech_state.json ``` 5. Use the direct command instead of `uv run speech-mcp`: ``` goose session --with-extension "speech-mcp" ```