Speech MCP
by Kvadratni
# Speech MCP Changes
## Latest Changes
### 7. Kokoro TTS Integration
- Added support for Kokoro TTS, a high-quality neural text-to-speech engine
- Created a modular TTS adapter system to support multiple TTS engines
- Added a Kokoro adapter that falls back to pyttsx3 if Kokoro is not available
- Added pip optional dependencies for easy installation: `pip install speech-mcp[kokoro]`
- Added an installation script for Kokoro and its dependencies
- Updated documentation with information about Kokoro TTS
- Enhanced the UI to use Kokoro when available
- Added support for multiple voice styles and languages
### 6. Freezing and State Management Issues
- Fixed issue where `start_conversation()` would freeze indefinitely
- Added file system permission testing to verify transcription file can be created
- Added fallback transcription mechanism when UI process fails to create transcription file
- Reduced timeout from 10 minutes to 30 seconds for better responsiveness
- Added emergency transcription when timeout occurs
- Added more detailed debug output to console
- Added psutil dependency for better process management
- Enhanced state management to avoid getting stuck in listening state
- Added force reset of speech state at the beginning of `start_conversation()`
- Modified error handling to return error messages instead of raising exceptions
- Updated documentation with troubleshooting information
### 5. Migrated to faster-whisper
- Replaced openai-whisper with faster-whisper for improved performance
- Updated transcription processing to work with faster-whisper's API
- Updated documentation to reflect the new dependency
- Configured faster-whisper to use CPU with int8 quantization for better compatibility
## Previous Fixes
### 1. Listen Function Timeout Issue
- Increased the timeout from 60 seconds to 10 minutes
- Added progress messages during listening to show that the system is still waiting
- Reduced silence detection threshold from 0.01 to 0.005 to make it less sensitive
- Increased maximum silence duration from 1.5 to 2.0 seconds before ending recording
### 2. Speak Function Not Producing Audio
- Added pyttsx3 as a dependency for text-to-speech functionality
- Implemented actual speech synthesis using pyttsx3 instead of just simulating speech with delays
- Added fallback to simulation if text-to-speech fails
### 3. UI Not Opening
- Enhanced error handling and logging in the UI startup process
- Added detailed logging of UI process output to help diagnose issues
- Increased the startup wait time from 1 to 2 seconds
- Added checks to verify if the UI process is still running after startup
- Improved process termination handling for existing UI processes
- **Fixed path issue:** Changed UI process startup to use Python module import (`python -m speech_mcp.ui`) instead of direct file path
- **Fixed log output:** Improved log output formatting to clean up error messages
- **Added minimal UI:** Implemented a simple status window that shows when the system is listening or speaking
### 4. Simplified API
- Reduced the API to just two main functions:
- `start_conversation()`: Launches the UI and immediately starts listening
- `reply(text)`: Speaks the provided text and then listens for a response
- Removed separate `start_voice_mode()`, `listen()`, and `speak()` functions
- Simplified the workflow for voice conversations
## How to Test
1. Make sure all dependencies are installed:
source .venv/bin/activate
uv pip install -e .
2. (Optional) Install Kokoro TTS:
python scripts/install_kokoro.py
3. Run the speech-mcp server:
4. Start a conversation:
user_input = start_conversation()
5. Reply to the user and get their response:
user_response = reply("This is a test of the speech synthesis system")
## Troubleshooting
If you encounter issues:
1. Check the log files:
- `/Users/mnovich/Development/speech-mcp/src/speech_mcp/speech-mcp.log`
- `/Users/mnovich/Development/speech-mcp/src/speech_mcp/speech-mcp-server.log`
- `/Users/mnovich/Development/speech-mcp/src/speech_mcp/speech-mcp-ui.log`
2. Make sure the UI process is running:
ps aux | grep speech_mcp
3. If the UI is not running, check for error messages in the logs and try restarting the server.
4. If the extension seems stuck, try deleting or resetting the state file:
echo '{"ui_active": false, "listening": false, "speaking": false, "last_transcript": "", "last_response": ""}' > src/speech_mcp/speech_state.json
5. Use the direct command instead of `uv run speech-mcp`:
goose session --with-extension "speech-mcp"