Skip to main content
Glama

Voice Mode

by mbailey
kokoro-setup.md•6.85 kB
# Kokoro Text-to-Speech Setup Kokoro is a high-quality local text-to-speech service that provides natural-sounding voices in multiple languages. It offers an OpenAI-compatible API that VoiceMode can use as an alternative to cloud-based TTS services. ## Quick Start ```bash # Install kokoro service voice-mode kokoro install # Start the service voice-mode kokoro start # Check status voice-mode kokoro status ``` Default endpoint: `http://127.0.0.1:8880/v1` ## Installation Methods ### Automatic Installation (Recommended) VoiceMode includes an installation tool that handles everything: ```bash # Install kokoro with default settings voice-mode kokoro install # Or using Claude Code claude converse "Please install kokoro-fastapi" ``` This will: - Clone the kokoro-fastapi repository to `~/.voicemode/kokoro-fastapi` - Install UV package manager if needed - Set up automatic startup (systemd on Linux, launchd on macOS) - Start the service on port 8880 - Download models automatically on first use ### Manual Installation #### Prerequisites ```bash # Ensure Python 3.8+ is installed python3 --version # Install uvx pip install uvx ``` #### Download and Run ```bash # Create models directory mkdir -p ~/Models/kokoro # Run kokoro-fastapi with uvx uvx kokoro-fastapi[cpu] serve \ --host 127.0.0.1 \ --port 8880 \ --models-dir ~/Models/kokoro ``` Models download automatically from [Hugging Face](https://huggingface.co/hexgrad/Kokoro-82M) on first use. ## Available Voices ### English Voices - **American Female**: `af_sky` (default), `af_sarah` - **American Male**: `am_adam`, `am_michael` - **British Female**: `bf_emma`, `bf_isabella` - **British Male**: `bm_george`, `bm_lewis` ### International Voices - **Spanish**: `ef_dora` (female), `em_alex` (male) - **French**: `ff_siwis` (female), `fm_gabriel` (male) - **Italian**: `if_sara` (female), `im_nicola` (male) - **Portuguese**: `pf_dora` (female), `pm_alex` (male) - **Chinese**: `zf_xiaobei` (female), `zm_yunjian` (male) - **Japanese**: `jf_alpha` (female), `jm_kumo` (male) - **Hindi**: `hf_alpha` (female), `hm_omega` (male) ## Service Configuration ### Environment Variables Configure in `~/.voicemode/voicemode.env`: ```bash VOICEMODE_KOKORO_PORT=8880 VOICEMODE_KOKORO_MODELS_DIR=~/Models/kokoro VOICEMODE_KOKORO_CACHE_DIR=~/.voicemode/cache/kokoro VOICEMODE_KOKORO_DEFAULT_VOICE=af_sky ``` ### Service Management #### macOS (LaunchAgent) ```bash # Start/stop service launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist # Enable/disable at startup launchctl load -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist launchctl unload -w ~/Library/LaunchAgents/com.voicemode.kokoro.plist # Check status launchctl list | grep kokoro ``` #### Linux (Systemd) ```bash # Start/stop service systemctl --user start kokoro systemctl --user stop kokoro # Enable/disable at startup systemctl --user enable kokoro systemctl --user disable kokoro # Check status and logs systemctl --user status kokoro journalctl --user -u kokoro -f ``` ## Integration with VoiceMode VoiceMode automatically detects Kokoro when available: 1. **First**: Checks for Kokoro on `http://127.0.0.1:8880/v1` 2. **Fallback**: Uses OpenAI API (requires `OPENAI_API_KEY`) ### Custom Configuration To use a different endpoint or specify a voice: ```bash export TTS_BASE_URL=http://127.0.0.1:8880/v1 export TTS_VOICE=af_sky # Optional: specify voice ``` Or in MCP configuration: ```json "voice-mode": { "env": { "TTS_BASE_URL": "http://127.0.0.1:8880/v1", "TTS_VOICE": "af_sky" } } ``` ## Fully Local Setup For completely offline voice processing, combine Kokoro with Whisper: ```bash export TTS_BASE_URL=http://127.0.0.1:8880/v1 # Kokoro for TTS export STT_BASE_URL=http://127.0.0.1:2022/v1 # Whisper for STT export TTS_VOICE=af_sky # Kokoro voice ``` ## Service Files ### macOS LaunchAgent Create `~/Library/LaunchAgents/com.voicemode.kokoro.plist`: ```xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.voicemode.kokoro</string> <key>ProgramArguments</key> <array> <string>/usr/local/bin/uvx</string> <string>kokoro-fastapi[cpu]</string> <string>serve</string> <string>--host</string> <string>127.0.0.1</string> <string>--port</string> <string>8880</string> <string>--models-dir</string> <string>/Users/YOUR_USERNAME/Models/kokoro</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> <key>EnvironmentVariables</key> <dict> <key>PATH</key> <string>/usr/local/bin:/usr/bin:/bin</string> </dict> </dict> </plist> ``` ### Linux Systemd Service Create `~/.config/systemd/user/kokoro.service`: ```ini [Unit] Description=Kokoro Text-to-Speech Service After=network.target [Service] Type=simple ExecStart=/usr/local/bin/uvx kokoro-fastapi[cpu] serve \ --host 127.0.0.1 \ --port 8880 \ --models-dir %h/Models/kokoro Restart=always RestartSec=10 Environment="PATH=/usr/local/bin:/usr/bin:/bin" [Install] WantedBy=default.target ``` ## Performance Kokoro runs locally on your machine: - **Generation time**: 1-3 seconds for short phrases - **CPU usage**: Moderate, depends on text length - **Memory**: ~500MB-1GB depending on loaded models - **Disk space**: ~300MB per language model For better performance: - Use CPU version for most systems: `kokoro-fastapi[cpu]` - GPU version available for CUDA-enabled systems - Adjust cache directory to SSD for faster access ## Troubleshooting ### Service Won't Start - Check if port 8880 is already in use: `lsof -i :8880` - Verify uvx is installed: `which uvx` - Check Python version: `python3 --version` (requires 3.8+) ### Models Not Found - Ensure models directory exists and has correct permissions - Models download automatically on first request - Manual download: https://huggingface.co/hexgrad/Kokoro-82M ### Voice Not Working - Verify service is running: `curl http://127.0.0.1:8880/v1/models` - Check logs for errors (see service management commands) - Try a different voice to rule out model issues ### Performance Issues - Ensure adequate CPU resources are available - Consider using a smaller text chunk size - Check disk I/O if models are on slow storage ## File Locations - **Models**: `~/Models/kokoro/` or `~/.voicemode/services/kokoro/models/` - **Cache**: `~/.voicemode/cache/kokoro/` - **Service Files**: - macOS: `~/Library/LaunchAgents/com.voicemode.kokoro.plist` - Linux: `~/.config/systemd/user/kokoro.service` - **Installation**: `~/.voicemode/kokoro-fastapi/`

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server