Voice MCP

README.md•4.29 KiB

# voice-mcp A local MCP server that provides voice tools for Claude Code - speech-to-text using Whisper and text-to-speech using Supertonic. ## Features ### Speech-to-Text (Whisper) - **listen_and_confirm** - Record speech, transcribe with Whisper, return transcript for confirmation - **listen_for_yes_no** - Quick yes/no detection for binary decisions - Local transcription using [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (no API calls) - Automatic silence detection to stop recording - Audio beeps to indicate recording start/end ### Text-to-Speech (Supertonic) - **speak** - Speak text aloud to the user - Local synthesis using [Supertonic](https://github.com/supertone-inc/supertonic) (no API calls) - Fast on-device generation (66M parameters) ### Combined Tools - **speak_and_listen** - Speak then listen for a full response (reduces round trips) - **speak_and_confirm** - Speak then listen for yes/no (reduces round trips) ## Requirements - Python 3.10+ - [uv](https://docs.astral.sh/uv/) package manager - A microphone (for speech-to-text) - Speakers/headphones (for text-to-speech) ## Installation 1. Clone the repository: ```bash git clone https://github.com/jochiang/voice-mcp.git cd voice-mcp ``` 2. Install dependencies: ```bash uv sync ``` 3. Add to your Claude Code MCP settings (`.mcp.json` in your project or `~/.claude/settings.json`): ```json { "mcpServers": { "voice": { "command": "uv", "args": ["run", "--directory", "/path/to/voice-mcp", "voice-mcp"] } } } ``` 4. Restart Claude Code to load the MCP server. ## Usage ### Voice Input Trigger voice input by saying something like: - "let me explain verbally" - "I'll tell you verbally" Claude will call `listen_and_confirm`, you'll hear a beep, speak your response, and hear another beep when recording stops. Claude will repeat the transcript back for confirmation. For yes/no questions, Claude can use `listen_for_yes_no` which interprets your response as "yes", "no", or "unclear". ### Voice Output Ask Claude to speak responses: - "say that out loud" - "read that to me" Claude will call `speak` to synthesize and play the audio through your speakers. ### Voice Conversations For back-and-forth voice conversations, Claude can use the combined tools: - `speak_and_listen` - Ask a question and wait for a full answer - `speak_and_confirm` - Ask a yes/no question and get confirmation These reduce latency by combining speak + listen in a single tool call. ### Customizing Speech Behavior The tool descriptions include default guidance for how Claude speaks. To customize this behavior, add instructions to your `CLAUDE.md` file. Examples: ```markdown # Voice preferences - When speaking, be brief and conversational - Describe code changes at a high level, don't read syntax - Summarize URLs instead of spelling them out ``` You can encourage different styles - more verbose explanations, different tone, etc. ## Notes - **First-run downloads**: Models download automatically on first use - Whisper small (~460MB) and Supertonic (~260MB) - **Silence detection**: Recording stops after 2.5 seconds of silence - **Platform**: Developed on Windows, should work on macOS/Linux ## Troubleshooting **No audio output from TTS:** - Some DACs require stereo output - the speak tool outputs stereo by default - Check your default audio output device **Recording stops too quickly:** - The silence threshold may be too sensitive for your microphone - Adjust `SILENCE_THRESHOLD` in `src/voice_mcp/audio.py` (default: 0.01) - Increase `SILENCE_DURATION_S` if you need more pause time between phrases (default: 2.5 seconds) **Recording doesn't stop fast enough:** - Decrease `SILENCE_DURATION_S` in `src/voice_mcp/audio.py` for quicker cutoff ## Configuration ### Whisper (Speech-to-Text) The Whisper model defaults to `small` running on CPU. To change this, edit `src/voice_mcp/transcribe.py`: ```python # Model options: tiny, base, small, medium, large-v3 _model = WhisperModel("small", device="cpu", compute_type="int8") ``` For GPU acceleration, change `device="cpu"` to `device="cuda"` (requires cuDNN). ### Supertonic (Text-to-Speech) The default voice is `M1`. Available voices can be found at the [Supertonic voice gallery](https://supertone-inc.github.io/supertonic-py/voices/). ## License MIT

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jochiang/voice-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•4.29 KiB