Speech Processing
Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.
MCP ServersBrowse all →
- AlicenseAqualityAmaintenanceLocal MCP voice coach that provides English pronunciation, grammar, and fluency feedback from microphone recordings. It supports free-form voice conversation, focused practice drills, phoneme-level feedback, prosody hints, and learner-profile guidance.Last updated28MIT

ElevenLabs MCP Serverofficial
AlicenseAqualityBmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,371MIT
@vocea.app/mcp-serverofficial
AlicenseAqualityCmaintenanceEnables AI agents to generate speech, transcribe audio, and manage voices via the Vocea API.Last updated6MIT- Apache 2.0

ContextPulseofficial
AlicenseAqualityFmaintenanceLets AI assistants understand what you're working on — current screen content, recent dictation, clipboard, and saved notes — running entirely on your own machine with nothing sent to the cloud.Last updated36AGPL 3.0- AlicenseAqualityCmaintenanceA Model Context Protocol server for FlowSpeech text-to-speech. It lets MCP-compatible clients generate human-like audio with context-aware emotion control, pause control, multi-speaker dialogue, and 30+ available voices.Last updated3MIT
- AlicenseAqualityBmaintenanceAI-powered speech tools by Brainiall: pronunciation assessment with phoneme-level feedback, speech-to-text with language detection, and text-to-speech with multiple voices.Last updated4MIT
- AlicenseAqualityCmaintenanceMCP Server for automated conversational phone calls using Asterisk with Speech-to-Speech capabilities, allowing users to make phone conversations as easily as writing a prompt.Last updated9186MIT
- AlicenseBqualityFmaintenanceAn MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.Last updated22710MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that provides text-to-speech functionality for AI agents using Microsoft Edge's text-to-speech technology, supporting multiple voices, languages, and voice customization.Last updated27MIT
- AlicenseAqualityCmaintenanceProvides accurate meeting transcription with speaker diarization and multilingual support, allowing users to submit audio URLs, poll transcription status, get transcripts, and summarize via MCP tools in their IDE.Last updated81MIT
- AlicenseBqualityCmaintenanceEnables AI video generation, replica management, conversational AI, lipsync, and speech synthesis through the Tavus API. Provides 29 tools across Phoenix replicas, video generation, personas, lipsync, and text-to-speech capabilities.Last updated292MIT
- AlicenseBqualityFmaintenanceA server that enables Claude 3.7 and other AI agents to access VOICEVOX-compatible speech synthesis engines (AivisSpeech, VOICEVOX, COEIROINK) through the Model Context Protocol.Last updated111MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that integrates with AivisSpeech to enable AI assistants to convert text to natural-sounding Japanese speech with customizable voice parameters.Last updated1308Apache 2.0
- AlicenseBqualityCmaintenanceAn MCP server that enables LLMs to access the NijiVoice API for text-to-speech generation, supporting features like fetching available voice actors and checking credit balance.Last updated32MIT
- AlicenseAqualityCmaintenanceA Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.Last updated12928The Unlicense
- AlicenseAqualityAmaintenanceA Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.Last updated1248Unlicense - libtelnet variant
- AlicenseAqualityBmaintenanceOfficial MCP server for the Vocametrix voice analysis API. Gives AI assistants direct access to clinical voice metrics (AVQI, DSI, jitter/shimmer, CPP), pronunciation assessment, speech transcription, prosody similarity, and AI-powered therapy planning. More than 40 endpoints for SLPs, voice researchers, and healthtech developers.Last updated39MIT
- AlicenseAquality-maintenanceEnables interaction with ElevenLabs Text-to-Speech and audio processing APIs. Supports speech generation, voice cloning, audio transcription, and sound effect creation through natural language.Last updated24
- AlicenseAqualityCmaintenanceAn MCP server that enables LLMs to generate spoken audio from text using OpenAI's Text-to-Speech API, supporting various voices, models, and audio formats.Last updated1121MIT
- AlicenseAqualityCmaintenanceBitcoin-powered AI tools via Lightning Network micropayments (L402). Image generation, text generation, video, music, speech, 3D models, file conversion, and SMS — no signup or API keys required.Last updated4943MIT
- AlicenseBqualityCmaintenanceEnables natural language-driven speech synthesis using Fish Audio's Text-to-Speech API, supporting multiple voices, streaming, and flexible configuration.Last updated246MIT
- AlicenseBqualityCmaintenanceProvides voice notifications using Grok's text-to-speech API to alert users when Claude Code completes tasks, with support for both local and remote server configurations.Last updated1MIT
- AlicenseBqualityCmaintenanceProvides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.Last updated419MIT
- AlicenseAqualityDmaintenanceA text-to-speech MCP server that enables AI assistants to speak using the VOICEVOX engine with support for multi-character conversations. It features queue management, low-latency streaming via FFplay, and cross-platform playback across Windows, macOS, and Linux.Last updated714914ISC
- AlicenseAqualityCmaintenanceA cross-platform MCP server that enables Claude to speak using Microsoft Edge TTS with support for over 300 voices across 50+ languages. It requires no API keys and allows for customization of speech rate, volume, and pitch.Last updated31MIT
- AlicenseAqualityCmaintenanceTranscribes videos from 1000+ platforms (YouTube, TikTok, Vimeo, etc.) and local video files using OpenAI's Whisper model, with support for 90+ languages and multiple output formats.Last updated4281MIT
- AlicenseAqualityDmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated854MIT
- AlicenseAqualityCmaintenanceAn MCP server that makes AI agents speak a brief summary of every response out loud using TTS.Last updated1GPL 3.0
- AlicenseBqualityCmaintenanceA Node.js server that enables AI assistants to interact with Bouyomi-chan's text-to-speech functionality through Model Context Protocol (MCP), allowing for voice reading of text with adjustable parameters.Last updated12MIT
MCP ConnectorsBrowse all →
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
AI voice agents on SMB websites — fully autonomous build in 2–3 min. 23 MCP tools. EU, GDPR.
OCR, transcription, file extraction, and image generation for AI agents via MCP.
An MCP server that fetches video transcripts/subtitles, with pagination for large responses. Supports YouTube, Twitter/X, Instagram, TikTok, Twitch, Vimeo, Facebook, Bilibili, VK, Dailymotion, Reddit. Whisper fallback — transcribes audio when subtitles are unavailable.
Give AI agents real phone numbers, messages, and voice calls via MCP.
Voice AI assistant builder for websites. Create, deploy, and analyze AI voice bots that understand natural speech, navigate pages, fill forms, and respond in 50+ languages. Includes knowledge base training, visitor intelligence, and conversation analytics.
AI-powered calorie tracking with photo recognition, barcode scanning, and voice logging
Access your Cosmonote audio notes, transcriptions, summaries, and action items.
Create and manage AI voice agents, real-time conversations, and analytics with eigi.ai
Free hosted API serving 10 professional AI voice clones powered by ElevenLabs. Browse, search, and get platform-ready configurations for voice integration across 29 platforms. Endpoints include voice listing, search by keyword/language/use-case, natural language recommendations with platform-specific configs, audio previews, and OpenAPI documentation. Zero authentication required, zero integration fee.
Read AI preliminary phone screens — your roles, candidate scores, transcripts, and analytics.
Give your AI a face, a voice, and a personality. 3D avatars with custom personas.
Voice-led, FSRS-scheduled flashcards from YouTube, PDFs, web, or text. Auto-graded quizzes.
Search recordings, summarize meetings, create clips, and automate workflows from your AI assistant.
Transcribe YouTube via Whisper. Summaries, chapters, semantic-search across your corpus.
YouTube video search with transcript extraction as first-class output.