Speech Processing
Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.
MCP ServersBrowse all →
- Apache 2.0

ContextPulseofficial
AlicenseAqualityFmaintenanceLets AI assistants understand what you're working on — current screen content, recent dictation, clipboard, and saved notes — running entirely on your own machine with nothing sent to the cloud.Last updated36AGPL 3.0- AlicenseAqualityAmaintenanceLocal MCP voice coach that provides English pronunciation, grammar, and fluency feedback from microphone recordings. It supports free-form voice conversation, focused practice drills, phoneme-level feedback, prosody hints, and learner-profile guidance.Last updated28MIT

ElevenLabs MCP Serverofficial
AlicenseAqualityCmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,347MIT
@vocea.app/mcp-serverofficial
AlicenseAqualityCmaintenanceEnables AI agents to generate speech, transcribe audio, and manage voices via the Vocea API.Last updated6MIT- AlicenseBqualityCmaintenanceA MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.Last updated129MIT
- AlicenseBqualityFmaintenanceA server that enables Claude 3.7 and other AI agents to access VOICEVOX-compatible speech synthesis engines (AivisSpeech, VOICEVOX, COEIROINK) through the Model Context Protocol.Last updated111MIT
- AlicenseAqualityAmaintenanceOfficial MCP server for the Vocametrix voice analysis API. Gives AI assistants direct access to clinical voice metrics (AVQI, DSI, jitter/shimmer, CPP), pronunciation assessment, speech transcription, prosody similarity, and AI-powered therapy planning. More than 40 endpoints for SLPs, voice researchers, and healthtech developers.Last updated39MIT
- AlicenseAqualityDmaintenanceEnables AI assistants to transcribe audio files from URLs or local paths using AssemblyAI's services, with support for speaker diarization, language detection, and asynchronous job management through a standardized MCP interface.Last updated4111MIT
- AlicenseBqualityCmaintenanceAn MCP server that enables LLMs to access the NijiVoice API for text-to-speech generation, supporting features like fetching available voice actors and checking credit balance.Last updated32MIT
- AlicenseAqualityCmaintenanceAn MCP server that enables LLMs to generate spoken audio from text using OpenAI's Text-to-Speech API, supporting various voices, models, and audio formats.Last updated151MIT
- AlicenseAqualityBmaintenanceAI-powered speech tools by Brainiall: pronunciation assessment with phoneme-level feedback, speech-to-text with language detection, and text-to-speech with multiple voices.Last updated4MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that provides text-to-speech functionality for AI agents using Microsoft Edge's text-to-speech technology, supporting multiple voices, languages, and voice customization.Last updated27MIT
- AlicenseBqualityFmaintenanceAn MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.Last updated25710MIT
- AlicenseAqualityAmaintenanceA Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.Last updated121,577Unlicense - libtelnet variant
- AlicenseBqualityCmaintenanceA Node.js server that enables AI assistants to interact with Bouyomi-chan's text-to-speech functionality through Model Context Protocol (MCP), allowing for voice reading of text with adjustable parameters.Last updated12MIT
- AlicenseBqualityCmaintenanceProvides voice notifications using Grok's text-to-speech API to alert users when Claude Code completes tasks, with support for both local and remote server configurations.Last updated1MIT
- AlicenseAqualityCmaintenanceA Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.Last updated12328The Unlicense
- AlicenseAquality-maintenanceEnables interaction with ElevenLabs Text-to-Speech and audio processing APIs. Supports speech generation, voice cloning, audio transcription, and sound effect creation through natural language.Last updated24
- AlicenseBqualityCmaintenanceProvides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.Last updated419MIT
- AlicenseAqualityCmaintenanceFirst Voice AI MCP for AI AgentsLast updated5MIT
- AlicenseBqualityCmaintenanceEnables downloading videos from platforms like YouTube and converting them to text using OpenAI Whisper and ffmpeg. It supports multiple output formats including TXT, JSON, SRT, and VTT for transcriptions.Last updated21ISC
- AlicenseAqualityBmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated852MIT
- AlicenseAqualityDmaintenanceA text-to-speech MCP server that enables AI assistants to speak using the VOICEVOX engine with support for multi-character conversations. It features queue management, low-latency streaming via FFplay, and cross-platform playback across Windows, macOS, and Linux.Last updated714914ISC
- AlicenseAqualityCmaintenanceA cross-platform MCP server that enables Claude to speak using Microsoft Edge TTS with support for over 300 voices across 50+ languages. It requires no API keys and allows for customization of speech rate, volume, and pitch.Last updated31MIT
- AlicenseBqualityDmaintenanceEnables AI video generation, replica management, conversational AI, lipsync, and speech synthesis through the Tavus API. Provides 29 tools across Phoenix replicas, video generation, personas, lipsync, and text-to-speech capabilities.Last updated292MIT
- AlicenseAqualityCmaintenanceProvides accurate meeting transcription with speaker diarization and multilingual support, allowing users to submit audio URLs, poll transcription status, get transcripts, and summarize via MCP tools in their IDE.Last updated8MIT
- AlicenseAqualityCmaintenanceBitcoin-powered AI tools via Lightning Network micropayments (L402). Image generation, text generation, video, music, speech, 3D models, file conversion, and SMS — no signup or API keys required.Last updated4943MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that integrates with AivisSpeech to enable AI assistants to convert text to natural-sounding Japanese speech with customizable voice parameters.Last updated11018Apache 2.0
- AlicenseAqualityCmaintenanceAn MCP server that enables transcribing local audio files and Telegram voice messages using OpenAI's Whisper via local inference or cloud API. It supports multiple audio formats, automatic language detection, and optional word-level timestamps for AI-powered audio analysis.Last updated5MIT
MCP ConnectorsBrowse all →
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
AI voice agents on SMB websites — fully autonomous build in 2–3 min. 23 MCP tools. EU, GDPR.
OCR, transcription, file extraction, and image generation for AI agents via MCP.
An MCP server that fetches video transcripts/subtitles, with pagination for large responses. Supports YouTube, Twitter/X, Instagram, TikTok, Twitch, Vimeo, Facebook, Bilibili, VK, Dailymotion, Reddit. Whisper fallback — transcribes audio when subtitles are unavailable.
Give AI agents real phone numbers, messages, and voice calls via MCP.
AI-powered calorie tracking with photo recognition, barcode scanning, and voice logging
Access your Cosmonote audio notes, transcriptions, summaries, and action items.
Create and manage AI voice agents, real-time conversations, and analytics with eigi.ai
Free hosted API serving 10 professional AI voice clones powered by ElevenLabs. Browse, search, and get platform-ready configurations for voice integration across 29 platforms. Endpoints include voice listing, search by keyword/language/use-case, natural language recommendations with platform-specific configs, audio previews, and OpenAPI documentation. Zero authentication required, zero integration fee.
Give your AI a face, a voice, and a personality. 3D avatars with custom personas.
Voice-led, FSRS-scheduled flashcards from YouTube, PDFs, web, or text. Auto-graded quizzes.
Search recordings, summarize meetings, create clips, and automate workflows from your AI assistant.
YouTube video search with transcript extraction as first-class output.