Speech Processing
Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.
MCP ServersBrowse all →
- Apache 2.0

ElevenLabs MCP Serverofficial
AlicenseAqualityCmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,334MIT
ContextPulseofficial
AlicenseAqualityDmaintenanceLets AI assistants understand what you're working on — current screen content, recent dictation, clipboard, and saved notes — running entirely on your own machine with nothing sent to the cloud.Last updated361- AlicenseAqualityAmaintenanceLocal MCP voice coach that provides English pronunciation, grammar, and fluency feedback from microphone recordings. It supports free-form voice conversation, focused practice drills, phoneme-level feedback, prosody hints, and learner-profile guidance.Last updated28MIT
- AlicenseBqualityFmaintenanceA server that enables Claude 3.7 and other AI agents to access VOICEVOX-compatible speech synthesis engines (AivisSpeech, VOICEVOX, COEIROINK) through the Model Context Protocol.Last updated111MIT
- AlicenseAqualityCmaintenanceAn MCP server that enables transcribing local audio files and Telegram voice messages using OpenAI's Whisper via local inference or cloud API. It supports multiple audio formats, automatic language detection, and optional word-level timestamps for AI-powered audio analysis.Last updated5MIT
- AlicenseAqualityCmaintenanceMCP Server for automated conversational phone calls using Asterisk with Speech-to-Speech capabilities, allowing users to make phone conversations as easily as writing a prompt.Last updated9986MIT
- AlicenseAqualityCmaintenanceProvides accurate meeting transcription with speaker diarization and multilingual support, allowing users to submit audio URLs, poll transcription status, get transcripts, and summarize via MCP tools in their IDE.Last updated8MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that integrates with AivisSpeech to enable AI assistants to convert text to natural-sounding Japanese speech with customizable voice parameters.Last updated11018Apache 2.0
- AlicenseBqualityCmaintenanceProvides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.Last updated419MIT
- AlicenseBqualityCmaintenanceEnables AI video generation, replica management, conversational AI, lipsync, and speech synthesis through the Tavus API. Provides 29 tools across Phoenix replicas, video generation, personas, lipsync, and text-to-speech capabilities.Last updated292MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that provides text-to-speech functionality for AI agents using Microsoft Edge's text-to-speech technology, supporting multiple voices, languages, and voice customization.Last updated27MIT
- AlicenseAqualityCmaintenanceBitcoin-powered AI tools via Lightning Network micropayments (L402). Image generation, text generation, video, music, speech, 3D models, file conversion, and SMS — no signup or API keys required.Last updated4916MIT
- AlicenseAqualityCmaintenanceA Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.Last updated1528The Unlicense
- AlicenseAqualityBmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated852MIT
- AlicenseAqualityDmaintenanceEnables AI assistants to transcribe audio files from URLs or local paths using AssemblyAI's services, with support for speaker diarization, language detection, and asynchronous job management through a standardized MCP interface.Last updated451MIT
- AlicenseAquality-maintenanceEnables interaction with ElevenLabs Text-to-Speech and audio processing APIs. Supports speech generation, voice cloning, audio transcription, and sound effect creation through natural language.Last updated24
- AlicenseBqualityFmaintenanceAn MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.Last updated26110MIT
- AlicenseAquality-maintenanceFirst Voice AI MCP for AI AgentsLast updated5
- AlicenseAqualityCmaintenanceAn MCP server that enables LLMs to generate spoken audio from text using OpenAI's Text-to-Speech API, supporting various voices, models, and audio formats.Last updated131MIT
- AlicenseAqualityDmaintenanceA text-to-speech MCP server that enables AI assistants to speak using the VOICEVOX engine with support for multi-character conversations. It features queue management, low-latency streaming via FFplay, and cross-platform playback across Windows, macOS, and Linux.Last updated714914ISC
- AlicenseAqualityCmaintenanceA cross-platform MCP server that enables Claude to speak using Microsoft Edge TTS with support for over 300 voices across 50+ languages. It requires no API keys and allows for customization of speech rate, volume, and pitch.Last updated31MIT
- AlicenseAqualityAmaintenanceA Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.Last updated121,552MIT
- AlicenseBqualityCmaintenanceAn MCP server that enables LLMs to access the NijiVoice API for text-to-speech generation, supporting features like fetching available voice actors and checking credit balance.Last updated32MIT
- AlicenseBqualityCmaintenanceProvides voice notifications using Grok's text-to-speech API to alert users when Claude Code completes tasks, with support for both local and remote server configurations.Last updated1MIT
- AlicenseBqualityDmaintenanceA MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.Last updated129MIT
- AlicenseBqualityCmaintenanceEnables downloading videos from platforms like YouTube and converting them to text using OpenAI Whisper and ffmpeg. It supports multiple output formats including TXT, JSON, SRT, and VTT for transcriptions.Last updated20ISC
- AlicenseBqualityCmaintenanceA Node.js server that enables AI assistants to interact with Bouyomi-chan's text-to-speech functionality through Model Context Protocol (MCP), allowing for voice reading of text with adjustable parameters.Last updated12MIT
- AlicenseAqualityBmaintenanceAI-powered speech tools by Brainiall: pronunciation assessment with phoneme-level feedback, speech-to-text with language detection, and text-to-speech with multiple voices.Last updated4MIT
- AlicenseAqualityCmaintenanceTranscribes videos from 1000+ platforms (YouTube, TikTok, Vimeo, etc.) and local video files using OpenAI's Whisper model, with support for 90+ languages and multiple output formats.Last updated4421MIT
MCP ConnectorsBrowse all →
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
AI voice agents on SMB websites — fully autonomous build in 2–3 min. 23 MCP tools. EU, GDPR.
OCR, transcription, file extraction, and image generation for AI agents via MCP.
An MCP server that fetches video transcripts/subtitles, with pagination for large responses. Supports YouTube, Twitter/X, Instagram, TikTok, Twitch, Vimeo, Facebook, Bilibili, VK, Dailymotion, Reddit. Whisper fallback — transcribes audio when subtitles are unavailable.
Give AI agents real phone numbers, messages, and voice calls via MCP.
AI-powered calorie tracking with photo recognition, barcode scanning, and voice logging
Access your Cosmonote audio notes, transcriptions, summaries, and action items.
Create and manage AI voice agents, real-time conversations, and analytics with eigi.ai
Free hosted API serving 10 professional AI voice clones powered by ElevenLabs. Browse, search, and get platform-ready configurations for voice integration across 29 platforms. Endpoints include voice listing, search by keyword/language/use-case, natural language recommendations with platform-specific configs, audio previews, and OpenAPI documentation. Zero authentication required, zero integration fee.
Give your AI a face, a voice, and a personality. 3D avatars with custom personas.
Voice-led, FSRS-scheduled flashcards from YouTube, PDFs, web, or text. Auto-graded quizzes.
Search recordings, summarize meetings, create clips, and automate workflows from your AI assistant.
YouTube video search with transcript extraction as first-class output.