Speech Processing
Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.
MCP ServersBrowse all →
AlicenseAqualityBmaintenanceEnables transcription of audio and video files using mocoVoice API, allowing users to start transcription jobs and retrieve results directly from Claude Desktop.Last updated63MIT
@vocea.app/mcp-serverofficial
AlicenseAqualityCmaintenanceEnables AI agents to generate speech, transcribe audio, and manage voices via the Vocea API.Last updated6MIT- AlicenseAqualityCmaintenanceProvides speech recognition (STT) and synthesis (TTS) tools via the Sber SaluteSpeech API, enabling audio transcription and voice generation through natural language.Last updated25271MIT

ContextPulseofficial
AlicenseAqualityDmaintenanceLets AI assistants understand what you're working on — current screen content, recent dictation, clipboard, and saved notes — running entirely on your own machine with nothing sent to the cloud.Last updated36AGPL 3.0
ElevenLabs MCP Serverofficial
AlicenseAqualityBmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,396MIT
supertone-mcpofficial
AlicenseAqualityBmaintenanceMCP server for the Supertone TTS API. Generate natural speech, browse and preview the voice catalog, predict synthesis cost, and create cloned voices — directly from Claude Desktop, Cursor, or any MCP-compatible client. Supports Korean, English, Japanese, and 20+ other languages, with speed, pitch, and emotion-style control.Last updated102MIT
Anam MCP Serverofficial
AlicenseBqualityFmaintenanceEnables managing AI personas, avatars, voices, and sessions from any MCP client, for integration with Anam AI.Last updated5427MIT- AlicenseAqualityBmaintenanceLet your AI agent call your phone and talk to you — MCP servers for live, interruptible voice calls + tiered alerts, using free self-hosted pieces (pjsua2 + whisper.cpp + Linphone). No paid telephony, no extra API key.Last updated2313Apache 2.0
- AlicenseAqualityFmaintenanceEnables AI assistants to transcribe audio files from URLs or local paths using AssemblyAI's services, with support for speaker diarization, language detection, and asynchronous job management through a standardized MCP interface.Last updated4181MIT
- AlicenseAqualityDmaintenanceTranscribes videos from 1000+ platforms (YouTube, TikTok, Vimeo, etc.) and local video files using OpenAI's Whisper model, with support for 90+ languages and multiple output formats.Last updated4271MIT
- AlicenseAqualityCmaintenanceProvides accurate meeting transcription with speaker diarization and multilingual support, allowing users to submit audio URLs, poll transcription status, get transcripts, and summarize via MCP tools in their IDE.Last updated81MIT
- AlicenseAqualityCmaintenanceA Model Context Protocol server for FlowSpeech text-to-speech. It lets MCP-compatible clients generate human-like audio with context-aware emotion control, pause control, multi-speaker dialogue, and 30+ available voices.Last updated3MIT
- AlicenseBqualityDmaintenanceEnables AI video generation, replica management, conversational AI, lipsync, and speech synthesis through the Tavus API. Provides 29 tools across Phoenix replicas, video generation, personas, lipsync, and text-to-speech capabilities.Last updated292MIT
- AlicenseBqualityDmaintenanceA Node.js server that enables AI assistants to interact with Bouyomi-chan's text-to-speech functionality through Model Context Protocol (MCP), allowing for voice reading of text with adjustable parameters.Last updated12MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that integrates with AivisSpeech to enable AI assistants to convert text to natural-sounding Japanese speech with customizable voice parameters.Last updated1387Apache 2.0
- AlicenseBqualityCmaintenanceEnables natural language-driven speech synthesis using Fish Audio's Text-to-Speech API, supporting multiple voices, streaming, and flexible configuration.Last updated28MIT
- AlicenseBqualityDmaintenanceProvides voice notifications using Grok's text-to-speech API to alert users when Claude Code completes tasks, with support for both local and remote server configurations.Last updated1MIT
- AlicenseAqualityDmaintenanceMCP Server for automated conversational phone calls using Asterisk with Speech-to-Speech capabilities, allowing users to make phone conversations as easily as writing a prompt.Last updated9196MIT
- AlicenseAqualityDmaintenanceAnalyzes speech audio to detect emotions, urgency, and sarcasm using prosodic features.Last updated51MIT
- AlicenseBqualityDmaintenanceProvides VOICEVOX text-to-speech as an MCP tool. Requires a running VOICEVOX engine on localhost.Last updated111771Apache 2.0
- AlicenseAqualityDmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated854MIT
- AlicenseAqualityCmaintenanceAn MCP server that makes AI agents speak a brief summary of every response out loud using TTS.Last updated1GPL 3.0
- AlicenseBqualityCmaintenanceProvides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.Last updated419MIT
- AlicenseAqualityBmaintenanceAn MCP server that enables transcribing local audio files and Telegram voice messages using OpenAI's Whisper via local inference or cloud API. It supports multiple audio formats, automatic language detection, and optional word-level timestamps for AI-powered audio analysis.Last updated5MIT
- AlicenseAqualityDmaintenanceExtracts and formats Bilibili video content into structured text, optimized for LLM processing and analysis.Last updated14MIT
- AlicenseAqualityCmaintenanceAI-powered speech tools by Brainiall: pronunciation assessment with phoneme-level feedback, speech-to-text with language detection, and text-to-speech with multiple voices.Last updated4MIT
- AlicenseAquality-maintenanceEnables interaction with ElevenLabs Text-to-Speech and audio processing APIs. Supports speech generation, voice cloning, audio transcription, and sound effect creation through natural language.Last updated24
- AlicenseBqualityFmaintenanceAn MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.Last updated23311MIT
- AlicenseAqualityDmaintenanceExtracts and formats Bilibili video content into structured text for LLM processing and analysis.Last updated14MIT
- AlicenseAqualityDmaintenanceA cross-platform MCP server that enables Claude to speak using Microsoft Edge TTS with support for over 300 voices across 50+ languages. It requires no API keys and allows for customization of speech rate, volume, and pitch.Last updated31MIT
MCP ConnectorsBrowse all →
AI voice agents: assistants, calls, campaigns, leads, knowledge bases, WhatsApp, SMS & SIP trunks.
One key, 100+ models — chat with any LLM and generate video, images, speech. Free trial at 370.ai.
Human-input bridge for AI agents with voice-first answer links, MCP tools, and HTTP APIs.
OCR, transcription, file extraction, and image generation for AI agents via MCP.
AI voice agents on SMB websites — fully autonomous build in 2–3 min. 23 MCP tools. EU, GDPR.
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
Manage Speko voice-AI agents, sessions, calls, phone numbers, knowledge bases, evals, and docs.
Noogat is a voice-first note-taking app for iOS and web with an MCP server for AI coding agents. Capture ideas hands-free via Siri, retrieve them inside Claude Code, Claude Desktop, or Cursor. Features: semantic search, AI auto-tagging, search by time, related notes. Pro subscription required for MCP access.
Voice notes that organize themselves. Capture by Siri, AI auto-tags, semantic search retrieves.
Search recordings, summarize meetings, create clips, and automate workflows from your AI assistant.
Official MCP server for OmniDimension. Drive voice agents, dispatch calls, and run bulk campaigns.
Podcast intelligence for agents: transcripts, clips, speaker diarization, mention tracking.
Voice AI assistant builder for websites. Create, deploy, and analyze AI voice bots that understand natural speech, navigate pages, fill forms, and respond in 50+ languages. Includes knowledge base training, visitor intelligence, and conversation analytics.
An MCP server that fetches video transcripts/subtitles, with pagination for large responses. Supports YouTube, Twitter/X, Instagram, TikTok, Twitch, Vimeo, Facebook, Bilibili, VK, Dailymotion, Reddit. Whisper fallback — transcribes audio when subtitles are unavailable.
Read AI preliminary phone screens — your roles, candidate scores, transcripts, and analytics.
Voice-led, FSRS-scheduled flashcards from YouTube, PDFs, web, or text. Auto-graded quizzes.
Transcribe YouTube via Whisper. Summaries, chapters, semantic-search across your corpus.
Give AI agents real phone numbers, messages, and voice calls via MCP.
Access your Cosmonote audio notes, transcriptions, summaries, and action items.
YouTube video search with transcript extraction as first-class output.