Speech Processing
Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.
MCP ServersBrowse all →
AsecurityAlicense-qualityAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,310MIT- AsecurityAlicenseAqualityAn MCP server that enables transcribing local audio files and Telegram voice messages using OpenAI's Whisper via local inference or cloud API. It supports multiple audio formats, automatic language detection, and optional word-level timestamps for AI-powered audio analysis.Last updated5MIT
- AsecurityAlicense-qualityTranscribes videos from 1000+ platforms (YouTube, TikTok, Vimeo, etc.) and local video files using OpenAI's Whisper model, with support for 90+ languages and multiple output formats.Last updated14111MIT
- AsecurityAlicenseAqualityAI-powered speech tools by Brainiall: pronunciation assessment with phoneme-level feedback, speech-to-text with language detection, and text-to-speech with multiple voices.Last updated4MIT
- Apache 2.0

spekoai-mcpofficial
-securityAlicense-qualityA Model Context Protocol server that provides voice-AI session management and usage tracking tools for SpekoAI. It enables creating, retrieving, and ending voice sessions while monitoring usage through MCP clients like Claude Desktop.Last updatedMIT- AsecurityAlicense-qualityMCP Server for automated conversational phone calls using Asterisk with Speech-to-Speech capabilities, allowing users to make phone conversations as easily as writing a prompt.Last updated9245MIT
- AsecurityAlicense-qualityA Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.Last updated1828The Unlicense
- AsecurityAlicense-qualityEnables AI video generation, replica management, conversational AI, lipsync, and speech synthesis through the Tavus API. Provides 29 tools across Phoenix replicas, video generation, personas, lipsync, and text-to-speech capabilities.Last updated292MIT
MCP ConnectorsBrowse all →
An MCP server that fetches video transcripts/subtitles, with pagination for large responses. Supports YouTube, Twitter/X, Instagram, TikTok, Twitch, Vimeo, Facebook, Bilibili, VK, Dailymotion, Reddit. Whisper fallback — transcribes audio when subtitles are unavailable.
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
AI-powered calorie tracking with photo recognition, barcode scanning, and voice logging
Create and manage AI voice agents, real-time conversations, and analytics with eigi.ai
Free hosted API serving 10 professional AI voice clones powered by ElevenLabs. Browse, search, and get platform-ready configurations for voice integration across 29 platforms. Endpoints include voice listing, search by keyword/language/use-case, natural language recommendations with platform-specific configs, audio previews, and OpenAPI documentation. Zero authentication required, zero integration fee.