MCP Servers for Speech Processing

Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.

View all MCP Servers

  • A
    security
    A
    license
    A
    quality
    An official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.
    Last updated -
    19
    633
    Python
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Node.js server that enables AI assistants to interact with Bouyomi-chan's text-to-speech functionality through Model Context Protocol (MCP), allowing for voice reading of text with adjustable parameters.
    Last updated -
    1
    1
    JavaScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.
    Last updated -
    1
    176
    4
    JavaScript
    The Unlicense
    • Apple
    • Linux
  • A
    security
    A
    license
    A
    quality
    A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
    Last updated -
    1
    2
    JavaScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    Provides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.
    Last updated -
    4
    8
    TypeScript
    MIT License
  • A
    security
    F
    license
    A
    quality
    A Python-based server that provides access to Whissle API endpoints for speech-to-text, diarization, translation, and text summarization.
    Last updated -
    5
    Python
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that enables fast and free lipsync video creation for a wide range of digital avatars, supporting both audio and text inputs to generate synchronized lip movements.
    Last updated -
    2
    Python
    MIT License
    • Linux
    • Apple
  • A
    security
    F
    license
    A
    quality
    Facilitates direct speech generation using Claude for multiple languages and emotions, integrating with a Zonos TTS setup via the Model Context Protocol.
    Last updated -
    1
    9
    TypeScript
    • Linux
  • -
    security
    A
    license
    -
    quality
    Connects Claude Desktop to Hugging Face Spaces with minimal setup, enabling capabilities like image generation, vision tasks, text-to-speech, and chat with AI models.
    Last updated -
    188
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Enables Claude and other AI assistants to interact with your computer's audio system, allowing for recording from microphones and playing audio through speakers.
    Last updated -
    1
    Python
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    Enables recording audio from a microphone and transcribing it using OpenAI's Whisper model. Works as both a standalone MCP server and a Goose AI agent extension.
    Last updated -
    4
    Python
    MIT License
  • -
    security
    A
    license
    -
    quality
    A server that enables Claude 3.7 and other AI agents to access VOICEVOX-compatible speech synthesis engines (AivisSpeech, VOICEVOX, COEIROINK) through the Model Context Protocol.
    Last updated -
    2
    TypeScript
    MIT License
    • Linux
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that enables AI assistants like Claude to initiate and manage real-time voice calls using Twilio and OpenAI's voice models.
    Last updated -
    14
    TypeScript
    MIT License
    • Apple
  • -
    security
    -
    license
    -
    quality
    Official Model Context Protocol server that enables interaction with powerful Speech-to-Text and Audio Intelligence APIs, allowing clients like Claude Desktop to transcribe audio, analyze speech, translate content, and more.
    Last updated -
    Python
    MIT License
  • -
    security
    A
    license
    -
    quality
    Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.
    Last updated -
    Python
    MIT License
  • -
    security
    F
    license
    -
    quality
    A multi-agent human-computer interaction system that enables natural interaction through integrated visual recognition, speech recognition, and speech synthesis capabilities.
    Last updated -
    1
    Python
    • Linux
    • Apple
  • -
    security
    -
    license
    -
    quality
    Official ElevenLabs Model Context Protocol server that enables AI assistants like Claude to interact with Text to Speech and audio processing APIs, allowing them to generate speech, clone voices, transcribe audio, and create soundscapes.
    Last updated -
    Python
    MIT License
  • -
    security
    F
    license
    -
    quality
    Enables users to manage Gmail accounts using AI agent-assisted operations via an MCP protocol, supporting email search, reading, deletion, and sending with a voice-powered interface.
    Last updated -
    2
    5
    TypeScript
  • -
    security
    F
    license
    -
    quality
    An MCP server that downloads videos/extracts audio from various platforms like YouTube, Bilibili, and TikTok, then transcribes them to text using OpenAI's Whisper model.
    Last updated -
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
    Last updated -
    2
    Python
  • -
    security
    -
    license
    -
    quality
    Votars is the world's smartest multilingual meeting assistant, designed for voice recording, transcription, and advanced AI processing. It features real-time translation, intelligent error correction, AI summarization, smart content generation, and AI discussions. The Votars app is available on Web,
    Last updated -
    5
    Go
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that provides text-to-speech functionality for AI agents using Microsoft Edge's text-to-speech technology, supporting multiple voices, languages, and voice customization.
    Last updated -
    1
    Python
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server implementation that enables AI assistants to interact with RetellAI's voice services for managing calls, agents, phone numbers, and voice options.
    Last updated -
    1
    TypeScript
  • -
    security
    F
    license
    -
    quality
    A specialized Model Context Protocol (MCP) server that enables AI-powered interview roleplay scenarios for practice with realistic conversational feedback.
    Last updated -
    6
    3
    TypeScript
  • -
    security
    F
    license
    -
    quality
    A Goose MCP extension providing voice interaction with modern audio visualization, allowing users to communicate with Goose through speech rather than text.
    Last updated -
    26
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that enables AI assistants to utilize AivisSpeech Engine's high-quality voice synthesis capabilities through a standardized API interface.
    Last updated -
    TypeScript
  • -
    security
    -
    license
    -
    quality
    An MCP server that enables LLMs to generate spoken audio from text using OpenAI's Text-to-Speech API, supporting various voices, models, and audio formats.
    Last updated -
    1
    JavaScript
    MIT License
  • -
    security
    -
    license
    -
    quality
    An MCP server that enables LLMs to access the NijiVoice API for text-to-speech generation, supporting features like fetching available voice actors and checking credit balance.
    Last updated -
    1
    Python
    MIT License
  • -
    security
    F
    license
    -
    quality
    A server providing text-to-speech and speech-to-text functionalities using Windows' native speech services without external dependencies.
    Last updated -
    4
    JavaScript