MCP Servers for Speech Processing

Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.

View all MCP Servers

  • A
    security
    A
    license
    A
    quality
    JavaScript implementation of MiniMax MCP that enables interaction with MiniMax AI services for image generation, video generation, text-to-speech, and voice cloning through MCP-compatible clients.
    6
    113
    17
    TypeScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    An official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.
    19
    543
    Python
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    Provides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.
    4
    8
    TypeScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    Enables integration of DeepSeek's language models with MCP-compatible applications, offering features like chat completion, custom model selection, and parameter control for enhancing language-based interactions.
    2
    142
    165
    JavaScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables AI assistants to generate images, text, and audio through the Pollinations APIs without requiring authentication.
    7
    325
    4
    JavaScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
    1
    2
    JavaScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    An MCP server implementation that integrates with Minimax API to provide AI-powered image generation and text-to-speech functionality in editors like Windsurf and Cursor.
    2
    192
    1
    JavaScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables retrieval of transcripts from YouTube videos. This server provides direct access to video captions and subtitles through a simple interface.
    1
    723
    72
    JavaScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    Enables text-to-speech functionality on macOS using the say command, offering extensive control over speech parameters like voice, rate, volume, and pitch for a customizable auditory experience.
    2
    7
    11
    JavaScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    This server enables AI models to send SMS messages and initiate Text-to-Speech calls programmatically using ClickSend's API with built-in rate limiting and input validation.
    2
    1
    JavaScript
    MIT License
  • A
    security
    F
    license
    A
    quality
    Enables users to control the cursor in Figma through verbal commands using an agentic AI agent, streamlining the design process with a new interaction method.
    19
    5,006
    1
    JavaScript
  • A
    security
    F
    license
    A
    quality
    Facilitates direct speech generation using Claude for multiple languages and emotions, integrating with a Zonos TTS setup via the Model Context Protocol.
    1
    6
    TypeScript
  • A
    security
    F
    license
    A
    quality
    A FastMCP tool that enables control of Spotify through natural language commands in Cursor Composer, allowing users to manage playback, search for content, and interact with playlists.
    21
    1
    Python
  • -
    security
    A
    license
    -
    quality
    Expose all Home Assistant voice intents through a Model Context Protocol Server allowing home control.
    30
    Python
    Apache 2.0
  • -
    security
    A
    license
    -
    quality
    Connects Claude Desktop to Hugging Face Spaces with minimal setup, enabling capabilities like image generation, vision tasks, text-to-speech, and chat with AI models.
    184
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Enables seamless integration between Ollama's local LLM models and MCP-compatible applications, supporting model management and chat interactions.
    50
    13
    TypeScript
    MIT License
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that enables AI assistants like Claude to initiate and manage real-time voice calls using Twilio and OpenAI's voice models.
    14
    TypeScript
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads.
    2
    184
    219
    TypeScript
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Enables recording audio from a microphone and transcribing it using OpenAI's Whisper model. Works as both a standalone MCP server and a Goose AI agent extension.
    4
    Python
    MIT License
  • -
    security
    F
    license
    -
    quality
    Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
    2
    Python
  • -
    security
    F
    license
    -
    quality
    A multi-agent human-computer interaction system that enables natural interaction through integrated visual recognition, speech recognition, and speech synthesis capabilities.
    1
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A Goose MCP extension providing voice interaction with modern audio visualization, allowing users to communicate with Goose through speech rather than text.
    26
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A server providing text-to-speech and speech-to-text functionalities using Windows' native speech services without external dependencies.
    4
    JavaScript
  • -
    security
    F
    license
    -
    quality
    A specialized Model Context Protocol (MCP) server that enables AI-powered interview roleplay scenarios for practice with realistic conversational feedback.
    6
    3
    TypeScript
  • -
    security
    F
    license
    -
    quality
    Integrates ElevenLabs Text-to-Speech capabilities with Cursor through the Model Context Protocol, allowing users to convert text to speech with selectable voices within the Cursor editor.
    1
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    Enables users to manage Gmail accounts using AI agent-assisted operations via an MCP protocol, supporting email search, reading, deletion, and sending with a voice-powered interface.
    2
    5
    TypeScript