MCP Servers for Speech Processing

Voice interaction and speech processing capabilities. Enables converting speech to text, audio commands, and voice generation.

View all MCP Servers

  • A
    security
    A
    license
    A
    quality
    Enables integration of DeepSeek's language models with MCP-compatible applications, offering features like chat completion, custom model selection, and parameter control for enhancing language-based interactions.
    2
    135
    122
    JavaScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables retrieval of transcripts from YouTube videos. This server provides direct access to video captions and subtitles through a simple interface.
    1
    723
    72
    JavaScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    This server enables AI models to send SMS messages and initiate Text-to-Speech calls programmatically using ClickSend's API with built-in rate limiting and input validation.
    2
    1
    JavaScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
    1
    1
    JavaScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    A MCP server that provides audio transcription capabilities through OpenAI's API, allowing users to transcribe audio files with options for language specification and saving results to files.
    1
    1
    JavaScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    Enables text-to-speech functionality on macOS using the say command, offering extensive control over speech parameters like voice, rate, volume, and pitch for a customizable auditory experience.
    2
    7
    11
    JavaScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    Provides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.
    4
    6
    TypeScript
    MIT License
  • A
    security
    F
    license
    A
    quality
    Enables users to control the cursor in Figma through verbal commands using an agentic AI agent, streamlining the design process with a new interaction method.
    19
    1,239
    1
    JavaScript
  • A
    security
    F
    license
    A
    quality
    Facilitates direct speech generation using Claude for multiple languages and emotions, integrating with a Zonos TTS setup via the Model Context Protocol.
    1
    5
    TypeScript
  • A
    security
    F
    license
    A
    quality
    A FastMCP tool that enables control of Spotify through natural language commands in Cursor Composer, allowing users to manage playback, search for content, and interact with playlists.
    21
    1
    Python
  • -
    security
    A
    license
    -
    quality
    Enables seamless integration between Ollama's local LLM models and MCP-compatible applications, supporting model management and chat interactions.
    50
    13
    TypeScript
    MIT License
  • -
    security
    A
    license
    -
    quality
    Records audio from microphone and transcribes it using OpenAI's Whisper model, functioning as both a standalone MCP server and a Goose AI custom extension.
    3
    Python
    MIT License
  • -
    security
    A
    license
    -
    quality
    An MCP server implementation that integrates with Minimax API to provide AI-powered image generation and text-to-speech functionality in editors like Windsurf and Cursor.
    38
    JavaScript
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Enables recording audio from a microphone and transcribing it using OpenAI's Whisper model. Works as both a standalone MCP server and a Goose AI agent extension.
    3
    Python
    MIT License
  • -
    security
    A
    license
    -
    quality
    Connects Claude Desktop to Hugging Face Spaces with minimal setup, enabling capabilities like image generation, vision tasks, text-to-speech, and chat with AI models.
    338
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads.
    2
    338
    152
    TypeScript
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Expose all Home Assistant voice intents through a Model Context Protocol Server allowing home control.
    30
    Python
    Apache 2.0
  • -
    security
    F
    license
    -
    quality
    A specialized Model Context Protocol (MCP) server that enables AI-powered interview roleplay scenarios for practice with realistic conversational feedback.
    6
    1
    TypeScript
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol integration for Zonos TTS that allows Claude to generate and speak text with different emotions and languages directly through audio playback.
    5
    TypeScript
    • Linux
  • -
    security
    F
    license
    -
    quality
    A multi-agent human-computer interaction system that enables natural interaction through integrated visual recognition, speech recognition, and speech synthesis capabilities.
    1
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A collection of Model Context Protocol servers that provide file searching functionality and speech-to-text transcription using Whisper, allowing AI assistants to find files and convert audio to text.
    JavaScript
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A Goose MCP extension that provides voice interaction capability with modern audio visualization, allowing users to speak to their AI assistant rather than typing.
    12
    Python
    • Apple
    • Linux
  • -
    security
    F
    license
    -
    quality
    Integrates ElevenLabs Text-to-Speech capabilities with Cursor through the Model Context Protocol, allowing users to convert text to speech with selectable voices within the Cursor editor.
    1
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A server providing text-to-speech and speech-to-text functionalities using Windows' native speech services without external dependencies.
    3
    JavaScript
  • -
    security
    F
    license
    -
    quality
    Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
    2
    Python
  • -
    security
    F
    license
    -
    quality
    Enables users to manage Gmail accounts using AI agent-assisted operations via an MCP protocol, supporting email search, reading, deletion, and sending with a voice-powered interface.
    2
    5
    TypeScript
  • -
    security
    F
    license
    -
    quality
    A Goose MCP extension providing voice interaction with modern audio visualization, allowing users to communicate with Goose through speech rather than text.
    12
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A specialized Model Context Protocol server that enables AI-powered interview roleplay scenarios for practice with interactive voice interface and real-time feedback.
    6
    1
    TypeScript
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that provides text-to-speech and speech-to-text capabilities using Windows' built-in speech services, requiring no external APIs.
    3
    JavaScript