An official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.
Enables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.
Transcribes videos from 1000+ platforms (YouTube, TikTok, Vimeo, etc.) and local video files using OpenAI's Whisper model, with support for 90+ languages and multiple output formats.
Enables LLMs to analyze music (genre, mood, tempo, key), separate audio stems, detect AI-generated music, and measure loudness using IRCAM Amplify's audio processing APIs.
An MCP server that enables users to control and edit REAPER projects through a Python-based interface and a Lua bridge. It supports managing tracks, controlling transport, manipulating MIDI and audio items, and adjusting FX parameters within the digital audio workstation.
An MCP server that enables transcribing local audio files and Telegram voice messages using OpenAI's Whisper via local inference or cloud API. It supports multiple audio formats, automatic language detection, and optional word-level timestamps for AI-powered audio analysis.
Provides high-quality text-to-speech synthesis with 10 natural voices, emotion control, and dynamic pacing for professional applications requiring expressive speech output.
Enables text-to-speech functionality on macOS using the say command, offering extensive control over speech parameters like voice, rate, volume, and pitch for a customizable auditory experience.
Enables AI-powered podcast generation with single or dual speakers, FlowSpeech audio creation from text/URLs, speaker voice library management, and subscription tracking for ListenHub Pro users.
Enables playback control of local audio files through a virtual audio output device, supporting play, stop, and status queries with configurable root directory and path safety enforcement.
A Model Context Protocol server that enables AI agents to create fully mixed and mastered tracks in REAPER DAW, supporting project management, MIDI composition, audio recording, and mixing automation.
Enables AI-powered music generation through natural language commands, supporting both inspiration mode (AI-generated lyrics and style) and custom mode (user-provided lyrics and parameters) to create songs with direct download links.
Enables video and audio processing through FFmpeg, supporting format conversion, compression, trimming, audio extraction, frame extraction, video merging, and subtitle burning through natural language commands.
Connects Ableton Live to Claude AI through the Model Context Protocol, enabling AI-assisted music production by allowing Claude to directly interact with and control Ableton Live sessions.
Provides powerful video and audio editing capabilities through FFmpeg, enabling AI assistants to perform professional-grade operations including format conversion, trimming, overlays, transitions, and advanced audio processing.
Enables execution of SuperCollider synth code through the Model Context Protocol using supercolliderjs, allowing AI assistants to generate and run audio synthesis programs.
Enables AI video generation, replica management, conversational AI, lipsync, and speech synthesis through the Tavus API. Provides 29 tools across Phoenix replicas, video generation, personas, lipsync, and text-to-speech capabilities.
Enables searching and downloading audio samples from Freesound using keywords, filters, and sound IDs. It provides detailed sound metadata including duration, license information, and preview URLs.
A powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.
Connects Ableton Live to Claude AI via the Model Context Protocol for prompt-assisted music production and session manipulation. It enables users to create tracks, load instruments, and manage MIDI clips using natural language commands.
Enables integration with VOICEVOX text-to-speech services to convert text into audio using a variety of character voices. It provides tools for speech generation, listing available speakers, and monitoring system health.
Enables batch audio processing and optimization using FFmpeg with preset configurations for game audio, voice processing, and music mastering, including specialized optimization for ElevenLabs AI voice output.
Suno AI music generation with custom lyrics, song extension, cover/remix creation, lyrics generation, and persona management for reusable voice styles.
Facilitates the creation of DecentSampler drum kit configurations, supporting WAV file analysis and XML generation to ensure accurate sample lengths and well-structured presets.
Gemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.
A Model Context Protocol server that enables AI assistants to generate images, text, and audio through the Pollinations APIs without requiring authentication.
Enables AI assistants to control Audacity for real-time local audio editing, mastering, and transcription through over 90 specialized tools. It allows users to perform complex audio processing tasks like noise reduction and podcast cleanup using natural language commands.
Enables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.
A Model Context Protocol server that enables AI assistants like Claude to use Bouyomichan (a Japanese text-to-speech program) for voice reading with adjustable voice types, volume, speed, and pitch.
Provides access to a database of over 8,800 headphones and IEMs for equalization settings, sound signature analysis, and Harman preference scores. It enables AI assistants to search, compare, and recommend headphones based on frequency response measurements and parametric EQ profiles.
Enables interaction with MiniMax AI APIs for text-to-speech, voice cloning, video generation, image generation, and music creation through MCP clients like Claude Desktop and Cursor.
Enables Claude Desktop and Claude Code to synthesize and play speech using VOICEVOX text-to-speech engine. Supports multiple voice characters, session-based voice assignment, and queue management for audio playback.
Enables LLMs to control Ableton Live digital audio workstation through OSC (Open Sound Control) protocol. Provides comprehensive tools for managing tracks, routing, and DAW configuration through natural language commands.
Provides AI-powered audio generation and processing through the MusicGPT API, enabling music creation, voice conversion, audio manipulation, stem extraction, and audio analysis capabilities.
A lightweight server that exposes FFmpeg's video processing capabilities to AI assistants through the Model Context Protocol (MCP), supporting operations like video format conversion, audio extraction, and adding watermarks.
An MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.
Enables Claude Desktop and other MCP clients to generate images, videos, music, and audio using Fal.ai models. Supports text-to-image generation, video creation, music composition, text-to-speech, audio transcription, and image enhancement through natural language prompts.