An official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.
Enables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.
Enables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.
Enables playback control of local audio files through a virtual audio output device, supporting play, stop, and status queries with configurable root directory and path safety enforcement.
Gemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.
Enables execution of SuperCollider synth code through the Model Context Protocol using supercolliderjs, allowing AI assistants to generate and run audio synthesis programs.
Enables searching and downloading audio samples from Freesound using keywords, filters, and sound IDs. It provides detailed sound metadata including duration, license information, and preview URLs.
Suno AI music generation with custom lyrics, song extension, cover/remix creation, lyrics generation, and persona management for reusable voice styles.
A powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.
Enables AI agents to search, browse, and play millions of meme sounds and sound effects from myinstants.com directly through the user's speakers. It supports streaming audio for trending clips, categories, and viral soundboard buttons to enhance agent interactions with reactive audio.
Enables batch audio processing and optimization using FFmpeg with preset configurations for game audio, voice processing, and music mastering, including specialized optimization for ElevenLabs AI voice output.
Facilitates the creation of DecentSampler drum kit configurations, supporting WAV file analysis and XML generation to ensure accurate sample lengths and well-structured presets.