Audio Processing
Services for manipulating, generating, and working with audio content. Includes audio synthesis, processing, playback control, and format conversion capabilities.
MCP ServersBrowse all →
- MIT
- AlicenseAqualityBmaintenanceGaudio Lab Audio AI — Stem Separation, DME Separation, AI Text SyncLast updated744MIT

ElevenLabs MCP Serverofficial
AlicenseAqualityBmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,361MIT
MMAudio MCPofficial
AlicenseBqualityBmaintenanceEnables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.Last updated323MIT- AlicenseAqualityCmaintenanceAll Voice Lab MCP ServerLast updated1256MIT
- AlicenseBqualityCmaintenanceProvides powerful video and audio editing capabilities through FFmpeg, enabling AI assistants to perform professional-grade operations including format conversion, trimming, overlays, transitions, and advanced audio processing.Last updated2772MIT
- AlicenseBqualityCmaintenanceEnables text-only models to process images and other media formats by providing access to multimodal models from OpenAI and Dashscope (Alibaba Cloud). Supports flexible deployment options and comprehensive tooling for multimodal AI interactions.Last updated34MIT
- AlicenseAqualityDmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated854MIT
- AlicenseBqualityCmaintenanceMCP to analyse local audio file.Last updated824MIT
- AlicenseAqualityBmaintenanceGemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.Last updated9MIT
- AlicenseBqualityCmaintenanceProvides AI-powered audio generation and processing through the MusicGPT API, enabling music creation, voice conversion, audio manipulation, stem extraction, and audio analysis capabilities.Last updated24151MIT
- AlicenseBqualityCmaintenanceProvides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.Last updated310MIT
- AlicenseAquality-maintenanceEnables interaction with MiniMax AI APIs for text-to-speech, voice cloning, video generation, image generation, and music creation through MCP clients like Claude Desktop and Cursor.Last updated9
- AlicenseAqualityCmaintenanceProvides access to a database of over 8,800 headphones and IEMs for equalization settings, sound signature analysis, and Harman preference scores. It enables AI assistants to search, compare, and recommend headphones based on frequency response measurements and parametric EQ profiles.Last updated73MIT
- AlicenseAqualityCmaintenanceFacilitates the creation of DecentSampler drum kit configurations, supporting WAV file analysis and XML generation to ensure accurate sample lengths and well-structured presets.Last updated527MIT
- AlicenseBquality-maintenanceEnables execution of SuperCollider synth code through the Model Context Protocol using supercolliderjs, allowing AI assistants to generate and run audio synthesis programs.Last updated21
- AlicenseBqualityCmaintenanceEnables batch audio processing and optimization using FFmpeg with preset configurations for game audio, voice processing, and music mastering, including specialized optimization for ElevenLabs AI voice output.Last updated92MIT
- AlicenseAqualityCmaintenanceEnables programmatic control of Ableton Live using natural language to manage session transport, tracks, MIDI clips, and device parameters. It also integrates ElevenLabs for AI-generated audio and provides a high-performance framework for real-time performance tools.Last updated16MIT
- AlicenseAqualityDmaintenanceEnables AI agents to search, browse, and play millions of meme sounds and sound effects from myinstants.com directly through the user's speakers. It supports streaming audio for trending clips, categories, and viral soundboard buttons to enhance agent interactions with reactive audio.Last updated35610MIT
- AlicenseBqualityDmaintenanceA server that generates MP3 audio files from text using Kokoro TTS technology with optional S3 upload capabilities.Last updated176Apache 2.0
- AlicenseAqualityDmaintenanceIntegrates with ElevenLabs text-to-speech API.Last updated6117MIT
- AlicenseAquality-maintenanceEnables interaction with ElevenLabs Text-to-Speech and audio processing APIs. Supports speech generation, voice cloning, audio transcription, and sound effect creation through natural language.Last updated24
- AlicenseBqualityCmaintenanceA powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.Last updated11288MIT
- AlicenseBqualityFmaintenanceAn MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.Last updated22710MIT
- AlicenseBqualityCmaintenanceAn MCP server that enables AI assistants to search, analyze, and retrieve information about audio samples from Freesound.org through their API.Last updated82MIT
- AlicenseBqualityBmaintenanceEnables playback control of local audio files through a virtual audio output device, supporting play, stop, and status queries with configurable root directory and path safety enforcement.Last updated1MIT
- AlicenseAqualityDmaintenanceEnables Claude Desktop and other MCP clients to generate images, videos, music, and audio using Fal.ai models. Supports text-to-image generation, video creation, music composition, text-to-speech, audio transcription, and image enhancement through natural language prompts.Last updated1844MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that enables AI assistants like Claude to use Bouyomichan (a Japanese text-to-speech program) for voice reading with adjustable voice types, volume, speed, and pitch.Last updated12MIT
- AlicenseAqualityCmaintenanceA MCP server for accessing Zoom recordings and transcripts without requiring direct authentication from the end user.Last updated4179Apache 2.0
MCP ConnectorsBrowse all →
Download YouTube videos as MP3/M4A/MP4 from any MCP-compatible AI assistant. Free 3/day, $3.99/mo.
Privacy-first audio intelligence: BPM, key, waveform. Audio never stored. Pay per second.
Transcribe, summarize, find and cut clips, publish to YouTube. Per-job pricing, no account.
AI audio tools for music producers — stem splitting, vocal removal, BPM & key detection, audio-to-MIDI, format conversion, trimming, video-to-audio extraction and AI song generation.
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
Media intelligence analysis for audio, video, and images via the Echosaw MCP server.
Turn any LLM multimodal; generate images, voices, videos, 3D models, music, and more.
Arabic-first AI creative platform for Egyptian and Arab businesses. Generate social media designs, write marketing copy in Egyptian dialect, build content calendars, produce Sora-2 videos, AI photoshoots, music tracks, and business documents — with your brand identity automatically applied. Requires a Grow or Business subscription at vizzy.space.
Detect AI-generated images, videos, and audio with identifAI's deepfake detection tools.
AI music and podcast platform for autonomous agents. SoundCloud for AI bots.
AI image, video & music generation. Flux, Veo 3.1, Suno V5. Free tier included.
AudioAlpha turns 100+ daily finance and crypto podcasts into structured intelligence — α-sentiment scores, narrative signals, asset mentions, transcripts, and market snapshots with 40+ custom metrics. Built for AI-driven research and trading workflows.
Process video, audio, images, and documents with 86+ cloud media processing robots.
Financial podcast intelligence platform — sentiment, narrative, and asset signals from 100+ podcasts
Focused MCP server for OpenAI image/audio generation (v2.0.0). Wraps endpoints via HAPI CLI.
25+ AI media generation tools — FLUX Pro, Ideogram v3, Recraft v3, Stable Diffusion XL, MiniMax video, and Kokoro TTS. Images, video, and audio from one server. $0.01/call.
125+ browser tools for PDF, Image, Video, Audio, AI, Scanner. Files never leave your device.
LibriVox public-domain audiobooks (~17000 titles in dozens of languages)
Generate game assets with AI: sprites, 3D models, animations, sound effects, music, and voices.
The audio intelligence layer. Search podcast transcripts, speakers, and entities across 250K+ shows.