Audio Processing
Services for manipulating, generating, and working with audio content. Includes audio synthesis, processing, playback control, and format conversion capabilities.
MCP ServersBrowse all →
AlicenseBqualityBmaintenanceEnables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.Last updated343MIT- AlicenseAqualityBmaintenanceGaudio Lab Audio AI — Stem Separation, DME Separation, AI Text SyncLast updated740MIT
- AlicenseAqualityBmaintenanceRemove vocals, extract instrumentals, and split any song into up to six stems — directly from Claude Desktop, Cursor, or any MCP client. Supports local audio files, YouTube URLs, and SoundCloud trackLast updated11570MIT
- MIT

ElevenLabs MCP Serverofficial
AlicenseAqualityBmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,371MIT- AlicenseAqualityCmaintenanceAll Voice Lab MCP ServerLast updated1256MIT
- AlicenseBqualityFmaintenanceAn MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.Last updated22710MIT
- AlicenseAqualityCmaintenanceTranscribes videos from 1000+ platforms (YouTube, TikTok, Vimeo, etc.) and local video files using OpenAI's Whisper model, with support for 90+ languages and multiple output formats.Last updated4281MIT
- AlicenseBqualityCmaintenanceProvides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.Last updated310MIT
- AlicenseAqualityDmaintenanceEnables Claude Desktop and other MCP clients to generate images, videos, music, and audio using Fal.ai models. Supports text-to-image generation, video creation, music composition, text-to-speech, audio transcription, and image enhancement through natural language prompts.Last updated1846MIT
- AlicenseBqualityCmaintenanceA powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.Last updated11288MIT
- AlicenseAqualityCmaintenanceA lightweight server that exposes FFmpeg's video processing capabilities to AI assistants through the Model Context Protocol (MCP), supporting operations like video format conversion, audio extraction, and adding watermarks.Last updated82625MIT
- AlicenseAqualityDmaintenanceIntegrates with ElevenLabs text-to-speech API.Last updated6117MIT
- AlicenseAqualityAmaintenanceA Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.Last updated1248Unlicense - libtelnet variant
- AlicenseBqualityBmaintenanceEnables playback control of local audio files through a virtual audio output device, supporting play, stop, and status queries with configurable root directory and path safety enforcement.Last updated1MIT
- AlicenseAquality-maintenanceEnables interaction with MiniMax AI APIs for text-to-speech, voice cloning, video generation, image generation, and music creation through MCP clients like Claude Desktop and Cursor.Last updated9
- AlicenseBqualityCmaintenanceA Model Context Protocol server that enables AI assistants like Claude to use Bouyomichan (a Japanese text-to-speech program) for voice reading with adjustable voice types, volume, speed, and pitch.Last updated12MIT
- AlicenseBqualityCmaintenanceAn MCP server that enables AI assistants to search, analyze, and retrieve information about audio samples from Freesound.org through their API.Last updated82MIT
- AlicenseBquality-maintenanceEnables execution of SuperCollider synth code through the Model Context Protocol using supercolliderjs, allowing AI assistants to generate and run audio synthesis programs.Last updated21
- AlicenseAqualityCmaintenanceProvides access to a database of over 8,800 headphones and IEMs for equalization settings, sound signature analysis, and Harman preference scores. It enables AI assistants to search, compare, and recommend headphones based on frequency response measurements and parametric EQ profiles.Last updated73MIT
- AlicenseAqualityCmaintenanceA MCP server for accessing Zoom recordings and transcripts without requiring direct authentication from the end user.Last updated4179Apache 2.0
- AlicenseAqualityBmaintenanceEnables AI-powered music generation and live coding by providing direct control over Strudel.cc through browser automation. Supports pattern creation, audio analysis, and pattern storage for TidalCycles/Strudel music patterns.Last updated27312202AGPL 3.0
- AlicenseBqualityCmaintenanceEnables batch audio processing and optimization using FFmpeg with preset configurations for game audio, voice processing, and music mastering, including specialized optimization for ElevenLabs AI voice output.Last updated92MIT
- AlicenseBqualityDmaintenanceA server that generates MP3 audio files from text using Kokoro TTS technology with optional S3 upload capabilities.Last updated176Apache 2.0
- AlicenseAqualityBmaintenanceGemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.Last updated9MIT
- AlicenseBqualityCmaintenanceMCP to analyse local audio file.Last updated824MIT
- AlicenseAqualityCmaintenanceEnables Claude Desktop and Claude Code to synthesize and play speech using VOICEVOX text-to-speech engine. Supports multiple voice characters, session-based voice assignment, and queue management for audio playback.Last updated721MIT
- AlicenseBqualityCmaintenanceProvides AI-powered audio generation and processing through the MusicGPT API, enabling music creation, voice conversion, audio manipulation, stem extraction, and audio analysis capabilities.Last updated2491MIT
- AlicenseAqualityBmaintenanceAI music production with text-to-music generation, audio extension, remixing, and cover creation via AceDataCloud API.Last updated18MIT
- AlicenseAqualityCmaintenanceProvides accurate meeting transcription with speaker diarization and multilingual support, allowing users to submit audio URLs, poll transcription status, get transcripts, and summarize via MCP tools in their IDE.Last updated81MIT
MCP ConnectorsBrowse all →
Download YouTube videos as MP3/M4A/MP4 from any MCP-compatible AI assistant. Free 3/day, $3.99/mo.
Privacy-first audio intelligence: BPM, key, waveform. Audio never stored. Pay per second.
Transcribe, summarize, find and cut clips, publish to YouTube. Per-job pricing, no account.
AI audio tools for music producers — stem splitting, vocal removal, BPM & key detection, audio-to-MIDI, format conversion, trimming, video-to-audio extraction and AI song generation.
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
Media intelligence analysis for audio, video, and images via the Echosaw MCP server.
Turn any LLM multimodal; generate images, voices, videos, 3D models, music, and more.
Arabic-first AI creative platform for Egyptian and Arab businesses. Generate social media designs, write marketing copy in Egyptian dialect, build content calendars, produce Sora-2 videos, AI photoshoots, music tracks, and business documents — with your brand identity automatically applied. Requires a Grow or Business subscription at vizzy.space.
Detect AI-generated images, videos, and audio with identifAI's deepfake detection tools.
AI image, video & music generation. Flux, Veo 3.1, Suno V5. Free tier included.
AI music and podcast platform for autonomous agents. SoundCloud for AI bots.
AudioAlpha turns 100+ daily finance and crypto podcasts into structured intelligence — α-sentiment scores, narrative signals, asset mentions, transcripts, and market snapshots with 40+ custom metrics. Built for AI-driven research and trading workflows.
Process video, audio, images, and documents with 86+ cloud media processing robots.
Financial podcast intelligence platform — sentiment, narrative, and asset signals from 100+ podcasts
Focused MCP server for OpenAI image/audio generation (v2.0.0). Wraps endpoints via HAPI CLI.
25+ AI media generation tools — FLUX Pro, Ideogram v3, Recraft v3, Stable Diffusion XL, MiniMax video, and Kokoro TTS. Images, video, and audio from one server. $0.01/call.
125+ browser tools for PDF, Image, Video, Audio, AI, Scanner. Files never leave your device.
LibriVox public-domain audiobooks (~17000 titles in dozens of languages)
Generate game assets with AI: sprites, 3D models, animations, sound effects, music, and voices.
The audio intelligence layer. Search podcast transcripts, speakers, and entities across 250K+ shows.