Audio Processing
Services for manipulating, generating, and working with audio content. Includes audio synthesis, processing, playback control, and format conversion capabilities.
MCP ServersBrowse all →
- AlicenseAqualityAmaintenanceRemove vocals, extract instrumentals, and split any song into up to six stems — directly from Claude Desktop, Cursor, or any MCP client. Supports local audio files, YouTube URLs, and SoundCloud trackLast updated11570MIT

ElevenLabs MCP Serverofficial
AlicenseAqualityBmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,396MIT- MIT
- AlicenseBqualityCmaintenanceAn MCP server that enables AI coding agents to control FMOD Studio for audio import, event creation, and bank building via TCP scripting.Last updated2211MIT
- AlicenseAqualityAmaintenanceGaudio Lab Audio AI — Stem Separation, DME Separation, AI Text SyncLast updated726MIT

MMAudio MCPofficial
AlicenseBqualityCmaintenanceEnables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.Last updated373MIT- AlicenseAqualityBmaintenanceOfficial MCP server for Rendobar. Lets AI agents run serverless media processing and upload local files.Last updated66732MIT

RunAPI MCP Serverofficial
AlicenseAqualityBmaintenanceConnects MCP-compatible coding tools to RunAPI for AI image, video, music, text-to-speech, and LLM generation using 130+ models from leading providers.Last updated801Apache 2.0
ZeroTrue MCP Serverofficial
AlicenseAqualityAmaintenanceEnables detection of AI-generated content in text, images, video, and audio via the ZeroTrue API, supporting multiple analysis tools and MCP-compatible clients.Last updated620MIT- AlicenseAqualityDmaintenanceAll Voice Lab MCP ServerLast updated1256MIT

mocoVoice MCP Serverofficial
AlicenseAqualityBmaintenanceEnables transcription of audio and video files using mocoVoice API, allowing users to start transcription jobs and retrieve results directly from Claude Desktop.Last updated63MIT- AlicenseAqualityFmaintenanceGenerate images, video, and audio directly in Claude Code, Cursor, Windsurf, or any MCP-compatible AI agent. 20+ models — Flux, GPT-Image-1, Imagen 4, Grok Imagine, Seedance, ElevenLabs TTS, and more. Free models work without an API key. Paid models require a Pollinations key.Last updated871MIT
- AlicenseAqualityAmaintenanceEnables AI assistants to control Audacity for real-time local audio editing, mastering, and transcription through over 90 specialized tools. It allows users to perform complex audio processing tasks like noise reduction and podcast cleanup using natural language commands.Last updated10042Apache 2.0
- AlicenseBqualityCmaintenanceMCP server for downloading and processing audio from SoundCloud and other sources, with tools for URL probing, download enqueueing, and job status tracking.Last updated34MIT
- AlicenseAqualityDmaintenanceTranscribes videos from 1000+ platforms (YouTube, TikTok, Vimeo, etc.) and local video files using OpenAI's Whisper model, with support for 90+ languages and multiple output formats.Last updated4271MIT
- AlicenseAqualityDmaintenanceEnables AI-powered music composition and synthesis by generating Pure Data patches, VCV Rack modules, and MIDI controller mappings through natural language.Last updated1052MIT
- AlicenseBqualityCmaintenanceA powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.Last updated114511MIT
- AlicenseAqualityBmaintenanceProvides atomic music-theory and MIDI tools for composing, enabling LLMs to chain deterministic steps like scale/chord lookups, degree resolution, rhythm generation, and MIDI rendering.Last updated13MIT
- AlicenseAqualityBmaintenanceAn MCP server that enables transcribing local audio files and Telegram voice messages using OpenAI's Whisper via local inference or cloud API. It supports multiple audio formats, automatic language detection, and optional word-level timestamps for AI-powered audio analysis.Last updated5MIT
- AlicenseBqualityDmaintenanceEnables AI video generation, replica management, conversational AI, lipsync, and speech synthesis through the Tavus API. Provides 29 tools across Phoenix replicas, video generation, personas, lipsync, and text-to-speech capabilities.Last updated292MIT
- AlicenseAqualityDmaintenanceAnalyzes speech audio to detect emotions, urgency, and sarcasm using prosodic features.Last updated51MIT
- AlicenseBqualityBmaintenanceConnects Ableton Live to AI through MCP, enabling prompt-assisted music production with extended tools and a 33-personality style system for generating parts in various artist styles.Last updated622MIT
- AlicenseBqualityCmaintenanceAn MCP server that integrates with fal.ai to provide AI agents with tools for image generation, text processing, audio synthesis, and model management via a unified interface.Last updated89MIT
- AlicenseAqualityBmaintenanceSuno AI music generation with custom lyrics, song extension, cover/remix creation, lyrics generation, and persona management for reusable voice styles.Last updated3514MIT
- AlicenseAqualityDmaintenanceEnables comprehensive audio file analysis and metadata extraction with specialized game audio development features, supporting batch processing of multiple formats and providing platform-specific optimization recommendations.Last updated3MIT
- AlicenseAquality-maintenanceEnables interaction with MiniMax AI APIs for text-to-speech, voice cloning, video generation, image generation, and music creation through MCP clients like Claude Desktop and Cursor.Last updated9
- AlicenseAqualityDmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated854MIT
- AlicenseBquality-maintenanceEnables execution of SuperCollider synth code through the Model Context Protocol using supercolliderjs, allowing AI assistants to generate and run audio synthesis programs.Last updated21
- AlicenseAqualityDmaintenanceProvides access to a database of over 8,800 headphones and IEMs for equalization settings, sound signature analysis, and Harman preference scores. It enables AI assistants to search, compare, and recommend headphones based on frequency response measurements and parametric EQ profiles.Last updated73MIT
- AlicenseAqualityBmaintenanceGemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.Last updated9MIT
MCP ConnectorsBrowse all →
Generate Suno AI music (v4.5/v5/v5.5) from any MCP client. Async; billed only on success.
Find & cut horizontal and vertical video clips (Shorts/Reels), transcribe & summarize. Pay per job.
Deepfake detection, media intelligence, and invisible watermarking for audio, image, and video via the Resemble AI API, plus docs tools. Remote MCP server (Streamable HTTP) — also published in the official MCP registry as io.github.resemble-ai/resemble-mcp.
Process video, audio, images, and documents with 86+ cloud media processing robots.
Audio features (BPM, key, mood, genre) for real tracks - a Spotify audio-features replacement.
Privacy-first audio intelligence: BPM, key, waveform. Audio never stored. Pay per second.
Download YouTube videos as MP3/M4A/MP4 from any MCP-compatible AI assistant. Free 3/day, $3.99/mo.
LibriVox public-domain audiobooks (~17000 titles in dozens of languages)
AI audio tools for music producers — stem splitting, vocal removal, BPM & key detection, audio-to-MIDI, format conversion, trimming, video-to-audio extraction and AI song generation.
Arabic-first AI creative platform for Egyptian and Arab businesses. Generate social media designs, write marketing copy in Egyptian dialect, build content calendars, produce Sora-2 videos, AI photoshoots, music tracks, and business documents — with your brand identity automatically applied. Requires a Grow or Business subscription at vizzy.space.
Detect AI-generated images, videos, and audio with identifAI's deepfake detection tools.
Financial podcast intelligence platform — sentiment, narrative, and asset signals from 100+ podcasts
AudioAlpha turns 100+ daily finance and crypto podcasts into structured intelligence — α-sentiment scores, narrative signals, asset mentions, transcripts, and market snapshots with 40+ custom metrics. Built for AI-driven research and trading workflows.
125+ browser tools for PDF, Image, Video, Audio, AI, Scanner. Files never leave your device.
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
AI music and podcast platform for autonomous agents. SoundCloud for AI bots.
25+ AI media generation tools — FLUX Pro, Ideogram v3, Recraft v3, Stable Diffusion XL, MiniMax video, and Kokoro TTS. Images, video, and audio from one server. $0.01/call.
AI image, video & music generation. Flux, Veo 3.1, Suno V5. Free tier included.
Focused MCP server for OpenAI image/audio generation (v2.0.0). Wraps endpoints via HAPI CLI.
Generate game assets with AI: sprites, 3D models, animations, sound effects, music, and voices.