Audio Processing
Services for manipulating, generating, and working with audio content. Includes audio synthesis, processing, playback control, and format conversion capabilities.
MCP ServersBrowse all →
AlicenseBqualityBmaintenanceEnables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.Last updated303MIT
ElevenLabs MCP Serverofficial
AlicenseAqualityCmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,347MIT- AlicenseAqualityBmaintenanceGaudio Lab Audio AI — Stem Separation, DME Separation, AI Text SyncLast updated712MIT
- AlicenseAqualityCmaintenanceAll Voice Lab MCP ServerLast updated1256MIT
- MIT
- AlicenseBqualityCmaintenanceProvides powerful video and audio editing capabilities through FFmpeg, enabling AI assistants to perform professional-grade operations including format conversion, trimming, overlays, transitions, and advanced audio processing.Last updated2772MIT
- AlicenseBqualityCmaintenanceEnables text-only models to process images and other media formats by providing access to multimodal models from OpenAI and Dashscope (Alibaba Cloud). Supports flexible deployment options and comprehensive tooling for multimodal AI interactions.Last updated34MIT
- AlicenseAqualityCmaintenanceGenerate images, video, and audio directly in Claude Code, Cursor, Windsurf, or any MCP-compatible AI agent. 20+ models — Flux, GPT-Image-1, Imagen 4, Grok Imagine, Seedance, ElevenLabs TTS, and more. Free models work without an API key. Paid models require a Pollinations key.Last updated841MIT
- AlicenseAqualityBmaintenanceSuno AI music generation with custom lyrics, song extension, cover/remix creation, lyrics generation, and persona management for reusable voice styles.Last updated356MIT
- AlicenseBqualityCmaintenanceA powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.Last updated11538MIT
- AlicenseBqualityDmaintenanceAn MCP server that enables AI coding agents to control FMOD Studio for audio import, event creation, and bank building via TCP scripting.Last updated21
- AlicenseAqualityCmaintenanceEnables programmatic control of Ableton Live using natural language to manage session transport, tracks, MIDI clips, and device parameters. It also integrates ElevenLabs for AI-generated audio and provides a high-performance framework for real-time performance tools.Last updated16MIT
- AlicenseAqualityCmaintenanceA MCP server for accessing Zoom recordings and transcripts without requiring direct authentication from the end user.Last updated489Apache 2.0
- AlicenseAqualityDmaintenanceIntegrates with ElevenLabs text-to-speech API.Last updated6117MIT
- AlicenseBqualityCmaintenanceEnables downloading videos from platforms like YouTube and converting them to text using OpenAI Whisper and ffmpeg. It supports multiple output formats including TXT, JSON, SRT, and VTT for transcriptions.Last updated21ISC
- AlicenseAqualityBmaintenanceA Model Context Protocol server that enables AI assistants to generate images, text, and audio through the Pollinations APIs without requiring authentication.Last updated914441MIT
- AlicenseBqualityCmaintenanceMCP to analyse local audio file.Last updated823MIT
- AlicenseBqualityCmaintenanceProvides AI-powered audio generation and processing through the MusicGPT API, enabling music creation, voice conversion, audio manipulation, stem extraction, and audio analysis capabilities.Last updated2471MIT
- AlicenseBqualityDmaintenanceA server that generates MP3 audio files from text using Kokoro TTS technology with optional S3 upload capabilities.Last updated176Apache 2.0
- AlicenseBquality-maintenanceEnables execution of SuperCollider synth code through the Model Context Protocol using supercolliderjs, allowing AI assistants to generate and run audio synthesis programs.Last updated21
- AlicenseAqualityCmaintenanceGemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.Last updated9
- AlicenseBqualityBmaintenanceAgent-native media processing: video encoding, image manipulation, document conversion, audio transcription, and more via 86+ cloud Robots.Last updated771
- AlicenseBqualityBmaintenanceEnables playback control of local audio files through a virtual audio output device, supporting play, stop, and status queries with configurable root directory and path safety enforcement.Last updated1MIT
- AlicenseAquality-maintenanceEnables interaction with MiniMax AI APIs for text-to-speech, voice cloning, video generation, image generation, and music creation through MCP clients like Claude Desktop and Cursor.Last updated9
- AlicenseBqualityCmaintenanceEnables searching and downloading audio samples from Freesound using keywords, filters, and sound IDs. It provides detailed sound metadata including duration, license information, and preview URLs.Last updated221MIT
- AlicenseBqualityCmaintenanceAn MCP server that enables AI assistants to search, analyze, and retrieve information about audio samples from Freesound.org through their API.Last updated82MIT
- AlicenseAquality-maintenanceEnables interaction with ElevenLabs Text-to-Speech and audio processing APIs. Supports speech generation, voice cloning, audio transcription, and sound effect creation through natural language.Last updated24
- AlicenseBqualityCmaintenanceA Model Context Protocol server that enables AI assistants like Claude to use Bouyomichan (a Japanese text-to-speech program) for voice reading with adjustable voice types, volume, speed, and pitch.Last updated12MIT
- AlicenseAqualityBmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated852MIT
- AlicenseBqualityCmaintenanceEnables batch audio processing and optimization using FFmpeg with preset configurations for game audio, voice processing, and music mastering, including specialized optimization for ElevenLabs AI voice output.Last updated92MIT
MCP ConnectorsBrowse all →
Transcribe and summarize audio and video. Pay per job via Stripe or crypto.
Privacy-first audio intelligence: BPM, key, waveform. Audio never stored. Pay per second.
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
AI audio tools for music producers — stem splitting, vocal removal, BPM & key detection, audio-to-MIDI, format conversion, trimming, video-to-audio extraction and AI song generation.
Media intelligence analysis for audio, video, and images via the Echosaw MCP server.
Turn any LLM multimodal; generate images, voices, videos, 3D models, music, and more.
Arabic-first AI creative platform for Egyptian and Arab businesses. Generate social media designs, write marketing copy in Egyptian dialect, build content calendars, produce Sora-2 videos, AI photoshoots, music tracks, and business documents — with your brand identity automatically applied. Requires a Grow or Business subscription at vizzy.space.
Detect AI-generated images, videos, and audio with identifAI's deepfake detection tools.
AI music and podcast platform for autonomous agents. SoundCloud for AI bots.
AI image, video & music generation. Flux, Veo 3.1, Suno V5. Free tier included.
AudioAlpha turns 100+ daily finance and crypto podcasts into structured intelligence — α-sentiment scores, narrative signals, asset mentions, transcripts, and market snapshots with 40+ custom metrics. Built for AI-driven research and trading workflows.
Financial podcast intelligence platform — sentiment, narrative, and asset signals from 100+ podcasts
Process video, audio, images, and documents with 86+ cloud media processing robots.
Focused MCP server for OpenAI image/audio generation (v2.0.0). Wraps endpoints via HAPI CLI.
25+ AI media generation tools — FLUX Pro, Ideogram v3, Recraft v3, Stable Diffusion XL, MiniMax video, and Kokoro TTS. Images, video, and audio from one server. $0.01/call.
125+ browser tools for PDF, Image, Video, Audio, AI, Scanner. Files never leave your device.
Generate game assets with AI: sprites, 3D models, animations, sound effects, music, and voices.
The audio intelligence layer. Search podcast transcripts, speakers, and entities across 250K+ shows.
Create AI music videos and audio-reactive visuals from songs through MCP.
35 AI tools for image/video generation, TTS, transcription, OCR & embeddings via deAPI