Audio Processing
Services for manipulating, generating, and working with audio content. Includes audio synthesis, processing, playback control, and format conversion capabilities.
MCP ServersBrowse all →
AlicenseBqualityBmaintenanceEnables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.Last updated303MIT
ElevenLabs MCP Serverofficial
AlicenseAqualityCmaintenanceAn official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.Last updated241,334MIT- MIT
- AlicenseAqualityCmaintenanceAll Voice Lab MCP ServerLast updated1256MIT
- AlicenseAqualityBmaintenanceGaudio Lab Audio AI — Stem Separation, DME Separation, AI Text SyncLast updated716MIT
- AlicenseAqualityCmaintenanceGenerate images, video, and audio directly in Claude Code, Cursor, Windsurf, or any MCP-compatible AI agent. 20+ models — Flux, GPT-Image-1, Imagen 4, Grok Imagine, Seedance, ElevenLabs TTS, and more. Free models work without an API key. Paid models require a Pollinations key.Last updated8121MIT
- AlicenseAqualityCmaintenanceEnables programmatic control of Ableton Live using natural language to manage session transport, tracks, MIDI clips, and device parameters. It also integrates ElevenLabs for AI-generated audio and provides a high-performance framework for real-time performance tools.Last updated16MIT
- AlicenseBqualityCmaintenanceAn enhanced server for ElevenLabs that enables high-quality text-to-speech, voice cloning, and multi-speaker dialogue management. It features advanced conversational tools for transcript retrieval, history tracking, and emotional audio synthesis using the v3 model.Last updated29357MIT
- AlicenseAqualityDmaintenanceEnables AI agents to search, browse, and play millions of meme sounds and sound effects from myinstants.com directly through the user's speakers. It supports streaming audio for trending clips, categories, and viral soundboard buttons to enhance agent interactions with reactive audio.Last updated35310MIT
- AlicenseBqualityCmaintenanceProvides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.Last updated310MIT
- AlicenseAquality-maintenanceEnables interaction with MiniMax AI APIs for text-to-speech, voice cloning, video generation, image generation, and music creation through MCP clients like Claude Desktop and Cursor.Last updated9
- AlicenseAquality-maintenanceEnables interaction with ElevenLabs Text-to-Speech and audio processing APIs. Supports speech generation, voice cloning, audio transcription, and sound effect creation through natural language.Last updated24
- AlicenseBqualityCmaintenanceMCP to analyse local audio file.Last updated823MIT
- AlicenseAqualityBmaintenanceEnables advanced audio transcription, text-to-speech generation, and audio processing using OpenAI's Whisper and GPT-4o models with support for multiple audio formats, file management, and parallel processing.Last updated852MIT
- AlicenseBqualityCmaintenanceEnables text-only models to process images and other media formats by providing access to multimodal models from OpenAI and Dashscope (Alibaba Cloud). Supports flexible deployment options and comprehensive tooling for multimodal AI interactions.Last updated34MIT
- AlicenseBquality-maintenanceEnables execution of SuperCollider synth code through the Model Context Protocol using supercolliderjs, allowing AI assistants to generate and run audio synthesis programs.Last updated21
- AlicenseAqualityCmaintenanceA MCP server for accessing Zoom recordings and transcripts without requiring direct authentication from the end user.Last updated489Apache 2.0
- AlicenseAqualityBmaintenanceA Model Context Protocol server that enables AI assistants to generate images, text, and audio through the Pollinations APIs without requiring authentication.Last updated914441MIT
- AlicenseAqualityBmaintenanceSuno AI music generation with custom lyrics, song extension, cover/remix creation, lyrics generation, and persona management for reusable voice styles.Last updated356MIT
- AlicenseBqualityCmaintenanceProvides powerful video and audio editing capabilities through FFmpeg, enabling AI assistants to perform professional-grade operations including format conversion, trimming, overlays, transitions, and advanced audio processing.Last updated2771MIT
- AlicenseBqualityCmaintenancePlays sound effects (completion, newtype, and error sounds) in response to various situations like task completion, insights, or errors. Integrates with Claude Desktop to provide audio feedback for improved workflow efficiency and entertainment.Last updated2126MIT
- AlicenseBqualityCmaintenanceA powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.Last updated11538MIT
- AlicenseBquality-maintenanceEnables LLMs to control Ableton Live digital audio workstation through OSC (Open Sound Control) protocol. Provides comprehensive tools for managing tracks, routing, and DAW configuration through natural language commands.Last updated1377
- AlicenseBqualityBmaintenanceEnables playback control of local audio files through a virtual audio output device, supporting play, stop, and status queries with configurable root directory and path safety enforcement.Last updated1MIT
- AlicenseBqualityCmaintenanceAn MCP server that enables AI assistants to search, analyze, and retrieve information about audio samples from Freesound.org through their API.Last updated82MIT
- AlicenseAqualityAmaintenanceA Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.Last updated121,577MIT
- AlicenseBqualityCmaintenanceA Model Context Protocol server that enables AI assistants like Claude to use Bouyomichan (a Japanese text-to-speech program) for voice reading with adjustable voice types, volume, speed, and pitch.Last updated12MIT
- AlicenseBqualityCmaintenanceEnables batch audio processing and optimization using FFmpeg with preset configurations for game audio, voice processing, and music mastering, including specialized optimization for ElevenLabs AI voice output.Last updated92MIT
- AlicenseBqualityDmaintenanceAn MCP server that enables AI coding agents to control FMOD Studio for audio import, event creation, and bank building via TCP scripting.Last updated21
- AlicenseBquality-maintenanceA server that generates MP3 audio files from text using Kokoro TTS technology with optional S3 upload capabilities.Last updated176
MCP ConnectorsBrowse all →
Transcribe and summarize audio and video. Pay per job via Stripe or crypto.
Privacy-first audio intelligence: BPM, key, waveform. Audio never stored. Pay per second.
Pronunciation scoring, speech-to-text, and text-to-speech for language learning
AI audio tools for music producers — stem splitting, vocal removal, BPM & key detection, audio-to-MIDI, format conversion, trimming, video-to-audio extraction and AI song generation.
Media intelligence analysis for audio, video, and images via the Echosaw MCP server.
Turn any LLM multimodal; generate images, voices, videos, 3D models, music, and more.
Arabic-first AI creative platform for Egyptian and Arab businesses. Generate social media designs, write marketing copy in Egyptian dialect, build content calendars, produce Sora-2 videos, AI photoshoots, music tracks, and business documents — with your brand identity automatically applied. Requires a Grow or Business subscription at vizzy.space.
Detect AI-generated images, videos, and audio with identifAI's deepfake detection tools.
AI music and podcast platform for autonomous agents. SoundCloud for AI bots.
AI image, video & music generation. Flux, Veo 3.1, Suno V5. Free tier included.
AudioAlpha turns 100+ daily finance and crypto podcasts into structured intelligence — α-sentiment scores, narrative signals, asset mentions, transcripts, and market snapshots with 40+ custom metrics. Built for AI-driven research and trading workflows.
Financial podcast intelligence platform — sentiment, narrative, and asset signals from 100+ podcasts
Process video, audio, images, and documents with 86+ cloud media processing robots.
Focused MCP server for OpenAI image/audio generation (v2.0.0). Wraps endpoints via HAPI CLI.
25+ AI media generation tools — FLUX Pro, Ideogram v3, Recraft v3, Stable Diffusion XL, MiniMax video, and Kokoro TTS. Images, video, and audio from one server. $0.01/call.
125+ browser tools for PDF, Image, Video, Audio, AI, Scanner. Files never leave your device.
Generate game assets with AI: sprites, 3D models, animations, sound effects, music, and voices.
The audio intelligence layer. Search podcast transcripts, speakers, and entities across 250K+ shows.
Create AI music videos and audio-reactive visuals from songs through MCP.
35 AI tools for image/video generation, TTS, transcription, OCR & embeddings via deAPI