MCP Servers for Audio Processing

Services for manipulating, generating, and working with audio content. Includes audio synthesis, processing, playback control, and format conversion capabilities.

View all MCP Servers

  • A
    security
    A
    license
    A
    quality
    An official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.
    Last updated -
    19
    633
    Python
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A powerful MCP tool for parsing and manipulating MIDI files that allows users to read, analyze, and modify MIDI files through natural language commands, supporting operations like reading file information, modifying tracks, adding notes, and setting tempo.
    Last updated -
    11
    26
    1
    JavaScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    Facilitates the creation of DecentSampler drum kit configurations, supporting WAV file analysis and XML generation to ensure accurate sample lengths and well-structured presets.
    Last updated -
    5
    93
    1
    TypeScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables real-time interaction with Ableton Live, allowing AI assistants to control song creation, track management, clip operations, and audio recording workflows.
    Last updated -
    23
    0
    9
    TypeScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    Enables text-to-speech functionality on macOS using the say command, offering extensive control over speech parameters like voice, rate, volume, and pitch for a customizable auditory experience.
    Last updated -
    2
    7
    11
    JavaScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    Provides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.
    Last updated -
    3
    6
    TypeScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
    Last updated -
    1
    2
    JavaScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables AI assistants like Claude to use Bouyomichan (a Japanese text-to-speech program) for voice reading with adjustable voice types, volume, speed, and pitch.
    Last updated -
    1
    1
    JavaScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    MCP server for Synthesizer V AI Vocal Studio, which allows LLMs to create/edit vocal tracks e.g. adding lyrics to the melody.
    Last updated -
    6
    Apache 2.0
    • Apple
  • A
    security
    A
    license
    A
    quality
    A MCP server for accessing Zoom recordings and transcripts without requiring direct authentication from the end user.
    Last updated -
    4
    2
    Python
    Apache 2.0
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables AI assistants to generate images, text, and audio through the Pollinations APIs without requiring authentication.
    Last updated -
    7
    325
    4
    JavaScript
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that enables fast and free lipsync video creation for a wide range of digital avatars, supporting both audio and text inputs to generate synchronized lip movements.
    Last updated -
    2
    Python
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.
    Last updated -
    TypeScript
    MIT License
  • -
    security
    A
    license
    -
    quality
    Connects Ableton Live to Claude AI through the Model Context Protocol, enabling AI-assisted music production by allowing Claude to directly interact with and control Ableton Live sessions.
    Last updated -
    881
    Python
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that allows AI assistants like Claude and Cursor to create music and control Sonic Pi programmatically through OSC messages.
    Last updated -
    JavaScript
    MIT License
  • -
    security
    -
    license
    -
    quality
    Official Model Context Protocol server that enables interaction with powerful Speech-to-Text and Audio Intelligence APIs, allowing clients like Claude Desktop to transcribe audio, analyze speech, translate content, and more.
    Last updated -
    Python
    MIT License
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol implementation that plays sound effects (completion, error, notification) for Cursor AI and other MCP-compatible environments, providing audio feedback for a more interactive coding experience.
    Last updated -
    Python
    MIT License
    • Apple
    • Linux
  • -
    security
    A
    license
    -
    quality
    A server that allows Claude to control audio playback on your computer, supporting MP3, WAV, and OGG files with features like play, list, and stop commands.
    Last updated -
    1
    Python
    MIT License
    • Apple
    • Linux
  • -
    security
    A
    license
    -
    quality
    Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.
    Last updated -
    Python
    MIT License
  • -
    security
    A
    license
    -
    quality
    A server that generates MP3 audio files from text using Kokoro TTS technology with optional S3 upload capabilities.
    Last updated -
    Python
    Apache 2.0
    • Apple
  • -
    security
    A
    license
    -
    quality
    A service that extracts and transcribes audio content from videos across 1000+ streaming websites including YouTube, Bilibili, TikTok, and Twitter, supporting multiple transcription providers like Deepgram, Gladia, Speechmatics, and AssemblyAI.
    Last updated -
    5
    Python
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that enables AI assistants like Claude to generate lyrics, songs, and background music through Mureka's APIs.
    Last updated -
    4
    Python
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads.
    Last updated -
    2
    188
    241
    TypeScript
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    Enables recording audio from a microphone and transcribing it using OpenAI's Whisper model. Works as both a standalone MCP server and a Goose AI agent extension.
    Last updated -
    4
    Python
    MIT License
  • -
    security
    A
    license
    -
    quality
    Enables Claude and other AI assistants to interact with your computer's audio system, allowing for recording from microphones and playing audio through speakers.
    Last updated -
    1
    Python
    MIT License
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A Python-based server that enables accessing and analyzing bird detection data through the Model Context Protocol, offering features like filtering detections, accessing audio recordings, and generating reports.
    Last updated -
    3
    Python
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model, offering multiple voice options and customizable speech parameters.
    Last updated -
    239
    JavaScript
    • Apple
    • Linux
  • -
    security
    F
    license
    -
    quality
    An MCP server that plays notification sounds when AI coding assistants like Windsurf or Cursor require user attention, such as when coding is complete or when user approval is needed.
    Last updated -
    124
    1
    TypeScript
    • Apple
  • -
    security
    F
    license
    -
    quality
    A MIDI composition system that enables AI assistants to create music through FluidSynth, with capabilities for playing notes, creating melodies, managing tracks, and exporting audio.
    Last updated -
    Python
  • -
    security
    -
    license
    -
    quality
    A Model Context Protocol server that provides a convenient interface for creating lipsynced videos by matching digital avatar videos with audio inputs.
    Last updated -
    1
    Python
  • -
    security
    -
    license
    -
    quality
    A lightweight server that exposes FFmpeg's video processing capabilities to AI assistants through the Model Context Protocol (MCP), supporting operations like video format conversion, audio extraction, and adding watermarks.
    Last updated -
    9
    TypeScript
    MIT License
  • -
    security
    F
    license
    -
    quality
    Provides audio feedback by playing sound effects when Cursor AI completes code generation, creating a more interactive coding experience.
    Last updated -
    15
    TypeScript
  • -
    security
    F
    license
    -
    quality
    Plays sound effects when Cursor AI completes code generation, providing audio feedback for a more interactive coding experience.
    Last updated -
    TypeScript
  • -
    security
    F
    license
    -
    quality
    A Node.js server that enables video manipulation through natural language requests, including resizing videos to different resolutions (360p to 1080p) and extracting audio in various formats (MP3, AAC, WAV, OGG).
    Last updated -
    34
    2
    TypeScript
    • Apple
    • Linux
  • -
    security
    F
    license
    -
    quality
    Enables seamless integration with Typecast API through the Model Context Protocol, allowing clients to manage voices, convert text to speech, and play audio in a standardized way.
    Last updated -
    1
    Python
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that enables AI agents to create fully mixed and mastered tracks in REAPER DAW, supporting project management, MIDI composition, audio recording, and mixing automation.
    Last updated -
    37
    Python
    • Apple
  • -
    security
    F
    license
    -
    quality
    An MCP server that connects Claude to FL Studio, allowing the AI to compose music, control instruments, and live record melodies, chords, and drums to the piano roll.
    Last updated -
    8
    Python
    • Apple
  • -
    security
    -
    license
    -
    quality
    Official ElevenLabs Model Context Protocol server that enables AI assistants like Claude to interact with Text to Speech and audio processing APIs, allowing them to generate speech, clone voices, transcribe audio, and create soundscapes.
    Last updated -
    Python
    MIT License
  • -
    security
    F
    license
    -
    quality
    A FastMCP server that creates a virtual MIDI output port, allowing LLMs to generate and send MIDI data to any software that accepts MIDI input.
    Last updated -
    1
    Python