Skip to main content
Glama

Open-Source MCP servers

Production-ready MCP servers that extend AI capabilities through file access, database connections, API integrations, and other contextual services.

6,854 servers. Last updated -

Matching MCP tools:

Matching MCP servers:

  • A
    security
    A
    license
    A
    quality
    A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
    Last updated -
    1
    2
    JavaScript
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A portable, Dockerized Python tool that implements Model Context Protocol for audio transcription using Whisper models, featuring both CLI and web UI interfaces for converting audio files to JSON transcriptions.
    Last updated -
    Python
    MIT License
    • Linux
  • -
    security
    A
    license
    -
    quality
    A voice-to-text transcription service that converts audio files to transcripts using SiliconFlow, supporting both multipart/form-data and base64 formats.
    Last updated -
    1
    Python
    Apache 2.0
  • A
    security
    A
    license
    A
    quality
    A document conversion server that transforms various file formats (PDFs, documents, images, audio, web content) to Markdown with improved multilingual and UTF-8 support.
    Last updated -
    10
    4
    TypeScript
    MIT License
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    Converts various file types and web content to Markdown format. It provides a set of tools to transform PDFs, images, audio files, web pages, and more into easily readable and shareable Markdown text.
    Last updated -
    10
    2
    1,611
    TypeScript
    MIT License
    • Apple

Interested in MCP?

Join the MCP community for support and updates.

RedditDiscord
  • -
    security
    A
    license
    -
    quality
    Enables Claude and other AI assistants to interact with your computer's audio system, allowing for recording from microphones and playing audio through speakers.
    Last updated -
    2
    Python
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    Provides powerful video and audio editing capabilities through FFmpeg, enabling AI assistants to perform professional-grade operations including format conversion, trimming, overlays, transitions, and advanced audio processing.
    Last updated -
    4
    Python
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    Provides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.
    Last updated -
    3
    6
    TypeScript
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A server that allows Claude to control audio playback on your computer, supporting MP3, WAV, and OGG files with features like play, list, and stop commands.
    Last updated -
    1
    Python
    MIT License
    • Apple
    • Linux
  • A
    security
    A
    license
    A
    quality
    An official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.
    Last updated -
    19
    815
    Python
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables AI assistants to generate images, text, and audio through the Pollinations APIs without requiring authentication.
    Last updated -
    7
    48
    19
    JavaScript
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads.
    Last updated -
    2
    922
    292
    TypeScript
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables real-time interaction with Ableton Live, allowing AI assistants to control song creation, track management, clip operations, and audio recording workflows.
    Last updated -
    23
    153
    24
    TypeScript
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A service that extracts and transcribes audio content from videos across 1000+ streaming websites including YouTube, Bilibili, TikTok, and Twitter, supporting multiple transcription providers like Deepgram, Gladia, Speechmatics, and AssemblyAI.
    Last updated -
    5
    Python
    MIT License
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    An MCP server that downloads videos/extracts audio from various platforms like YouTube, Bilibili, and TikTok, then transcribes them to text using OpenAI's Whisper model.
    Last updated -
    2
    Python
    • Linux
    • Apple
  • A
    security
    F
    license
    A
    quality
    A Model Context Protocol server that allows AI assistants to generate music through the Suno API, supporting custom lyrics and style inputs or inspiration-based creation.
    Last updated -
    1
    4
    JavaScript
  • A
    security
    A
    license
    A
    quality
    A Node.js implementation of the Kagi Model Context Protocol server that enables Claude AI to search the web and summarize documents, videos, and audio using Kagi's APIs.
    Last updated -
    2
    JavaScript
    MIT License
    • Apple
  • A
    security
    F
    license
    A
    quality
    An MCP server designed to work with FFmpeg for media processing tasks, offering enhanced performance and secure communication for handling media processing requests.
    Last updated -
    2
    27
    11
    TypeScript
  • A
    security
    F
    license
    A
    quality
    A Python-based server that provides access to Whissle API endpoints for speech-to-text, diarization, translation, and text summarization.
    Last updated -
    5
    1
    Python
    • Linux
    • Apple
  • A
    security
    A
    license
    A
    quality
    Provides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.
    Last updated -
    4
    8
    TypeScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    Model Context Protocol server that enables interaction with Mobvoi's Text to Speech and Voice Clone APIs, allowing MCP clients like Cursor, Claude Desktop, and Cline to generate speech and clone voices.
    Last updated -
    4
    1
    Python
    MIT License
    • Apple
    • Linux
  • -
    security
    A
    license
    -
    quality
    An MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.
    Last updated -
    546
    4
    TypeScript
    MIT License
  • -
    security
    F
    license
    -
    quality
    Simple MCP server that returns the transcription of a Youtube video using url and desired language.
    Last updated -
    Python
  • -
    security
    A
    license
    -
    quality
    An MCP server that enables LLMs to generate spoken audio from text using OpenAI's Text-to-Speech API, supporting various voices, models, and audio formats.
    Last updated -
    4
    1
    JavaScript
    MIT License
  • -
    security
    F
    license
    -
    quality
    Converts various file types (documents, images, audio, web content) to markdown format without requiring Docker, supporting PDF, Word, Excel, PowerPoint, images, audio files, web URLs, and more.
    Last updated -
    98
    1
    JavaScript
    • Apple
    • Linux
  • -
    security
    -
    license
    -
    quality
    Official Model Context Protocol server that enables interaction with powerful Speech-to-Text and Audio Intelligence APIs, allowing clients like Claude Desktop to transcribe audio, analyze speech, translate content, and more.
    Last updated -
    Python
    MIT License
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables AI agents to join and interact with online meetings (Zoom and Google Meet), capturing transcripts and recordings to generate meeting summaries.
    Last updated -
    3
    3
    TypeScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.
    Last updated -
    1
    15
    4
    JavaScript
    The Unlicense
    • Apple
    • Linux
  • -
    security
    -
    license
    -
    quality
    Official ElevenLabs Model Context Protocol server that enables AI assistants like Claude to interact with Text to Speech and audio processing APIs, allowing them to generate speech, clone voices, transcribe audio, and create soundscapes.
    Last updated -
    Python
    MIT License