Integrations
Enables containerized deployment of the transcription service, making it portable and providing a consistent runtime environment.
Provides audio file processing capabilities, allowing the transcription service to handle various audio formats like .wav, .mp3, .ogg, and .m4a.
Integrates with OpenAI's Whisper models to provide high-quality, multi-language audio transcription with options for different model sizes.
MCP Audio Transcriber
A portable, Dockerized Python tool that implements a Model Context Protocol (MCP) for audio transcription using OpenAI's Whisper models—and even ships with a Streamlit-powered web UI so you can upload an audio file and download the transcription as JSON.
🚀 Features
- Modular MCP interface (
mcp.py
) that defines a standardModelContextProtocol
. - Whisper-based implementation (
WhisperMCP
) for high-quality, multi-language transcription. - Command-line interface (
app.py
) for batch or ad-hoc transcription:Copy - Docker support for a consistent runtime:Copy
- Streamlit web app (
streamlit_app.py
) letting end users:- Upload any common audio file (.wav, .mp3, .ogg, .m4a)
- Choose a Whisper model size
- Preview the transcription live
- Download the JSON result with one click
📦 Prerequisites
- Python 3.10+
- ffmpeg installed & on your PATH
- (Optional) Docker Engine / Docker Desktop
- (Optional) Streamlit
🔧 Installation
- Clone the repoCopy
- Python dependencies & FFmpegCopy
- (Optional) Docker
- Install Docker Desktop
- Enable WSL integration if using WSL2.
- (Optional) StreamlitCopy
🎯 Usage
1. CLI Transcription
<input_audio>
: path to your audio file<output_json>
: path where the JSON result will be saved--model
: choose Whisper model size (default: base)
Example:
2. Docker
Build the image:
Run it (mounting your data/ folder):
Then inspect:
3. Streamlit Web UI
Launch the app:
- Open http://localhost:8501 in your browser
- Upload an audio file
- Select the Whisper model size
- Click Transcribe
- Preview & download the resulting JSON
📁 Project Structure
This server cannot be installed
A portable, Dockerized Python tool that implements Model Context Protocol for audio transcription using Whisper models, featuring both CLI and web UI interfaces for converting audio files to JSON transcriptions.
Related MCP Servers
- -securityFlicense-qualityProvides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.Last updated -2Python
- -securityAlicense-qualityA Model Context Protocol server that allows AI assistants like Claude and Cursor to create music and control Sonic Pi programmatically through OSC messages.Last updated -JavaScriptMIT License
- AsecurityAlicenseAqualityA MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.Last updated -12JavaScriptMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.Last updated -11764JavaScriptThe Unlicense