Enables containerized deployment of the transcription service, making it portable and providing a consistent runtime environment.
Provides audio file processing capabilities, allowing the transcription service to handle various audio formats like .wav, .mp3, .ogg, and .m4a.
Integrates with OpenAI's Whisper models to provide high-quality, multi-language audio transcription with options for different model sizes.
Offers a web-based user interface for uploading audio files, selecting model parameters, previewing transcriptions, and downloading results.
MCP Audio Transcriber
A Dockerized Python tool that implements the Model Context Protocol (MCP) via AssemblyAI's API. Upload or point to an audio file, and receive a structured JSON transcription.
Features
- AssemblyMCP: a concrete MCP implementation that uses AssemblyAI's REST API
- Command-line interface (
app.py
): - Streamlit web UI (
streamlit_app.py
):- Upload local files or paste URLs
- Click Transcribe
- Preview transcript and download JSON
- Docker support for environment consistency and portability
Prerequisites
- Python 3.10+
- An AssemblyAI API key
- ffmpeg (for local decoding, if using local files)
- (Optional) Docker Desktop / Engine
- (Optional) Streamlit (
pip install streamlit
)
🔧 Installation
- Clone the repo
- Create a
.env
- Ensure
.gitignore
contains: - Install Python dependencies
- Install ffmpeg
- Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg -y
- Windows: download from https://ffmpeg.org and add its
bin/
to your PATH
- Ubuntu/Debian:
Usage
1. CLI Transcription
<input_audio>
: any file or URL supported by AssemblyAI<output_json>
: path for the generated JSON
Example:
2. Streamlit Web UI
- Open http://localhost:8501
- Upload or enter an audio URL
- Click Transcribe
- Download the JSON result
3. Docker
Build the image:
Run it (mounting your data/ folder):
Then inspect:
Windows PowerShell:
Project Structure
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
A portable, Dockerized Python tool that implements Model Context Protocol for audio transcription using Whisper models, featuring both CLI and web UI interfaces for converting audio files to JSON transcriptions.
Related MCP Servers
- -securityFlicense-qualityProvides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.Last updated -2Python
- -securityAlicense-qualityA Model Context Protocol server that allows AI assistants like Claude and Cursor to create music and control Sonic Pi programmatically through OSC messages.Last updated -JavaScriptMIT License
- AsecurityAlicenseAqualityA MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.Last updated -12JavaScriptMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.Last updated -1154JavaScriptThe Unlicense