MCP Audio Transcriber

A Dockerized Python tool that implements the Model Context Protocol (MCP) via AssemblyAI's API. Upload or point to an audio file, and receive a structured JSON transcription.

Features

AssemblyMCP: a concrete MCP implementation that uses AssemblyAI's REST API
Command-line interface (app.py):
python app.py <input_audio> <output_json>
Streamlit web UI (streamlit_app.py):
- Upload local files or paste URLs
- Click Transcribe
- Preview transcript and download JSON
Docker support for environment consistency and portability

Prerequisites

Python 3.10+
An AssemblyAI API key
ffmpeg (for local decoding, if using local files)
(Optional) Docker Desktop / Engine
(Optional) Streamlit (pip install streamlit)

🔧 Installation

Clone the repo
git clone https://github.com/ShreyasTembhare/MCP---Audio-Transcriber.git cd MCP---Audio-Transcriber
Create a .env
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
Ensure .gitignore contains:
.env
Install Python dependencies
pip install --upgrade pip pip install -r requirements.txt
Install ffmpeg
- Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg -y
- Windows: download from https://ffmpeg.org and add its bin/ to your PATH

Usage

1. CLI Transcription

python app.py <input_audio> <output_json>

<input_audio>: any file or URL supported by AssemblyAI
<output_json>: path for the generated JSON

Example:

python app.py data/input.ogg data/output.json
cat data/output.json

2. Streamlit Web UI

streamlit run streamlit_app.py

Open http://localhost:8501
Upload or enter an audio URL
Click Transcribe
Download the JSON result

3. Docker

Build the image:

docker build -t mcp-transcriber .

Run it (mounting your data/ folder):

docker run --rm \
  -e ASSEMBLYAI_API_KEY="$ASSEMBLYAI_API_KEY" \
  -v "$(pwd)/data:/data" \
  mcp-transcriber:latest \
  /data/input.ogg /data/output.json

Then inspect:

ls data/output.json
cat data/output.json

Windows PowerShell:

docker run --rm `
  -e ASSEMBLYAI_API_KEY=$env:ASSEMBLYAI_API_KEY `
  -v "${PWD}\data:/data" `
  mcp-transcriber:latest `
  /data/input.ogg /data/output.json

Project Structure

MCP-Audio-Transcriber/
├── app.py               # CLI entrypoint (AssemblyMCP only)
├── mcp.py               # ModelContextProtocol + AssemblyMCP
├── streamlit_app.py     # Streamlit interface
├── requirements.txt     # assemblyai, python-dotenv, streamlit, etc.
├── Dockerfile           # builds the container
├── .gitignore           # ignores .env, __pycache__, etc.
├── LICENSE              # MIT license
└── data/                # sample input and output
    ├── input.ogg
    └── output.json

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

local-only server

The server can only run on the client's local machine because it depends on local resources.

A portable, Dockerized Python tool that implements Model Context Protocol for audio transcription using Whisper models, featuring both CLI and web UI interfaces for converting audio files to JSON transcriptions.

Related MCP Servers

Kokoro TTS MCP Server
giannisanni
-
security
F
license
-
quality
Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
Last updated -
6
Python
Sonic Pi MCP
abhishekjairath
-
security
A
license
-
quality
A Model Context Protocol server that allows AI assistants like Claude and Cursor to create music and control Sonic Pi programmatically through OSC messages.
Last updated -
331
7
TypeScript
MIT License
Audio Transcriber MCP Server
Ichigo3766
A
security
A
license
A
quality
A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
Last updated -
1
2
7
JavaScript
MIT License
Rime MCP
MatthewDailey
A
security
A
license
A
quality
A Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.
Last updated -
1
6
8
JavaScript
The Unlicense

View all related MCP servers

MCP Audio Transcriber