Skip to main content
Glama

MCP Audio Transcriber

MCP Audio Transcriber

A Dockerized Python tool that implements the Model Context Protocol (MCP) via AssemblyAI's API. Upload or point to an audio file, and receive a structured JSON transcription.

Features

  • AssemblyMCP: a concrete MCP implementation that uses AssemblyAI's REST API
  • Command-line interface (app.py):
    python app.py <input_audio> <output_json>
  • Streamlit web UI (streamlit_app.py):
    • Upload local files or paste URLs
    • Click Transcribe
    • Preview transcript and download JSON
  • Docker support for environment consistency and portability

Prerequisites

  • Python 3.10+
  • An AssemblyAI API key
  • ffmpeg (for local decoding, if using local files)
  • (Optional) Docker Desktop / Engine
  • (Optional) Streamlit (pip install streamlit)

🔧 Installation

  1. Clone the repo
    git clone https://github.com/ShreyasTembhare/MCP---Audio-Transcriber.git cd MCP---Audio-Transcriber
  2. Create a .env
    ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
  3. Ensure .gitignore contains:
    .env
  4. Install Python dependencies
    pip install --upgrade pip pip install -r requirements.txt
  5. Install ffmpeg
    • Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg -y
    • Windows: download from https://ffmpeg.org and add its bin/ to your PATH

Usage

1. CLI Transcription

python app.py <input_audio> <output_json>
  • <input_audio>: any file or URL supported by AssemblyAI
  • <output_json>: path for the generated JSON

Example:

python app.py data/input.ogg data/output.json cat data/output.json

2. Streamlit Web UI

streamlit run streamlit_app.py

3. Docker

Build the image:

docker build -t mcp-transcriber .

Run it (mounting your data/ folder):

docker run --rm \ -e ASSEMBLYAI_API_KEY="$ASSEMBLYAI_API_KEY" \ -v "$(pwd)/data:/data" \ mcp-transcriber:latest \ /data/input.ogg /data/output.json

Then inspect:

ls data/output.json cat data/output.json

Windows PowerShell:

docker run --rm ` -e ASSEMBLYAI_API_KEY=$env:ASSEMBLYAI_API_KEY ` -v "${PWD}\data:/data" ` mcp-transcriber:latest ` /data/input.ogg /data/output.json

Project Structure

MCP-Audio-Transcriber/ ├── app.py # CLI entrypoint (AssemblyMCP only) ├── mcp.py # ModelContextProtocol + AssemblyMCP ├── streamlit_app.py # Streamlit interface ├── requirements.txt # assemblyai, python-dotenv, streamlit, etc. ├── Dockerfile # builds the container ├── .gitignore # ignores .env, __pycache__, etc. ├── LICENSE # MIT license └── data/ # sample input and output ├── input.ogg └── output.json
-
security - not tested
A
license - permissive license
-
quality - not tested

local-only server

The server can only run on the client's local machine because it depends on local resources.

A portable, Dockerized Python tool that implements Model Context Protocol for audio transcription using Whisper models, featuring both CLI and web UI interfaces for converting audio files to JSON transcriptions.

  1. Features
    1. Prerequisites
      1. 🔧 Installation
        1. Usage
          1. CLI Transcription
          2. Streamlit Web UI
          3. Docker
        2. Project Structure

          Related MCP Servers

          • -
            security
            F
            license
            -
            quality
            Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
            Last updated -
            2
            Python
          • -
            security
            A
            license
            -
            quality
            A Model Context Protocol server that allows AI assistants like Claude and Cursor to create music and control Sonic Pi programmatically through OSC messages.
            Last updated -
            JavaScript
            MIT License
          • A
            security
            A
            license
            A
            quality
            A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
            Last updated -
            1
            2
            JavaScript
            MIT License
            • Linux
            • Apple
          • A
            security
            A
            license
            A
            quality
            A Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.
            Last updated -
            1
            15
            4
            JavaScript
            The Unlicense
            • Apple
            • Linux

          View all related MCP servers

          MCP directory API

          We provide all the information about MCP servers via our MCP API.

          curl -X GET 'https://glama.ai/api/mcp/v1/servers/ShreyasTembhare/MCP---Audio-Transcriber'

          If you have feedback or need assistance with the MCP directory API, please join our Discord server