crispasr-agent-transcriber
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@crispasr-agent-transcribertranscribe meeting_recording.wav"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
# crispasr-agent-transcriber
Local-only transcription for Codex and MCP-based AI agents, powered by CrispASR. No cloud uploads, no API keys required for transcription.
What it does
Give it a local audio or video file. It:
Probes the spoken language (English or Chinese) using CrispASR's FireRed LID.
Starts a local CrispASR server with the right backend -- Cohere Transcribe for English, Qwen3-ASR for Chinese.
Extracts audio from video with ffmpeg when needed.
Calls CrispASR's
/v1/audio/transcriptionsendpoint.Writes the transcript and metadata to disk.
Everything runs on your machine. Media never leaves it.
Related MCP server: whisper-telegram-mcp
Quick start
Prerequisites: Python 3.11+, uv, ffmpeg, and three model files (see below).
git clone https://github.com/EmiyaKatuz/crispasr-agent-transcriber.git
cd crispasr-agent-transcriber
# Install Python dependencies
uv sync --extra dev
# Install the CrispASR binary (auto-detects GPU: CUDA > Vulkan > CPU)
uv run python scripts/transcribe.py --install-crispasr
# Transcribe a file
uv run python scripts/transcribe.py sample.mp4 --profile auto `
--manage-server `
--lid-backend firered --lid-model models\firered-lid-q2_k.gguf `
--model models\cohere-transcribe.gguf `
--format verbose_jsonOr run .\scripts\setup.ps1 for a guided first-time setup.
Required models
This tool does not download models automatically. Download these three
GGUF files and keep them in a local directory (the repo's models/ folder
works well):
Purpose | File | ~Size | Source |
English ASR |
| 3.9 GB | |
Chinese ASR |
| 1.3 GB | |
Language detection |
| 350 MB |
Pass them on every run:
--model models\cohere-transcribe.gguf
--lid-backend firered --lid-model models\firered-lid-q2_k.ggufCrispASR binary management
The tool auto-detects, installs, and updates the CrispASR binary from GitHub releases.
Flag | Effect |
| Download latest platform binary to |
| Upgrade to newest release |
| Show installed version + update availability |
| Custom directory (default |
| Exact path to |
When --manage-server is set and no binary is found, it auto-installs before
starting the server.
GPU detection
On install and update, the tool checks your hardware:
CUDA --
nvidia-smiavailable, orCUDA_PATH/CUDA_HOMEset, or CUDA inPATH-> downloadscrispasr-*-cudavariant.Vulkan --
vulkaninfoorVULKAN_SDKset (only when CUDA is absent) -> downloadscrispasr-*-vulkanvariant.CPU -- fallback when no GPU toolkit is detected.
macOS always uses the universal binary.
Profiles
Profile | Backend | ASR model | Language hint |
|
| Cohere Transcribe 03-2026 |
|
|
| Qwen3-ASR 1.7B |
|
| determined by LID | determined by LID | detected |
auto mode runs FireRed language detection on the media, then routes English
to Cohere or Chinese to Qwen3-1.7B. Mixed or uncertain content stops with a
clear error asking you to re-run with --profile english or --profile chinese.
Usage
Managed server (tool starts CrispASR for you)
uv run python scripts/transcribe.py sample.wav `
--profile auto `
--manage-server `
--model models\qwen3-asr-1.7b-q4_k.gguf `
--lid-backend firered --lid-model models\firered-lid-q2_k.gguf `
--format srt `
--out-dir outputsAdd --keep-server to leave the server running after transcription.
Manual server (you start CrispASR)
# Terminal 1 -- start the server
crispasr --server --backend cohere `
-m models\cohere-transcribe.gguf `
--port 8080
# Terminal 2 -- transcribe
uv run python scripts/transcribe.py sample.mp4 `
--profile english `
--server-url http://127.0.0.1:8080 `
--format verbose_jsonIf the running server's backend doesn't match the selected profile, the tool prints the exact command you need to start the correct server.
Output formats
| File extension | Contents |
|
| Plain transcript |
|
| Full response with segments |
|
| SubRip subtitles |
|
| WebVTT subtitles |
A .metadata.json sidecar is always written alongside the transcript.
Video files
Video files are detected automatically. ffmpeg extracts the audio track to a temporary mono 16 kHz WAV before sending it to CrispASR. The temporary file is deleted when transcription finishes.
All CLI flags
--profile auto|english|chinese
--format text|verbose_json|srt|vtt
--out-dir PATH
--server-url URL
--allow-remote-server
--manage-server
--keep-server
--model PATH Local GGUF model path
--allow-model-auto-download
--lid-model PATH Local LID model path
--lid-backend firered|silero|ecapa|whisper
--host HOST Managed server host (default 127.0.0.1)
--port PORT Managed server port (default 8080)
--language CODE Language hint for transcription
--prompt TEXT Initial prompt/context
--vad Enable voice activity detection
--diarize Enable speaker diarization
--diarize-method METHOD
--hotwords WORD,WORD Comma-separated hotwords
--no-timestamps
--preprocess auto|always|never
--api-key KEY If CRISPASR_API_KEYS is enabled
--crispasr-bin-dir PATH
--crispasr-bin PATH
--install-crispasr
--update-crispasr
--crispasr-statusMCP server
uv sync --extra mcp
uv run python -m crispasr_mcp.serverExposed tools:
Tool | Description |
| Check CrispASR server health |
| List available backends |
| Run language detection on a file |
| Transcribe an audio file |
| Transcribe a video file |
| Batch-transcribe a folder |
Security model
No cloud uploads. Media files stay on the local filesystem.
No remote servers by default.
--server-urlonly accepts localhost unless--allow-remote-serveris explicitly passed.No URL inputs. Only local file paths are accepted. URLs, S3, and other remote schemes are rejected.
No shell injection. ffmpeg is called with argument lists and
shell=False. No user-controlled strings are interpolated into shell commands.No model downloads by default. CrispASR model auto-download (
-m auto) requires--allow-model-auto-download. The same guard applies to language detection models.Temporary files are cleaned up. Converted WAV files and LID probe windows are deleted when transcription finishes.
Binary downloads are explicit. CrispASR binary installs only from the official
CrispStrobe/CrispASRGitHub releases.
Verify
uv run pytest # 52 tests
uv run ruff check . # zero lint warningsLicense
This project is licensed under the MIT License.
Third-party components and attribution
This tool orchestrates several independently-licensed projects. It does not bundle, fork, or redistribute their code -- it downloads pre-built binaries and calls them as subprocesses or HTTP services at runtime.
Component | License | Role |
MIT | ASR engine, server, language detection | |
LGPL 2.1+ / GPL 2+ | Media decoding and audio extraction | |
Cohere model license | English ASR model (loaded by CrispASR) | |
Apache 2.0 | Chinese ASR model (loaded by CrispASR) | |
Apache 2.0 | Language detection model (loaded by CrispASR) | |
BSD | HTTP client for CrispASR API | |
MIT | MCP server framework |
Model files must be downloaded separately by the user from their respective HuggingFace repositories. See Required models above.
Related projects
CrispASR -- the ASR engine this tool wraps
CrisperWeaver -- CrispASR's desktop GUI (not used by this tool)
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/EmiyaKatuz/crispasr-agent-transcriber'
If you have feedback or need assistance with the MCP directory API, please join our Discord server