Skip to main content
Glama

whisper-windows-mcp

A Windows-native MCP (Model Context Protocol) server that lets Claude Desktop transcribe audio and video files locally using whisper.cpp — with GPU acceleration, multilingual support, and batch processing. All transcription runs locally — no audio, video, or file paths ever leave your machine.

Why does this exist? The popular whisper-mcp package was built for macOS and assumes a Unix environment. It does not work on Windows. This package was written specifically for Windows users who want local AI transcription integrated with Claude Desktop.


What you can do with it

Once installed, you can say things like this directly in Claude Desktop:

  • "Transcribe C:\Users\Me\Downloads\meeting.mp3"

  • "Transcribe this folder of recordings and save each as a text file"

  • "Generate Japanese and English subtitles for this video"

  • "Start a batch transcription of everything in this folder"

  • "How long will it take to transcribe these files?"

  • "Check if GPU acceleration is working"


Related MCP server: Whisper Speech Recognition MCP Server

Requirements

  1. Node.js 18 or laternodejs.org

  2. whisper.cpp binaries with Vulkan GPU support — see Step 1

  3. A Whisper model file — see Step 2

  4. FFmpeg — required for video files and non-WAV/MP3 audio


Step 1 — Install whisper.cpp binaries

Download whisper-vulkan-win-x64.zip from the releases page.

This is a custom-compiled build with Vulkan GPU acceleration enabled. Works with AMD, NVIDIA, and Intel GPUs — no vendor-specific SDK required.

Extract to C:\whisper\Release\. You should end up with:

C:\whisper\Release\whisper-cli.exe
C:\whisper\Release\ggml-vulkan.dll
C:\whisper\Release\ggml.dll
C:\whisper\Release\ggml-base.dll
C:\whisper\Release\ggml-cpu.dll
C:\whisper\Release\whisper.dll

GPU acceleration is automatic — no additional configuration needed.

Option B — Build from source

Requires: Git, CMake, Visual Studio Build Tools 2022+ with "Desktop development with C++", Vulkan SDK from lunarg.com.

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release --target whisper-cli

Copy the binaries from build\bin\Release\ to C:\whisper\Release\.

Note: The official whisper.cpp Windows releases on GitHub do not include a Vulkan build. You must use the pre-built release above or compile from source with -DGGML_VULKAN=ON.


Step 2 — Download a Whisper model

Model

Size

Speed

Accuracy

Best for

ggml-tiny.en.bin

75 MB

Very fast

Basic

Quick tests

ggml-base.en.bin

142 MB

Fast

Good

Everyday English

ggml-small.en.bin

466 MB

Moderate

Better

Important recordings

ggml-medium.en.bin

1.5 GB

Fast on GPU

Very good

Best quality English

ggml-large-v3-turbo.bin

1.6 GB

Fast on GPU

Excellent

Recommended for English GPU batch work — ~6x faster than large-v3 with minimal accuracy loss

ggml-large-v3.bin

2.9 GB

Fast on GPU

Excellent

Multilingual, maximum accuracy

ggml-medium.en-q5_0.bin

514 MB

Fast

Very good

Best CPU-only English option — high accuracy at low memory

ggml-large-v3-turbo-q5_0.bin

547 MB

Fast

Excellent

Best CPU-only multilingual option

ggml-large-v3-q5_0.bin

1.1 GB

Moderate on CPU

Excellent

Multilingual, CPU-friendly

Use download_model in Claude Desktop to install any of these directly. For English-only use: large-v3-turbo (GPU) or medium.en-q5_0 (CPU) are the best starting points. For multilingual use: large-v3-turbo or large-v3-turbo-q5_0 (CPU). English-only models (*.en.bin) output [FOREIGN] on non-English audio and cannot be used for other languages.


Step 3 — Install FFmpeg

FFmpeg is required for video files and non-native audio formats.

Install via winget:

winget install ffmpeg

Or download from ffmpeg.org and add to your PATH.

Verify:

ffmpeg -version

Step 4 — Install this MCP server

npm install -g whisper-windows-mcp

Step 5 — Configure Claude Desktop

Open Claude Desktop → Settings → Developer → Edit Config.

Add the whisper entry:

{
  "mcpServers": {
    "whisper": {
      "command": "npx",
      "args": ["-y", "whisper-windows-mcp"],
      "env": {
        "WHISPER_CLI_PATH": "C:\\whisper\\Release\\whisper-cli.exe",
        "WHISPER_MODEL": "C:\\whisper\\models\\ggml-medium.en.bin"
      }
    }
  }
}

Config file location: C:\Users\YourName\AppData\Roaming\Claude\claude_desktop_config.json

Use double backslashes in all paths.

Save and fully restart Claude Desktop. You should see whisper listed with a green running badge in Settings → Developer.


Step 6 — Verify your setup

In Claude Desktop, ask:

"Check your whisper config"

Then:

"Check your system hardware"

This confirms your GPU is detected and Vulkan acceleration is active.


Available tools

transcribe_audio

Transcribe a single file. Supports blocking (default) or background mode for long files.

Parameter

Description

file_path

Absolute path to the file (required)

language

Language code (en, ja, es, etc.) or auto to detect. Default: en

output_format

text (default), timestamps, json, or srt

save_to_file

Save transcript as .txt next to the source file

background

Run as detached job — returns a job ID immediately. Use check_progress to monitor. Recommended for files over 10 minutes.

threads

CPU thread override

temperature

Sampling temperature 0.0–1.0. Default 0.0 (deterministic). Higher values reduce hallucination on noisy audio.

prompt

Prior context string — improves accuracy for domain-specific vocabulary or speaker names. Example: "Names: Keemstar, DramaAlert."

condition_on_prev_text

Re-enable context conditioning between segments. Default false.

beam_size

Beam search width. Higher = more accurate, slower. Default 5.

best_of

Candidate sequences evaluated. Default 5.

gpu_device

GPU device index for multi-GPU systems. Default 0.

processors

Parallel processor count. Default 1.

word_timestamps

One word per timestamped segment. Useful for clip alignment.

max_segment_length

Max segment length in characters.

diarize

Stereo speaker diarization — requires stereo audio with speakers on separate channels.

vad_model

Path to Silero VAD model .bin. Strips silence before transcription — reduces hallucinations on noisy files.

offset_t

Start offset in milliseconds.

duration

Process duration in milliseconds from offset.


check_progress

Monitor a background transcription job started with transcribe_audio (background=true).

Returns elapsed time, last processed timestamp, percentage, and the full transcript when complete.

Parameter

Description

job_id

Job ID returned by transcribe_audio


start_batch

Automated sequential batch transcription of all untranscribed files in a folder. Sorts by duration (shortest first), processes one at a time as background jobs, validates each output.

Parameter

Description

folder_path

Path to folder (required)

language

Language code. Default: en

threads

CPU thread override


check_batch_progress

Monitor a running batch. Automatically advances to the next file when the current one finishes. Returns overall progress, current file with timestamp, ETA, and any failed files.

Parameter

Description

batch_id

Batch ID returned by start_batch


transcribe_batch (interactive)

Process files one at a time with a preview and confirmation before each. Useful when you want to review as you go.

Parameter

Description

folder_path

Path to folder (required)

file_index

Which file to process (1-based). Omit to list files first.

language

Language code. Default: en

recursive

Include subfolders


generate_subtitles

Generate SRT subtitle files. Supports automatic language detection and English translation output.

Parameter

Description

file_path

Path to file (required)

language

Language code or auto to detect. Default: en

translate_to_english

Also generate an English translation .en.srt. Only applies when source is not English.

threads

CPU thread override

When both native and translation are requested, two files are saved next to the source:

  • filename.ja.srt — original language

  • filename.en.srt — English translation

Whisper's built-in translation only translates to English. For other target languages, translate the .srt file contents separately.


analyze_media

Analyze files before committing to transcription. Returns duration, size, codec, and estimated transcription time on CPU and GPU. For folders, shows all files in a sortable table with transcription status.

Parameter

Description

path

Path to a single file or folder (required)

sort_by

For folders: duration (default), name, or size


check_config

Verify whisper-cli.exe, the model file, and FFmpeg are all accessible. Run this first if anything is failing.


list_models

List all Whisper model files installed in your models directory. Shows filename, size, whether it is currently active, quantization status, and recommended use case. No network calls — reads local filesystem only.


download_model

Download a Whisper model directly from Hugging Face into your models directory. Accepts a model name (e.g. large-v3-turbo, medium.en-q5_0) and handles the download automatically. Only downloads from trusted Hugging Face namespaces. After downloading, use switch_model to activate it.

Parameter

Description

model_name

Model name to download, e.g. large-v3-turbo, large-v3-turbo-q5_0, medium.en-q5_0


switch_model

Switch the active Whisper model for the current session without restarting Claude Desktop. Change is session-scoped — does not persist after restart. To make permanent, update WHISPER_MODEL in your config.

Parameter

Description

model_name

Model filename (e.g. ggml-large-v3-turbo.bin) or full path. Must be a .bin file in the configured models directory.


check_system

Detect GPU hardware and verify Vulkan acceleration is available. Reports GPU name, VRAM, whether ggml-vulkan.dll is present, and recommends the best model size for your hardware.


Supported formats

Type

Formats

Native (no conversion)

mp3, wav

Video (auto-converted via FFmpeg)

mp4, mkv, avi, mov, webm, flv, wmv, m4v, ts, 3gp

Audio (auto-converted via FFmpeg)

m4a, ogg, flac


GPU acceleration

The pre-built Vulkan release enables GPU acceleration automatically. Tested on AMD Radeon RX Vega 56 (GCN 5th gen). Any GPU with Vulkan 1.0+ support should work, including NVIDIA and Intel Arc.

Performance comparison (medium.en model, ~5 minute audio file):

Hardware

Time

CPU only (Ryzen 7 2700x, 8 threads)

8–12 minutes

GPU (Vega 56 via Vulkan)

20–40 seconds

GPU utilization during transcription is typically 15–20%, dropping back to idle between files. CPU stays around 15%.


Multilingual support

Whisper can auto-detect the spoken language and transcribe in that language. The built-in translation model translates to English only.

For best multilingual accuracy, use the large-v3 model. English-specific models (*.en.bin) cannot detect or transcribe other languages.

Example — foreign language video with subtitles:

  1. Ask Claude to generate subtitles with language=auto and translate_to_english=true

  2. Whisper detects the language and generates a native-language SRT

  3. A second pass generates an English translation SRT

  4. Load either file in VLC via Subtitle → Add Subtitle File


Designed for free-tier users

This tool is built to minimize Claude API interactions. The entire transcription workflow — scan, analyze, queue, run, validate — is designed to require as few Claude interactions as possible. Heavy lifting is done locally on your machine.


Optional environment variables

Variable

Description

WHISPER_CLI_PATH

Path to whisper-cli.exe (required)

WHISPER_MODEL

Path to model .bin file (required)

WHISPER_THREADS

CPU thread count override

FFMPEG_PATH

Path to ffmpeg if not in system PATH

WHISPER_PRIVACY_MODE

Planned. When set to true, tool responses return metadata only — no transcript text is returned to Claude's API. For regulated or confidential content. See PRIVACY.md.


Troubleshooting

See TROUBLESHOOTING.md for detailed solutions. See PRIVACY.md for compliance guidance if you handle regulated content.

Quick checklist:

  • Paths in config use double backslashes (C:\\whisper\\...)

  • whisper-cli.exe exists at the configured path

  • Model .bin file exists at the configured path

  • FFmpeg is installed and in PATH (ffmpeg -version works)

  • Claude Desktop was fully restarted after editing config

  • Whisper shows running in Settings → Developer


Security and Privacy

whisper-windows-mcp is designed with security as a core principle.

Audio never leaves your machine. No audio or video files, no file paths, and no telemetry are ever transmitted to any server. No cloud APIs are required for core functionality.

Transcript text and the API boundary. When a tool response includes transcript text, that text is processed by Claude's API — it leaves your local machine. For most users (public content, podcasts, streaming recordings) this is expected behavior. If you handle medical, legal, financial, or other regulated recordings, see PRIVACY.md for compliance guidance and configuration options.

A WHISPER_PRIVACY_MODE environment variable is planned that will restrict all tool responses to metadata only (filename, duration, word count) — no transcript text will be returned to Claude. This is the correct configuration for regulated or confidential content.

Input validation. All file paths are validated before use — UNC paths (\\server\share) and directory traversal sequences (..) are rejected. Files over 10 GB are rejected to prevent resource exhaustion.

Transcript injection awareness. Audio files can contain spoken content that, when transcribed, resembles instructions. Claude's built-in defenses handle this, but it is worth knowing that transcript content is treated as data — never as instructions — by the MCP server itself.

Model downloads are restricted. The download_model tool only downloads from two trusted Hugging Face namespaces (ggerganov/whisper.cpp and ggml-org). Arbitrary URLs are rejected. Redirects are validated against an allowlist before following.

Model switching is sandboxed. switch_model only accepts .bin files within the configured models directory. Paths outside that directory are rejected.

No new network dependencies. Model downloads use Node.js built-in https — no external HTTP libraries are added to the package.


License

Non-commercial use: MIT — free for personal, educational, and non-commercial use. See LICENSE.

Commercial use: A separate commercial license is required for any business, professional, or revenue-generating use. See LICENSE-COMMERCIAL.md for terms and contact information.

Contributing

Pull requests welcome. See ROADMAP.md for planned features.

If you've tested GPU acceleration on hardware not listed above, please open an issue with your results — GPU model, VRAM, model size, and observed throughput.

Install Server
A
license - permissive license
A
quality
A
maintenance

Maintenance

Maintainers
2hResponse time
1dRelease cycle
4Releases (12mo)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/eviscerations/whisper-windows-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server