Skip to main content
Glama

video-vision-mcp

CI PyPI Python License: MIT

An MCP server that gives Claude Code the ability to analyze any video — a local file, a URL, or a Jira ticket attachment — through one set of tools.

Claude can't watch video natively (only text + the first frame of an image). This server converts a video into sampled frame images + an audio transcript, or — when a Gemini key is present — a native Gemini analysis of the whole video. It works alongside mcp-atlassian and shares its .env.

Scenario: open a Jira ticket with a video bug report → one command (analyze_video jira_issue_key=DEV-123) → you see the frames and the transcript (or Gemini's analysis if a key is configured), without juggling two MCP servers.

Three backend tiers (auto-selected)

Tier

Needs

What it does

1 — local (default)

nothing

ffmpeg frames + whisper.cpp transcript. Free, fully local, always works.

2 — cloud ASR

OPENAI_API_KEY or GROQ_API_KEY

Local frames, but transcription via OpenAI Whisper / Groq for higher quality.

3 — native Gemini

GEMINI_API_KEY

Gemini ingests the whole video (visual + audio) in one call, with MM:SS timestamps. Default when the key is set.

Precedence: Gemini > OpenAI > Groq > local. Set VIDEO_MCP_DISABLE_GEMINI=true to force tiers 1/2 even with a Gemini key. The backend used is named in every result.

Privacy: tier 1 never uploads anything. Tiers 2/3 print a one-time notice in the session the first time video content is sent to a third party.

Related MCP server: Video MCP

Tools

  • analyze_video — frames + transcript + metadata (the main tool).

  • get_video_transcript_only — transcript text only.

  • extract_frames_at — frames at specific timestamps ("00:42", "1:05", 12.5).

  • list_recent_analyses — cached analyses + backend used.

  • compare_backends — same video via tier 1 and tier 3 side by side.

Install

Requires Python ≥ 3.10. A single install pulls everything — backends, plus the ffmpeg and whisper.cpp dependencies. Nothing is ever installed globally on your machine (no brew/apt/winget, no sudo).

With uv you don't install it explicitly — uvx runs the published package on demand (see Register in Claude Code). To install into an environment instead:

uv pip install video-vision-mcp     # or: pip install video-vision-mcp

From source (development)

git clone https://github.com/KitDevUA/video-vision-mcp.git
cd video-vision-mcp
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"          # all backends bundled

Dependencies — fully self-contained

  • ffmpeg / ffprobe: if they are already on your PATH, those system binaries are used. Otherwise the bundled static-ffmpeg package supplies them (fetched once into its own local cache — never a system-wide install).

  • whisper.cpp (tier 1 transcription): shipped as the bundled pywhispercpp binding (prebuilt wheels; builds from source only if no wheel exists for your platform/Python). A whisper-cli already on PATH is used if present.

  • whisper model: the ggml model (base by default) downloads from Hugging Face into the cache on first transcription. Override with VIDEO_MCP_WHISPER_MODEL (tiny/base/small/medium/large-v3) or VIDEO_MCP_WHISPER_MODEL_PATH.

  • cloud-only: set OPENAI_API_KEY / GROQ_API_KEY (tier 2) or GEMINI_API_KEY (tier 3); whisper.cpp is then never invoked.

Configure

cp env.example .env
# edit .env — nothing is required for tier 1

See env.example for every variable. The .env format matches mcp-atlassian, so Jira creds (JIRA_URL / JIRA_USERNAME / JIRA_API_TOKEN) can be shared.

Register in Claude Code

Add to your project .mcp.json (or global config), next to mcp-atlassian — see .mcp.json.example:

{
  "mcpServers": {
    "video-vision": {
      "command": "uvx",
      "args": ["video-vision-mcp"],
      "env": { "VIDEO_MCP_ENV": "/abs/path/to/.env" }
    }
  }
}

uvx downloads and runs the published package automatically — no manual install step. VIDEO_MCP_ENV is optional (tier 1 needs no keys); point it at your .env if you use Jira or cloud backends. For local development against a checkout, use "args": ["--from", "/abs/path/to/video-vision-mcp", "video-vision-mcp"] instead. Restart Claude Code; the video-vision tools then appear.

Cache

Results are cached at ~/.cache/video-vision-mcp/ keyed by (file hash, backend) — re-analyzing the same video is instant, and switching backends keeps each result separately. Downloaded URLs/Jira files and whisper models live under the same dir. Override with VIDEO_MCP_CACHE_DIR.

How it fits with mcp-atlassian

mcp-atlassian can download a Jira attachment but can't analyze it. This server takes over from there: pass jira_issue_key and it fetches the attachment over Jira REST itself (same creds), so you stay in one tool call. If the Jira token is missing/invalid you get a clear error pointing at .env, not a silent failure.

A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/KitDevUA/video-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server