What can you do with this server?

This server enables local, private analysis of video, audio, and image files (or URLs) to extract AI-ready context like keyframes, transcripts, and on-screen text — nothing is uploaded, no API keys required. Video Analysis — Extract keyframes in multiple modes: * sheet: Tile frames into compact contact sheets for a cheap overview * frames: Individual full-size stills for detailed inspection * scenes: Scene-change frames (great for slide decks/screencasts) * filmstrip: Dense near-native-rate strips to catch sub-second UI glitches/flickers Audio Transcription — Convert speech to text from audio files or video soundtracks using local Whisper (tiny → large models). OCR — Extract on-screen text from images or video frames via Tesseract, with configurable language and page-segmentation modes. URL Support — Fetch and analyze media from YouTube, Vimeo, and 1000+ sites via yt-dlp, in addition to local files. Glitch/Jump Detection — Track on-screen numbers across frames and report non-monotonic jump-back glitches with timestamps. Cropping & Time Windows — Focus on specific UI regions or restrict analysis to a start/end time range. Customization — Control frame rate, resolution, number of frames, output format (webp/jpeg/png), OCR language/PSM, download size/duration limits, and more. Dependency Check — Use check_media_deps to verify required binaries (ffmpeg, ffprobe, yt-dlp, whisper, tesseract) are installed. All processing runs entirely on your local machine — private, free, and open source (Apache-2.0).

Which integrations are available for this server?

Enables analyzing Vimeo videos by fetching the video and extracting frames and audio. Enables summarizing, transcribing, and analyzing YouTube videos by fetching the video and extracting frames and audio.

media-context-mcp

by vishalguptax

Overview Schema Related Servers Score Discussions

TypeScript

Local

LLMs read text and glance at a single image — but they can't watch a video or listen to audio. media-context-mcp closes that gap. Hand it a file or a link and it returns clean, model-ready context — keyframes, a transcript, or the text on screen — entirely on your machine. Nothing is uploaded.

🚀 Install

Two steps — add the server, then install the local helpers it uses.

1 · Add the server to your client

# Claude Code
claude mcp add media-context -- npx -y media-context-mcp

The launch command is always npx -y media-context-mcp. Pick your client:

Settings → Developer → Edit Config (claude_desktop_config.json):

{
  "mcpServers": {
    "media-context": { "command": "npx", "args": ["-y", "media-context-mcp"] }
  }
}

~/.cursor/mcp.json (global) or .cursor/mcp.json (per-project):

{
  "mcpServers": {
    "media-context": { "command": "npx", "args": ["-y", "media-context-mcp"] }
  }
}

.vscode/mcp.json — VS Code uses the servers key:

{
  "servers": {
    "media-context": { "command": "npx", "args": ["-y", "media-context-mcp"] }
  }
}

~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "media-context": { "command": "npx", "args": ["-y", "media-context-mcp"] }
  }
}

cline_mcp_settings.json (the extension's MCP settings):

{
  "mcpServers": {
    "media-context": { "command": "npx", "args": ["-y", "media-context-mcp"] }
  }
}

.kiro/settings/mcp.json (project) or ~/.kiro/settings/mcp.json (user):

{
  "mcpServers": {
    "media-context": { "command": "npx", "args": ["-y", "media-context-mcp"] }
  }
}

~/.gemini/settings.json:

{
  "mcpServers": {
    "media-context": { "command": "npx", "args": ["-y", "media-context-mcp"] }
  }
}

settings.json — Zed uses context_servers:

{
  "context_servers": {
    "media-context": { "command": { "path": "npx", "args": ["-y", "media-context-mcp"] } }
  }
}

~/.codex/config.toml:

[mcp_servers.media-context]
command = "npx"
args = ["-y", "media-context-mcp"]

Settings → Tools → AI Assistant → Model Context Protocol → Add, then use command npx with args -y media-context-mcp.

Tip: in Claude Code you can install it as a plugin instead — run /plugin marketplace add vishalguptax/media-context-mcp, then /plugin install media-context. To share with a team, install per-project: --scope project (writes .mcp.json) or commit a .cursor/mcp.json in the repo.

2 · Install the local helpers

One command sets up everything the server uses, via your OS package manager:

npx media-context-mcp setup          # core: keyframes, links, on-screen text
npx media-context-mcp setup --audio  # also enable transcription

The server finds the helpers automatically afterward — no extra configuration. Run check_media_deps to see what's ready, and setup --uninstall to remove them. (Install by hand →)

3 · Ask

“Summarize demo.mp4.”

Related MCP server: local_ai_gen

✨ Capabilities


Video	Keyframe overview, full-size stills, scene detection, or a dense filmstrip that catches split-second glitches
Audio	Speech turned into text — clips, voice notes, meetings, podcasts
Images	The picture, plus the exact text shown on screen
Anywhere	Local files or links — YouTube, Vimeo, and 1000+ sites
Private	Runs on your machine. No API keys, no uploads
Efficient	A long clip becomes a couple of images, not hundreds

🎞️ Modes

analyze_media auto-detects audio and images. For video, choose how frames are sampled:

Mode	Best for
`sheet` (default)	A cheap overview — frames tiled into one or two contact sheets
`frames`	Detail on specific moments — individual full-size stills
`scenes`	Slide decks & static screencasts — only scene-change frames
`filmstrip`	Catching a sub-second UI glitch — a dense, near-native-rate strip

💬 Examples

Just ask in plain language — the assistant picks the right options.

You ask	What you get
“Summarize `demo.mp4`.”	A quick overview from sampled keyframes
“What error does `bug.mp4` show at the end?”	The exact on-screen text, read back
“Walk me through the UI flow in `onboarding.mov`.”	Step-by-step from scene-change frames
“Transcribe `standup.m4a` and list action items.”	A local transcript
“Summarize `https://youtu.be/…` with the transcript.”	Fetched and transcribed
“Read the error in this screenshot `crash.png`.”	The picture plus its exact text
“Find where the slider in `ui.mp4` flickers ~0:06.”	The exact frame of a sub-second glitch

🧰 Tools

Tool	What it does
`analyze_media`	Turn a video, audio, or image — file or URL — into model-readable context. Auto-detects the type and supports cropping, time windows, language, and sampling rate.
`check_media_deps`	Report which capabilities are ready on this machine.

Every call runs locally and cleans up after itself.

⚙️ Options

Your assistant fills these in for you, but you can steer it (“use filmstrip mode”, “crop to the toolbar”).

Param	Default	Description
`source`	—	Local file path (video/audio/image) or http(s) URL
`context`	—	A note framing the analysis; echoed atop the summary
`detail`	—	`high` = readable stills for screen recordings; `low` = cheap overview
`mode`	`sheet`	`sheet` · `frames` · `scenes` · `filmstrip`
`format`	`webp`	`webp` (smallest) · `jpeg` · `png` (crisp text)
`maxFrames`	`30`	Upper bound on sampled frames
`grid`	`5`	Tiles per row/column for contact-sheet modes
`scale`	`320`	Per-frame width in px — lower = fewer tokens
`sceneThreshold`	`0.4`	Scene-change sensitivity (`scenes` mode)
`fps`	auto	Explicit sampling rate; pair high with `filmstrip`
`crop`	—	`{x,y,width,height}` (pixels, or `0–1` fractions) to zoom a region
`stripRows`	`18`	Tiles per image in `filmstrip` mode
`startSec` / `endSec`	—	Restrict to a time window
`transcript`	`false`	Also produce a transcript (video)
`whisperModel`	`small`	`tiny` · `base` · `small` · `medium` · `large`
`ocr`	`false`	Extract on-screen text
`ocrLang`	`eng`	Language code(s), e.g. `eng+deu`
`ocrPsm`	`3`	Page-segmentation: `3` auto · `6` block · `11` sparse
`detectJumps`	`false`	Track an on-screen number and report jump-back glitches with timestamps
`maxDurationSec`	`3600`	Reject URL downloads longer than this
`maxFileSizeMb`	`500`	Abort a URL download past this size

Worked recipes for each are in the usage guide.

❓ FAQ

Can an LLM watch a video? Not directly — models take images and text, not video. This server turns the video into frames and a transcript it can read.

Does anything get uploaded? No. Everything runs on your machine; no keys, no cloud.

Which clients work? Any MCP client — Claude Code, Claude Desktop, Cursor, VS Code, Windsurf, Cline, Kiro, Gemini CLI, JetBrains, Zed, Codex.

Does it handle YouTube and other links? Yes.

How much does it cost? It's free and open source.

📋 Requirements

Node.js 18+, on Windows, macOS, or Linux. The one-time npx media-context-mcp setup installs everything else.

🛠️ Development

npm install
npm run build
npm test

Issues and PRs welcome — see the usage guide for the architecture.

📄 License

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

0dRelease cycle

5Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vishalguptax/media-context-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server