What can you do with this server?

mcp-video-analyzer extracts transcripts, key frames, OCR text, and metadata from video URLs (Loom, .mp4, .webm, .mov, etc.) with the following capabilities: * Full video analysis (analyze_video): Extract everything at once — timestamped transcript with speaker IDs, deduplicated key frames via scene-change detection, OCR text from frames (code, UI text, error messages), an annotated timeline merging transcript + frames + OCR, metadata (title, duration, platform), viewer comments, chapters, and AI summary. Supports brief, standard, and detailed depth levels. * Transcript extraction (get_transcript): Pull only the timestamped transcript with speaker IDs; falls back to Whisper when no native transcript is available. * Metadata retrieval (get_metadata): Fetch metadata, comments, chapters, and AI summary without downloading the video or extracting frames. * Frame extraction (get_frames): Extract key frames via scene-change detection (default) or dense sampling (1 frame/sec), with deduplication and JPEG optimization. * Single frame (get_frame_at): Capture one frame at a specific timestamp to inspect exactly what's on screen. * Burst frame extraction (get_frame_burst): Extract N frames evenly across a narrow time range — ideal for analyzing motion, animations, or fast scrolling. * Moment deep-dive (analyze_moment): Focused analysis on a specific time range combining burst frames, filtered transcript, OCR, and an annotated timeline. * Caching: Results cached in memory for 10 minutes; use forceRefresh to bypass. * Flexible output: Filter returned fields, control frame count, adjust scene-change sensitivity, and optionally return frames as base64 inline.

Which integrations are available for this server?

Enables analysis of Loom videos by extracting transcripts, key frames, metadata, and comments to provide a unified timeline of visual and audio content.

mcp-video-analyzer

by guimatheus92

Overview Schema Related Servers Score Discussions

TypeScript

Hybrid

mcp-video-analyzer

Featured in awesome-mcp-servers.

MCP server for video analysis — extracts transcripts, key frames, and metadata from video URLs. Supports Loom, direct video files (.mp4, .webm), and more.

No existing video MCP combines transcripts + visual frames + metadata in one tool. This one does.

Installation

Prerequisites

Node.js 18+ — required to run the server via npx
yt-dlp (optional) — enables frame extraction via ffmpeg. Install with pip install yt-dlp
Chrome/Chromium (optional) — fallback for frame extraction if yt-dlp is unavailable

Without yt-dlp or Chrome, the server still works — you'll get transcripts, metadata, and comments, just no frames.

Claude Code (CLI)

claude mcp add video-analyzer -- npx mcp-video-analyzer@latest

Then restart Claude Code or start a new conversation.

VS Code / Cursor

Add to your MCP settings file:

VS Code: File → Preferences → Settings → search "MCP" or edit ~/.vscode/mcp.json / %APPDATA%\Code\User\mcp.json (Windows)
Cursor: Settings → MCP Servers → Add

{
  "servers": {
    "mcp-video-analyzer": {
      "type": "stdio",
      "command": "npx",
      "args": ["mcp-video-analyzer@latest"]
    }
  }
}

Then reload the window (Ctrl+Shift+P → "Developer: Reload Window").

Claude Desktop

Add to your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "video-analyzer": {
      "command": "npx",
      "args": ["mcp-video-analyzer@latest"]
    }
  }
}

Then restart Claude Desktop.

Verify it works

Once installed, ask your AI assistant:

Analyze this video: https://www.loom.com/share/bdebdfe44b294225ac718bad241a94fe

If the server is connected, it will automatically call the analyze_video tool.

Tools

`analyze_video` — Full video analysis

Extracts everything from a video URL in one call:

> Analyze this video: https://www.loom.com/share/abc123...

Returns:

Transcript with timestamps and speakers
Key frames extracted via scene-change detection (automatically deduplicated)
OCR text extracted from frames (code, error messages, UI text visible on screen)
Annotated timeline merging transcript + frames + OCR into a unified "what happened when" view
Metadata (title, duration, platform)
Comments from viewers
Chapters and AI summary (when available)

The AI will automatically call this tool when it sees a video URL — no need to ask.

Options:

detail — analysis depth: "brief" (metadata + truncated transcript, no frames), "standard" (default), "detailed" (dense sampling, more frames)
fields — array of specific fields to return, e.g. ["metadata", "transcript"]. Available: metadata, transcript, frames, comments, chapters, ocrResults, timeline, aiSummary
maxFrames (1-60, default depends on detail level) — cap on extracted frames
threshold (0.0-1.0, default 0.1) — scene-change sensitivity
forceRefresh — bypass cache and re-analyze
skipFrames — skip frame extraction for transcript-only analysis

`get_transcript` — Transcript only

> Get the transcript from this video

Quick transcript extraction. Falls back to Whisper transcription when no native transcript is available.

`get_metadata` — Metadata only

> What's this video about?

Returns metadata, comments, chapters, and AI summary without downloading the video.

`get_frames` — Frames only

> Extract frames from this video with dense sampling

Two modes:

Scene-change detection (default) — captures visual transitions
Dense sampling (dense: true) — 1 frame/sec for full coverage

`analyze_moment` — Deep-dive on a time range

> Analyze what happens between 1:30 and 2:00 in this video

Combines burst frame extraction + filtered transcript + OCR + annotated timeline for a focused segment. Use when you need to understand exactly what happens at a specific moment.

`get_frame_at` — Single frame at a timestamp

> Show me the frame at 1:23 in this video

The AI reads the transcript, spots a critical moment, and requests the exact frame to see what's on screen.

`get_frame_burst` — N frames in a time range

> Show me 10 frames between 0:15 and 0:17 of this video

For motion, vibration, animations, or fast scrolling — burst mode captures N frames in a narrow window so the AI can see frame-by-frame changes.

Detail Levels

Level	Frames	Transcript	OCR	Timeline	Use case
`brief`	None	First 10 entries	No	No	Quick check — what's this video about?
`standard`	Up to 20 (scene-change)	Full	Yes	Yes	Default — full analysis
`detailed`	Up to 60 (1fps dense)	Full	Yes	Yes	Deep analysis — every second captured

Caching

Results are cached in memory for 10 minutes. Subsequent calls with the same URL and options return instantly. Use forceRefresh: true to bypass the cache.

Supported Platforms

Platform	Transcript	Metadata	Comments	Frames	Auth
Loom	Yes	Yes	Yes	Yes	None
Direct URL (.mp4, .webm)	No	Duration only	No	Yes	None

Frame Extraction Strategies

Frame extraction uses a two-strategy fallback chain — no single dependency is required:

Strategy	How it works	Speed	Requirements
yt-dlp + ffmpeg (primary)	Downloads video, extracts frames via scene detection	Fast, precise	yt-dlp (`pip install yt-dlp`)
Browser (fallback)	Opens video in headless Chrome, seeks to timestamps, takes screenshots	Slower, no download needed	Chrome or Chromium installed

The fallback is automatic — if yt-dlp is not available, the server tries browser-based extraction via puppeteer-core. If neither is available, analysis still returns transcript + metadata + comments, just no frames.

Post-Processing Pipeline

After frame extraction, the pipeline automatically applies:

Step	What it does	Why
Frame deduplication	Removes near-identical consecutive frames using perceptual hashing (dHash + Hamming distance)	Screencasts often have long static moments — dedup removes redundant frames, saving tokens
OCR	Extracts text visible on screen from each frame (via tesseract.js)	Captures code, error messages, terminal output, UI text that the transcript doesn't cover
Annotated timeline	Merges transcript timestamps + frame timestamps + OCR text into a single chronological view	Gives the AI a unified "what was said, what changed visually, and what text appeared" at each moment

The OCR step requires tesseract.js (included as a dependency). If it fails to load, analysis continues without OCR — no frames or transcript are lost.

Complementary Tools

Chrome DevTools MCP

For live web debugging alongside video analysis, pair this server with the Chrome DevTools MCP:

claude mcp add chrome-devtools npx @anthropic-ai/mcp-devtools@latest

When to use each:

Scenario	Tool
Bug report recorded as a Loom video	`mcp-video-analyzer` — extract transcript, frames, and error text from the recording
Live debugging a web page	Chrome DevTools MCP — inspect DOM, console, network, take screenshots
Video shows UI issue, need to reproduce it	Use both: analyze the video first, then open the page in Chrome DevTools to reproduce

The two MCPs complement each other: video analyzer understands recorded content, DevTools interacts with live pages.

Example Output

The examples/loom-demo/ folder contains real outputs from analyzing a public Loom video (Boost In-App Demo Video, 2:55).

File	What it shows
`metadata.json`	Title, duration, platform
`transcript.json`	42 timestamped entries with speaker IDs
`timeline.json`	Unified chronological view (transcript + frames merged)
`moment-transcript-0m30s-0m45s.json`	Filtered transcript for `analyze_moment` (0:30–0:45)
`full-analysis.json`	Complete `analyze_video` output

Frame images (19 total in examples/loom-demo/frames/):

scene_*.jpg — scene-change detection (key visual transitions)
dense_*.jpg — 1fps dense sampling (every 10th frame saved as sample)
burst_*.jpg — burst extraction for moment analysis (0:30–0:45)

Regenerate after changes: npx tsx examples/generate.ts — requires yt-dlp + network access.

Development

# Install dependencies
npm install

# Run all checks (format, lint, typecheck, knip, tests)
npm run check

# Build
npm run build

# Run E2E tests (requires network)
npm run test:e2e

# Open MCP Inspector for manual testing
npm run inspect

Architecture

src/
├── index.ts                    # Entry point (shebang + stdio)
├── server.ts                   # FastMCP server + tool registration
├── tools/                      # MCP tool definitions (7 tools)
│   ├── analyze-video.ts        # Full analysis with detail levels + caching
│   ├── analyze-moment.ts       # Deep-dive on a time range
│   ├── get-transcript.ts       # Transcript-only with Whisper fallback
│   ├── get-metadata.ts         # Metadata + comments + chapters
│   ├── get-frames.ts           # Frames-only (scene-change or dense)
│   ├── get-frame-at.ts         # Single frame at timestamp
│   └── get-frame-burst.ts      # N frames in a time range
├── adapters/                   # Platform-specific logic
│   ├── adapter.interface.ts    # IVideoAdapter interface + registry
│   ├── loom.adapter.ts         # Loom: authless GraphQL
│   └── direct.adapter.ts       # Direct URL: any mp4/webm link
├── processors/                 # Shared processing
│   ├── frame-extractor.ts      # ffmpeg scene detection + dense + burst extraction
│   ├── browser-frame-extractor.ts # Headless Chrome fallback for frames
│   ├── audio-transcriber.ts    # Whisper fallback (HF transformers → CLI → OpenAI)
│   ├── image-optimizer.ts      # sharp resize/compress
│   ├── frame-dedup.ts          # Perceptual dedup (dHash + Hamming distance)
│   ├── frame-ocr.ts            # OCR text extraction (tesseract.js)
│   └── annotated-timeline.ts   # Unified timeline (transcript + frames + OCR)
├── config/
│   └── detail-levels.ts        # brief / standard / detailed config
├── utils/
│   ├── cache.ts                # In-memory TTL cache with LRU eviction
│   ├── field-filter.ts         # Selective field filtering for responses
│   ├── url-detector.ts         # Platform detection from URL
│   ├── vtt-parser.ts           # WebVTT → transcript entries
│   └── temp-files.ts           # Temp directory management
└── types.ts                    # Shared TypeScript interfaces

License

MIT

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

10dResponse time

2dRelease cycle

7Releases (12mo)

Resources

Need Help?

Related Servers

Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/guimatheus92/mcp-video-analyzer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

mcp-video-analyzer

Installation

Prerequisites

Claude Code (CLI)

VS Code / Cursor

Claude Desktop

Verify it works

Tools

analyze_video — Full video analysis

get_transcript — Transcript only

get_metadata — Metadata only

get_frames — Frames only

analyze_moment — Deep-dive on a time range

get_frame_at — Single frame at a timestamp

get_frame_burst — N frames in a time range

Detail Levels

Caching

Supported Platforms

Frame Extraction Strategies

Post-Processing Pipeline

Complementary Tools

Chrome DevTools MCP

Example Output

Development

Architecture

License

Maintenance

Resources

Tools

Latest Blog Posts

MCP directory API

`analyze_video` — Full video analysis

`get_transcript` — Transcript only

`get_metadata` — Metadata only

`get_frames` — Frames only

`analyze_moment` — Deep-dive on a time range

`get_frame_at` — Single frame at a timestamp

`get_frame_burst` — N frames in a time range