What can you do with this server?

Framesleuth is a local video analysis server that converts videos into structured Context Bundles for coding agents, exposed via MCP tools. All processing runs locally — no data leaves the machine. Video Analysis * Submit any video (bug recording, demo, walkthrough) with optional intent, skill, and action mode to generate a structured report: classification, keyframes, transcript, error evidence, reproduction steps, and a timeline. * Frame-by-frame analysis using a local vision model with adaptive keyframe selection and deduplication. * Detects errors from console logs, OCR, and UI state; generates severity, reproduction steps, and Trust Signals (per-field confidence). * Redacts secrets/PII before model processing. * Supports Ollama, llama.cpp, vLLM, and degrades gracefully without a vision model. Report Management * List all available reports, retrieve full or slim Context Bundles, and inspect specific elements: reproduction steps, error evidence, merged event timeline, or individual keyframe images. Code Grounding * Locate relevant file:line code candidates in a repository based on analysis, respecting .gitignore. Artifact Rendering * Render reports as markdown, GitHub issue text, or test plans. * Generate animated GIF previews of analyzed videos (configurable fps, width, time range). * Convert self-contained HTML/CSS/JS/canvas animations into MP4, WebM, or GIF — frame-by-frame, up to 4K at 5–60 fps. Discovery & Actions * List available summary skills (e.g. bug_report, tutorial, action_items) and action modes (e.g. fix, explain, triage, test, reproduce). * Get a machine-readable suggested_actions menu (action, label, rationale, ref) derived from the report's classification. Observability & Lifecycle * Live job progress via Server-Sent Events (SSE), per-stage timings, job cancellation, webhooks, and TTL-based bundle retention.

How do I use Framesleuth?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Framesleuth analyze the bug recording video" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Framesleuth

by santoshshinde2012

Overview Schema Related Servers Score Discussions

Python

Local

Framesleuth

Local video → structured context for coding agents, exposed over MCP.

Give Framesleuth any video — a bug recording, a feature demo, a design walkthrough, a Loom, a phone capture — and it understands it frame-by-frame (plus optional browser sidecars) and produces a structured Context Bundle. It is MCP-ready, so any MCP client — a VS Code agent, another coding agent, or a custom system — can drive the analysis and consume the result to fix a bug, add or change a feature, or build a whole new feature/app grounded in what the video actually shows.

Capture happens outside this repo: any video works, or a browser capture extension can record a session and post the video + sidecars to this agent's local API. This repo is the analysis agent only.

Everything runs locally. Nothing leaves your machine.

Quick start

Want to go from a video to a grounded change inside VS Code? Follow Use with VS Code & Claude (MCP) — connect the bundled MCP server and turn a recording into a fix, a feature, or a new build.

Fastest: one command with Docker

Everything — the model server, the models, and the API — comes up with a single command. No Python, no virtualenv, no manual model setup.

git clone https://github.com/santoshshinde2012/framesleuth.git
cd framesleuth
docker compose up            # or: ./scripts/dev_up.sh

Compose loads docker-compose.override.yml automatically; that override adds the Ollama server, model-pull job, and Ollama model volume. The first run automatically pulls the vision + coder models (qwen2.5vl and qwen2.5-coder:7b, ~11 GB total) into a Docker volume, then starts the backend on http://127.0.0.1:8010. Subsequent runs are instant. It's ready when the health check reports healthy:

curl -s http://127.0.0.1:8010/v1/healthz | python -m json.tool   # "status": "healthy"

That's the whole setup — run your first analysis (below), or connect the MCP server in your editor (VS Code & Claude).

docker compose logs -f                  # follow progress / model download
docker compose down --remove-orphans    # stop  (add -v to also delete model volumes)

The stack runs its own Ollama on the internal Docker network only (its port is not published), so it never clashes with a native Ollama you may already run on :11434 — the only host port is the API on :8010.

Already run Ollama natively (with the models)? The Docker stack's Ollama is separate and would re-download them. Skip Docker and use the direct path below instead — it reuses your existing Ollama and is faster (especially on macOS, where Docker can't use the GPU).
macOS / no GPU: Docker runs the models on CPU, so the vision model is slow. NVIDIA GPU on Linux: uncomment the deploy: block on the ollama service in docker-compose.override.yml for acceleration.
To run only the backend container against a native/external model server, use docker compose -f docker-compose.yml up. The base compose file defaults to native Ollama on http://host.docker.internal:11434; override VLM_URL and CODER_URL for another server.

Run your first analysis (curl)

Once the API reports healthy (either setup path), go from a video to a Context Bundle in three calls — analysis is async (submit → poll → read):

# 1. Submit any screen recording (mp4/webm). Returns 202 { job_id, ... }
JOB=$(curl -s -F "video=@bug.mp4" http://127.0.0.1:8010/v1/analyze \
  | python -c "import sys, json; print(json.load(sys.stdin)['job_id'])")

# 2. Poll until state is "done" (queued → running → done)
curl -s "http://127.0.0.1:8010/v1/jobs/$JOB" | python -m json.tool

# 3. Read the Context Bundle
curl -s "http://127.0.0.1:8010/v1/report/$JOB" | python -m json.tool

Optional form fields on step 1: -F intent="why does save hang?", -F skill=bug_report, -F action=fix (GET /v1/skills and /v1/actions list the choices). Prefer a UI? Import the Postman collection — it chains these calls for you.

Run it directly (no Docker — fastest on macOS, best for development)

Prerequisites: Python 3.11+, uv, 8 GB+ RAM, and a local model server. ffmpeg is not required (PyAV bundles its own; ffprobe, if present, is used opportunistically to detect an audio stream).

git clone https://github.com/santoshshinde2012/framesleuth.git
cd framesleuth

# 1. Models — native Ollama (uses the Mac GPU) is the quick path
ollama serve &                                  # skip if already running
ollama pull qwen2.5vl && ollama pull qwen2.5-coder:7b

# 2. Install
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
python scripts/download_models.py               # optional: pre-warm ASR + check servers

# 3. Configure + start the API (binds 127.0.0.1:8010)
cp .env.example .env                            # already defaults to the Ollama path above
framesleuth-api                                 # or: uvicorn framesleuth.service.api:app --port 8010

# 4. Verify
curl -s http://127.0.0.1:11434/v1/models | grep -q qwen2.5vl && echo "VLM ready"
curl -s http://127.0.0.1:8010/v1/healthz | python -m json.tool   # status: healthy, vlm: ready

When /v1/healthz shows vlm: ready, recordings analyze with a real classification (analysis_quality.level = full/partial). With no vision model reachable, Framesleuth degrades gracefully — it still produces a valid Context Bundle from the browser sidecars (console errors, failed requests, clicks) and records what was thin in analysis_quality. Record with narration so the audio transcript (asr) stage contributes too.

Something not working? Run the setup doctor — it works with a plain python3 even when your virtualenv is broken, and prints a one-line fix for each problem (stale/missing venv, framesleuth-api not on PATH, ffmpeg/render prerequisites, backend or model server not reachable, wrong VLM_URL):
python3 scripts/doctor.py
Common gotcha: command not found: framesleuth-api or a uv pip install error about a missing interpreter means your active venv was deleted/moved. Fix it from the framesleuth directory: deactivate; unset VIRTUAL_ENV; uv venv && source .venv/bin/activate && uv pip install -e ".[dev]".

Stop

# Stop the backend: Ctrl+C in its terminal, or
pkill -f framesleuth-api

# Stop Ollama (optional — leaving it running keeps the model warm)
pkill -f "ollama serve"              # macOS app users: quit Ollama from the menu bar

Related MCP server: popcorn

Architecture

Any video (mp4/webm) + optional sidecars
    ↓
Local Analysis Service (pipeline)
    ├─ Preprocess (PyAV: duration/fps/dims)
    ├─ Transcript (faster-whisper)
    ├─ Keyframes (visual-delta change scoring)
    ├─ Understanding (local vision model — Qwen2.5-VL by default)
    ├─ Fusion + Classification
    ├─ Extraction → Context Bundle
    ├─ Summarize (skill/system-prompt-driven)
    └─ Grounding (workspace search)
    ↓
Context Bundle
    ↓
MCP server + local HTTP API
    └─ consumed by any MCP client (VS Code agent, other agents, capture extension)

Features

Frame-by-frame understanding using a local vision model (Qwen2.5-VL by default; engine-agnostic)
Adaptive keyframe selection — coverage-binned, visual-salience-ranked (AKS-style), with a build-aware budget for feature/design videos and perceptual-hash dedup that drops near-identical frames so the VLM budget is spent on distinct content
Bug and build — a feature class plus a structured build context (screens, UI components, a screen-to-screen user flow, design notes, and where to implement) so an agent can implement, not just diagnose
Error detection and extraction from console, OCR, and UI state
Corpus-aware grounding — error symbols or feature/UI nouns → ranked file:line (definitions preferred, distinctive symbols weighted via IDF + whole-word match), respecting .gitignore and bounded for large repos
Trust signals — per-field confidence (with cross-modal corroboration — agreeing signals reinforce each other) and a task-aware actionability (ready/thin/insufficient) alongside the pipeline quality level
Redaction-first design — secrets (passwords, tokens, keys) and PII (emails, Luhn-valid card numbers, SSNs/phones, cloud keys) are scrubbed from OCR, captions, the transcript, and the raw sidecar streams before any of it reaches a model or is persisted (bundle and the sibling timeline.json / sidecars.json / transcript.json)
Observability — per-stage timings on every bundle (stage_timings) and live on GET /v1/jobs/{id}, so you can see where analysis time went
Job lifecycle & delivery — cooperative cancellation (DELETE /v1/jobs/{id}, checked between frames), a hard per-job timeout (JOB_TIMEOUT_S), crash recovery (orphaned jobs are failed on restart, not left as zombies), SSE progress with explicit terminal events (GET /v1/jobs/{id}/events), a completion webhook (WEBHOOK_URL), real queue depth in /healthz, and TTL retention cleanup (BUNDLE_TTL_DAYS) swept at startup and periodically (RETENTION_SWEEP_INTERVAL_S)
Interaction overlay — a click/cursor sidecar with coordinates draws a marker on the matching keyframe, so the model sees where the user acted
Cleaner transcripts — faster-whisper voice-activity filtering (ASR_VAD_FILTER) drops silence before decoding; detected/forced language is recorded
OCR backstop (optional ocr extra) — a sparse VLM OCR on an error frame gets a second, independent Tesseract reading; a no-op without the extra
No data leaves your machine — fully local, no telemetry or cloud APIs
Engine-agnostic — swap Ollama, llama.cpp, or vLLM via config only
Works on any video — not just bug recordings. A general video (a demo, a walkthrough, a talk, a phone/real-world clip) yields a faithful summary + a timeline of key moments (summary, key_moments[]) instead of being forced into a bug shape; the bug-only fields (severity, expected/actual, repro steps) stay null rather than carrying fabricated placeholders
Structured output — canonical Context Bundle with evidence citations
Configurable response — pick a summary skill and an action mode (fix/implement/design/summarize/explain/triage/test/report/reproduce, auto-picked from the classification), plus a machine-readable suggested_actions menu and on-demand artifact renderers (markdown / GitHub issue / test plan)
Eval harness — model-free classification / grounding / citation / faithfulness suites (python scripts/eval_harness.py --behavioral) run in CI (GitHub Actions: ruff, black, mypy --strict, pytest with coverage, then the eval harness) on every push and PR; the faithfulness suite proves every emitted key moment and step cites real, resolvable evidence (no fabrication)
Resilient — handles no-audio videos, weak local models, low-confidence cases
HTML → video (frame-by-frame) — turn a self-contained HTML animation (CSS/JS/canvas) into MP4, GIF, or WebM via the render_html_video MCP tool or POST /v1/render-html. Captures the animation frame-by-frame under a paused virtual clock and encodes a color-correct H.264 MP4 (yuv420p+bt709, near-lossless) — full color, no dropped frames, no quality loss (up to 4K, 5–60 fps). Included by default in the Docker image (headless Chromium + ffmpeg). For the direct (non-Docker) path, add the render extra (see below); returns 503 with an actionable message when unavailable.

Enable & troubleshoot HTML → video

Using Docker (docker compose up)? HTML→video already works — the image bakes in Playwright + Chromium + ffmpeg. (Build with --build-arg INSTALL_RENDER=false for a slimmer image without it.) The steps below are for the direct path.

Why is Playwright not in the core install? It's an optional [render] extra, not a core dependency, because it pulls a ~150 MB headless-Chromium browser the core video→bundle pipeline never needs — the standard way to ship a heavy, feature-specific dependency. (av, opencv, faster-whisper are core because the pipeline requires them.) Install the extra and you're done — the Chromium build downloads automatically on your first render, so there's no separate playwright install chromium step:

# In the same environment the server runs in:
uv pip install -e ".[render]"        # or ".[all]" = dev + render
# ffmpeg must be on PATH (brew install ffmpeg / apt-get install ffmpeg)

# Restart framesleuth-api, then verify (Chromium fetches itself on first render):
curl -s http://127.0.0.1:8010/v1/healthz | python -m json.tool
# → "render": {"playwright": true, "chromium": <true after first render>, "ffmpeg": true}

Set FRAMESLEUTH_AUTO_INSTALL_BROWSER=0 to disable the auto-download and run playwright install chromium yourself (e.g. in a locked-down environment).

Other optional extra — ocr. For the dedicated OCR backstop on error frames, uv pip install -e ".[ocr]" and put the tesseract binary on PATH (brew install tesseract / apt-get install tesseract-ocr). It's a no-op when absent — the VLM still does OCR; the backstop only adds a second reading. Use ".[all]" for dev + render + ocr.

If render.ready is false, the render.hint field tells you exactly what's missing. The most common cause of "Playwright is not installed" despite following the steps is that framesleuth-api is running from a different environment than the one you installed into (the render.python field shows which interpreter the server uses) — or the server simply wasn't restarted.

Project structure

framesleuth/
├── framesleuth/              # Main package
│   ├── config.py            # Typed config (pydantic-settings)
│   ├── schemas.py           # Data contracts (Context Bundle, enums)
│   ├── errors.py            # Exception taxonomy
│   ├── logging_config.py    # Structured JSON logging, job-id correlation
│   ├── prompts.py           # VLM / classify / summary / fix prompt templates
│   ├── skills.py            # Built-in summary skills (summary, bug_report, ...)
│   ├── actions.py           # Action modes (fix/explain/triage/...) + suggested-actions menu
│   ├── render.py            # Artifact renderers (markdown / GitHub issue / test plan)
│   ├── clients/             # VLM, coder HTTP clients (OpenAI-compatible)
│   ├── pipeline/            # preprocess, asr, scenes, understand, fusion, classify, bug_extract, redact, summarize, sidecars, grounding, html_render
│   ├── orchestrator/        # graph.py — linear async stage pipeline
│   ├── jobs/                # store.py — SQLite job state + bundle index
│   ├── service/             # FastAPI HTTP endpoints
│   └── mcp_server/          # framesleuth MCP server (VS Code + any MCP client)
├── tests/                   # pytest tests + fixtures
├── scripts/                 # doctor.py (setup check), download_models.py, dev_up.sh
├── postman/                 # HTTP API collection + environment
├── docs/                    # capabilities, use-with-vscode-and-claude, web-integration
└── pyproject.toml           # Dependencies and tool config

Development

Run tests

pytest tests/ -v --cov=framesleuth

Code quality

ruff check framesleuth tests
black --check framesleuth tests
mypy --strict framesleuth

Set up pre-commit hooks

pre-commit install

A short, focused set:

Capabilities — the single reference: every input, output, skill, action, renderer, HTTP endpoint, and MCP tool
Use with VS Code & Claude (MCP) — connect the framesleuth MCP server to Copilot, Claude Code, and Claude Desktop
Web App Integration (end-to-end) — embed Framesleuth behind your own backend with an agent loop
Postman Collection — exercise the HTTP API end-to-end (import or run headless with Newman)
Runbook & Troubleshooting — setup, health checks, and common issues

License

Apache-2.0

Capture client

Bug capture lives outside this repo. Any screen recording works — drive the agent directly with your own video file. A browser capture extension can also record a session, collect browser sidecars (console errors, failed requests, clicks), and post the video + sidecars to this agent's local API. CORS is allowlisted (WEB_ORIGINS, default: the hosted demo site

local dev) plus chrome-extension:// origins, and the agent answers Chrome's Private Network Access preflight — so both a capture extension and the "Try it" widget on framesleuth.com work against a locally running backend with no extra setup. The agent stays bound to loopback; CORS only controls which browser origins may read its responses.

Status: Backend + pipeline + MCP server completed. Questions? Open an issue or check runbook.md for common questions.

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Related MCP Servers

mcp-video-analyzer
Image & Video Processing Multimedia Processing Web Scraping
guimatheus92
A
license
A
quality
A
maintenance
MCP server for video analysis — extracts transcripts, key frames with OCR, and annotated timelines from video URLs. Supports Loom and direct video files (.mp4, .webm). Zero auth required.
Last updated 2026-07-21
7
8
729
28
MIT
popcorn
Multimedia Processing Image & Video Processing Speech Processing
haithamelmengad
A
license
-
quality
D
maintenance
An MCP server that enables AI agents to analyze videos locally by extracting transcripts, detecting scene changes, and returning key frames.
Last updated 2026-02-05
4
MIT
Screen Vision MCP Server
Image & Video Processing Monitoring
avicuna
F
license
A
quality
B
maintenance
Enables Claude to capture screenshots, watch your screen in real-time, read text via OCR, and analyze video files, all running locally as an MCP server.
Last updated 2026-07-08
14
vidsight
Image & Video Processing Search AI & Machine Learning
szepix
A
license
-
quality
B
maintenance
Enables AI agents to query local video timelines by extracting speech, frame captions, and on-screen text into a SQLite store, exposing search and retrieval tools via MCP.
Last updated 2026-06-29
PolyForm Noncommercial 1.0.0

View all related MCP servers

Related MCP Connectors

BugEzy
Voice-powered bug reporting with 13 MCP tools. Record bugs by talking; let AI find and fix them.
Contendeo
Multimodal video analysis MCP — transcription, vision, and OCR for any video URL.
agent-replay-debugger-mcp
Agent Replay Debugger MCP — record every agent step + deterministic replay. Step-debugger for

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/santoshshinde2012/framesleuth'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Framesleuth

Quick start

Fastest: one command with Docker

Run your first analysis (curl)

Run it directly (no Docker — fastest on macOS, best for development)

Architecture

Features

Enable & troubleshoot HTML → video

Project structure

Development

Run tests

Code quality

Set up pre-commit hooks

License

Capture client

Maintenance

Resources

Looking for Admin?

Tools

Related MCP Servers

mcp-video-analyzer

popcorn

Screen Vision MCP Server

vidsight

Related MCP Connectors

Latest Blog Posts

MCP directory API