Video MCP Server
Provides audio transcription using OpenAI's Whisper model, enabling analysis of spoken content from video files.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Video MCP Serveranalyze this meeting recording"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Video MCP Server
Bridge the gap between Claude and video content - An MCP (Model Context Protocol) server that extracts keyframes and transcribes audio from videos, enabling Claude to analyze video files.
Overview
While Claude is multimodal and can analyze images, it cannot directly process video files. This MCP server automatically:
🎬 Extracts keyframes using intelligent scene detection or fixed intervals
🎤 Transcribes audio using OpenAI's Whisper
⚡ Provides structured, timestamped output combining visual and textual information
💾 Caches results for instant follow-up queries
🖼️ Optimized image compression to fit within Claude Code's token limits
Related MCP server: ffmpeg-mcp-server
Use Cases
📹 Meeting Analysis: Process Teams/Zoom recordings to extract key points and visual content
🎓 Tutorial Understanding: Analyze demo videos and instructional content
🔍 Content Review: Quickly scan through long videos to find specific moments
📊 Presentation Analysis: Extract slides and speaker notes from recorded presentations
Installation
Prerequisites
Python 3.11 or higher
python3 --version # Should be 3.11+FFmpeg (required for video processing)
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt-get update sudo apt-get install ffmpegWindows:
Download from ffmpeg.org
Add to PATH
Verify FFmpeg installation:
ffmpeg -version
Install Video MCP Server
Clone repository:
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall the package:
pip install -e .Verify installation:
video-mcp --help # Should show no errors (will start MCP server)
Configuration
MCP Configuration for Claude Code
Add the following to your Claude Code MCP configuration file:
Location: ~/.claude.json
{
"mcpServers": {
"video-mcp": {
"command": "/path/to/vidmcp/venv/bin/video-mcp",
"args": [],
"env": {
"VIDEO_MCP_FRAME_INTERVAL_SECONDS": "10",
"VIDEO_MCP_WHISPER_MODEL": "base",
"VIDEO_MCP_MAX_FRAMES_PER_VIDEO": "100",
"VIDEO_MCP_USE_SCENE_DETECTION": "true",
"VIDEO_MCP_SCENE_THRESHOLD": "0.3",
"VIDEO_MCP_SCENE_MAX_INTERVAL_SECONDS": "30",
"PYTHONUNBUFFERED": "1"
}
}
}
}Important Notes:
Adjust the
commandpath to match your virtual environment locationRestart Claude Code after modifying
~/.claude.json
Environment Variables
You can customize behavior using environment variables (all optional):
Variable | Default | Description |
|
| Interval between frames (fixed-interval mode) |
|
| Maximum frames to extract per video |
|
| Use scene detection instead of fixed intervals |
|
| Scene detection sensitivity (0.0-1.0, lower=more sensitive) |
|
| Maximum seconds between frames (ensures static scenes are captured) |
|
| Whisper model (tiny, base, small, medium, large) |
|
| Device for Whisper (cpu or cuda) |
|
| Directory for cache storage |
|
| Maximum age of cache entries |
|
| Maximum total cache size |
Scene Detection
Enabled by default - Intelligently extracts frames at scene changes instead of fixed intervals.
Settings:
VIDEO_MCP_USE_SCENE_DETECTION=true- Enable scene detectionVIDEO_MCP_SCENE_THRESHOLD=0.3- Sensitivity (0.0-1.0)0.1-0.2: Very sensitive (subtle changes, camera movements)
0.3: Default (balanced, most content)
0.4-0.5: Conservative (only major scene changes)
VIDEO_MCP_SCENE_MAX_INTERVAL_SECONDS=30- Maximum seconds between framesEnsures static content is captured even if scene doesn't change
Prevents missing long presentations, talking heads, or static slides
Set to
0to disable (pure scene detection only)
Benefits:
Captures key moments automatically
Avoids redundant frames in static scenes
Max interval ensures no content is missed during static periods
Better coverage for dynamic content (movies, vlogs, edited videos)
Example: If someone talks for 2 minutes without moving (no scene change), max interval of 30s ensures you still get frames at 0s, 30s, 60s, 90s, 120s.
When to use fixed intervals:
You need perfectly predictable, evenly-spaced coverage
Content is extremely static throughout
Whisper Model Selection
Choose based on your needs:
Model | Size | Speed | Accuracy | Best For |
| 39M | Fastest | Good | Quick scans, low-end hardware |
| 74M | Fast | Better | Recommended default |
| 244M | Medium | Great | High accuracy needs |
| 769M | Slow | Excellent | Production quality |
| 1550M | Slowest | Best | Maximum accuracy |
Token Limits and Video Length
Claude Code has a 25,000 token limit for MCP responses. Current settings are optimized to fit within this limit.
Image Token Calculation:
Formula (from Anthropic docs):
tokens = (width × height) / 750For 640x360 frames:
(640 × 360) / 750 = ~307 tokens per frame
Token Budget Breakdown:
Frame images: ~307 tokens each
Transcript text: ~1 token per 4 characters
Metadata: ~200 tokens
Capacity at 25K Limit:
Safe limit: ~70 frames (accounting for transcript)
Maximum: ~80 frames (minimal transcript)
Recommended Video Lengths (10-second intervals):
2-5 minutes: 12-30 frames (~4-10K tokens) ✅ Always safe
5-10 minutes: 30-60 frames (~10-20K tokens) ✅ Usually safe
10-12 minutes: 60-72 frames (~20-24K tokens) ⚠️ Approaching limit
12+ minutes: 72+ frames ❌ Will likely truncate
If you hit truncation:
Increase scene threshold:
VIDEO_MCP_SCENE_THRESHOLD=0.4Reduce max frames:
VIDEO_MCP_MAX_FRAMES_PER_VIDEO=60Increase frame interval:
VIDEO_MCP_FRAME_INTERVAL_SECONDS=15Process video in segments
Usage
Basic Usage with Claude Code
Once configured, you can ask Claude to process videos:
Process this video: /.../meeting_recording.mp4Claude will automatically:
Extract frames every 10 seconds (configurable)
Transcribe the audio
Present a timeline with frames and transcript
Cache results for future queries
Follow-up Queries
Ask questions about the video content:
What were the main points discussed in the meeting?Show me the slides that were presentedWhat was said around the 5-minute mark?These follow-up queries are instant - they use the cached processed video data.
Custom Processing Options
You can customize processing parameters:
Process this video with 5-second intervals: /path/to/video.mp4Process using the small Whisper model: /path/to/video.mp4List Cached Videos
List all processed videosShows all videos in cache with their settings and cache timestamps.
Tools
The MCP server provides two tools:
1. process_video
Process a video file into frames and transcription.
Parameters:
video_path(required): Absolute path to video fileframe_interval(optional): Interval between frames (fixed-interval mode only)use_scene_detection(optional): Use scene detection (default: true)scene_threshold(optional): Scene detection sensitivity 0.0-1.0 (default: 0.3)whisper_model(optional): Whisper model to usemax_frames(optional): Maximum number of frames to extract (default: 100)
Example:
{
"video_path": "/path/to/video.mp4",
"frame_interval": 5,
"whisper_model": "small",
"max_frames": 100
}2. list_processed_videos
List all videos currently in cache.
Returns: Summary of cached videos with paths, timestamps, and settings.
Resources
The server also provides MCP resources for cached videos:
URI Format: video:///absolute/path/to/video.mp4
Cached videos are automatically exposed as resources that Claude can access.
Cache Management
Cache Location
By default, cached data is stored in:
~/.cache/video-mcp/
├── {cache_key}/
│ ├── metadata.json
│ ├── transcript.json
│ ├── timeline.json
│ └── frames/
│ ├── frame_0000.jpg
│ ├── frame_0001.jpg
│ └── ...
└── cache_index.jsonCache Key Format
Cache keys are generated from:
Video file hash (MD5 of first 1MB + file size)
Processing settings (scene detection/interval, Whisper model)
Fixed Interval Mode:
Format: {hash}_{interval}s_{model}
Example: abc123def456_10s_base
Scene Detection Mode:
Format: {hash}_scene_{threshold}_{model}
Example: abc123def456_scene_0.3_base
Manual Cache Management
Check cache size:
du -sh ~/.cache/video-mcpClear cache:
rm -rf ~/.cache/video-mcpView cache index:
cat ~/.cache/video-mcp/cache_index.json | python3 -m json.toolAutomatic Cleanup
The cache automatically cleans up:
Entries older than 30 days (configurable)
When total size exceeds 10GB (configurable)
Supported Formats
MP4 (
.mp4)MOV (
.mov)AVI (
.avi)MKV (
.mkv)WebM (
.webm)
Most common video codecs are supported via FFmpeg.
Performance
Processing Time
Approximate times for a 10-minute video (on Apple M1):
Task | Time |
Frame extraction (10s intervals) | ~10s |
Audio transcription (base model) | ~60s |
Timeline building | ~1s |
Total | ~70s |
Cache Benefits
First query: ~70s (processes video)
Follow-up queries: Instant (uses cache)
Troubleshooting
Corporate Environments / JFrog Artifactory
Error: Could not find a version that satisfies the requirement fastmcp or No matching distribution found for fastmcp
Solutions:
Install with PyPI fallback:
pip install -e . --extra-index-url https://pypi.org/simpleThis checks your JFrog repo first, then falls back to public PyPI for missing packages.
Install FastMCP directly from PyPI first:
# Install FastMCP from public PyPI pip install --index-url https://pypi.org/simple fastmcp # Then install video-mcp (uses JFrog for other dependencies) pip install -e .
FFmpeg Not Found
Error: FFmpeg not found. Please install it...
Solution:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Verify
ffmpeg -versionVideo File Not Found
Error: Video file not found: /path/to/video.mp4
Solution:
Provide absolute path, not relative path
Check file exists:
ls -la /path/to/video.mp4Use tab completion to avoid typos
Unsupported Format
Error: Unsupported video format: .flv
Solution:
Convert to MP4:
ffmpeg -i input.flv output.mp4Supported formats: MP4, MOV, AVI, MKV, WebM
Whisper Model Download Issues
Error: Failed to load Whisper model...
Solution:
Models download automatically on first use
Requires internet connection
Check disk space (models: 39MB - 1.5GB)
Download location:
~/.cache/whisper/
Out of Memory
Error: Processing fails on large videos
Solution:
Reduce
max_frames:VIDEO_MCP_MAX_FRAMES_PER_VIDEO=30Increase
frame_interval:VIDEO_MCP_FRAME_INTERVAL_SECONDS=15Use smaller Whisper model:
VIDEO_MCP_WHISPER_MODEL=tiny
Slow Processing
Issue: Transcription takes too long
Solution:
Use smaller Whisper model (
tinyorbase)For GPU: Set
VIDEO_MCP_WHISPER_DEVICE=cuda(requires CUDA setup)Process shorter video segments
Development
Project Structure
vidmcp/
├── src/vidmcp/
│ ├── __init__.py # Package initialization
│ ├── server.py # MCP server implementation
│ ├── config.py # Configuration management
│ ├── models.py # Data models
│ ├── video_processor.py # FFmpeg integration
│ ├── audio_processor.py # Whisper integration
│ ├── timeline_builder.py # Timeline synchronization
│ └── cache.py # Cache management
├── tests/ # Test suite
├── pyproject.toml # Project configuration
└── README.md # This fileRunning Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=vidmcpCode Formatting
# Format code
black src/vidmcp
# Lint code
ruff check src/vidmcpArchitecture
┌─────────────────────────────────────────┐
│ Claude Code (Client) │
└───────────────┬─────────────────────────┘
│ MCP Protocol
┌───────────────▼─────────────────────────┐
│ Video MCP Server │
│ ┌─────────────────────────────────┐ │
│ │ MCP Interface Layer │ │
│ │ • process_video tool │ │
│ │ • list_processed_videos tool │ │
│ │ • video:// resources │ │
│ └──────────┬──────────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────────┐ │
│ │ Processing Pipeline │ │
│ │ • VideoProcessor (FFmpeg) │ │
│ │ • AudioProcessor (Whisper) │ │
│ │ • TimelineBuilder │ │
│ └──────────┬──────────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────────┐ │
│ │ Cache Layer │ │
│ │ • Hash-based caching │ │
│ │ • Persistent storage │ │
│ │ • Auto cleanup │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘Future Enhancements
Planned features:
GPU Support: CUDA acceleration for faster Whisper transcription
Video Chunking: Handle very long videos (>1 hour) efficiently
YouTube Support: Download and process YouTube videos directly
Speaker Diarization: Identify different speakers in transcript
Summary Mode: Generate video summaries with fewer frames
Custom Frame Selection: User-specified exact timestamps for frames
Faster Whisper: Use
faster-whisperlibrary for 4x speed improvementAdaptive Scene Detection: Auto-tune threshold based on content
Acknowledgments
MCP Protocol: Anthropic's Model Context Protocol
FFmpeg: Video processing
Whisper: OpenAI's speech recognition
Claude: Anthropic's AI assistant
Made with ❤️ for the Claude Code community
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ethnn-b/vidmcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server