Used for video processing, including frame extraction and scene detection in video files
Used for version control of the codebase and managing contributions
Platform for project hosting, issue reporting, and community discussions
Provides local LLM inference capabilities using models like Llava for vision analysis and Mistral for chat interactions
Testing framework used for running unit and integration tests
Core programming language required to run the server (3.10+)
MCP Video Parser
A powerful video analysis system that uses the Model Context Protocol (MCP) to process, analyze, and query video content using AI vision models.
🎬 Features
- AI-Powered Video Analysis: Automatically extracts and analyzes frames using vision LLMs (Llava)
- Natural Language Queries: Search videos using conversational queries
- Time-Based Search: Query videos by relative time ("last week") or specific dates
- Location-Based Organization: Organize videos by location (shed, garage, etc.)
- Audio Transcription: Extract and search through video transcripts
- Chat Integration: Natural conversations with Mistral/Llama while maintaining video context
- Scene Detection: Intelligent frame extraction based on visual changes
- MCP Protocol: Standards-based integration with Claude and other MCP clients
🚀 Quick Start
Prerequisites
- Python 3.10+
- Ollama installed and running
- ffmpeg (for video processing)
Installation
- Clone the repository:
- Install dependencies:
- Pull required Ollama models:
- Start the MCP server:
Basic Usage
- Process a video:
- Start the chat client:
- Example queries:
- "Show me the latest videos"
- "What happened at the garage yesterday?"
- "Find videos with cars"
- "Give me a summary of all videos from last week"
🏗️ Architecture
🛠️ Configuration
Edit config/default_config.json
to customize:
- Frame extraction rate: How many frames to analyze
- Scene detection sensitivity: When to capture scene changes
- Storage settings: Where to store videos and data
- LLM models: Which models to use for vision and chat
See Configuration Guide for details.
🔧 MCP Tools
The server exposes these MCP tools:
process_video
- Process and analyze a video filequery_location_time
- Query videos by location and timesearch_videos
- Search video content and transcriptsget_video_summary
- Get AI-generated summary of a videoask_video
- Ask questions about specific videosanalyze_moment
- Analyze specific timestamp in a videoget_video_stats
- Get system statisticsget_video_guide
- Get usage instructions
🛠️ Utility Scripts
Video Cleanup
Clean all videos from the system and reset to a fresh state:
This script will:
- Remove all video entries from the database
- Delete all processed frames and transcripts
- Delete all videos from the location-based structure
- Optionally delete original video files
- Create a backup of the database before cleaning (unless
--no-backup
)
Video Processing
Process individual videos:
📖 Documentation
- API Reference - Detailed MCP tool documentation
- Configuration Guide - Customization options
- Video Analysis Info - How video processing works
- Development Guide - Contributing and testing
- Deployment Guide - Production setup
🚦 Development
Running Tests
Project Structure
🤝 Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
📝 Roadmap
- ✅ Basic video processing and analysis
- ✅ MCP server implementation
- ✅ Natural language queries
- ✅ Chat integration with context
- 🚧 Enhanced time parsing (see INTELLIGENT_QUERY_PLAN.md)
- 🚧 Multi-camera support
- 🚧 Real-time processing
- 🚧 Web interface
🐛 Troubleshooting
Common Issues
- Ollama not running:
- Missing models:
- Port already in use:
📄 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
- Built on FastMCP framework
- Uses Ollama for local LLM inference
- Inspired by the Model Context Protocol specification
💬 Support
Version: 0.1.1
Author: Michael Baker
Status: Beta - Breaking changes possible
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
A video analysis system that uses AI vision models to process, analyze, and query video content through natural language, enabling users to search videos by time, location, and content.
Related MCP Servers
- AsecurityFlicenseAqualityThis server allows AI language models to interact with YouTube content through a standardized interface, providing features such as video and channel information retrieval, transcript management, and playlist operations.Last updated -7182205TypeScript
- -securityFlicense-qualityEnables AI language models to interact with YouTube content through a standardized interface, providing tools for retrieving video information, transcripts, channel analytics, and trend analysis.Last updated -8521JavaScript
- -securityAlicense-qualityAn agent-based tool that provides web search and advanced research capabilities including document analysis, image description, and YouTube transcript retrieval.Last updated -7PythonApache 2.0
- -securityFlicense-qualityA Model Context Protocol server that analyzes YouTube videos, enabling users to extract transcripts, generate summaries, and query video content using Gemini AI.Last updated -7Python