# Qwen Video Understanding MCP Server
An MCP (Model Context Protocol) server that enables Claude and other AI agents to analyze videos and images using **Qwen3-VL** deployed on Modal.
## Highlights
- **Hours-long video** support with full recall
- **Timestamp grounding** - second-level precision
- **256K context** (expandable to 1M)
- **32-language OCR** support
- **Free/self-hosted** on Modal serverless GPU
## Features
- **Video Analysis**: Analyze videos via URL with custom prompts
- **Image Analysis**: Analyze images via URL
- **Video Summarization**: Generate brief, standard, or detailed summaries
- **Text Extraction**: Extract on-screen text and transcribe speech
- **Video Q&A**: Ask specific questions about video content
- **Frame Comparison**: Analyze changes and progression in videos
## Architecture
```
Claude/Agent → MCP Server → Modal API → Qwen3-VL (GPU)
```
The MCP server acts as a bridge between Claude and your Qwen2.5-VL model deployed on Modal's serverless GPU infrastructure.
## Prerequisites
1. **Modal Account**: Sign up at [modal.com](https://modal.com)
2. **Deployed Qwen Model**: Deploy the video understanding model to Modal (see below)
3. **Python 3.10+**
## Quick Start
### 1. Deploy the Model to Modal (if not already done)
```bash
cd ~/qwen-video-modal
modal deploy qwen_video.py
```
### 2. Install the MCP Server
```bash
cd ~/qwen-video-mcp-server
pip install -e .
```
Or with uv:
```bash
uv pip install -e .
```
### 3. Configure Environment
```bash
cp .env.example .env
# Edit .env with your Modal workspace name
```
### 4. Add to Claude Desktop
Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):
```json
{
"mcpServers": {
"qwen-video": {
"command": "uv",
"args": [
"--directory",
"/Users/adamanz/qwen-video-mcp-server",
"run",
"server.py"
],
"env": {
"MODAL_WORKSPACE": "adam-31541",
"MODAL_APP": "qwen-video-understanding"
}
}
}
}
```
### 5. Restart Claude Desktop
The `qwen-video` tools should now be available.
## Available Tools
### `analyze_video`
Analyze a video with a custom prompt.
```
analyze_video(
video_url="https://example.com/video.mp4",
question="What happens in this video?",
max_frames=16
)
```
### `analyze_image`
Analyze an image with a custom prompt.
```
analyze_image(
image_url="https://example.com/image.jpg",
question="Describe this image"
)
```
### `summarize_video`
Generate a video summary in different styles.
```
summarize_video(
video_url="https://example.com/video.mp4",
style="detailed" # brief, standard, or detailed
)
```
### `extract_video_text`
Extract text and transcribe speech from a video.
```
extract_video_text(
video_url="https://example.com/presentation.mp4"
)
```
### `video_qa`
Ask specific questions about a video.
```
video_qa(
video_url="https://example.com/video.mp4",
question="How many people appear in this video?"
)
```
### `compare_video_frames`
Analyze changes throughout a video.
```
compare_video_frames(
video_url="https://example.com/timelapse.mp4",
comparison_prompt="How does the scene change?"
)
```
### `check_endpoint_status`
Check the Modal endpoint configuration.
### `list_capabilities`
List all server capabilities and supported formats.
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `MODAL_WORKSPACE` | Your Modal workspace/username | `adam-31541` |
| `MODAL_APP` | Name of the Modal app | `qwen-video-understanding` |
| `QWEN_IMAGE_ENDPOINT` | Override image endpoint URL | Auto-generated |
| `QWEN_VIDEO_ENDPOINT` | Override video endpoint URL | Auto-generated |
## Supported Formats
**Video**: mp4, webm, mov, avi, mkv
**Image**: jpg, jpeg, png, gif, webp, bmp
## Limitations
- Videos must be accessible via public URL
- Maximum 64 frames extracted per video
- Recommended video length: under 10 minutes for best results
- First request may have cold start delay (Modal serverless)
## Cost
The Modal backend uses A100-40GB GPUs:
- ~$3.30/hour while processing
- Scales to zero when idle (no cost)
- Only charged for actual processing time
## Troubleshooting
### "Request timed out"
- Video may be too large
- Try a shorter video or reduce `max_frames`
### "HTTP error 502/503"
- Modal container is starting up (cold start)
- Wait a few seconds and retry
### "Video URL not accessible"
- Ensure the URL is publicly accessible
- Check for authentication requirements
## Development
```bash
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
```
## License
MIT