Skip to main content
Glama

Qwen Video Understanding MCP Server

An MCP (Model Context Protocol) server that enables Claude and other AI agents to analyze videos and images using Qwen3-VL deployed on Modal.

Highlights

  • Hours-long video support with full recall

  • Timestamp grounding - second-level precision

  • 256K context (expandable to 1M)

  • 32-language OCR support

  • Free/self-hosted on Modal serverless GPU

Features

  • Video Analysis: Analyze videos via URL with custom prompts

  • Image Analysis: Analyze images via URL

  • Video Summarization: Generate brief, standard, or detailed summaries

  • Text Extraction: Extract on-screen text and transcribe speech

  • Video Q&A: Ask specific questions about video content

  • Frame Comparison: Analyze changes and progression in videos

Architecture

Claude/Agent → MCP Server → Modal API → Qwen3-VL (GPU)

The MCP server acts as a bridge between Claude and your Qwen2.5-VL model deployed on Modal's serverless GPU infrastructure.

Prerequisites

  1. Modal Account: Sign up at modal.com

  2. Deployed Qwen Model: Deploy the video understanding model to Modal (see below)

  3. Python 3.10+

Quick Start

1. Deploy the Model to Modal (if not already done)

cd ~/qwen-video-modal modal deploy qwen_video.py

2. Install the MCP Server

cd ~/qwen-video-mcp-server pip install -e .

Or with uv:

uv pip install -e .

3. Configure Environment

cp .env.example .env # Edit .env with your Modal workspace name

4. Add to Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{ "mcpServers": { "qwen-video": { "command": "uv", "args": [ "--directory", "/Users/adamanz/qwen-video-mcp-server", "run", "server.py" ], "env": { "MODAL_WORKSPACE": "adam-31541", "MODAL_APP": "qwen-video-understanding" } } } }

5. Restart Claude Desktop

The qwen-video tools should now be available.

Available Tools

analyze_video

Analyze a video with a custom prompt.

analyze_video( video_url="https://example.com/video.mp4", question="What happens in this video?", max_frames=16 )

analyze_image

Analyze an image with a custom prompt.

analyze_image( image_url="https://example.com/image.jpg", question="Describe this image" )

summarize_video

Generate a video summary in different styles.

summarize_video( video_url="https://example.com/video.mp4", style="detailed" # brief, standard, or detailed )

extract_video_text

Extract text and transcribe speech from a video.

extract_video_text( video_url="https://example.com/presentation.mp4" )

video_qa

Ask specific questions about a video.

video_qa( video_url="https://example.com/video.mp4", question="How many people appear in this video?" )

compare_video_frames

Analyze changes throughout a video.

compare_video_frames( video_url="https://example.com/timelapse.mp4", comparison_prompt="How does the scene change?" )

check_endpoint_status

Check the Modal endpoint configuration.

list_capabilities

List all server capabilities and supported formats.

Configuration

Environment Variable

Description

Default

MODAL_WORKSPACE

Your Modal workspace/username

adam-31541

MODAL_APP

Name of the Modal app

qwen-video-understanding

QWEN_IMAGE_ENDPOINT

Override image endpoint URL

Auto-generated

QWEN_VIDEO_ENDPOINT

Override video endpoint URL

Auto-generated

Supported Formats

Video: mp4, webm, mov, avi, mkv

Image: jpg, jpeg, png, gif, webp, bmp

Limitations

  • Videos must be accessible via public URL

  • Maximum 64 frames extracted per video

  • Recommended video length: under 10 minutes for best results

  • First request may have cold start delay (Modal serverless)

Cost

The Modal backend uses A100-40GB GPUs:

  • ~$3.30/hour while processing

  • Scales to zero when idle (no cost)

  • Only charged for actual processing time

Troubleshooting

"Request timed out"

  • Video may be too large

  • Try a shorter video or reduce max_frames

"HTTP error 502/503"

  • Modal container is starting up (cold start)

  • Wait a few seconds and retry

"Video URL not accessible"

  • Ensure the URL is publicly accessible

  • Check for authentication requirements

Development

# Install dev dependencies pip install -e ".[dev]" # Run tests pytest

License

MIT

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adamanz/qwen-video-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server