Schema | Qwen Video Understanding MCP Server

Qwen Video Understanding MCP Server

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`MODAL_APP`	Yes	Name of the Modal app	qwen-video-understanding
`MODAL_WORKSPACE`	Yes	Your Modal workspace/username	adam-31541
`QWEN_IMAGE_ENDPOINT`	No	Override image endpoint URL (auto-generated if not provided)
`QWEN_VIDEO_ENDPOINT`	No	Override video endpoint URL (auto-generated if not provided)

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
analyze_videoA	Analyze a video using Qwen3-VL vision-language model. The video must be accessible via a public URL. The model will: Download the video Extract key frames (up to max_frames) Analyze the frames with your question Provide timestamp-grounded responses when applicable Examples: "What happens in this video?" "Summarize the main events with timestamps" "What products are shown?" "At what timestamp does the speaker mention X?" "What is being discussed or demonstrated?"
analyze_imageA	Analyze an image using Qwen2.5-VL vision-language model. The image must be accessible via a public URL. Examples: "What's in this image?" "Describe the scene" "What text is visible?" "Identify any people or objects" "What is the mood or atmosphere?"
summarize_videoC	Generate a summary of a video. Styles: brief: 1-2 sentence overview standard: 1-2 paragraph summary with key points detailed: Comprehensive analysis with timeline
extract_video_textA	Extract and transcribe any visible text or speech from a video. Useful for: Reading on-screen text, titles, captions Transcribing spoken content Extracting text from presentations or documents shown in video
video_qaA	Ask a specific question about a video's content. Examples: "How many people appear in this video?" "What color is the car?" "What is the speaker's main argument?" "What products are being demonstrated?" "At what point does the action begin?"
compare_video_framesB	Analyze changes and progression across a video. Useful for: Before/after comparisons Tracking movement or changes Understanding progression of events Analyzing tutorials or how-to videos
check_endpoint_statusB	Check the configuration and status of the Modal endpoints. Returns the configured endpoint URLs and connection status.
list_capabilitiesB	List the capabilities of this video understanding server.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
`get_server_info`	Get information about this MCP server's capabilities.

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adamanz/qwen-video-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server