Skip to main content
Glama
adamanz

Qwen Video Understanding MCP Server

by adamanz

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
MODAL_APPYesName of the Modal appqwen-video-understanding
MODAL_WORKSPACEYesYour Modal workspace/usernameadam-31541
QWEN_IMAGE_ENDPOINTNoOverride image endpoint URL (auto-generated if not provided)
QWEN_VIDEO_ENDPOINTNoOverride video endpoint URL (auto-generated if not provided)

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
analyze_videoA

Analyze a video using Qwen3-VL vision-language model.

The video must be accessible via a public URL. The model will:

  1. Download the video

  2. Extract key frames (up to max_frames)

  3. Analyze the frames with your question

  4. Provide timestamp-grounded responses when applicable

Examples:

  • "What happens in this video?"

  • "Summarize the main events with timestamps"

  • "What products are shown?"

  • "At what timestamp does the speaker mention X?"

  • "What is being discussed or demonstrated?"

analyze_imageA

Analyze an image using Qwen2.5-VL vision-language model.

The image must be accessible via a public URL.

Examples:

  • "What's in this image?"

  • "Describe the scene"

  • "What text is visible?"

  • "Identify any people or objects"

  • "What is the mood or atmosphere?"

summarize_videoC

Generate a summary of a video.

Styles:

  • brief: 1-2 sentence overview

  • standard: 1-2 paragraph summary with key points

  • detailed: Comprehensive analysis with timeline

extract_video_textA

Extract and transcribe any visible text or speech from a video.

Useful for:

  • Reading on-screen text, titles, captions

  • Transcribing spoken content

  • Extracting text from presentations or documents shown in video

video_qaA

Ask a specific question about a video's content.

Examples:

  • "How many people appear in this video?"

  • "What color is the car?"

  • "What is the speaker's main argument?"

  • "What products are being demonstrated?"

  • "At what point does the action begin?"

compare_video_framesB

Analyze changes and progression across a video.

Useful for:

  • Before/after comparisons

  • Tracking movement or changes

  • Understanding progression of events

  • Analyzing tutorials or how-to videos

check_endpoint_statusB

Check the configuration and status of the Modal endpoints.

Returns the configured endpoint URLs and connection status.

list_capabilitiesB

List the capabilities of this video understanding server.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription
get_server_infoGet information about this MCP server's capabilities.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adamanz/qwen-video-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server