MiMo Multimodal Understanding MCP Server
This server provides AI-powered multimodal understanding of images, audio, and video using the Xiaomi MiMo API. It exposes three tools:
Image Understanding (
understand_image): Analyze single or multiple images via URL, local path, or base64. Supports JPEG, PNG, GIF, WebP (up to 10MB). Configurable prompt, system prompt, and max output tokens.Audio Understanding (
understand_audio): Transcribe or analyze single or multiple audio files via URL or local path. Supports MP3, WAV, FLAC, M4A, OGG (URL: up to 100MB, Base64: up to 50MB). Configurable prompt, system prompt, and max output tokens.Video Understanding (
understand_video): Analyze single or multiple video files via URL or local path. Supports MP4, MOV, AVI, WMV (URL: up to 300MB, Base64: up to 50MB). Configurable frames-per-second (range: 0.1–10, default: 2), media resolution (defaultormax), system prompt, and max output tokens.
Provides tools for analyzing images, audio, and video using Xiaomi MiMo's multimodal understanding API, supporting multiple input formats and customization options like fps and resolution for video.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MiMo Multimodal Understanding MCP ServerWhat's in this image? https://example.com/photo.jpg"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MiMo Multimodal Understanding MCP Server
MCP server for Xiaomi MiMo multimodal understanding API (image, audio, video).
Features
Image Understanding: Single/multiple images, URL and local file support
Audio Understanding: Single/multiple audio, URL and local file support
Video Understanding: Single/multiple video, URL and local file support, configurable fps and resolution
Related MCP server: Vision MCP
Setup
1. Install dependencies
uv sync2. Configure API Key
Copy .env.example to .env and fill in your API key:
cp .env.example .envOr set environment variable directly:
export MIMO_API_KEY=your_api_key_hereGet your API key from: https://platform.xiaomimimo.com
3. (Optional) Configure API Base URL
The default API endpoint is determined by your API key prefix:
Key Prefix | Default Endpoint |
|
|
|
|
To use a different API endpoint:
export MIMO_API_BASE=https://your-custom-endpoint/v1Or add it to your .env file:
MIMO_API_BASE=https://your-custom-endpoint/v1Usage
Quick Start (with uvx)
export MIMO_API_KEY=your_api_key_here
uvx mimo-multimodal-mcpDevelopment mode (with MCP Inspector)
uv run mcp dev src/mimo_multimodal_mcp/server.pyInstall to Claude Desktop
uv run mcp install src/mimo_multimodal_mcp/server.pyDirect execution
uv run python src/mimo_multimodal_mcp/server.pyClaude Desktop Configuration
Add to ~/.config/claude/claude_desktop_config.json:
{
"mcpServers": {
"mimo-multimodal": {
"command": "uvx",
"args": ["mimo-multimodal-mcp"],
"env": {
"MIMO_API_KEY": "your_api_key_here"
}
}
}
}Tools
understand_image
Analyze images using Xiaomi MiMo multimodal model.
Parameter | Type | Required | Description |
| string | Yes | Image understanding task description |
| string | No | Single image URL or data:image base64 |
| string | No | Single local image file path |
| list[string] | No | Multiple image URLs |
| list[string] | No | Multiple local image file paths |
| string | No | Custom system prompt |
| integer | No | Max output length (default: 8192, max: 32768) |
Supported formats: JPEG, PNG, GIF, WebP Size limit: 10MB
understand_audio
Analyze audio using Xiaomi MiMo multimodal model.
Parameter | Type | Required | Description |
| string | Yes | Audio understanding task description |
| string | No | Single audio URL |
| string | No | Single local audio file path |
| list[string] | No | Multiple audio URLs |
| list[string] | No | Multiple local audio file paths |
| string | No | Custom system prompt |
| integer | No | Max output length (default: 8192, max: 32768) |
Supported formats: MP3, WAV, FLAC, M4A, OGG Size limit: URL 100MB, Base64 50MB
understand_video
Analyze video using Xiaomi MiMo multimodal model.
Parameter | Type | Required | Description |
| string | Yes | Video understanding task description |
| string | No | Single video URL |
| string | No | Single local video file path |
| list[string] | No | Multiple video URLs |
| list[string] | No | Multiple local video file paths |
| float | No | Frames per second, range [0.1, 10], default: 2 |
| string | No | Resolution: "default" or "max" |
| string | No | Custom system prompt |
| integer | No | Max output length (default: 8192, max: 32768) |
Supported formats: MP4, MOV, AVI, WMV Size limit: URL 300MB, Base64 50MB
Examples
Image Understanding
# URL
await understand_image(prompt="Describe this image", image_url="https://example.com/image.jpg")
# Local file
await understand_image(prompt="What text is in this?", image_path="/path/to/screenshot.png")
# Multiple images
await understand_image(prompt="Compare these", image_urls=["url1", "url2"])Audio Understanding
# URL
await understand_audio(prompt="Transcribe this audio", audio_url="https://example.com/audio.wav")
# Local file
await understand_audio(prompt="What is being said?", audio_path="/path/to/audio.mp3")Video Understanding
# URL with default settings
await understand_video(prompt="Describe this video", video_url="https://example.com/video.mp4")
# URL with custom fps and resolution
await understand_video(
prompt="Describe the action",
video_url="https://example.com/video.mp4",
fps=5.0,
media_resolution="max"
)Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ChanthMiao/MiMo-Multimodal-Understanding-MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server