Skip to main content
Glama
ChanthMiao

MiMo Multimodal Understanding MCP Server

by ChanthMiao

MiMo Multimodal Understanding MCP Server

MCP server for Xiaomi MiMo multimodal understanding API (image, audio, video).

Features

  • Image Understanding: Single/multiple images, URL and local file support

  • Audio Understanding: Single/multiple audio, URL and local file support

  • Video Understanding: Single/multiple video, URL and local file support, configurable fps and resolution

Related MCP server: Vision MCP

Setup

1. Install dependencies

uv sync

2. Configure API Key

Copy .env.example to .env and fill in your API key:

cp .env.example .env

Or set environment variable directly:

export MIMO_API_KEY=your_api_key_here

Get your API key from: https://platform.xiaomimimo.com

3. (Optional) Configure API Base URL

The default API endpoint is determined by your API key prefix:

Key Prefix

Default Endpoint

tp-*

https://token-plan-cn.xiaomimimo.com/v1

sk-* or others

https://api.xiaomimimo.com/v1

To use a different API endpoint:

export MIMO_API_BASE=https://your-custom-endpoint/v1

Or add it to your .env file:

MIMO_API_BASE=https://your-custom-endpoint/v1

Usage

Quick Start (with uvx)

export MIMO_API_KEY=your_api_key_here
uvx mimo-multimodal-mcp

Development mode (with MCP Inspector)

uv run mcp dev src/mimo_multimodal_mcp/server.py

Install to Claude Desktop

uv run mcp install src/mimo_multimodal_mcp/server.py

Direct execution

uv run python src/mimo_multimodal_mcp/server.py

Claude Desktop Configuration

Add to ~/.config/claude/claude_desktop_config.json:

{
  "mcpServers": {
    "mimo-multimodal": {
      "command": "uvx",
      "args": ["mimo-multimodal-mcp"],
      "env": {
        "MIMO_API_KEY": "your_api_key_here"
      }
    }
  }
}

Tools

understand_image

Analyze images using Xiaomi MiMo multimodal model.

Parameter

Type

Required

Description

prompt

string

Yes

Image understanding task description

image_url

string

No

Single image URL or data:image base64

image_path

string

No

Single local image file path

image_urls

list[string]

No

Multiple image URLs

image_paths

list[string]

No

Multiple local image file paths

system_prompt

string

No

Custom system prompt

max_tokens

integer

No

Max output length (default: 8192, max: 32768)

Supported formats: JPEG, PNG, GIF, WebP Size limit: 10MB

understand_audio

Analyze audio using Xiaomi MiMo multimodal model.

Parameter

Type

Required

Description

prompt

string

Yes

Audio understanding task description

audio_url

string

No

Single audio URL

audio_path

string

No

Single local audio file path

audio_urls

list[string]

No

Multiple audio URLs

audio_paths

list[string]

No

Multiple local audio file paths

system_prompt

string

No

Custom system prompt

max_tokens

integer

No

Max output length (default: 8192, max: 32768)

Supported formats: MP3, WAV, FLAC, M4A, OGG Size limit: URL 100MB, Base64 50MB

understand_video

Analyze video using Xiaomi MiMo multimodal model.

Parameter

Type

Required

Description

prompt

string

Yes

Video understanding task description

video_url

string

No

Single video URL

video_path

string

No

Single local video file path

video_urls

list[string]

No

Multiple video URLs

video_paths

list[string]

No

Multiple local video file paths

fps

float

No

Frames per second, range [0.1, 10], default: 2

media_resolution

string

No

Resolution: "default" or "max"

system_prompt

string

No

Custom system prompt

max_tokens

integer

No

Max output length (default: 8192, max: 32768)

Supported formats: MP4, MOV, AVI, WMV Size limit: URL 300MB, Base64 50MB

Examples

Image Understanding

# URL
await understand_image(prompt="Describe this image", image_url="https://example.com/image.jpg")

# Local file
await understand_image(prompt="What text is in this?", image_path="/path/to/screenshot.png")

# Multiple images
await understand_image(prompt="Compare these", image_urls=["url1", "url2"])

Audio Understanding

# URL
await understand_audio(prompt="Transcribe this audio", audio_url="https://example.com/audio.wav")

# Local file
await understand_audio(prompt="What is being said?", audio_path="/path/to/audio.mp3")

Video Understanding

# URL with default settings
await understand_video(prompt="Describe this video", video_url="https://example.com/video.mp4")

# URL with custom fps and resolution
await understand_video(
    prompt="Describe the action",
    video_url="https://example.com/video.mp4",
    fps=5.0,
    media_resolution="max"
)
Install Server
F
license - not found
A
quality
A
maintenance

Maintenance

Maintainers
Response time
0dRelease cycle
4Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ChanthMiao/MiMo-Multimodal-Understanding-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server