OpenRouter MCP Multimodal Server
The OpenRouter MCP Multimodal Server provides access to 300+ LLMs via OpenRouter with broad multimodal capabilities:
Text Chat: Converse with any OpenRouter model using
chat_completion, with control over temperature and max tokens.Image Analysis: Analyze images from local files, URLs, or data URIs using
analyze_imagewith vision models.Audio Analysis: Transcribe and analyze audio files (WAV, MP3, FLAC, OGG, etc.) via
analyze_audio.Video Analysis: Understand video content (mp4, mpeg, mov, webm) from files, URLs, or data URIs using
analyze_video.Image Generation: Create images from text prompts via
generate_image, with optional disk save.Audio Generation: Generate speech or music from text with
generate_audio(auto-detects format).Video Generation: Async video generation (Veo 3.1, Sora 2 Pro, Seedance, Wan) via
generate_videowith progress polling.Video Job Management: Resume polling and download results for video jobs using
get_video_status.Model Search & Discovery: Filter models by name, provider, or capabilities (vision/audio/video) via
search_models.Model Info: Get pricing, context length, and capability details with
get_model_info.Model Validation: Verify a model ID exists on OpenRouter using
validate_model.Security & Error Handling: Includes SSRF mitigation, path sandboxing for file writes, and structured
_meta.codeerrors for programmatic failure handling.
Offers deployment through Docker containers, with support for environment variables and seamless integration with MCP configurations.
Enables use of Google's Gemini models through OpenRouter for text chat and multimodal conversations, with support for vision capabilities and model customization.
Provides Node.js-based installation and execution options with NPX support for easy integration into MCP environments.
Supports Python-based installation and execution using UV package manager for those preferring Python environments.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@OpenRouter MCP Multimodal Serveranalyze this product photo and suggest improvements for the lighting"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Access 300+ LLMs — Claude, Gemini, GPT, Llama, Qwen, Grok, and more — through OpenRouter via the Model Context Protocol. Analyze images, audio, and video. Generate images, speech, music, and video (Veo 3.1, Sora 2 Pro, Seedance, Wan). Chat with any model. Works with Claude Desktop, Cursor, Kiro, VS Code, Windsurf, Cline, and any MCP-compatible client. Every tool returns structured _meta.code errors so MCP clients can switch on failure modes without parsing strings.
One-Click Install
After clicking, the target client opens a confirmation prompt. You'll need to paste your
OPENROUTER_API_KEY— the deeplink ships a placeholder so no secrets end up in shared links.
Related MCP server: VRChat MCP OSC
Why This One?
Feature | Status |
Text chat with 300+ models | ✅ |
Image analysis (vision) | ✅ Native with sharp optimization |
Audio analysis | ✅ Transcription + analysis, base64 auto-encoded |
Audio generation | ✅ Conversational, speech, and music with format auto-detection |
Image generation | ✅ Path-sandboxed disk output |
Video understanding | ✅ v3 — mp4, mpeg, mov, webm from files, URLs, or data URLs |
Video generation | ✅ v3 — Veo 3.1 / Sora 2 Pro / Seedance / Wan via async API with progress notifications |
Auto image resize + compress | ✅ Configurable (defaults 800px max, JPEG 80%) |
Model search + validation | ✅ Filter by vision / audio / video modality |
Free model support | ✅ Default: free Nemotron VL |
Docker support | ✅ Multi-arch (amd64 + arm64), ~345 MB Alpine |
Retry-After + jitter | ✅ Honors |
IPv4 + IPv6 SSRF blocklist | ✅ Covers mapped, compat, multicast, 6to4, Teredo, ORCHID |
Structured error taxonomy | ✅ Closed |
Reasoning-model awareness | ✅ Detects |
MCP 2025 tool annotations | ✅ |
Tools
Tool | Description |
| Send messages to any OpenRouter model. Detects reasoning-model cutoffs. |
| Analyze images from local files, URLs, or data URIs. Auto-optimized with sharp. |
| Analyze/transcribe audio (WAV, MP3, FLAC, OGG, etc.) from files, URLs, or data URIs. |
| Analyze/transcribe video (mp4, mpeg, mov, webm) from files, URLs, or data URIs. |
| Generate images from text prompts. Supports |
| Generate audio from text. Auto-detects format, wraps raw PCM in WAV. |
| Generate video via OpenRouter's async API (Veo 3.1 / Sora 2 Pro / Seedance / Wan). Submits, polls, downloads, saves. |
| Resume polling a |
| Search/filter models by name, provider, or capabilities (vision / audio / video). |
| Get pricing, context length, and capabilities for any model. |
| Check if a model ID exists on OpenRouter. |
All error responses carry
_meta.codefrom a closed taxonomy:INVALID_INPUT·UNSAFE_PATH·UPSTREAM_HTTP·UPSTREAM_TIMEOUT·UPSTREAM_REFUSED·UNSUPPORTED_FORMAT·RESOURCE_TOO_LARGE·ZDR_INCOMPATIBLE·MODEL_NOT_FOUND·JOB_FAILED·JOB_STILL_RUNNING·INTERNAL
Quick Start
Prerequisites
Get a free API key from openrouter.ai/keys.
Option 1: npx (no install)
{
"mcpServers": {
"openrouter": {
"command": "npx",
"args": ["-y", "@stabgan/openrouter-mcp-multimodal"],
"env": {
"OPENROUTER_API_KEY": "sk-or-v1-..."
}
}
}
}Option 2: Docker
{
"mcpServers": {
"openrouter": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"-e", "OPENROUTER_API_KEY=sk-or-v1-...",
"stabgan/openrouter-mcp-multimodal:latest"
]
}
}
}Option 3: Global install
npm install -g @stabgan/openrouter-mcp-multimodal{
"mcpServers": {
"openrouter": {
"command": "openrouter-multimodal",
"env": { "OPENROUTER_API_KEY": "sk-or-v1-..." }
}
}
}Option 4: Smithery
npx -y @smithery/cli install @stabgan/openrouter-mcp-multimodal --client claudeConfiguration
Variable | Required | Default | Description |
| Yes | — | Your OpenRouter API key |
| No |
| Default model for chat + analyze tools |
| No | — | Alias for above |
| No |
| Model cache TTL (ms) |
| No |
| Longest edge for resize (px) |
| No |
| JPEG quality (1–100) |
| No |
| Image URL timeout |
| No |
| Image URL size cap (~25 MB) |
| No |
| Image URL redirect cap |
| No |
| Image data URL size cap (~20 MB) |
| No |
| Audio URL timeout |
| No |
| Audio URL size cap (~25 MB) |
| No |
| Audio URL redirect cap |
| No |
| Audio data URL size cap |
| No |
| Default for |
| No |
| Default for |
| No |
| Video URL timeout |
| No |
| Video URL size cap (~100 MB) |
| No |
| Video URL redirect cap |
| No |
| Video data URL size cap |
| No |
| Async video poll cadence |
| No |
| Max wait before returning a resumable handle |
| No |
| Generated video download cap (~256 MB) |
| No |
| Inline video ceiling (~10 MB) |
| No |
| Sandbox root for |
| No | — |
|
| No |
|
|
Security notes
Analyze tools can read local files and fetch HTTP(S) URLs. URL fetches block private/link-local/reserved IPv4 and IPv6 targets (SSRF mitigation) and cap response size.
Generate tools write to disk through a path sandbox:
save_pathis resolved againstOPENROUTER_OUTPUT_DIRand any traversal attempt is rejected. Override withOPENROUTER_ALLOW_UNSAFE_PATHS=1.IPv6 SSRF blocklist covers loopback, unspecified, IPv4-mapped, IPv4-compatible, link-local, site-local, ULA, multicast, documentation, Teredo, ORCHID, and 6to4 of private IPv4.
Usage Examples
# Chat
Use chat_completion to explain quantum computing in simple terms.
# Vision
Use analyze_image on /path/to/photo.jpg and tell me what you see.
# Audio transcription
Use analyze_audio on /path/to/recording.mp3 to transcribe it.
# Video understanding
Use analyze_video on /path/to/clip.mp4 — what happens at 00:15?
# Generate audio
Use generate_audio with prompt "Explain neural networks" and voice "alloy", save to ./response.wav
# Generate music
Use generate_audio with model "google/lyria-3-clip-preview" and prompt "upbeat jazz piano trio"
# Generate image
Use generate_image with prompt "a cat astronaut on mars", aspect_ratio "16:9", image_size "1K", save to ./cat.png
# Generate video
Use generate_video with model "google/veo-3.1", prompt "a calm river at sunrise",
resolution 720p, duration 4, save to ./river.mp4
# Resume a video job
Use get_video_status with video_id "vid_abc123" and save_path "./river.mp4"Architecture
src/
├── index.ts # Entry, env validation, graceful shutdown
├── tool-handlers.ts # 11 tools (annotated) + dispatch
├── model-cache.ts # TTL + in-flight coalescing
├── openrouter-api.ts # REST client (chat + /videos)
├── errors.ts # Closed ErrorCode enum
├── logger.ts # JSON-line structured logger
└── tool-handlers/
├── fetch-utils.ts # SSRF, bounded fetch, data-URL parser
├── openrouter-errors.ts # SDK/HTTP → ErrorCode classifier
├── completion-utils.ts # Reasoning-model cutoff detection
├── path-safety.ts # save_path sandbox
├── chat-completion.ts # Text + multimodal chat
├── analyze-image.ts # Vision analysis
├── analyze-audio.ts # Audio transcription
├── analyze-video.ts # Video understanding
├── generate-image.ts # Image generation
├── generate-audio.ts # Audio generation + streaming
├── generate-video.ts # Video generation (async)
├── image-utils.ts # Sharp optimization, MIME sniffing
├── audio-utils.ts # Audio format detection
├── video-utils.ts # Video format detection
├── search-models.ts # Model search
├── get-model-info.ts # Model detail lookup
└── validate-model.ts # Model existence checkDevelopment
git clone https://github.com/stabgan/openrouter-mcp-multimodal.git
cd openrouter-mcp-multimodal
npm install
cp .env.example .env # Add your API key
npm run build
npm startnpm test # 163 unit tests, <1s
npm run test:integration # Live API tests
npm run lint
node scripts/live-e2e.mjs # 16 live E2E scenariosUpgrading from v2
v3 is additive — no tool schemas or env vars were removed.
Three new tools:
analyze_video,generate_video,get_video_statusStructured
_meta.codeon every error response (text messages preserved)save_pathsandboxed by default — setOPENROUTER_OUTPUT_DIRorOPENROUTER_ALLOW_UNSAFE_PATHS=1Reasoning-model awareness:
content: null+finish_reason: lengthnow returnsINVALID_INPUTwith a preview instead of empty stringIPv6 SSRF coverage extended to mapped, compat, multicast, 6to4, Teredo, ORCHID
Compatibility
Works with any MCP client: Kiro · Claude Desktop · Cursor · Windsurf · Cline · any MCP-compatible client.
License
MIT
Contributing
Issues and PRs welcome. Please open an issue first for major changes.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/stabgan/openrouter-mcp-multimodal'
If you have feedback or need assistance with the MCP directory API, please join our Discord server