What can you do with this server?

The OpenRouter MCP Multimodal Server provides access to 300+ LLMs via OpenRouter with broad multimodal capabilities: * Text Chat: Converse with any OpenRouter model using chat_completion, with control over temperature and max tokens. * Image Analysis: Analyze images from local files, URLs, or data URIs using analyze_image with vision models. * Audio Analysis: Transcribe and analyze audio files (WAV, MP3, FLAC, OGG, etc.) via analyze_audio. * Video Analysis: Understand video content (mp4, mpeg, mov, webm) from files, URLs, or data URIs using analyze_video. * Image Generation: Create images from text prompts via generate_image, with optional disk save. * Audio Generation: Generate speech or music from text with generate_audio (auto-detects format). * Video Generation: Async video generation (Veo 3.1, Sora 2 Pro, Seedance, Wan) via generate_video with progress polling. * Video Job Management: Resume polling and download results for video jobs using get_video_status. * Model Search & Discovery: Filter models by name, provider, or capabilities (vision/audio/video) via search_models. * Model Info: Get pricing, context length, and capability details with get_model_info. * Model Validation: Verify a model ID exists on OpenRouter using validate_model. * Security & Error Handling: Includes SSRF mitigation, path sandboxing for file writes, and structured _meta.code errors for programmatic failure handling.

Which integrations are available for this server?

Offers deployment through Docker containers, with support for environment variables and seamless integration with MCP configurations. Enables use of Google's Gemini models through OpenRouter for text chat and multimodal conversations, with support for vision capabilities and model customization. Provides Node.js-based installation and execution options with NPX support for easy integration into MCP environments. Supports Python-based installation and execution using UV package manager for those preferring Python environments.

How do I use OpenRouter MCP Multimodal Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@OpenRouter MCP Multimodal Server analyze this product photo and suggest improvements for the lighting" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

de en es ja ko ru zh

OpenRouter MCP Multimodal Server

by stabgan

Overview Schema Related Servers Score Discussions

TypeScript

Hybrid

Access 300+ LLMs — Claude, Gemini, GPT, Llama, Qwen, Grok, and more — through OpenRouter via the Model Context Protocol. Analyze images, audio, and video. Generate images, speech, music, and video (Veo 3.1, Sora 2 Pro, Seedance, Wan). Chat with any model. Works with Claude Desktop, Cursor, Kiro, VS Code, Windsurf, Cline, and any MCP-compatible client. Every tool returns structured _meta.code errors so MCP clients can switch on failure modes without parsing strings.

One-Click Install

After clicking, the target client opens a confirmation prompt. You'll need to paste your OPENROUTER_API_KEY — the deeplink ships a placeholder so no secrets end up in shared links.

Related MCP server: VRChat MCP OSC

Why This One?

Feature	Status
Text chat with 300+ models	✅
Image analysis (vision)	✅ Native with sharp optimization
Audio analysis	✅ Transcription + analysis, base64 auto-encoded
Audio generation	✅ Conversational, speech, and music with format auto-detection
Image generation	✅ Path-sandboxed disk output
Video understanding	✅ v3 — mp4, mpeg, mov, webm from files, URLs, or data URLs
Video generation	✅ v3 — Veo 3.1 / Sora 2 Pro / Seedance / Wan via async API with progress notifications
Auto image resize + compress	✅ Configurable (defaults 800px max, JPEG 80%)
Model search + validation	✅ Filter by vision / audio / video modality
Free model support	✅ Default: free Nemotron VL
Docker support	✅ Multi-arch (amd64 + arm64), ~345 MB Alpine
Retry-After + jitter	✅ Honors `Retry-After` header, avoids thundering herd
IPv4 + IPv6 SSRF blocklist	✅ Covers mapped, compat, multicast, 6to4, Teredo, ORCHID
Structured error taxonomy	✅ Closed `_meta.code` so clients can switch on failure modes
Reasoning-model awareness	✅ Detects `max_tokens` cutoff during CoT, guides the caller
MCP 2025 tool annotations	✅ `readOnlyHint` / `destructiveHint` / `idempotentHint` on every tool

Tools

Tool	Description
`chat_completion`	Send messages to any OpenRouter model. Detects reasoning-model cutoffs.
`analyze_image`	Analyze images from local files, URLs, or data URIs. Auto-optimized with sharp.
`analyze_audio`	Analyze/transcribe audio (WAV, MP3, FLAC, OGG, etc.) from files, URLs, or data URIs.
`analyze_video`	Analyze/transcribe video (mp4, mpeg, mov, webm) from files, URLs, or data URIs.
`generate_image`	Generate images from text prompts. Supports `aspect_ratio` (14 values), `image_size` (0.5K–4K), and `max_tokens`. Optional path-sandboxed disk save.
`generate_audio`	Generate audio from text. Auto-detects format, wraps raw PCM in WAV.
`generate_video`	Generate video via OpenRouter's async API (Veo 3.1 / Sora 2 Pro / Seedance / Wan). Submits, polls, downloads, saves.
`get_video_status`	Resume polling a `generate_video` job by id. Download + save when complete.
`search_models`	Search/filter models by name, provider, or capabilities (vision / audio / video).
`get_model_info`	Get pricing, context length, and capabilities for any model.
`validate_model`	Check if a model ID exists on OpenRouter.

All error responses carry _meta.code from a closed taxonomy: INVALID_INPUT · UNSAFE_PATH · UPSTREAM_HTTP · UPSTREAM_TIMEOUT · UPSTREAM_REFUSED · UNSUPPORTED_FORMAT · RESOURCE_TOO_LARGE · ZDR_INCOMPATIBLE · MODEL_NOT_FOUND · JOB_FAILED · JOB_STILL_RUNNING · INTERNAL

Quick Start

Prerequisites

Get a free API key from openrouter.ai/keys.

Option 1: npx (no install)

{
  "mcpServers": {
    "openrouter": {
      "command": "npx",
      "args": ["-y", "@stabgan/openrouter-mcp-multimodal"],
      "env": {
        "OPENROUTER_API_KEY": "sk-or-v1-..."
      }
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "openrouter": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-e", "OPENROUTER_API_KEY=sk-or-v1-...",
        "stabgan/openrouter-mcp-multimodal:latest"
      ]
    }
  }
}

Option 3: Global install

npm install -g @stabgan/openrouter-mcp-multimodal

{
  "mcpServers": {
    "openrouter": {
      "command": "openrouter-multimodal",
      "env": { "OPENROUTER_API_KEY": "sk-or-v1-..." }
    }
  }
}

Option 4: Smithery

npx -y @smithery/cli install @stabgan/openrouter-mcp-multimodal --client claude

Configuration

Variable	Required	Default	Description
`OPENROUTER_API_KEY`	Yes	—	Your OpenRouter API key
`OPENROUTER_DEFAULT_MODEL`	No	`nvidia/nemotron-nano-12b-v2-vl:free`	Default model for chat + analyze tools
`DEFAULT_MODEL`	No	—	Alias for above
`OPENROUTER_MODEL_CACHE_TTL_MS`	No	`3600000`	Model cache TTL (ms)
`OPENROUTER_IMAGE_MAX_DIMENSION`	No	`800`	Longest edge for resize (px)
`OPENROUTER_IMAGE_JPEG_QUALITY`	No	`80`	JPEG quality (1–100)
`OPENROUTER_IMAGE_FETCH_TIMEOUT_MS`	No	`30000`	Image URL timeout
`OPENROUTER_IMAGE_MAX_DOWNLOAD_BYTES`	No	`26214400`	Image URL size cap (~25 MB)
`OPENROUTER_IMAGE_MAX_REDIRECTS`	No	`8`	Image URL redirect cap
`OPENROUTER_IMAGE_MAX_DATA_URL_BYTES`	No	`20971520`	Image data URL size cap (~20 MB)
`OPENROUTER_AUDIO_FETCH_TIMEOUT_MS`	No	`30000`	Audio URL timeout
`OPENROUTER_AUDIO_MAX_DOWNLOAD_BYTES`	No	`26214400`	Audio URL size cap (~25 MB)
`OPENROUTER_AUDIO_MAX_REDIRECTS`	No	`8`	Audio URL redirect cap
`OPENROUTER_AUDIO_MAX_DATA_URL_BYTES`	No	`20971520`	Audio data URL size cap
`OPENROUTER_DEFAULT_VIDEO_MODEL`	No	`google/gemini-2.5-flash`	Default for `analyze_video`
`OPENROUTER_DEFAULT_VIDEO_GEN_MODEL`	No	`google/veo-3.1`	Default for `generate_video`
`OPENROUTER_VIDEO_FETCH_TIMEOUT_MS`	No	`60000`	Video URL timeout
`OPENROUTER_VIDEO_MAX_DOWNLOAD_BYTES`	No	`104857600`	Video URL size cap (~100 MB)
`OPENROUTER_VIDEO_MAX_REDIRECTS`	No	`8`	Video URL redirect cap
`OPENROUTER_VIDEO_MAX_DATA_URL_BYTES`	No	`104857600`	Video data URL size cap
`OPENROUTER_VIDEO_POLL_INTERVAL_MS`	No	`15000`	Async video poll cadence
`OPENROUTER_VIDEO_MAX_WAIT_MS`	No	`600000`	Max wait before returning a resumable handle
`OPENROUTER_VIDEO_GEN_MAX_BYTES`	No	`268435456`	Generated video download cap (~256 MB)
`OPENROUTER_VIDEO_INLINE_MAX_BYTES`	No	`10485760`	Inline video ceiling (~10 MB)
`OPENROUTER_OUTPUT_DIR`	No	`process.cwd()`	Sandbox root for `save_path`
`OPENROUTER_ALLOW_UNSAFE_PATHS`	No	—	`1` disables the sandbox
`OPENROUTER_LOG_LEVEL`	No	`info`	`error` / `warn` / `info` / `debug`

Security notes

Analyze tools can read local files and fetch HTTP(S) URLs. URL fetches block private/link-local/reserved IPv4 and IPv6 targets (SSRF mitigation) and cap response size.
Generate tools write to disk through a path sandbox: save_path is resolved against OPENROUTER_OUTPUT_DIR and any traversal attempt is rejected. Override with OPENROUTER_ALLOW_UNSAFE_PATHS=1.
IPv6 SSRF blocklist covers loopback, unspecified, IPv4-mapped, IPv4-compatible, link-local, site-local, ULA, multicast, documentation, Teredo, ORCHID, and 6to4 of private IPv4.

Usage Examples

# Chat
Use chat_completion to explain quantum computing in simple terms.

# Vision
Use analyze_image on /path/to/photo.jpg and tell me what you see.

# Audio transcription
Use analyze_audio on /path/to/recording.mp3 to transcribe it.

# Video understanding
Use analyze_video on /path/to/clip.mp4 — what happens at 00:15?

# Generate audio
Use generate_audio with prompt "Explain neural networks" and voice "alloy", save to ./response.wav

# Generate music
Use generate_audio with model "google/lyria-3-clip-preview" and prompt "upbeat jazz piano trio"

# Generate image
Use generate_image with prompt "a cat astronaut on mars", aspect_ratio "16:9", image_size "1K", save to ./cat.png

# Generate video
Use generate_video with model "google/veo-3.1", prompt "a calm river at sunrise",
resolution 720p, duration 4, save to ./river.mp4

# Resume a video job
Use get_video_status with video_id "vid_abc123" and save_path "./river.mp4"

Architecture

src/
├── index.ts                    # Entry, env validation, graceful shutdown
├── tool-handlers.ts            # 11 tools (annotated) + dispatch
├── model-cache.ts              # TTL + in-flight coalescing
├── openrouter-api.ts           # REST client (chat + /videos)
├── errors.ts                   # Closed ErrorCode enum
├── logger.ts                   # JSON-line structured logger
└── tool-handlers/
    ├── fetch-utils.ts          # SSRF, bounded fetch, data-URL parser
    ├── openrouter-errors.ts    # SDK/HTTP → ErrorCode classifier
    ├── completion-utils.ts     # Reasoning-model cutoff detection
    ├── path-safety.ts          # save_path sandbox
    ├── chat-completion.ts      # Text + multimodal chat
    ├── analyze-image.ts        # Vision analysis
    ├── analyze-audio.ts        # Audio transcription
    ├── analyze-video.ts        # Video understanding
    ├── generate-image.ts       # Image generation
    ├── generate-audio.ts       # Audio generation + streaming
    ├── generate-video.ts       # Video generation (async)
    ├── image-utils.ts          # Sharp optimization, MIME sniffing
    ├── audio-utils.ts          # Audio format detection
    ├── video-utils.ts          # Video format detection
    ├── search-models.ts        # Model search
    ├── get-model-info.ts       # Model detail lookup
    └── validate-model.ts       # Model existence check

Development

git clone https://github.com/stabgan/openrouter-mcp-multimodal.git
cd openrouter-mcp-multimodal
npm install
cp .env.example .env  # Add your API key
npm run build
npm start

npm test                    # 163 unit tests, <1s
npm run test:integration    # Live API tests
npm run lint
node scripts/live-e2e.mjs  # 16 live E2E scenarios

Upgrading from v2

v3 is additive — no tool schemas or env vars were removed.

Three new tools: analyze_video, generate_video, get_video_status
Structured _meta.code on every error response (text messages preserved)
save_path sandboxed by default — set OPENROUTER_OUTPUT_DIR or OPENROUTER_ALLOW_UNSAFE_PATHS=1
Reasoning-model awareness: content: null + finish_reason: length now returns INVALID_INPUT with a preview instead of empty string
IPv6 SSRF coverage extended to mapped, compat, multicast, 6to4, Teredo, ORCHID

Compatibility

Works with any MCP client: Kiro · Claude Desktop · Cursor · Windsurf · Cline · any MCP-compatible client.

License

MIT

Contributing

Issues and PRs welcome. Please open an issue first for major changes.

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

21dResponse time

5dRelease cycle

10Releases (12mo)

Commit activity

Issues opened vs closed

Resources

Need Help?

Related Servers

Tools

View all tools

Appeared in Searches

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stabgan/openrouter-mcp-multimodal'

If you have feedback or need assistance with the MCP directory API, please join our Discord server