Which integrations are available for this server?

Provides tools for describing images, extracting text, answering questions about images, and deleting files using NVIDIA vision models.

How do I use nvidia-vision-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@nvidia-vision-mcp describe the screenshot at /tmp/screenshot.png" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

nvidia-vision-mcp

by Juupeee

Overview Schema Related Servers Score Discussions

JavaScript

Local

NVIDIA Vision MCP

A small MCP server for reading local images with NVIDIA vision models.

This is useful when the AI model you are using cannot see images directly. A common case is browser debugging: Chrome DevTools can capture a screenshot, but the model still cannot inspect what is inside the image. This server gives the model a simple way to read that screenshot.

What It Does

Describes local images and screenshots
Extracts visible text from images
Answers specific questions about an image
Turns a UI screenshot into code, a prompt, a spec, or a description
OCRs screenshots optimized for code, terminal output, documents, or general text
Diagnoses error screenshots and proposes fixes
Interprets technical diagrams (architecture, flow, UML, ER, sequence, system)
Reads charts and dashboards to surface insights and trends
Compares two UI screenshots to flag visual drift
General-purpose image understanding as a fallback
Deletes temporary screenshot files after use

Related MCP server: vision-reader

Setup

Add the server to your MCP client config:

{
  "mcpServers": {
    "nvidia-vision": {
      "command": "npx",
      "args": ["-y", "nvidia-vision-mcp"],
      "env": {
        "NVIDIA_MODEL": "meta/llama-4-maverick-17b-128e-instruct",
        "NVIDIA_API_KEY": "your_nvidia_api_key"
      }
    }
  }
}

The API key is read from the MCP server environment. No .env file is needed.

NVIDIA_MODEL is optional. If it is not set, the server uses:

meta/llama-4-maverick-17b-128e-instruct

You can replace it with another NVIDIA-hosted vision-capable chat model when needed.

For local development from this folder:

{
  "mcpServers": {
    "nvidia-vision": {
      "command": "node",
      "args": ["/path/to/nvidia-vision/src/server.js"],
      "env": {
        "NVIDIA_MODEL": "meta/llama-4-maverick-17b-128e-instruct",
        "NVIDIA_API_KEY": "your_nvidia_api_key"
      }
    }
  }
}

Tools

describe_image

Describes what is visible in a local image.

extract_text_from_image

Extracts text from an image or screenshot. Useful for UI errors, terminal output, form labels, dialogs, and short documents.

analyze_image

Answers a custom question about an image. For example, you can ask where a button is, what color an element uses, or whether an error message is visible.

ui_to_artifact

Turns a UI screenshot into a reusable artifact. Choose artifact_type:

code — production-ready code recreating the UI (optionally set target, e.g. react + tailwind).
prompt — a text-to-UI prompt that reproduces the screenshot.
spec — a structured UI specification.
description — a written description for documentation.

extract_text_from_screenshot

OCR tuned for a specific kind of content: code, terminal, document, or general (default). Reproduces text verbatim with structure preserved.

diagnose_error_screenshot

Analyzes an error snapshot (stack trace, crash dialog, failed build, browser console). Extracts the error, explains it, finds the likely root cause, and lists ordered fix steps. Pass optional context for what was being attempted.

understand_technical_diagram

Interprets a technical diagram. Set diagram_type to architecture, flow, uml, er, sequence, system, or auto (default). Optionally ask a follow-up question.

analyze_data_visualization

Reads a chart, graph, or dashboard. Reports visualization type, axes/units, key values, trends, and insights. Optionally answer a specific question. Will not fabricate unreadable numbers.

ui_diff_check

Compares two UI screenshots (image_path_a / image_path_b) and flags visual or implementation drift, with per-difference severity and recommendations. Optionally focus on an aspect like spacing, colors, layout, or typography.

image_analysis

General-purpose image understanding when a more specific tool does not fit. Pass any freeform task instruction.

delete_file

Deletes a local file. This is mostly for cleaning up temporary screenshots.

Examples

Read text from a screenshot:

extract_text_from_image(image_path="/tmp/screenshot.png")

Ask about a specific part of the UI:

analyze_image(
  image_path="/tmp/screenshot.png",
  question="What does the primary button say, and where is it located?"
)

Describe a screenshot and remove it afterwards:

describe_image(image_path="/tmp/screenshot.png", cleanup=true)

Turn a UI screenshot into React + Tailwind code:

ui_to_artifact(
  image_path="/tmp/screenshot.png",
  artifact_type="code",
  target="react + tailwind"
)

OCR terminal output from a screenshot:

extract_text_from_screenshot(image_path="/tmp/terminal.png", kind="terminal")

Diagnose a build error screenshot with context:

diagnose_error_screenshot(
  image_path="/tmp/build-error.png",
  context="Running vite build on a React + TypeScript project"
)

Compare two versions of a UI:

ui_diff_check(
  image_path_a="/tmp/before.png",
  image_path_b="/tmp/after.png",
  focus="spacing"
)

Notes

This server intentionally stays narrow. It exists to help models inspect local screenshots when another tool can produce the image file but cannot explain what is inside it.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Vision MCP
Image & Video Processing Multimedia Processing
i-richardwang
A
license
A
quality
D
maintenance
Enables image analysis and understanding using Vision Language Models through OpenAI-compatible APIs. Supports analyzing images from URLs or local files with custom prompts.
Last updated 2025-12-26
1
2
MIT
vision-reader
Image & Video Processing Documentation Access
PNg-HA
A
license
-
quality
C
maintenance
Enables reading images (diagrams, screenshots) directly via the model's own vision, with no external API key needed, and can extract embedded images from .doc/MHTML documents.
Last updated 2026-06-04
MIT
image_mcp
Image & Video Processing AI & Machine Learning
karlcc
A
license
-
quality
C
maintenance
Enables text-only LLMs to analyze images by routing them to an OpenAI-compatible vision backend, supporting local files, URLs, and data URLs.
Last updated 2026-05-11
30
MIT
Vision MCP Server
Image & Video Processing AI & Machine Learning
Loveacup
A
license
A
quality
D
maintenance
Enables AI agents to analyze images, extract text, compare images, and analyze video through any OpenAI-compatible vision model.
Last updated 2026-02-09
4
66
13
MIT

View all related MCP servers

Related MCP Connectors

Cloudinary Asset Management
Upload, organize, search, and transform images, videos, and files with AI-powered tools.
Frenchie
OCR, transcription, file extraction, and image generation for AI agents via MCP.
huuthangntk-claude-vision-mcp-server
Analyze images from multiple angles to extract detailed insights or quick summaries. Describe visu…

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Juupeee/nvidia-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server