Skip to main content
Glama

NVIDIA Vision MCP

A small MCP server for reading local images with NVIDIA vision models.

This is useful when the AI model you are using cannot see images directly. A common case is browser debugging: Chrome DevTools can capture a screenshot, but the model still cannot inspect what is inside the image. This server gives the model a simple way to read that screenshot.

What It Does

  • Describes local images and screenshots

  • Extracts visible text from images

  • Answers specific questions about an image

  • Deletes temporary screenshot files after use

Related MCP server: AI Vision MCP Server

Setup

Add the server to your MCP client config:

{
  "mcpServers": {
    "nvidia-vision": {
      "command": "npx",
      "args": ["-y", "nvidia-vision-mcp"],
      "env": {
        "NVIDIA_MODEL": "meta/llama-4-maverick-17b-128e-instruct",
        "NVIDIA_API_KEY": "your_nvidia_api_key"
      }
    }
  }
}

The API key is read from the MCP server environment. No .env file is needed.

NVIDIA_MODEL is optional. If it is not set, the server uses:

meta/llama-4-maverick-17b-128e-instruct

You can replace it with another NVIDIA-hosted vision-capable chat model when needed.

For local development from this folder:

{
  "mcpServers": {
    "nvidia-vision": {
      "command": "node",
      "args": ["/path/to/nvidia-vision/src/server.js"],
      "env": {
        "NVIDIA_MODEL": "meta/llama-4-maverick-17b-128e-instruct",
        "NVIDIA_API_KEY": "your_nvidia_api_key"
      }
    }
  }
}

Tools

describe_image

Describes what is visible in a local image.

extract_text_from_image

Extracts text from an image or screenshot. Useful for UI errors, terminal output, form labels, dialogs, and short documents.

analyze_image

Answers a custom question about an image. For example, you can ask where a button is, what color an element uses, or whether an error message is visible.

delete_file

Deletes a local file. This is mostly for cleaning up temporary screenshots.

Examples

Read text from a screenshot:

extract_text_from_image(image_path="/tmp/screenshot.png")

Ask about a specific part of the UI:

analyze_image(
  image_path="/tmp/screenshot.png",
  question="What does the primary button say, and where is it located?"
)

Describe a screenshot and remove it afterwards:

describe_image(image_path="/tmp/screenshot.png", cleanup=true)

Notes

This server intentionally stays narrow. It exists to help models inspect local screenshots when another tool can produce the image file but cannot explain what is inside it.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Juupeee/nvidia-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server