nvidia-vision-mcp
Provides tools for describing images, extracting text, answering questions about images, and deleting files using NVIDIA vision models.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@nvidia-vision-mcpdescribe the screenshot at /tmp/screenshot.png"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
NVIDIA Vision MCP
A small MCP server for reading local images with NVIDIA vision models.
This is useful when the AI model you are using cannot see images directly. A common case is browser debugging: Chrome DevTools can capture a screenshot, but the model still cannot inspect what is inside the image. This server gives the model a simple way to read that screenshot.
What It Does
Describes local images and screenshots
Extracts visible text from images
Answers specific questions about an image
Deletes temporary screenshot files after use
Related MCP server: AI Vision MCP Server
Setup
Add the server to your MCP client config:
{
"mcpServers": {
"nvidia-vision": {
"command": "npx",
"args": ["-y", "nvidia-vision-mcp"],
"env": {
"NVIDIA_MODEL": "meta/llama-4-maverick-17b-128e-instruct",
"NVIDIA_API_KEY": "your_nvidia_api_key"
}
}
}
}The API key is read from the MCP server environment. No .env file is needed.
NVIDIA_MODEL is optional. If it is not set, the server uses:
meta/llama-4-maverick-17b-128e-instructYou can replace it with another NVIDIA-hosted vision-capable chat model when needed.
For local development from this folder:
{
"mcpServers": {
"nvidia-vision": {
"command": "node",
"args": ["/path/to/nvidia-vision/src/server.js"],
"env": {
"NVIDIA_MODEL": "meta/llama-4-maverick-17b-128e-instruct",
"NVIDIA_API_KEY": "your_nvidia_api_key"
}
}
}
}Tools
describe_image
Describes what is visible in a local image.
extract_text_from_image
Extracts text from an image or screenshot. Useful for UI errors, terminal output, form labels, dialogs, and short documents.
analyze_image
Answers a custom question about an image. For example, you can ask where a button is, what color an element uses, or whether an error message is visible.
delete_file
Deletes a local file. This is mostly for cleaning up temporary screenshots.
Examples
Read text from a screenshot:
extract_text_from_image(image_path="/tmp/screenshot.png")Ask about a specific part of the UI:
analyze_image(
image_path="/tmp/screenshot.png",
question="What does the primary button say, and where is it located?"
)Describe a screenshot and remove it afterwards:
describe_image(image_path="/tmp/screenshot.png", cleanup=true)Notes
This server intentionally stays narrow. It exists to help models inspect local screenshots when another tool can produce the image file but cannot explain what is inside it.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Juupeee/nvidia-vision-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server