Skip to main content
Glama

DINO-X Image Detection MCP Server

DINO-X MCP

English | 中文

DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.

Why DINO-X MCP?

With DINO-X MCP, you can:

  • Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.
  • Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.
  • Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.

Transport Modes

DINO-X MCP supports two transport modes:

FeatureSTDIO (default)Streamable HTTP
RuntimeLocalLocal or Cloud
TransportStandard I/OHTTP (streaming responses)
Input sourcefile:// and https://https:// only
VisualizationSupported (saves annotated images locally)Not supported (for now)

Quick Start

1. Prepare an MCP client

Any MCP-compatible client works, e.g.:

2. Get your API key

Apply on the DINO-X platform: Request API Key (new users get free quota).

3. Configure MCP

Add to your MCP client config and replace with your API key:

{ "mcpServers": { "dinox-mcp": { "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key" } } }
Option B: Use the NPM package locally (STDIO)

Install Node.js first

  • Download the installer from nodejs.org
  • Or use command:
# macOS / Linux curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # or wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # load nvm into current shell (choose the one you use) source ~/.bashrc || true source ~/.zshrc || true # install and use LTS Node.js nvm install --lts nvm use --lts # Windows (one of the following) winget install OpenJS.NodeJS.LTS # or with Chocolatey (in admin PowerShell) iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex choco install nodejs-lts -y

Configure your MCP client:

{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": ["-y", "@deepdataspace/dinox-mcp"], "env": { "DINOX_API_KEY": "your-api-key-here", "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory" } } } }

Note: Replace your-api-key-here with your real key.

Option C: Run from source locally

Make sure Node.js is installed (see Option B), then:

# clone git clone https://github.com/IDEA-Research/DINO-X-MCP.git cd DINO-X-MCP # install deps npm install # build npm run build

Configure your MCP client:

{ "mcpServers": { "dinox-mcp": { "command": "node", "args": ["/path/to/DINO-X-MCP/build/index.js"], "env": { "DINOX_API_KEY": "your-api-key-here", "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory" } } } }

CLI Flags & Environment Variables

  • Common flags
    • --http: start in Streamable HTTP mode (otherwise STDIO by default)
    • --stdio: force STDIO mode
    • --dinox-api-key=...: set API key
    • --enable-client-key: allow API key via URL ?key= (Streamable HTTP only)
    • --port=8080: HTTP port (default 3020)
  • Environment variables
    • DINOX_API_KEY (required/conditionally required): DINO-X platform API key
    • IMAGE_STORAGE_DIRECTORY (optional, STDIO): directory to save annotated images
    • AUTH_TOKEN (optional, HTTP): if set, client must send Authorization: Bearer <token>

    Examples:

# STDIO (local) node build/index.js --dinox-api-key=your-api-key # Streamable HTTP (server provides a shared API key) node build/index.js --http --dinox-api-key=your-api-key # Streamable HTTP (custom port) node build/index.js --http --dinox-api-key=your-api-key --port=8080 # Streamable HTTP (require client-provided API key via URL) node build/index.js --http --enable-client-key

Client config when using ?key=:

{ "mcpServers": { "dinox-mcp": { "url": "http://localhost:3020/mcp?key=your-api-key" } } }

Using AUTH_TOKEN with a gateway that injects Authorization: Bearer <token>:

AUTH_TOKEN=my-token node build/index.js --http --enable-client-key

Client example with supergateway:

{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": [ "-y", "supergateway", "--streamableHttp", "http://localhost:3020/mcp?key=your-api-key", "--oauth2Bearer", "my-token" ] } } }

Tools

CapabilityTool IDTransportInputOutput
Full-scene object detectiondetect-all-objectsSTDIO / HTTPImage URLCategory + bbox + (optional) captions
Text-prompted object detectiondetect-objects-by-textSTDIO / HTTPImage URL + English nouns (dot-separated for multiple, e.g., person.car)Target object bbox + (optional) captions
Human pose estimationdetect-human-pose-keypointsSTDIO / HTTPImage URL17 keypoints + bbox + (optional) captions
Visualizationvisualize-detection-resultSTDIO onlyImage URL + detection results arrayLocal path to annotated image

🎬 Use Cases

🎯 Scenario📝 Input✨ Output
Detection & Localization💬 Prompt:Detect and visualize the fire areas in the forest 🖼️ Input Image:1-11-2
Object Counting💬 Prompt:Please analyze thiswarehouse image, detectall the cardboard boxes,count the total number🖼️ Input Image:2-1
Feature Detection💬 Prompt:Find all red carsin the image🖼️ Input Image:4-14-2
Attribute Reasoning💬 Prompt:Find the tallest personin the image, describetheir clothing🖼️ Input Image:5-15-2
Full Scene Detection💬 Prompt:Find the fruit withthe highest vitamin Ccontent in the image🖼️ Input Image:6-16-3Answer: Kiwi fruit (93mg/100g)
Pose Analysis💬 Prompt:Please analyze whatyoga pose this is🖼️ Input Image:3-13-3

FAQ

  • Supported image sources?
    • STDIO: file:// and https://
    • Streamable HTTP: https:// only
  • Supported image formats?
    • jpg, jpeg, webp, png

Development & Debugging

Use watch mode to auto-rebuild during development:

npm run watch

Use MCP Inspector for debugging:

npm run inspector

License

Apache License 2.0

Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

Empower LLMs with fine-grained visual understanding — detect, localize, and describe anything in images with natural language prompts.

  1. Why DINO-X MCP?
    1. Transport Modes
      1. Quick Start
        1. 1. Prepare an MCP client
        2. 2. Get your API key
        3. 3. Configure MCP
      2. CLI Flags & Environment Variables
        1. Tools
          1. 🎬 Use Cases
            1. FAQ
              1. Development & Debugging
                1. License

                  Related MCP Servers

                  • -
                    security
                    A
                    license
                    -
                    quality
                    A powerful server that integrates the Moondream vision model to enable advanced image analysis, including captioning, object detection, and visual question answering, through the Model Context Protocol, compatible with AI assistants like Claude and Cline.
                    Last updated -
                    18
                    JavaScript
                    Apache 2.0
                  • A
                    security
                    F
                    license
                    A
                    quality
                    Enables querying WolframAlpha's LLM API for natural language questions, providing structured and simplified answers optimized for LLM consumption.
                    Last updated -
                    3
                    36
                    TypeScript
                  • A
                    security
                    A
                    license
                    A
                    quality
                    Enhances LLM capabilities with location-based services and geospatial data, enabling users to geocode addresses, find nearby points of interest, get directions, optimize meeting points, and analyze neighborhoods.
                    Last updated -
                    12
                    97
                    Python
                    MIT License
                    • Apple
                  • -
                    security
                    F
                    license
                    -
                    quality
                    Intelligently analyzes codebases to enhance LLM prompts with relevant context, featuring adaptive context management and task detection to produce higher quality AI responses.
                    Last updated -
                    TypeScript

                  View all related MCP servers

                  MCP directory API

                  We provide all the information about MCP servers via our MCP API.

                  curl -X GET 'https://glama.ai/api/mcp/v1/servers/IDEA-Research/DINO-X-MCP'

                  If you have feedback or need assistance with the MCP directory API, please join our Discord server