videre-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@videre-mcpanalyze this screenshot for UI elements"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
videre-mcp
MCP server that bridges vision models to text-only coding models using Florence-2.
Non-vision LLMs can't see images — videre-mcp fixes that. It loads a Florence-2 vision model locally and exposes four MCP tools that convert images (including SVGs) and screenshots into structured text descriptions that any text-based model can consume.
Screenshot tool → videre-mcp (Florence-2) → Text description → Coding modelInstallation
pip install videre-mcpOr with uv:
uv pip install videre-mcpRequires Python 3.11+ and ~300MB disk space for the Florence-2-base model weights (downloaded automatically on first use).
Related MCP server: vision-mcp
Usage
Add to your OpenCode configuration:
{
"mcpServers": {
"videre-mcp": {
"command": "videre-mcp"
}
}
}Or run directly:
videre-mcp
# or
python -m videre_mcpTools
describe_image
Generate a natural language description of an image.
Parameters:
image_path(str) — Path to the image file (supports PNG, JPEG, SVG)detail_level(str, optional) —"normal"(default) for brief caption,"high"for detailed description
Example:
result = describe_image("/path/to/photo.png", detail_level="high")
# Returns:
# {
# "description": "A sunlit meadow with wildflowers in bloom...",
# "model": "Florence-2-base",
# "prompt_used": "<MORE_DETAILED_CAPTION>"
# }ocr_image
Extract text from an image using optical character recognition.
Parameters:
image_path(str) — Path to the image file (supports PNG, JPEG, SVG)detail_level(str, optional) —"normal"(default) for plain text,"high"for text with bounding regions
Example:
result = ocr_image("/path/to/document.png", detail_level="high")
# Returns:
# {
# "text": "Invoice Number 12345",
# "regions": [
# {"label": "Invoice Number 12345", "bbox": [10, 20, 30, 40, 50, 60, 70, 80]}
# ]
# }describe_screenshot
Describe UI regions in a screenshot — designed for coding agents that need to understand screen layouts.
Parameters:
image_path(str) — Path to the screenshot file (supports PNG, JPEG, SVG)detail_level(str, optional) —"normal"(default) for dense region captions,"high"for per-region descriptions
Example:
result = describe_screenshot("/path/to/screenshot.png")
# Returns:
# {
# "regions": [
# {"bbox": [10, 20, 30, 40], "label": "search bar"},
# {"bbox": [100, 200, 300, 250], "label": "submit button"}
# ],
# "model": "Florence-2-base"
# }take_screenshot
Capture a screenshot and optionally describe it using Florence-2. Supports multi-monitor setups via the monitor parameter.
Parameters:
output_path(str, optional) — Path to save the screenshot PNG. IfNone, saves to a temp file.monitor(int, optional) — Monitor index:0= all monitors combined,1= primary, etc. (default:0)describe(bool, optional) — IfTrue, also rundescribe_screenshoton the captured image (default:True)
Example:
result = take_screenshot(monitor=1, describe=True)
# Returns:
# {
# "path": "/tmp/tmpxxxxxx.png",
# "width": 1920,
# "height": 1080,
# "monitor": 1,
# "regions": [
# {"label": "search bar", "bbox": [10, 20, 30, 40]},
# ...
# ]
# }Supported Formats
Format | Support |
PNG | ✅ Native |
JPEG | ✅ Native |
SVG | ✅ Via cairosvg rasterization |
Requirements
Python 3.11+
~300MB disk for model weights (auto-downloaded on first inference)
Works on CPU; GPU (CUDA) is auto-detected and used if available
Continuous Integration
The Florence-2 slow tests (real model load + inference) run on a nightly
schedule via GitHub Actions. See .github/workflows/slow-tests.yml.
License
MIT — see LICENSE.
Third-party licenses
This package vendors a patched copy of Microsoft's Florence-2 processor
(src/videre_mcp/_vendor/processing_florence2.py) under Microsoft's MIT license.
See src/videre_mcp/_vendor/LICENSE-Microsoft-Florence-2.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Veedubin/Videre-MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server