npu-vision-fallback
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@npu-vision-fallbackRead the screen at [0,0,1280,800] and locate the 'Start Game' button."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
๐ npu-vision-fallback
Local low-power vision for desktop AI agents
When accessibility APIs fail โ NPU-first, zero GPU wake-up, 100% local
English
What is this?
A lightweight, local-first vision service for desktop agents that need to see and interact with screens where traditional accessibility APIs fall shortโgames, remote desktops, canvas apps, and more.
Built for efficiency: Native OS OCR ยท Intel NPU acceleration ยท Zero cloud calls ยท Battery-friendly by design

โจ Why Use This?
Desktop agents face a challenge: how to perceive UI when the accessibility tree is empty?
Common Approach | The Problem |
๐ค Multimodal LLM screenshots | Expensive tokens, slow round-trips, coordinate hallucination |
๐ณ OS Accessibility APIs only | Blind to games, canvas apps, remote desktops, emulators |
๐ฅ Heavy GPU OCR (PaddleOCR) | Big dependencies, high power draw, wakes discrete GPU |
npu-vision-fallback is your fallback layer โ when the accessibility tree comes back empty, this gives your agent a small, fast, local vision service that doesn't touch the cloud or spin up the dGPU.
Perfect for:
๐ฎ Game UIs and emulators
๐ฅ๏ธ Remote desktop / VNC clients (no remote accessibility tree)
๐จ Canvas / WASM web apps rendering outside the DOM
๐ป Local SLMs that can't afford multimodal screenshot tokens
๐ Quick Start
1. Install (Windows + Intel NPU recommended)
pip install "npu-vision-fallback[ocr-win,detect]"
python scripts/download_ui_model.py # One-time setup2. Configure Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"npu-vision-fallback": {
"command": "npu-vision-fallback"
}
}
}3. Use it
Restart Claude Desktop and try:
You: The accessibility tree for this game is empty. Can you read the screen at coordinates [0,0,1280,800] and find the "Start Game" button?
Claude: (calls
analyze_screen) I found a button labeled "Start Game" at [520, 580, 720, 640]. Want me to click its center at (620, 610)?
๐ฆ Installation Options
Windows (Recommended)
Native OCR + NPU UI detection (~85 MB total):
pip install "npu-vision-fallback[ocr-win,detect]"
python scripts/download_ui_model.pyLinux / macOS
Cross-platform OCR + CPU detection (~130 MB):
pip install "npu-vision-fallback[ocr-rapid,detect]"
python scripts/download_ui_model.pyFull (All Backends)
For development or testing all backends:
pip install "npu-vision-fallback[all]"
python scripts/download_ui_model.pyMinimal Core
Just the MCP server (no OCR/detection, ~20 MB):
pip install npu-vision-fallback๐ก Note: The
detectextra uses OpenVINO (~80 MB) for runtime, not PyTorch. Model conversion requires thedev-convertextra (~2 GB), but that's a one-time setup most users skip.
๐ฏ Key Features
๐ NPU-first architecture โ UI detection runs on Intel AI Boost at ~80ms per call (~0.3J energy)
โก Zero dGPU wake-up โ Default paths use NPU, system OCR, or CPUโlaptop battery stays happy
๐ Native OS OCR โ Uses Windows OCR engine (macOS Vision planned) for quality
๐งฉ MCP protocol โ Works with Claude Desktop, Cursor, or any MCP client out of the box
๐ชถ Lightweight โ No PyTorch/TensorFlow at runtime; all heavy deps are optional
๐ก๏ธ Privacy-first โ 100% local processing, no telemetry, no cloud
โก Performance
Measured on Intel Core Ultra 9 275HX (2560ร1600 screen, on battery):
Task | Backend | Latency | Energy | Notes |
OCR | WinOCR | ~1100ms | 2.5J | Native Windows API (full screen) |
OCR | RapidOCR | ~6300ms | 14.5J | Cross-platform ONNX CPU |
UI Detection | OpenVINO NPU | ~80ms | 0.3J | YOLOv8n on Intel AI Boost |
UI Detection | OpenVINO CPU | ~120ms | โ | Fallback when no NPU |
Full benchmark details and reproduction steps:
outputs/power_report.md
๐ ๏ธ MCP Tools
Tool | Purpose | Key Arguments |
| Server status | โ |
| Available backends | โ |
| Extract text from region |
|
| Find UI elements |
|
| ๐ Combined OCR + detection |
|
analyze_screen is the primary tool โ it fuses detection + OCR, returns spatially-sorted elements with text annotations. Perfect for agent navigation.
๐ Documentation
Architecture Guide โ System design and data flow
Backend Reference โ Per-backend capabilities and priorities
FAQ โ Common questions and troubleshooting
Contributing โ How to contribute
Code Guide โ Project constitution for contributors
๐งช Examples
Example | Description |
Simple OCR call to screen region | |
Find and click UI elements | |
Vision fallback in remote desktop |
uv run python examples/basic_ocr.py --region 0 0 1280 800๐บ๏ธ Roadmap
v1.1 โ Multi-monitor support, DPI scaling awareness
v2.0 โ Custom model training interface, bring your own detector
v2.1 โ UI-TARS integration, macOS Vision backend, PP-OCR v4 on NPU
๐ค Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines. Please read CLAUDE.mdโit's the project constitution that ensures code quality and architectural consistency.
๐ Supported Backends
Backend | Type | Device | Platform | Status |
| System OCR | CPU/NPU | Windows | โ Primary |
| UI Detection | NPU | Win/Linux + Intel NPU | โ Primary |
| UI Detection | CPU | Win/Linux/macOS | โ Fallback |
| OCR | CPU | All | โ Cross-platform |
| OCR | CPU | All | โ Last-resort |
| System OCR | ANE | macOS | ๐ง Planned |
๐ License
MIT ยฉ npu-vision-fallback contributors
๐ Acknowledgments
Built with:
Model Context Protocol (Anthropic) โ Agent integration layer
OpenVINO โ NPU/CPU inference runtime
Ultralytics YOLO โ UI detection models
RapidOCR โ Cross-platform OCR engine
Tesseract โ OCR fallback
python-mss โ Screen capture library
Development assisted by Claude Code (Anthropic). Architecture design and code review powered by AI collaboration.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Byte-Naut/npu-vision-fallback'
If you have feedback or need assistance with the MCP directory API, please join our Discord server