OScribe
Enables vision-based desktop automation for the Brave browser, including CDP-enhanced element detection and control via screenshots.
Provides full UI element detection and automation for Electron applications (e.g., VS Code) via Windows accessibility tree or NVDA.
Enables vision-based desktop automation for the Opera browser with CDP-enhanced element detection and control.
Enables vision-based desktop automation for the Safari browser on macOS, using native accessibility APIs.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@OScribeclick the 'OK' button in the dialog"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
OScribe
Vision-based desktop automation MCP server. Control any application via screenshot + AI vision.
Supported Platforms & Applications
Operating Systems
Native Applications
Web Browsers (CDP-enhanced)
Note: Chrome 136+ requires automatic profile sync (~20-30s) due to CDP security changes.
Table of Contents
Why OScribe?
"If you can see it, OScribe can click it."
OScribe is your fallback when traditional automation tools fail:
Legacy apps without APIs
Games and canvas apps without DOM
Third-party software you can't modify
Ad-hoc automation without infrastructure setup
Demo
Helltaker - Full Chapter 1 Automated
Claude plays through the entire first chapter of Helltaker using OScribe MCP tools - navigating menus, solving puzzles, and progressing through dialogue, all via screenshot + vision.
Features
π― Vision-based - Locate UI elements by description using Claude vision
π UI Automation - Get element coordinates via Windows accessibility tree
π§ MCP Server - Integrates with Claude Desktop, Claude Code, Cursor, Windsurf
β‘ Native Input - Uses robotjs for reliable mouse/keyboard control
πΈ Multi-monitor - Supports multiple screens with DPI awareness
πͺ Windows - Currently tested on Windows only
βοΈ Electron Support - Full UI element detection in Electron apps (via NVDA)
Quick Start
Guided Installation (Recommended)
Run our interactive installer that checks and installs all prerequisites for you:
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs | node
# Windows (PowerShell as Administrator)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs -OutFile install.mjs; node install.mjsThe installer will:
β Check Node.js version (22+ required)
β Check/install Python
β Check/install build tools (VS Build Tools or Xcode CLI)
β Install OScribe
Manual Installation
If you prefer manual installation or already have prerequisites:
npm install -g oscribeThen configure your MCP client (see MCP Integration below).
Installation
System Prerequisites
OScribe uses robotjs for native mouse/keyboard control, which requires compilation tools:
Windows
Node.js 22+ - Download
Python 3.x - Download (check "Add to PATH" during install)
Visual Studio Build Tools - Install with C++ workload:
# Option 1: Via npm (recommended) npm install -g windows-build-tools # Option 2: Manual install # Download from https://visualstudio.microsoft.com/visual-cpp-build-tools/ # Select "Desktop development with C++" workload
macOS
Node.js 22+ - Download or
brew install nodeXcode Command Line Tools:
xcode-select --installPython 3.x - Usually pre-installed, verify with
python3 --version
Verify Prerequisites
Before installing, run the diagnostic script to check all prerequisites:
# macOS/Linux - Run directly without installation
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node
# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjsThe doctor script checks:
Node.js version (22+)
Python installation
Build tools (VS Build Tools on Windows, Xcode CLI on macOS)
It provides step-by-step fix instructions for any missing prerequisites.
After OScribe is installed, you can also run:
oscribe doctorAdditional Requirements
Claude Desktop, Claude Code, or any MCP client (provides OAuth authentication)
From npm (Recommended)
# Global installation
npm install -g oscribe
# Verify installation
oscribe --versionFrom Source
git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm install
npm run build
npm link # Makes 'oscribe' command available globallyPlatform Support
Platform | Status |
Windows | β Fully supported |
macOS | β Supported |
Linux | π§ Not tested yet |
Windows Details
PowerShell (included)
UI Automation via PowerShell + .NET
NVDA support for Electron apps
macOS Details
Native screencapture command
UI Automation via AXUIElement API (
ax-readerbinary)Requires: Accessibility permissions (System Settings β Privacy & Security β Accessibility)
Add Terminal or your IDE to allowed apps
IMPORTANT for VSCode users: You must also authorize VSCode in "App Management" (Login Items & Extensions)
Open System Settings β General β Login Items & Extensions
Find "Visual Studio Code"
Toggle ON the switch
Enter your password or use Touch ID to confirm
This is required for OScribe MCP to control your system from Claude Code
Native apps (Chrome, Safari, Finder) work well
Electron apps (VS Code, etc.) have limited element detection (same as Windows without NVDA)
Usage
CLI Commands
Vision-Based Clicking (The Core of OScribe!)
oscribe click "Submit button" # Click by description - the magic!
oscribe click "File menu" # Works on any visible element
oscribe click "Export as PNG" --screen 1 # Target specific monitor
oscribe click "Close" --dry-run # Preview without clickingInput & Automation
oscribe type "hello world" # Type text
oscribe hotkey "ctrl+c" # Press keyboard shortcut
oscribe hotkey "ctrl+shift+esc" # Multiple modifiersScreenshots
oscribe screenshot # Capture primary screen
oscribe screenshot -o capture.png # Save to file
oscribe screenshot --screen 1 # Capture second monitor
oscribe screenshot --list # List available screens
oscribe screenshot --describe # Describe screen content with AIWindow Management
oscribe windows # List open windows
oscribe focus "Chrome" # Focus window by name
oscribe focus "Calculator" # Works with partial matchesMCP Server
oscribe serve # Start MCP server (stdio transport)Global Options
--verbose, -v # Detailed output
--dry-run # Simulate without executing
--quiet, -q # Minimal output
--screen N # Target specific screen (default: 0)Examples
# Take screenshot and save
oscribe screenshot -o desktop.png
# Type with delay between keystrokes
oscribe type "slow typing" --delay 100
# Use second monitor
oscribe screenshot --screen 1 --describe
# Dry run to see what would happen
oscribe type "test" --dry-runMCP Integration
OScribe exposes tools via Model Context Protocol for AI agents. Works with Claude Desktop, Claude Code, Cursor, Windsurf, and any MCP-compatible client.
Quick Setup
Claude Desktop
Edit your config file:
OS | Config Path |
Windows |
|
macOS |
|
Add OScribe to mcpServers:
{
"mcpServers": {
"oscribe": {
"command": "npx",
"args": ["-y", "oscribe", "serve"]
}
}
}Or if installed globally (npm install -g oscribe):
{
"mcpServers": {
"oscribe": {
"command": "oscribe",
"args": ["serve"]
}
}
}Then restart Claude Desktop. You'll see a π icon indicating MCP tools are available.
Claude Code / Cursor / Windsurf
Add a .mcp.json file in your project root:
{
"mcpServers": {
"oscribe": {
"command": "npx",
"args": ["-y", "oscribe", "serve"]
}
}
}Or if installed globally:
{
"mcpServers": {
"oscribe": {
"command": "oscribe",
"args": ["serve"]
}
}
}Available MCP Tools
Tool | Description | Parameters |
| πΈ Capture screenshot + cursor position |
|
| π Get UI elements via Windows UI Automation |
|
| π― Get element info at coordinates |
|
| Move mouse cursor |
|
| Click at current cursor position |
|
| Move + click in one action |
|
| Type text |
|
| Press keyboard shortcut |
|
| Scroll in direction |
|
| List open windows + screens | - |
| Focus window by name |
|
| Wait for duration (UI loading) |
|
| Check NVDA screen reader status (Electron support) | - |
| Download NVDA portable for Electron apps | - |
| Start NVDA in silent mode | - |
| Stop NVDA screen reader | - |
MCP Usage Example
Once configured, Claude can automate your desktop:
"Take a screenshot and describe what you see"
"Inspect the UI elements and click the Submit button"
"List all windows and focus on Chrome"
"Type 'hello world' and press Ctrl+Enter"
Workflow: Claude uses os_screenshot to see the screen, os_inspect to get element coordinates, then os_move + os_click for precise interaction.
Configuration
Config directory: ~/.oscribe/
Files
config.json- Application settings
config.json
{
"defaultScreen": 0,
"dryRun": false,
"logLevel": "info",
"cursorSize": 128
}Configuration Options
Option | Type | Default | Description |
| number |
| Default monitor to capture |
| boolean |
| Simulate actions without executing |
| string |
| Log level: |
| number |
| Cursor size in screenshots (32-256) |
| boolean |
| Auto-download NVDA when needed |
| boolean |
| Auto-start NVDA for Electron apps |
| string | - | Custom NVDA installation path |
How It Works
OScribe uses a multi-layer approach for desktop automation (Windows):
Screenshot Layer - Captures screen using PowerShell + .NET System.Drawing
UI Automation Layer - Gets element coordinates via Windows accessibility tree:
Uses Windows UI Automation API via PowerShell
Returns interactive elements with screen coordinates
Works like a DOM for desktop apps
Input Layer - Uses robotjs for:
Mouse movement and clicks
Keyboard input and hotkeys
Adapts to Windows mouse button swap settings
Best strategy: Use os_screenshot which returns UI elements with coordinates, then os_move + os_click for precise interaction.
Development
Setup
git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm installScripts
npm run build # Build TypeScript
npm run dev # Development mode (watch)
npm run typecheck # Type check only
npm run lint # Run ESLint
npm run lint:fix # Fix linting issues
npm run format # Format with Prettier
npm run clean # Remove dist folderProject Structure
oscribe/
βββ bin/
β βββ oscribe.ts # CLI entry point
βββ src/
β βββ core/
β β βββ screenshot.ts # Multi-platform screen capture
β β βββ input.ts # Mouse/keyboard control (robotjs)
β β βββ windows.ts # Window management
β β βββ uiautomation.ts # Windows UI Automation (accessibility)
β βββ cli/
β β βββ commands/ # CLI command implementations
β β βββ index.ts # Command registration
β βββ mcp/
β β βββ server.ts # MCP server (12 tools)
β βββ config/
β β βββ index.ts # Config management with Zod
β βββ index.ts # Main exports
βββ package.json
βββ tsconfig.json
βββ .env.example
βββ LICENSETech Stack
Runtime: Node.js 22+ (ESM)
Language: TypeScript 5.7+ (strict mode)
Validation: Zod
CLI: Commander + Chalk + Ora
Vision: Anthropic SDK (Claude Sonnet 4)
Input: robotjs (native automation)
Screenshot: screenshot-desktop + platform-specific tools
MCP: @modelcontextprotocol/sdk
Troubleshooting
Installation Issues
npm install fails with node-gyp errors:
First, run the diagnostic script (no installation required):
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node
# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjsThis is usually due to missing build tools. robotjs requires native compilation.
# Error examples:
# - "gyp ERR! find Python"
# - "gyp ERR! find VS"
# - "node-pre-gyp ERR! build error"Windows fix:
# 1. Install Python (if missing)
# Download from https://www.python.org/downloads/
# IMPORTANT: Check "Add Python to PATH" during installation
# 2. Install Visual Studio Build Tools
npm install -g windows-build-tools
# Or manually: download from https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Select "Desktop development with C++" workload
# 3. Retry installation
npm install -g oscribemacOS fix:
# 1. Install Xcode Command Line Tools
xcode-select --install
# 2. Retry installation
npm install -g oscribeStill failing? Try clearing npm cache:
npm cache clean --force
npm install -g oscribeMCP Server Issues
Server not starting:
Check Node.js version:
node --version(requires 22+)Rebuild if needed:
npm run buildCheck path in your MCP config file
Tools not appearing in Claude Desktop:
Restart Claude Desktop after config changes
Check
claude_desktop_config.jsonsyntax (valid JSON)Look for π icon in Claude Desktop interface
Windows Issues
Clicks not working:
OScribe auto-detects swapped mouse buttons
No manual configuration needed
UI elements not detected:
Some apps don't expose UI Automation elements
Use
os_screenshotto see what's visibleCoordinates are returned in the screenshot response
Electron apps showing few UI elements:
Electron/Chromium apps require NVDA screen reader to expose their full accessibility tree:
# Install NVDA portable (one-time)
oscribe nvda install
# Start NVDA silently (no audio)
oscribe nvda startOr via MCP tools: os_nvda_install β os_nvda_start
NVDA runs in silent mode (no speech, no sounds). The agent will prompt to install NVDA when needed.
Manual NVDA installation:
If you prefer to install NVDA yourself, download from nvaccess.org and set the path in config:
{
"nvda": {
"customPath": "C:/Program Files/NVDA"
}
}License
BSL 1.1 (Business Source License 1.1)
β Free for personal use
β Free for open-source projects
β οΈ Commercial use requires a paid license (until 2029)
π Converts to MIT on 2029-01-30 (then free for everyone)
See LICENSE for full terms.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Guidelines
Follow the existing code style (ESLint + Prettier configured)
Add tests for new features
Update documentation as needed
Ensure
npm run buildsucceedsCheck types with
npm run typecheck
Areas for Contribution
Additional platform support (BSD, other Unix variants)
More sophisticated element location strategies
Performance optimizations
Additional MCP tools
Better error messages
Documentation improvements
Support
π Bug reports: GitHub Issues
π¬ Questions: GitHub Discussions
π Documentation: This README + inline code comments
Roadmap
npm package distribution
Web interface for remote control
Recording and playback of automation sequences
Multi-provider vision support (GPT-4V, Gemini)
Plugin system for custom tools
Docker container distribution
Acknowledgements
OScribe is built on top of these great open-source projects:
robotjs - Native mouse/keyboard control
screenshot-desktop - Cross-platform screen capture
@anthropic-ai/sdk - Claude API client
@modelcontextprotocol/sdk - MCP server framework
ffmpeg - GIF generation (optional, external)
Maintained by MickaΓ«l Bellun
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/mikealkeal/oscribe'
If you have feedback or need assistance with the MCP directory API, please join our Discord server