Allows AI assistants to visually interact with macOS applications by capturing screenshots, controlling mouse and keyboard inputs, managing windows, extracting text via OCR, and detecting errors on screen.
Mentioned as an example application that can be controlled, allowing the AI to focus Safari windows and interact with web content.
Mac Commander MCP Server
🤖 Enable AI assistants to visually interact with your macOS applications
An MCP (Model Context Protocol) server that allows AI coding tools like Claude Desktop, Claude Code, and Cursor to see, control, and test macOS applications. Perfect for automated testing, UI debugging, and error detection.
🎆 What makes this special?
✨ Visual AI: Your AI can actually see what's on your screen
🎯 Smart UI Detection: Advanced element detection finds buttons, forms, and controls without relying on text
🗾 Error Detection: Automatically finds bugs and error dialogs
🔄 Full Control: Click, type, and navigate just like a human
📱 App Testing: Perfect for testing mobile apps, desktop software, or web interfaces
⚡ High Performance: Optimized memory usage and 60-80% faster text operations
🚀 Easy Setup: Get started in under 5 minutes
🚀 Quick Start
Option 1: Automated Install (Easiest)
The installer will:
✅ Check your Node.js version
✅ Install dependencies and build the project
✅ Show you the exact configuration to copy
✅ Offer to open System Settings for permissions
Option 2: Manual Install
✨ In 2 minutes, your AI will be able to see and control your Mac!
📚 Table of Contents
📖 Complete Documentation Index - Start here for all documentation
✨ Features
Core Features
📸 Screenshot Capture: High-performance screenshots with optional base64 compression and metadata-only responses
🖱️ Mouse Control: Click, double-click, and move the mouse cursor
⌨️ Keyboard Input: Type text and press key combinations
🪟 Window Management: List, find, focus, and get information about application windows
🔍 OCR Text Recognition: Extract and find text on screen using Tesseract.js
⚠️ Error Detection: Automatically detect error dialogs and messages using OCR
📏 Screen Information: Get display dimensions and coordinates
Advanced UI Element Detection (New!)
🎯 Multi-Strategy Detection Engine: Combines visual analysis, OCR, color patterns, and shape detection
🔍 20+ UI Element Types: Detects buttons, text fields, links, dialogs, menus, checkboxes, dropdowns, tabs, toolbars, scrollbars, and more
🍎 macOS-Specific Patterns: Optimized for Apple Human Interface Guidelines and native UI components
📊 Confidence Scoring: Each detected element includes reliability scores and validation methods
🔄 Element Classification: Advanced categorization with state detection (enabled/disabled/selected/focused)
🎨 Visual Feature Analysis: Color, shape, border radius, and spatial relationship analysis for accurate detection
🧠 Context-Aware Detection: Groups related elements and understands form patterns, dialog layouts, and menu structures
✅ Interactive Element Validation: Verifies clickability and interactivity through visual characteristics
Advanced Automation Features (New!)
🎯 Drag & Drop: Drag operations for moving UI elements and files
📜 Advanced Scrolling: Directional scrolling with customizable amounts
🖱️ Mouse Gestures: Hover, right-click, and mouse movement controls
⌨️ Keyboard Input: Text typing with configurable delays between keystrokes
🔄 Complex Interactions: Chain multiple actions for sophisticated automation
⏱️ Precise Timing: Built-in wait functionality and timing controls
Performance Features (New!)
⚡ Optimized Memory Usage: Reduced memory consumption from 99% to ~60-70% through intelligent buffering
🚀 Fast Text Search: 60-80% faster
find_text
operations with optimized OCR processing💾 Smart Caching System: Intelligent cache with 30-70% hit rates for frequently accessed screenshots
🖼️ Chunked Image Processing: Efficient handling of large images through intelligent chunking
🎛️ Automatic Memory Management: Built-in throttling and cleanup to prevent memory exhaustion
📊 Performance Monitoring: Real-time tracking of memory usage, cache performance, and operation timings
🔄 Request Batching: Optimized handling of multiple simultaneous operations
🛠️ Prerequisites
System Requirements
macOS 13+ (Ventura or later)
Node.js 18+ and npm
AI client with MCP support:
Claude Desktop (recommended)
Cursor with MCP support
Any other MCP-compatible client
Required macOS Permissions
⚠️ Important: You must grant these permissions or the server won't work!
Screen Recording Permission:
Go to System Settings → Privacy & Security → Screen Recording
Click the + button and add your AI client (Claude Desktop, Cursor, etc.)
✅ Check the box next to your AI client
Accessibility Permission:
Go to System Settings → Privacy & Security → Accessibility
Click the + button and add your AI client
✅ Check the box next to your AI client
💡 Tip: You might need to restart your AI client after granting permissions.
📦 Installation
💿 Automated Installation
Recommended for beginners:
The installer script will guide you through everything!
🔧 Manual Installation
For advanced users:
Option 2: Global Install
🔧 Verify Installation
Run the test script to make sure everything works:
You should see the server start and respond to test commands.
⚙️ Configuration
🖥️ Claude Desktop Setup
Open Claude Desktop and go to Settings (gear icon)
Click on the Developer tab
Click Edit Config to open the configuration file
Add the MCP server configuration:
🚨 Important: Replace
/FULL/PATH/TO/
with the actual absolute path to where you cloned this repository!
Example with real path:
Save the file and restart Claude Desktop
Start a new chat - you should see a 🔨 hammer icon indicating MCP is active
💻 Claude Code Setup
Navigate to your project folder in terminal
Create or edit
.claude/config.json
in your project root:
Start Claude Code in that project folder:
🎯 Cursor Setup
Open Cursor and go to Settings → Cursor Settings → MCP
Click "Add new global MCP server"
Add the configuration:
Name:
mac-commander
Command:
node
Args:
/FULL/PATH/TO/mac-commander/build/index.js
Or create ~/.cursor/mcp.json
:
🔍 Finding Your Full Path
Not sure what your full path is? Run this in the project directory:
Example output: /Users/yourname/Developer/mac-commander/build/index.js
Copy this exact path and use it in your configuration files above.
✅ Verify It's Working
After configuration:
Restart your AI client (Claude Desktop, Cursor, etc.)
Start a new chat/session
Look for the MCP indicator (hammer icon in Claude Desktop)
Try a test command: "Take a screenshot of my screen"
If it works, you'll see the AI successfully take a screenshot! 🎉
📖 Tool Parameter Reference
screenshot
Capture a screenshot of the screen or a specific region with optimized performance.
Parameters:
outputPath
(optional): Path to save the screenshot as PNGregion
(optional): Object withx
,y
,width
,height
to capture specific areareturnBase64
(optional): Return base64 data in response (default: false)compressionQuality
(optional): JPEG compression quality 10-100 for base64 responses (default: 80)
Performance Features:
Default mode returns metadata only (fast, small responses)
Base64 mode includes compressed image data when
returnBase64: true
60-80% size reduction through JPEG compression
Always saves to temp folder for later access regardless of mode
Usage Examples:
Metadata-only mode (recommended for performance):
With base64 data for immediate processing:
Region capture with high compression:
click
Click at specific coordinates on the screen.
Parameters:
x
: X coordinatey
: Y coordinatebutton
: "left", "right", or "middle" (default: "left")doubleClick
: boolean (default: false)verify
: boolean (default: false) - Take a screenshot after clicking to verify the action
type_text
Type text using the keyboard.
Parameters:
text
: Text to typedelay
: Delay between keystrokes in milliseconds (default: 50)
mouse_move
Move the mouse to specific coordinates.
Parameters:
x
: X coordinatey
: Y coordinate
key_press
Press a key or key combination.
Parameters:
key
: Key to press (e.g., "Enter", "Escape", "cmd+a")
check_for_errors
Check the screen for common error indicators.
Parameters:
region
(optional): Specific region to check
wait
Wait for a specified amount of time.
Parameters:
milliseconds
: Time to wait
wait_for_element
Wait for specific text or UI element to appear on screen before continuing. Essential for handling dynamic content, loading screens, and asynchronous UI updates.
Parameters:
text
: Text to wait for on screentimeout
: Maximum wait time in milliseconds (default: 10000)pollInterval
: How often to check in milliseconds (default: 500)region
(optional): Specific region to search in withx
,y
,width
,height
coordinates
Returns success/failure status and location of found element if successful. Perfect for waiting for buttons to become available, dialogs to appear, or loading indicators to disappear.
get_screen_info
Get information about the screen dimensions.
No parameters required.
list_windows
List all open windows with their titles and positions.
No parameters required.
get_active_window
Get information about the currently active window.
No parameters required.
find_window
Find a window by its title (partial match supported).
Parameters:
title
: Window title to search for
focus_window
Focus/activate a window by its title.
Parameters:
title
: Window title to focus
get_window_info
Get detailed information about a specific window.
Parameters:
title
: Window title to get info for
extract_text
Extract and read text from the screen or specific regions using advanced Optical Character Recognition (OCR). Features improved caching system for better performance, confidence scoring, and enhanced text recognition accuracy. Supports fuzzy text matching and configurable OCR settings for optimal results.
Parameters:
region
(optional): Specific region to extract text from withx
,y
,width
,height
coordinates
Enhanced Features:
Smart Caching: Multi-level caching system with image hash-based keys for better performance
Confidence Filtering: Configurable minimum confidence thresholds (default: 50%)
Optimized Processing: Uses worker pool for concurrent OCR operations
Error Handling: Comprehensive error detection and recovery
Performance Tracking: Built-in timing and performance metrics
find_text
Locate specific text on the screen using advanced OCR with fuzzy matching capabilities. Returns precise coordinates, confidence scores, and handles OCR variations automatically. Essential for robust UI automation that adapts to text rendering differences.
Parameters:
text
: Text to search for (supports fuzzy matching for OCR variations)region
(optional): Specific region to search in withx
,y
,width
,height
coordinates
Enhanced Features:
Fuzzy Text Matching: Handles OCR variations with configurable similarity thresholds
Standard threshold: 70% similarity (configurable)
Relaxed threshold: 50% similarity for difficult text
Levenshtein distance algorithm for accurate matching
Smart Sorting: Results sorted by similarity score and confidence level
Multiple Match Support: Returns all matching text locations with coordinates
Center Point Calculation: Provides precise click coordinates for each match
Confidence Scoring: Each match includes OCR confidence level
Performance Optimized: Cached results and memory management
Example Fuzzy Matching:
Search for "Submit" → Finds "Subm1t", "SUBMIT", "submit" (OCR variations)
Search for "Login" → Matches "Log1n", "LOGIN", "Iog in" (common OCR errors)
Search for "Cancel" → Finds "Cancei", "CANCEL", "cancel" (character misrecognition)
find_ui_elements
Advanced UI element detection system that intelligently identifies interactive components using multiple detection strategies. Unlike text-only detection, this tool can accurately find buttons, text fields, dropdowns, and other UI elements even when they don't contain visible text. Perfect for modern applications with visual-only buttons, icons, and complex layouts.
Parameters:
autoSave
(optional): Whether to save the screenshot for analysis (default: true)elementTypes
(optional): Array of specific element types to detect:['button', 'text_field', 'link', 'image', 'icon', 'dialog', 'menu', 'window', 'checkbox', 'radio_button', 'dropdown', 'slider', 'tab', 'toolbar', 'list', 'table', 'scrollbar', 'other']
region
(optional): Specific region to analyze withx
,y
,width
,height
coordinates
Detection Strategies:
Visual Analysis: Detects UI elements based on shape, color, and visual patterns
OCR Text Recognition: Identifies elements with text content and labels
Color Pattern Analysis: Recognizes macOS system colors and UI themes
Shape Detection: Finds rectangular buttons, rounded elements, and geometric patterns
Context Analysis: Groups related elements and understands spatial relationships
macOS-Specific Features:
Apple HIG Compliance: Optimized for Apple Human Interface Guidelines
System Color Recognition: Detects standard macOS button colors (#007AFF, #34C759, #FF3B30, etc.)
Touch Target Validation: Ensures elements meet minimum 44x44 pixel requirements
Native UI Patterns: Recognizes standard macOS dialogs, menus, and controls
Output Format: Returns comprehensive element information including:
Element type and subtype classification
Precise coordinates and clickable center points
Confidence scores and detection methods used
Visual features (colors, border radius, shadows)
Interactive validation results
Element state (enabled/disabled/selected)
Contextual relationships with nearby elements
Example Use Cases:
Find all clickable buttons in a dialog:
elementTypes: ['button']
Detect form elements:
elementTypes: ['text_field', 'button', 'dropdown']
Locate menu items:
elementTypes: ['menu', 'link']
Find all interactive elements: (no elementTypes filter)
drag
Drag from one point to another using mouse button hold.
Parameters:
startX
: Starting X coordinatestartY
: Starting Y coordinateendX
: Ending X coordinateendY
: Ending Y coordinatebutton
: Mouse button to use for dragging (default: "left")
scroll
Scroll in any direction within the current window or a specific region.
Parameters:
direction
: Direction to scroll ("up", "down", "left", "right")amount
: Number of scroll units (default: 5)x
(optional): X coordinate to scroll at (defaults to current mouse position)y
(optional): Y coordinate to scroll at (defaults to current mouse position)
hover
Hover the mouse at a specific position for a duration.
Parameters:
x
: X coordinate to hover aty
: Y coordinate to hover atduration
: Duration to hover in milliseconds (default: 1000)
right_click
Right-click at specific coordinates to open context menus.
Parameters:
x
: X coordinate to right-clicky
: Y coordinate to right-click
list_screenshots
List all screenshots saved in the temporary folder.
No parameters required.
list_recent_screenshots
List recently captured screenshots with detailed metadata including timestamps, file sizes, and dimensions.
Parameters:
limit
: Maximum number of screenshots to list (default: 10, max: 50)
view_screenshot
View/display a specific screenshot from the temporary folder.
Parameters:
filename
: Name of the screenshot file to view
cleanup_screenshots
Clean up old screenshots from temporary folder, keeping only recent ones.
Parameters:
keepLast
: Number of recent screenshots to keep (default: 10)
compare_screenshots
Compare two previously saved screenshots to identify differences and changes.
Parameters:
screenshot1
: Filename of the first screenshotscreenshot2
: Filename of the second screenshot
describe_screenshot
Capture and analyze a screenshot with AI-powered insights, combining OCR text extraction and UI element detection.
Parameters:
region
(optional): Specific region to analyze withx
,y
,width
,height
coordinatessavePath
(optional): Optional path to save the analyzed screenshot
performance_dashboard
Comprehensive performance monitoring dashboard providing real-time system health, metrics, and optimization recommendations.
Parameters:
includeMetrics
(optional): Include detailed metrics in response (default: true)includeRecommendations
(optional): Include optimization recommendations (default: true)includeHistory
(optional): Include performance history and trends (default: false)timeRangeMs
(optional): Time range for trends in milliseconds (default: 1 hour)
🔧 OCR Configuration Options
The OCR system can be customized with various configuration options to optimize performance and accuracy for different use cases:
configureOCR(options)
Configure OCR settings globally for all text recognition operations.
Available Options:
minConfidence
: Minimum confidence score for text recognition (default: 50, range: 0-100)fuzzyMatchThreshold
: Standard similarity threshold for fuzzy matching (default: 0.7, range: 0-1)relaxedFuzzyThreshold
: Fallback threshold for difficult text (default: 0.5, range: 0-1)cacheEnabled
: Enable/disable OCR result caching (default: true)cacheTTL
: Cache time-to-live in milliseconds (default: 30000)maxCacheSize
: Maximum number of cached results (default: 100)timeoutMs
: OCR operation timeout in milliseconds (default: 30000)
Example Configuration:
OCR Performance Features
Worker Pool Architecture:
Concurrent OCR processing with multiple worker threads
Automatic load balancing and task prioritization
Graceful fallback to single worker if pool initialization fails
Intelligent Caching:
Multi-level caching with image hash and region-based keys
Automatic cache cleanup and size management
Configurable TTL and cache size limits
Memory Management:
Automatic garbage collection triggers for large OCR operations
Memory usage monitoring and cleanup
Efficient image processing and buffer management
Error Handling:
Comprehensive error detection and recovery
Timeout protection for long-running OCR operations
Detailed error reporting with context information
📊 Implementation Status & Available Tools
✅ Fully Implemented Tools
Screenshot Management:
screenshot
- Screen capture with region support and compressionlist_screenshots
- List all saved screenshotslist_recent_screenshots
- List recent screenshots with metadataview_screenshot
- View specific screenshot filescleanup_screenshots
- Clean up old screenshot filescompare_screenshots
- Compare two screenshotsdescribe_screenshot
- AI-powered screenshot analysis
Mouse & Keyboard Control:
click
- Click with multiple button support and verificationtype_text
- Text input with configurable delayskey_press
- Key combinations and shortcutsmouse_move
- Move mouse cursordrag
- Drag and drop operationsscroll
- Directional scrollinghover
- Mouse hover with durationright_click
- Context menu access
Window Management:
list_windows
- List all open windowsget_active_window
- Get current window infofind_window
- Find window by titlefocus_window
- Bring window to frontget_window_info
- Detailed window information
OCR & Text Recognition:
extract_text
- OCR text extraction with cachingfind_text
- Locate text on screen with fuzzy matchingwait_for_element
- Wait for text/elements to appearfind_ui_elements
- Advanced visual UI element detection
System & Utilities:
get_screen_info
- Screen dimensionscheck_for_errors
- Visual error detectionwait
- Pause executiondiagnostic
- System health checkperformance_dashboard
- Performance monitoring
🚧 Planned Features (Not Yet Implemented)
The following features mentioned in examples are planned for future releases:
click_hold
- Click and hold operationsrelative_mouse_move
- Relative mouse positioningkey_hold
- Hold keys for durationtype_with_delay
- Human-like typing with variable delays and typosAdvanced smooth scrolling with easing
Pixel-perfect scrolling controls
🚀 Usage Examples
🎯 Basic Commands
Once configured, you can ask your AI assistant to:
Screenshots & Visual Inspection:
"Take a screenshot of my app" (metadata-only for fast responses)
"Capture just the top-left corner of the screen"
"Save a screenshot to ~/Desktop/app-screenshot.png"
"Take a screenshot and return the base64 data with 70% compression"
"Capture a region and return compressed image data for processing"
Mouse & Keyboard Control:
"Click the button at coordinates 100, 200"
"Double-click on the center of the screen"
"Type 'Hello World' in the current field"
"Press cmd+s to save the file"
"Press Enter to submit"
Window Management:
"List all open windows"
"Focus the Safari window"
"Get information about the active window"
"Find the window with 'Calculator' in the title"
Text Recognition & Search:
"Extract all text from the screen"
"Find the 'Submit' button on screen"
"Look for any text containing 'error' on screen"
"Read the text in the dialog box"
UI Element Detection:
"Find all clickable buttons on this screen"
"Detect text fields and form elements in this dialog"
"Locate all interactive elements (buttons, links, dropdowns)"
"Identify the toolbar and menu elements visually"
"Find UI elements by type: buttons, text fields, and checkboxes"
Error Detection:
"Check if there are any error dialogs on screen"
"Look for error messages in my app"
"Scan for any warning or error indicators"
🔧 Advanced Automation Examples
UI Testing Workflow:
Bug Investigation:
Automated Form Filling:
🚀 Advanced Automation Features
Drag and Drop Operations:
Natural Scrolling:
Text Input with Timing:
Drag and Drop Operations:
Keyboard Shortcuts:
Menu Navigation:
🎯 UI Element Detection Examples
Smart Button Detection:
Form Automation with Visual Detection:
Modern App UI Navigation:
macOS Dialog Interaction:
Complex Layout Analysis:
Responsive UI Testing:
Visual-Only Element Detection:
🚀 Performance Improvements
Mac Commander has been optimized for high-performance automation with significant improvements in memory usage, processing speed, and reliability:
Memory Optimization
99% → 60-70% Memory Usage: Intelligent memory management and buffer optimization
Automatic Cleanup: Built-in garbage collection and memory throttling
Smart Buffering: Efficient image processing with minimal memory footprint
Processing Speed
60-80% Faster Text Operations: Optimized OCR processing and text search algorithms
Chunked Image Processing: Large images are processed in efficient chunks
Parallel Processing: Multiple operations can run concurrently without blocking
Caching System
30-70% Cache Hit Rates: Intelligent caching of frequently accessed screenshots
Smart Cache Management: Automatic cache invalidation and memory-conscious storage
Performance Monitoring: Real-time tracking of cache effectiveness
Request Batching
Optimized Concurrency: Multiple simultaneous requests are handled efficiently
Resource Throttling: Prevents system overload during intensive operations
Performance Metrics: Built-in monitoring of operation timings and resource usage
These improvements make Mac Commander suitable for intensive automation tasks and long-running operations without performance degradation.
Development
Run in development mode:
npm run dev
Test with MCP Inspector:
npm run inspector
⚠️ Limitations & Troubleshooting
Known Limitations
OCR Accuracy: Text recognition depends on font size, contrast, and clarity
Permission Requirements: Must manually grant Screen Recording and Accessibility permissions
First OCR Run: Initial text extraction may be slower due to model loading
macOS Only: This server only works on macOS systems
🐛 Common Issues
"Permission denied" or "Screen recording not allowed"
✅ Grant Screen Recording permission to your AI client
✅ Grant Accessibility permission to your AI client
🔄 Restart your AI client after granting permissions
"Command not found" or "Cannot find module"
✅ Make sure you ran
npm install
andnpm run build
✅ Use the absolute path to
build/index.js
in your config✅ Verify Node.js is installed:
node --version
"MCP server not showing up"
✅ Check your configuration JSON syntax is valid
✅ Restart your AI client completely
✅ Try the test script:
node test-server.js
"Screenshots are black or empty"
✅ Grant Screen Recording permission
✅ Make sure the app you're screenshotting is visible (not minimized)
🆘 Getting Help
If you're still having issues:
Run the test script:
node test-server.js
to verify basic functionalityCheck the console: Look for error messages in your AI client
Open an issue: Create a GitHub issue with:
Your macOS version
Your AI client (Claude Desktop, Cursor, etc.)
The exact error message
Your configuration file (with paths anonymized)
🔒 Security & Privacy
Important Security Notes
⚠️ This server has powerful capabilities and requires significant system permissions.
What this server can access:
✅ Screen content: Can take screenshots of anything visible
✅ Keyboard input: Can type any text or key combinations
✅ Mouse control: Can click anywhere on screen
✅ Window information: Can see and control application windows
✅ Text recognition: Can read any text visible on screen
Security best practices:
🏠 Only use in trusted environments: Don't use on shared or public computers
🤝 Review AI requests: Be mindful of what you ask the AI to do
🔐 Sensitive data: Avoid using when sensitive information is visible
🚫 Revoke access: You can remove permissions anytime in System Settings
Privacy Notes
No data is sent externally by this MCP server itself
Your AI client (Claude Desktop, etc.) may process screenshots/data according to their privacy policies
Screenshots are temporary and not permanently stored unless you specify a save path
OCR processing happens locally on your machine
🤝 Contributing
Contributions are welcome! Please:
Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request
📄 License
MIT License - see LICENSE file for details.
📚 Additional Resources
📖 Complete Documentation Suite
DOCS.md - Complete documentation index and navigation guide
API.md - Comprehensive API reference and technical documentation
PERFORMANCE.md - Performance optimization and tuning guide
MIGRATION.md - Migration guide and version changes
🔗 Related Links
MCP Protocol - Learn about the Model Context Protocol
Claude Desktop - Download Claude Desktop
Claude Code - Web-based Claude Code interface
Cursor - AI-powered code editor
🛠️ Development Resources
GitHub Repository - Source code and issue tracking
Contributing Guidelines - How to contribute
Release Notes - Version history and changes
💬 Community and Support
GitHub Issues - Bug reports and feature requests
GitHub Discussions - Community discussions
MCP Community - Broader MCP ecosystem
🙏 Acknowledgments
Built with Model Context Protocol (MCP)
Uses @nut-tree-fork/nut-js for system automation
OCR powered by Tesseract.js
Image processing with node-canvas
Made with ❤️ for the MCP community
Having issues?
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
An MCP server that allows AI tools like Claude Desktop, Claude Code, and Cursor to visually interact with macOS applications by capturing screenshots and controlling the mouse and keyboard.
- 🚀 Quick Start
- 📚 Table of Contents
- ✨ Features
- 🛠️ Prerequisites
- 📦 Installation
- ⚙️ Configuration
- 📖 Tool Parameter Reference
- screenshot
- click
- type_text
- mouse_move
- key_press
- check_for_errors
- wait
- wait_for_element
- get_screen_info
- list_windows
- get_active_window
- find_window
- focus_window
- get_window_info
- extract_text
- find_text
- find_ui_elements
- drag
- scroll
- hover
- right_click
- list_screenshots
- list_recent_screenshots
- view_screenshot
- cleanup_screenshots
- compare_screenshots
- describe_screenshot
- performance_dashboard
- 🔧 OCR Configuration Options
- 📊 Implementation Status & Available Tools
- 🚀 Usage Examples
- 🚀 Performance Improvements
- Development
- ⚠️ Limitations & Troubleshooting
- 🔒 Security & Privacy
- 🤝 Contributing
- 📄 License
- 📚 Additional Resources
- 🙏 Acknowledgments
Related MCP Servers
- AsecurityAlicenseAqualityAn MCP server that implements Claude Code-like functionality, allowing the AI to analyze codebases, modify files, execute commands, and manage projects through direct file system interactions.Last updated -15264MIT License
- -securityFlicense-qualityAn MCP server that allows AI assistants like Claude to execute terminal commands on the user's computer and return the output, functioning like a terminal through AI.Last updated -58
- -securityAlicense-qualityAn MCP server that enables AI assistants like Claude to access and manipulate Apple Notes on macOS, allowing for retrieving, creating, and managing notes through natural language interactions.Last updated -76MIT License
- AsecurityAlicenseAqualityAn MCP server that allows AI assistants like Claude Code, Claude Desktop, and Cursor to interact with Things.app on macOS, enabling task creation, updates, viewing, scheduling, and organization through natural language.Last updated -682MIT License