Supports configuration through .env files for managing API keys and other settings like OpenAI endpoints and model selection
Provides cross-platform support for screen monitoring and interaction capabilities on Linux systems
Provides cross-platform support for screen monitoring and interaction capabilities on macOS systems
Enables integration with OpenAI's vision models for screen content analysis, supporting API key configuration and custom endpoints for visual processing tasks
Offers installation through PyPI package repository using pip install command
ScreenMonitorMCP - Revolutionary AI Vision Server
Give AI real-time sight and screen interaction capabilities
ScreenMonitorMCP is a revolutionary MCP (Model Context Protocol) server that provides Claude and other AI assistants with real-time screen monitoring, visual analysis, and intelligent interaction capabilities. This project enables AI to see, understand, and interact with your screen in ways never before possible.
Why ScreenMonitorMCP?
Transform your AI assistant from text-only to a visual powerhouse that can:
- Monitor your screen in real-time and detect important changes
- Click UI elements using natural language commands
- Extract text from any part of your screen
- Analyze screenshots and videos with AI
- Provide intelligent insights about screen activity
Core Features
Smart Monitoring System
- start_smart_monitoring() - Enable intelligent monitoring with configurable triggers
- get_monitoring_insights() - AI-powered analysis of screen activity
- get_recent_events() - History of detected screen changes
- stop_smart_monitoring() - Stop monitoring with preserved insights
Natural Language UI Interaction
- smart_click() - Click elements using descriptions like "Save button"
- extract_text_from_screen() - OCR text extraction from screen regions
- get_active_application() - Get current application context
Visual Analysis Tools
- capture_and_analyze() - Screenshot capture with AI analysis
- record_and_analyze() - Video recording with AI analysis
- query_vision_about_current_view() - Ask AI questions about current screen
🆕 Real-time Screen Streaming
- start_screen_stream() - Start real-time base64 screen streaming with performance optimizations
- get_stream_frame() - Get the latest frame from an active stream
- get_stream_status() - Monitor stream health, performance, and statistics
- stop_screen_stream() - Stop streaming and cleanup resources
- list_active_streams() - List all active streams with their status
System Performance
- get_system_metrics() - Comprehensive system health dashboard
- get_cache_stats() - Cache performance statistics
- optimize_image() - Advanced image optimization
- simulate_input() - Keyboard and mouse simulation
Quick Setup
1. Installation
Installation
Option 1: Install from PyPI (Recommended)
Option 2: Install from Source
Configuration
Create a .env
file in your working directory:
Example .env
configuration:
Claude Desktop Integration
Add to your Claude Desktop claude_desktop_config.json
:
Alternative with custom path:
Usage Examples
Available Tools (26 Total)
Smart Monitoring (6 tools): Real-time screen monitoring with AI analysis UI Interaction (2 tools): Natural language screen control Visual Analysis (3 tools): AI-powered image and video analysis 🆕 Real-time Streaming (5 tools): Base64 screen streaming with performance optimizations System Performance (7 tools): Performance monitoring and optimization Input Simulation (2 tools): Keyboard and mouse automation Utility (1 tool): Tool documentation and listing
Technical Features
- 21 Revolutionary Tools - Comprehensive AI vision capabilities
- Real-time Monitoring - Adaptive FPS with smart triggers
- Multi-AI Support - OpenAI, OpenRouter, and custom endpoints
- Advanced OCR - Tesseract and EasyOCR integration
- Cross-platform - Windows, macOS, Linux support
- Smart Caching - Performance optimization
- Security Focused - API key management
Vision & Mission
Vision: Enable AI assistants to see and interact with the visual world, breaking down the barrier between text-based AI and real-world interfaces.
Mission: Provide the foundational technology for AI-human visual interaction, making AI assistants truly helpful in visual tasks and screen-based workflows.
Contributing
We welcome contributions to this revolutionary project:
- Bug reports and feature requests
- Code contributions and improvements
- Documentation enhancements
See CONTRIBUTING.md for details.
License
This project is licensed under the MIT License. See LICENSE for details.
Ready to give your AI real sight?
ScreenMonitorMCP transforms AI assistants from text-only to visually intelligent companions.
This server cannot be installed
An MCP server that provides AI with real-time screen monitoring capabilities and UI element intelligence, allowing AI to observe, analyze, and interact with screen content through features like smart clicking and text extraction.
Related MCP Servers
- -securityAlicense-qualityAn MCP server that bridges AI agents with GUI automation capabilities, allowing them to control mouse, keyboard, windows, and take screenshots to interact with desktop applications.Last updated -7PythonMIT License
- -securityFlicense-qualityA MCP server that allows AI assistants to interact with the browser, including getting page content as markdown, modifying page styles, and searching browser history.Last updated -79TypeScript
- -securityAlicense-qualityA comprehensive MCP server providing tools for AI agents to interact with code, including reading symbols, importing modules, replacing text, and sending OS notifications.Last updated -998TypeScriptMIT License
- -securityAlicense-qualityAn MCP server that lets agents and humans monitor and control long-running processes, reducing copy-pasting between AI tools and enabling multiple agents to interact with the same process outputs.Last updated -3PythonMIT License