Skip to main content
Glama

ScreenMonitorMCP

MIT License
40
  • Apple
  • Linux

ScreenMonitorMCP - Revolutionary AI Vision Server

Give AI real-time sight and screen interaction capabilities

ScreenMonitorMCP is a revolutionary MCP (Model Context Protocol) server that provides Claude and other AI assistants with real-time screen monitoring, visual analysis, and intelligent interaction capabilities. This project enables AI to see, understand, and interact with your screen in ways never before possible.

Whisk_5d4767ec99

Why ScreenMonitorMCP?

Transform your AI assistant from text-only to a visual powerhouse that can:

  • Monitor your screen in real-time and detect important changes
  • Click UI elements using natural language commands
  • Extract text from any part of your screen
  • Analyze screenshots and videos with AI
  • Provide intelligent insights about screen activity

Core Features

Smart Monitoring System

  • start_smart_monitoring() - Enable intelligent monitoring with configurable triggers
  • get_monitoring_insights() - AI-powered analysis of screen activity
  • get_recent_events() - History of detected screen changes
  • stop_smart_monitoring() - Stop monitoring with preserved insights

Natural Language UI Interaction

  • smart_click() - Click elements using descriptions like "Save button"
  • extract_text_from_screen() - OCR text extraction from screen regions
  • get_active_application() - Get current application context

Visual Analysis Tools

  • capture_and_analyze() - Screenshot capture with AI analysis
  • record_and_analyze() - Video recording with AI analysis
  • query_vision_about_current_view() - Ask AI questions about current screen

🆕 Real-time Screen Streaming

  • start_screen_stream() - Start real-time base64 screen streaming with performance optimizations
  • get_stream_frame() - Get the latest frame from an active stream
  • get_stream_status() - Monitor stream health, performance, and statistics
  • stop_screen_stream() - Stop streaming and cleanup resources
  • list_active_streams() - List all active streams with their status

System Performance

  • get_system_metrics() - Comprehensive system health dashboard
  • get_cache_stats() - Cache performance statistics
  • optimize_image() - Advanced image optimization
  • simulate_input() - Keyboard and mouse simulation

Quick Setup

1. Installation

Installation

# Install the package pip install screenmonitormcp # Run the server screenmonitormcp # or use the short alias smcp

Option 2: Install from Source

git clone https://github.com/inkbytefo/ScreenMonitorMCP.git cd ScreenMonitorMCP pip install -e .

Configuration

Create a .env file in your working directory:

# Copy the example configuration cp .env.example .env # Edit .env file with your OpenAI API key

Example .env configuration:

OPENAI_API_KEY=your_openai_api_key_here OPENAI_BASE_URL=https://api.openai.com/v1 DEFAULT_OPENAI_MODEL=gpt-4-vision-preview DEFAULT_MAX_TOKENS=1000

Claude Desktop Integration

Add to your Claude Desktop claude_desktop_config.json:

{ "mcpServers": { "screenMonitorMCP": { "command": "screenmonitormcp", "args": [] } } }

Alternative with custom path:

{ "mcpServers": { "screenMonitorMCP": { "command": "python", "args": [ "-m", "screenmonitormcp.main" ] } } }

Usage Examples

# Start intelligent monitoring await start_smart_monitoring(triggers=['significant_change', 'error_detected']) # Natural language UI interaction await smart_click('Save button') await smart_click('Email input field') # Ask AI about current screen await query_vision_about_current_view('What errors are visible on this page?') # Extract text from screen await extract_text_from_screen() # 🆕 Real-time screen streaming stream_result = await start_screen_stream( fps=5, quality=70, format="jpeg", scale=0.5, change_detection=True, adaptive_quality=True ) stream_id = stream_result['stream_id'] # Get latest frame from stream frame = await get_stream_frame(stream_id) # frame['frame']['data'] contains base64 encoded image # Monitor stream performance status = await get_stream_status(stream_id) print(f"FPS: {status['stream_info']['stats']['current_fps']}") # Stop streaming await stop_screen_stream(stream_id)

Available Tools (26 Total)

Smart Monitoring (6 tools): Real-time screen monitoring with AI analysis UI Interaction (2 tools): Natural language screen control Visual Analysis (3 tools): AI-powered image and video analysis 🆕 Real-time Streaming (5 tools): Base64 screen streaming with performance optimizations System Performance (7 tools): Performance monitoring and optimization Input Simulation (2 tools): Keyboard and mouse automation Utility (1 tool): Tool documentation and listing

Technical Features

  • 21 Revolutionary Tools - Comprehensive AI vision capabilities
  • Real-time Monitoring - Adaptive FPS with smart triggers
  • Multi-AI Support - OpenAI, OpenRouter, and custom endpoints
  • Advanced OCR - Tesseract and EasyOCR integration
  • Cross-platform - Windows, macOS, Linux support
  • Smart Caching - Performance optimization
  • Security Focused - API key management

Vision & Mission

Vision: Enable AI assistants to see and interact with the visual world, breaking down the barrier between text-based AI and real-world interfaces.

Mission: Provide the foundational technology for AI-human visual interaction, making AI assistants truly helpful in visual tasks and screen-based workflows.

Contributing

We welcome contributions to this revolutionary project:

  • Bug reports and feature requests
  • Code contributions and improvements
  • Documentation enhancements

See CONTRIBUTING.md for details.

License

This project is licensed under the MIT License. See LICENSE for details.


Ready to give your AI real sight?

ScreenMonitorMCP transforms AI assistants from text-only to visually intelligent companions.

-
security - not tested
A
license - permissive license
-
quality - not tested

An MCP server that provides AI with real-time screen monitoring capabilities and UI element intelligence, allowing AI to observe, analyze, and interact with screen content through features like smart clicking and text extraction.

  1. Why ScreenMonitorMCP?
    1. Core Features
      1. Smart Monitoring System
      2. Natural Language UI Interaction
      3. Visual Analysis Tools
      4. 🆕 Real-time Screen Streaming
      5. System Performance
    2. Quick Setup
      1. 1. Installation
    3. Installation
      1. Option 1: Install from PyPI (Recommended)
      2. Option 2: Install from Source
      3. Configuration
      4. Claude Desktop Integration
    4. Usage Examples
      1. Available Tools (26 Total)
        1. Technical Features
          1. Vision & Mission
            1. Contributing
              1. License

                Related MCP Servers

                • -
                  security
                  A
                  license
                  -
                  quality
                  An MCP server that bridges AI agents with GUI automation capabilities, allowing them to control mouse, keyboard, windows, and take screenshots to interact with desktop applications.
                  Last updated -
                  7
                  Python
                  MIT License
                  • Apple
                  • Linux
                • -
                  security
                  F
                  license
                  -
                  quality
                  A MCP server that allows AI assistants to interact with the browser, including getting page content as markdown, modifying page styles, and searching browser history.
                  Last updated -
                  79
                  TypeScript
                • -
                  security
                  A
                  license
                  -
                  quality
                  A comprehensive MCP server providing tools for AI agents to interact with code, including reading symbols, importing modules, replacing text, and sending OS notifications.
                  Last updated -
                  998
                  TypeScript
                  MIT License
                  • Linux
                  • Apple
                • -
                  security
                  A
                  license
                  -
                  quality
                  An MCP server that lets agents and humans monitor and control long-running processes, reducing copy-pasting between AI tools and enabling multiple agents to interact with the same process outputs.
                  Last updated -
                  3
                  Python
                  MIT License

                View all related MCP servers

                MCP directory API

                We provide all the information about MCP servers via our MCP API.

                curl -X GET 'https://glama.ai/api/mcp/v1/servers/inkbytefo/ScreenMonitorMCP'

                If you have feedback or need assistance with the MCP directory API, please join our Discord server