AI Vision MCP Server

Integrations

  • Runs on Node.js 14+ as the server environment required for operating the MCP functionality

AI Vision MCP Server

A Model Context Protocol (MCP) server that provides AI-powered visual analysis capabilities for Claude and other MCP-compatible AI assistants.

Features

  • Screenshot URL: Capture screenshots of any website by providing a URL
  • Visual Analysis: Analyze UI elements, layouts, and content in screenshots
  • File Operations: Read and modify files with line-specific precision
  • Report Generation: Create comprehensive UI/UX analysis reports
  • Debugging Session: Maintain context across multiple analysis steps

Installation

# Clone the repository git clone https://github.com/samihalawa/mcp-server-ai-vision.git cd mcp-server-ai-vision # Install dependencies npm install # Build the server npm run build

Usage

Starting the Server

npm start

Configuration

Add the server to your MCP configuration:

{ "servers": { "ai-vision": { "command": "/path/to/node", "args": ["/path/to/mcp-server-ai-vision/build/index.js"], "enabled": true, "port": 3005, "environment": { "NODE_PATH": "/path/to/node_modules", "PATH": "/usr/local/bin:/usr/bin:/bin", "GEMINI_API_KEY": "your-gemini-api-key" } } } }

Available Tools

screenshot_url

Take a screenshot of a URL using a web browser.

Parameters:

  • url (string, required): URL to capture a screenshot of (e.g., http://localhost:4999, https://google.com)
  • fullPage (boolean, optional): Whether to capture full page or just viewport. Default: false
  • waitForSelector (string, optional): CSS selector to wait for before taking screenshot
  • waitTime (number, optional): Time to wait in milliseconds before taking screenshot. Default: 1000
analyze_screen

Analyze a screenshot with AI vision.

Parameters: None (uses the most recent screenshot)

read_file

Read content from a file between specified line numbers.

Parameters:

  • path (string): Path to the file
  • startLine (number): Starting line number (1-indexed)
  • endLine (number): Ending line number (1-indexed)
modify_file

Modify content in a file between specified line numbers.

Parameters:

  • path (string): Path to the file
  • startLine (number): Starting line number to replace (1-indexed)
  • endLine (number): Ending line number to replace (1-indexed)
  • content (string): New content to replace the specified lines
generate_report

Generate a comprehensive UI/UX analysis report.

Parameters:

  • testUrl (string): URL of the application being tested
  • appName (string, optional): Name of the application being analyzed
  • date (string, optional): Date of the analysis (YYYY-MM-DD)
  • observations (object): Observations structured as components, data state, interactions, etc.

Example Workflow

  1. Take a screenshot of a website:
    screenshot_url(url: "https://example.com")
  2. Analyze the screenshot:
    analyze_screen()
  3. Generate a report based on the analysis:
    generate_report(testUrl: "https://example.com", observations: {...})

Requirements

  • Node.js 14+
  • Playwright for browser automation
  • Gemini API key for AI vision analysis

License

MIT

-
security - not tested
F
license - not found
-
quality - not tested

Provides AI-powered visual analysis capabilities for Claude and other MCP-compatible AI assistants, allowing them to capture and analyze screenshots, perform file operations, and generate UI/UX reports.

  1. Features
    1. Installation
      1. Usage
        1. Starting the Server
        2. Configuration
        3. Available Tools
      2. Example Workflow
        1. Requirements
          1. License

            Related MCP Servers

            • A
              security
              A
              license
              A
              quality
              A custom MCP tool that integrates Perplexity AI's API with Claude Desktop, allowing Claude to perform web-based research and provide answers with citations.
              Last updated -
              1
              2
              JavaScript
              MIT License
              • Apple
            • -
              security
              F
              license
              -
              quality
              Enables AI tools to capture and process screenshots of a user's screen, allowing AI assistants to see and analyze what the user is looking at through a simple MCP interface.
              Last updated -
              1
              Python
              • Linux
              • Apple
            • -
              security
              A
              license
              -
              quality
              An MCP server that bridges AI agents with GUI automation capabilities, allowing them to control mouse, keyboard, windows, and take screenshots to interact with desktop applications.
              Last updated -
              Python
              MIT License
              • Apple
              • Linux
            • A
              security
              A
              license
              A
              quality
              An MCP server that supercharges AI assistants with powerful tools for software development, enabling research, planning, code generation, and project scaffolding through natural language interaction.
              Last updated -
              11
              6
              TypeScript
              MIT License
              • Linux
              • Apple

            View all related MCP servers

            ID: p4nhzy0of0