Glama
AI Vision Debug MCP Server

# AI Vision MCP Server

A Model Context Protocol (MCP) server that provides AI-powered visual analysis capabilities for Claude and other MCP-compatible AI assistants.

## Features

- **Screenshot URL**: Capture screenshots of any website by providing a URL
- **Visual Analysis**: Analyze UI elements, layouts, and content in screenshots
- **File Operations**: Read and modify files with line-specific precision
- **Report Generation**: Create comprehensive UI/UX analysis reports
- **Debugging Session**: Maintain context across multiple analysis steps

## Installation

```bash
# Clone the repository
git clone https://github.com/samihalawa/mcp-server-ai-vision.git
cd mcp-server-ai-vision

# Install dependencies
npm install

# Build the server
npm run build
```

## Usage

### Starting the Server

```bash
npm start
```

### Configuration

Add the server to your MCP configuration:

```json
{
  "servers": {
    "ai-vision": {
      "command": "/path/to/node",
      "args": ["/path/to/mcp-server-ai-vision/build/index.js"],
      "enabled": true,
      "port": 3005,
      "environment": {
        "NODE_PATH": "/path/to/node_modules",
        "PATH": "/usr/local/bin:/usr/bin:/bin",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}
```

### Available Tools

#### screenshot_url

Take a screenshot of a URL using a web browser.

Parameters:
- `url` (string, required): URL to capture a screenshot of (e.g., http://localhost:4999, https://google.com)
- `fullPage` (boolean, optional): Whether to capture full page or just viewport. Default: false
- `waitForSelector` (string, optional): CSS selector to wait for before taking screenshot
- `waitTime` (number, optional): Time to wait in milliseconds before taking screenshot. Default: 1000

#### analyze_screen

Analyze a screenshot with AI vision.

Parameters: None (uses the most recent screenshot)

#### read_file

Read content from a file between specified line numbers.

Parameters:
- `path` (string): Path to the file
- `startLine` (number): Starting line number (1-indexed)
- `endLine` (number): Ending line number (1-indexed)

#### modify_file

Modify content in a file between specified line numbers.

Parameters:
- `path` (string): Path to the file
- `startLine` (number): Starting line number to replace (1-indexed)
- `endLine` (number): Ending line number to replace (1-indexed)
- `content` (string): New content to replace the specified lines

#### generate_report

Generate a comprehensive UI/UX analysis report.

Parameters:
- `testUrl` (string): URL of the application being tested
- `appName` (string, optional): Name of the application being analyzed
- `date` (string, optional): Date of the analysis (YYYY-MM-DD)
- `observations` (object): Observations structured as components, data state, interactions, etc.

## Example Workflow

1. Take a screenshot of a website:
   ```
   screenshot_url(url: "https://example.com")
   ```

2. Analyze the screenshot:
   ```
   analyze_screen()
   ```

3. Generate a report based on the analysis:
   ```
   generate_report(testUrl: "https://example.com", observations: {...})
   ```

## Requirements

- Node.js 14+
- Playwright for browser automation
- Gemini API key for AI vision analysis

## License

MIT