Skip to main content
Glama

Where's Waldo Rick - Visual Regression MCP Server

A Model Context Protocol (MCP) server that brings agentic vision capabilities to Claude Code for visual regression testing using Gemini 3 Flash.

Overview

Never again have ambiguous conversations about visual changes. See exactly what changed, circled and annotated, with intended vs unintended change detection.

Problem Solved

  • Developer works for hours on UI changes

  • Build passes, code is "clean"

  • You open the app... same exact layout

  • You ask: "What specifically changed?"

  • Dev says: "We added 2 pixels to the card"

  • You ask: "Where? Top? Bottom? Inside the box? Around it?"

  • 😀 Wasted time, unclear communication

Solution

Where's Waldo Rick provides:

  1. Screenshot capture from multiple platforms (macOS, iOS Simulator, Web)

  2. Pixel-perfect comparison with configurable thresholds

  3. Agentic vision analysis using Gemini 3 Flash (iterative zoom/crop/annotate)

  4. Expected vs unintended change detection

  5. Conversational investigation ("Not that box, the child item")

Installation

Requirements

  • Python 3.10+

  • Gemini API key (free tier: 15 requests/minute)

Install from GitHub

# Install via uvx uvx --from git+https://github.com/bretbouchard/gemini-vision-mcp wheres_waldo.server # Or install locally pip install -e .

Configure Claude Code

Add to your Claude Code MCP configuration (~/.claude/mcp.json or project-specific):

{ "mcpServers": { "wheres-waldo-rick": { "command": "uvx", "args": ["--from", "git+https://github.com/bretbouchard/gemini-vision-mcp", "wheres_waldo.server"], "env": { "GEMINI_API_KEY": "your-api-key-here" } } } }

Usage

Basic Workflow

# 1. Declare expected changes before work /visual:prepare "Card padding increases by 2px, button moves to right" # 2. Capture baseline screenshot /visual:capture "Phase 3 - Before card update" # 3. Development happens... # 4. Capture current state /visual:capture "Phase 4 - After card update" # 5. Compare and see all changes /visual:compare screenshots/phases/3-before.png screenshots/phases/4-after.png

MCP Tools

visual_capture

Capture a screenshot and store it for visual regression testing.

await visual_capture( name="Phase 3 - Before card update", platform="macos" # auto, macos, ios, web )

visual_prepare

Declare a baseline with expected changes before development.

await visual_prepare( phase="Phase 3 - Card Layout Update", expected_changes="Card padding increases by 2px, button moves to right" )

visual_compare

Compare two screenshots with pixel-level precision and agentic vision.

await visual_compare( before_path="screenshots/phases/3-before.png", after_path="screenshots/phases/4-after.png", threshold=2 # 1px, 2px, or 3px )

visual_cleanup

Clean up old screenshots and cache.

await visual_cleanup(retention_days=7)

Development

Setup

# Clone repository git clone https://github.com/bretbouchard/gemini-vision-mcp cd gemini-vision-mcp # Install in development mode pip install -e ".[dev]" # Run tests pytest # Format code black src/ ruff check src/

Project Structure

src/wheres_waldo/ β”œβ”€β”€ __init__.py β”œβ”€β”€ server.py # MCP server with tool definitions β”œβ”€β”€ models/ # Pydantic domain models β”œβ”€β”€ services/ # Business logic (capture, compare, storage) β”œβ”€β”€ tools/ # MCP tool implementations └── utils/ # Logging, hashing, path helpers

Roadmap

  • Phase 1: Foundation (MCP server skeleton, types, storage)

  • Phase 2: Capture & Baselines (multi-platform screenshots)

  • Phase 3: Comparison Engine (OpenCV + Gemini integration) πŸ”₯ HIGH RISK

  • Phase 4: Operations (caching, progressive resolution, reporting)

  • Phase 5: Polish (conversational investigation)

See ROADMAP.md for complete execution plan.

Contributing

Contributions welcome! Please read REQUIREMENTS.md and ROADMAP.md before contributing.

License

MIT License - See LICENSE file for details

Acknowledgments

Built with:


Generated with

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bretbouchard/gemini-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server