Which integrations are available for this server?

Integrates with Gemini 3 Flash to provide agentic vision capabilities, enabling iterative screenshot analysis, change detection, and automated annotation for visual regression testing. Provides tools for capturing and comparing screenshots from the iOS Simulator to facilitate visual regression testing on mobile layouts. Enables visual regression testing by capturing and comparing screenshots directly from the macOS platform to detect and investigate UI changes.

How do I use Where's Waldo Rick?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Where's Waldo Rick compare the current layout to the baseline and show me what changed" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

de en es ja ko ru zh

Where's Waldo Rick

by bretbouchard

Overview Schema Related Servers Score Discussions

Python

Hybrid

Visual Regression MCP Server

A Model Context Protocol (MCP) server that brings agentic vision capabilities to Claude Code for visual regression testing using Gemini 3 Flash.

Overview

Never again have ambiguous conversations about visual changes. See exactly what changed, circled and annotated, with intended vs unintended change detection.

Problem Solved

Developer works for hours on UI changes
Build passes, code is "clean"
You open the app... same exact layout
You ask: "What specifically changed?"
Dev says: "We added 2 pixels to the card"
You ask: "Where? Top? Bottom? Inside the box? Around it?"
😤 Wasted time, unclear communication

Solution

Where's Waldo Rick provides:

Screenshot capture from multiple platforms (macOS, iOS Simulator, Web)
Pixel-perfect comparison with configurable thresholds
Agentic vision analysis using Gemini 3 Flash (iterative zoom/crop/annotate)
Expected vs unintended change detection
Conversational investigation ("Not that box, the child item")

Installation

Requirements

Python 3.10+
Gemini API key (free tier: 15 requests/minute)

Install from GitHub

# Install via uvx
uvx --from git+https://github.com/bretbouchard/gemini-vision-mcp wheres_waldo.server

# Or install locally
pip install -e .

Configure Claude Code

Add to your Claude Code MCP configuration (~/.claude/mcp.json or project-specific):

{
  "mcpServers": {
    "wheres-waldo-rick": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/bretbouchard/gemini-vision-mcp", "wheres_waldo.server"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Usage

Basic Workflow

# 1. Declare expected changes before work
/visual:prepare "Card padding increases by 2px, button moves to right"

# 2. Capture baseline screenshot
/visual:capture "Phase 3 - Before card update"

# 3. Development happens...

# 4. Capture current state
/visual:capture "Phase 4 - After card update"

# 5. Compare and see all changes
/visual:compare screenshots/phases/3-before.png screenshots/phases/4-after.png

MCP Tools

`visual_capture`

Capture a screenshot and store it for visual regression testing.

await visual_capture(
    name="Phase 3 - Before card update",
    platform="macos"  # auto, macos, ios, web
)

`visual_prepare`

Declare a baseline with expected changes before development.

await visual_prepare(
    phase="Phase 3 - Card Layout Update",
    expected_changes="Card padding increases by 2px, button moves to right"
)

`visual_compare`

Compare two screenshots with pixel-level precision and agentic vision.

await visual_compare(
    before_path="screenshots/phases/3-before.png",
    after_path="screenshots/phases/4-after.png",
    threshold=2  # 1px, 2px, or 3px
)

`visual_cleanup`

Clean up old screenshots and cache.

await visual_cleanup(retention_days=7)

Development

Setup

# Clone repository
git clone https://github.com/bretbouchard/gemini-vision-mcp
cd gemini-vision-mcp

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/
ruff check src/

Project Structure

src/wheres_waldo/
├── __init__.py
├── server.py          # MCP server with tool definitions
├── models/            # Pydantic domain models
├── services/          # Business logic (capture, compare, storage)
├── tools/             # MCP tool implementations
└── utils/             # Logging, hashing, path helpers

Roadmap

Phase 1: Foundation (MCP server skeleton, types, storage)
Phase 2: Capture & Baselines (multi-platform screenshots)
Phase 3: Comparison Engine (OpenCV + Gemini integration) 🔥 HIGH RISK
Phase 4: Operations (caching, progressive resolution, reporting)
Phase 5: Polish (conversational investigation)

See ROADMAP.md for complete execution plan.

Contributing

Contributions welcome! Please read REQUIREMENTS.md and ROADMAP.md before contributing.

License

MIT License - See LICENSE file for details

Acknowledgments

Built with:

Generated with Claude Code via Happy

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bretbouchard/gemini-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server