Where's Waldo Rick

gemini-vision-mcp
.planning

REQUIREMENTS.md•15.4 KiB

# Requirements — Where's Waldo Rick **Project**: Visual Regression MCP Server with Agentic Vision **Version**: 1.0 **Last Updated**: 2025-02-04 --- ## Table Stakes (Must-Have Features) ### CAPTURE — Screenshot Capture #### REQ-CAPTURE-01: Multi-Platform Screenshot Capture User can capture screenshots from multiple platforms (macOS desktop, iOS Simulator, and web browsers via chrome-devtools MCP) with automatic platform detection and adapter selection. **Acceptance Criteria**: - macOS screenshots use MSS (capture time < 50ms) - iOS screenshots use simctl CLI tool - Web screenshots use chrome-devtools MCP - Platform is auto-detected from environment - Fallback to manual platform selection if auto-detection fails **Traceability**: Phase 2 #### REQ-CAPTURE-02: Phase-Based Organization User can organize screenshots by development phase with automatic directory structure creation (`.screenshots/phases/{phase-name}/`) and JSON index metadata. **Acceptance Criteria**: - Screenshots stored in phase-named directories - JSON index includes timestamp, platform, resolution - Index is queryable for comparison operations - Missing directories are auto-created **Traceability**: Phase 2 #### REQ-CAPTURE-03: Configurable Capture Quality User can configure screenshot quality settings (resolution scale, format selection) with persistent configuration across sessions. **Acceptance Criteria**: - Resolution scale options: 1x, 2x (Retina), 3x (super) - Format options: PNG (lossless), JPEG (compressed), WebP - Configuration persisted in `.screenshots/config.json` - Default: PNG @ 2x resolution **Traceability**: Phase 2 --- ### COMPARE — Side-by-Side Comparison #### REQ-COMPARE-01: Pixel-Perfect Diffing User can compare two screenshots with pixel-level precision using OpenCV with configurable threshold for change detection (1px, 2px, 3px sensitivity). **Acceptance Criteria**: - OpenCV 4.10+ for image comparison - Configurable pixel threshold (default: 2px) - Absolute difference calculation - Threshold selection based on image complexity - Results include total changed pixels and percentage **Traceability**: Phase 3 #### REQ-COMPARE-02: Anti-Aliasing Noise Reduction User can enable 2-pixel Gaussian blur preprocessing to prevent false positives from anti-aliasing rendering artifacts. **Acceptance Criteria**: - 2px Gaussian blur kernel applied before comparison - Option to enable/disable preprocessing - Reduces false positives by >80% (measured against anti-aliasing test suite) - Preserves genuine layout changes **Traceability**: Phase 3 #### REQ-COMPARE-03: Heatmap Diff Visualization User can view difference heatmap highlighting changed regions with color intensity based on magnitude of change (subtle to dramatic). **Acceptance Criteria**: - Color gradient: yellow (subtle) → orange (moderate) → red (dramatic) - Overlay on original screenshot - Adjustable intensity (opacity: 20-80%) - Exportable as PNG **Traceability**: Phase 3 --- ### BASELINE — Baseline Management #### REQ-BASELINE-01: Baseline Declaration User can declare any screenshot as a baseline with automatic snapshot metadata storage and version tracking. **Acceptance Criteria**: - Baseline stored with unique ID (timestamp-based) - Metadata includes: phase, description, expected changes - Baseline listing with search/filter - Baseline restoration from history **Traceability**: Phase 2 #### REQ-BASELINE-02: Expected Changes Annotation User can document expected changes when declaring a baseline (e.g., "moved button 2px right, added new card") with structured format for validation. **Acceptance Criteria**: - Structured annotation format (JSON schema) - Free text description field - Optional bounding box annotations - Annotations linked to baseline ID **Traceability**: Phase 2 #### REQ-BASELINE-03: Baseline Comparison User can compare current screenshot against any declared baseline with automatic detection of differences beyond expected changes. **Acceptance Criteria**: - Select baseline by ID or search - Compute diff against baseline - Highlight all changes (intended + unintended) - Flag changes not in expected changes annotation **Traceability**: Phase 3 --- ### VALIDATION — Pass/Fail Determination #### REQ-VALIDATION-01: Pass/Fail Criteria User can define pass/fail criteria based on change thresholds (maximum pixel change, maximum region count) with customizable rules. **Acceptance Criteria**: - Configurable pixel change threshold (default: 0.5% of total pixels) - Configurable maximum changed regions (default: 10) - Pass/fail status clearly displayed - Criteria persisted per comparison **Traceability**: Phase 3 #### REQ-VALIDATION-02: Regression Detection User can automatically detect unintended changes by comparing actual changes against expected changes annotation. **Acceptance Criteria**: - Parse expected changes from baseline annotation - Flag all changes not in expected list - Categorize as "regression" or "unexpected enhancement" - Provide detailed diff report **Traceability**: Phase 3 #### REQ-VALIDATION-03: Comparison Report User can generate detailed comparison reports including pass/fail status, change summary, annotated screenshots, and recommendations. **Acceptance Criteria**: - Markdown-formatted report - Embedded screenshots with annotations - Pass/fail badge with criteria explanation - Change summary table (intended vs unintended) - Report saved to `.screenshots/reports/` **Traceability**: Phase 3 --- ## Differentiators (Competitive Advantage) ### AGENTIC — Agentic Vision Analysis #### REQ-AGENTIC-01: Gemini 3 Flash Integration User can leverage Gemini 3 Flash's agentic vision capabilities for iterative zoom/crop/annotate analysis of visual differences. **Acceptance Criteria**: - Integration with gemini-3-flash-preview model - Agentic vision API calls (multi-resolution analysis) - Iterative refinement: low → medium → high → ultra high resolution - Code execution for image manipulation (crop, annotate) - Free tier usage: 15 req/min, 250K tokens/min **Traceability**: Phase 3 (HIGH RISK) #### REQ-AGENTIC-02: Intelligent Change Interpretation User can receive natural language explanations of what changed (e.g., "padding increased by 2px on top of card") instead of just pixel coordinates. **Acceptance Criteria**: - Gemini analyzes pixel diffs and generates descriptions - Descriptions reference UI elements (cards, buttons, text) - Confidence scores for each interpretation - Fallback to pixel coordinates if interpretation fails **Traceability**: Phase 3 #### REQ-AGENTIC-03: Conversational Investigation User can engage in multi-turn conversations with Waldo about visual changes, asking follow-up questions like "Where exactly did the padding change?" and receiving annotated zoom-ins. **Acceptance Criteria**: - Multi-turn conversation context maintained - Follow-up questions trigger focused analysis - Annotated zoom-ins generated on-demand - Conversation history persisted for audit trail - Exit condition: user satisfaction or 5-turn limit **Traceability**: Phase 5 #### REQ-AGENTIC-04: Aggressive Caching User can benefit from hash-based caching to minimize redundant Gemini API calls and stay within free tier limits. **Acceptance Criteria**: - Hash-based cache key (image pair + threshold) - Cache hit returns cached analysis without API call - Cache invalidation on threshold/configuration change - Cache statistics (hit rate, API calls saved) **Traceability**: Phase 3 #### REQ-AGENTIC-05: Progressive Resolution User can automatically start with low-resolution analysis and only increase resolution if needed (confidence score < threshold) to optimize token usage. **Acceptance Criteria**: - Start: 720p (fast, cheap) - Upgrade to 1080p if confidence < 80% - Upgrade to 4K if confidence < 80% again - Stop at ultra-high (8K) or confidence ≥ 80% - Token savings tracked per comparison **Traceability**: Phase 3 #### REQ-AGENTIC-06: Rate Limiting with Token Bucket User can stay within Gemini free tier limits (15 req/min) with automatic token bucket rate limiter that queues requests during spikes. **Acceptance Criteria**: - Token bucket algorithm: 15 tokens refilled per minute - Queued requests processed when tokens available - Queue status visible to user (position, ETA) - Burst allowance: up to 15 requests immediately if full bucket - Backpressure: user notified if queue > 5 requests **Traceability**: Phase 3 --- ### EXPECTED — Expected vs Unintended Change Detection #### REQ-EXPECTED-01: Expected Changes Declaration User can declare expected changes when capturing baseline screenshots (e.g., "move button 2px right, add new card to bottom of list"). **Acceptance Criteria**: - Structured input format (JSON schema or natural language) - Natural language parsed by Gemini for structured extraction - Expected changes linked to baseline ID - Validation: changes must be verifiable visually **Traceability**: Phase 2 #### REQ-EXPECTED-02: Intended vs Unintended Classification User can see all changes categorized as "intended" (match expected changes) or "unintended" (not in expected changes) with clear visual distinction. **Acceptance Criteria**: - Pixel diffs compared against expected changes - Classification algorithm: match by proximity, description, bounding box - Intended changes: green highlights - Unintended changes: red highlights - Ambiguous changes: yellow highlights (manual review needed) **Traceability**: Phase 3 #### REQ-EXPECTED-03: Regression Alerting User can receive immediate alerts for regressions (unintended changes) with severity classification (critical, major, minor) based on visual impact. **Acceptance Criteria**: - Severity scoring algorithm: - Critical: center-stage layout breaks, text unreadability - Major: misalignment, overflow, spacing issues - Minor: 1px shifts, color variations - Alert notification with severity badge - Regression summary count (critical: N, major: N, minor: N) **Traceability**: Phase 3 --- ### CONVERSATIONAL — Conversational Investigation #### REQ-CONV-01: Multi-Turn Conversation Context User can maintain conversation context across up to 5 turns with Waldo, asking follow-up questions about visual changes. **Acceptance Criteria**: - Conversation state persisted per comparison session - Maximum 5 turns (configurable) - Context includes: original comparison, previous questions, previous answers - Graceful exit after 5 turns with summary **Traceability**: Phase 5 #### REQ-CONV-02: Focused Follow-Up Analysis User can ask targeted follow-up questions (e.g., "Show me exactly where the padding changed") and receive annotated zoom-ins of specific regions. **Acceptance Criteria**: - Follow-up question triggers focused analysis - Gemini generates annotated screenshot (zoom, crop, circles/arrows) - Annotations highlight specific requested region - Response includes bounding box coordinates **Traceability**: Phase 5 #### REQ-CONV-03: Conversational UI Patterns User can engage in natural conversations using common UI investigation patterns (zoom, crop, annotate, measure, compare). **Acceptance Criteria**: - Supported patterns: - "Zoom in on {region}" - "Crop to {element}" - "Annotate {change type}" - "Measure {dimension} between {elements}" - "Compare {region A} with {region B}" - Pattern matching with fuzzy intent recognition - Fallback to generic analysis if pattern unrecognized **Traceability**: Phase 5 #### REQ-CONV-04: Conversation History Persistence User can access conversation history for audit trail and learning (what questions were asked, what answers were provided). **Acceptance Criteria**: - Conversations stored in `.screenshots/conversations/` - JSON format with timestamp, question, answer, annotations - Searchable by comparison ID, date, question content - Exportable as markdown report **Traceability**: Phase 5 --- ## Non-Functional Requirements ### Performance #### REQ-PERF-01: Capture Latency Screenshot capture must complete in < 100ms for macOS, < 500ms for iOS Simulator, < 1s for web browsers. **Traceability**: Phase 2 #### REQ-PERF-02: Comparison Speed Pixel-level comparison must complete in < 2s for standard screenshots (1920x1080 or smaller). **Traceability**: Phase 3 #### REQ-PERF-03: Agentic Vision Response Time Gemini agentic vision analysis must complete in < 10s for initial analysis, < 5s for follow-up questions. **Traceability**: Phase 3 ### Reliability #### REQ-RELIABILITY-01: Rate Limit Compliance System must stay within Gemini free tier limits (15 req/min) with 99.9% compliance rate. **Traceability**: Phase 3 #### REQ-RELIABILITY-02: Cache Hit Rate System must achieve >60% cache hit rate for repeated comparisons to minimize API usage. **Traceability**: Phase 3 #### REQ-RELIABILITY-03: MCP JSON-RPC Compliance System must maintain valid JSON-RPC protocol (no stdout pollution) with 100% compliance rate. **Traceability**: Phase 1 ### Usability #### REQ-USABILITY-01: Zero-Config Baseline User can perform basic screenshot capture and comparison without any configuration (sensible defaults out-of-the-box). **Traceability**: Phase 2 #### REQ-USABILITY-02: Clear Error Messages User receives actionable error messages for all failure modes (rate limits, API errors, file system issues) with remediation steps. **Traceability**: Phase 4 #### REQ-USABILITY-03: Progressive Disclosure Advanced features (conversational investigation, custom thresholds) are hidden behind opt-in flags to keep default workflow simple. **Traceability**: Phase 5 --- ## Anti-Requirements (Deliberately NOT Building) ### ❌ CI/CD Mass Integration **Why Not**: Burns through free tier limits (15 req/min) with mass screenshot comparisons **Alternative**: Manual on-demand testing during code reviews and major milestones ### ❌ Cross-Browser Testing Matrix **Why Not**: Not target use case (focused on macOS/iOS development workflow) **Alternative**: Use existing tools (BrowserStack, Percy) for cross-browser testing ### ❌ Pixel-Perfect Strict Mode **Why Not**: Anti-aliasing rendering creates false positive noise storm **Alternative**: 2px threshold default with Gaussian blur preprocessing ### ❌ Cloud-Based Baseline Storage **Why Not**: Adds complexity, cost, and privacy concerns **Alternative**: Local filesystem storage with Git LFS for version control ### ❌ Real-Time Browser Monitoring **Why Not**: Out of scope for MCP server design (file-based comparison workflow) **Alternative**: Use chrome-devtools MCP for live inspection --- ## Requirements Summary | Category | Requirements | Phase | |----------|--------------|-------| | **Screenshot Capture** | 3 | Phase 2 | | **Side-by-Side Comparison** | 3 | Phase 3 | | **Baseline Management** | 3 | Phase 2 | | **Pass/Fail Determination** | 3 | Phase 3 | | **Agentic Vision Analysis** | 6 | Phase 3-5 | | **Expected vs Unintended** | 3 | Phase 2-3 | | **Conversational Investigation** | 4 | Phase 5 | | **Non-Functional** | 9 | Phase 1-5 | | **Total** | **34** | **5 phases** | --- ## Traceability Notes - **Phase 1 (Foundation)**: 3 non-functional requirements (MCP compliance, types, utilities) - **Phase 2 (Capture)**: 9 requirements (capture, baseline, expected changes) - **Phase 3 (Comparison)**: 15 requirements (comparison, validation, agentic vision core) - **HIGH RISK** - **Phase 4 (Operations)**: 2 non-functional requirements (error messages, cleanup) - **Phase 5 (Polish)**: 5 requirements (conversational investigation, progressive disclosure) **Critical Path**: REQ-AGENTIC-01 (Gemini Integration) is the highest-risk requirement and determines project success. --- **Next Step**: Create ROADMAP.md with 5-8 phases mapping these 34 requirements to execution plans.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bretbouchard/gemini-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

REQUIREMENTS.md•15.4 KiB