Analyze images to extract summaries, objects, text, or detailed insights using Gemini's multimodal vision capabilities. Supports JPEG, PNG, WebP, and other formats with optional context for enhanced results.
Analyze images using Gemini AI vision models to answer questions, identify content, or extract information from images provided via URL or base64 data.
Execute automated UI tests using Playwright to validate user interactions, assertions, and browser compatibility across Chromium, Firefox, and WebKit for robust UI/UX workflows.
An MCP (Multi-Agent Conversation Protocol) Server that provides a standardized interface for interacting with Google's Cloud Vision API, enabling AI agents to analyze images and extract visual information through natural language.
Enables browser automation and web page interaction using Playwright's accessibility tree for fast, deterministic actions without screenshots or vision models.
Enables browser automation and web page interaction using Playwright's accessibility tree for fast, structured automation without requiring vision models or screenshots.