analyze_screenshot
Capture and analyze web page screenshots using AI to describe visible content, detect elements, and provide structural insights based on user-defined context.
Instructions
Take a screenshot and analyze it with AI (Gemma3) to describe what is visible on the page
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| detailed | No | Provide detailed structural analysis of the page | |
| fullPage | No | Capture full scrollable page | |
| model | No | AI model to use for analysis (default: gemma3:4b) | gemma3:4b |
| path | No | Path to save screenshot (optional) | |
| pretext | No | Optional context or specific instructions for what to look for in the analysis |
Implementation Reference
- src/index.ts:681-759 (handler)Handler for the 'analyze_screenshot' tool. Takes a screenshot of the current page, encodes it as base64, analyzes it using Ollama (default gemma3:4b) with a customizable prompt, and returns the AI-generated description. Handles errors by providing fallback screenshot info.case 'analyze_screenshot': { if (!currentPage) { throw new Error('No browser page available. Launch a browser first.'); } const params = AnalyzeScreenshotSchema.parse(args); // Take screenshot const screenshotPath = params.path || `screenshot-${Date.now()}.png`; const screenshotBuffer = await currentPage.screenshot({ fullPage: params.fullPage, path: screenshotPath }); try { // Initialize Ollama client const ollama = new Ollama({ host: 'http://localhost:11434' }); // Convert screenshot to base64 const base64Image = screenshotBuffer.toString('base64'); // Prepare the prompt let prompt = 'Analyze this website screenshot and describe exactly what you see. '; if (params.detailed) { prompt += 'Provide a detailed structural analysis including layout, navigation elements, content sections, forms, buttons, and any interactive elements. '; } else { prompt += 'Focus on the main content and key elements visible on the page. '; } if (params.pretext) { prompt += `Additional context/instructions: ${params.pretext}. `; } prompt += 'Be specific about colors, text content, images, and the overall design and functionality of the page.'; // Make AI request const response = await ollama.generate({ model: params.model, prompt: prompt, images: [base64Image], stream: false }); return { content: [ { type: 'text', text: `AI Analysis of Screenshot (${screenshotPath}): ${response.response} Screenshot saved to: ${screenshotPath} Model used: ${params.model} Analysis type: ${params.detailed ? 'Detailed structural analysis' : 'General description'}` } ] }; } catch (aiError) { // If AI analysis fails, still return screenshot info const fallbackMessage = aiError instanceof Error ? aiError.message : String(aiError); return { content: [ { type: 'text', text: `Screenshot taken and saved to: ${screenshotPath} AI Analysis Error: ${fallbackMessage} Note: Make sure Ollama is running locally with the ${params.model} model installed. You can install it with: ollama pull ${params.model} And start Ollama with: ollama serve` } ] }; } }
- src/index.ts:64-70 (schema)Zod schema definition for validating inputs to the analyze_screenshot tool, including options for fullPage screenshot, save path, AI prompt pretext, model, and detailed analysis flag.const AnalyzeScreenshotSchema = z.object({ fullPage: z.boolean().default(false), path: z.string().optional(), pretext: z.string().optional(), model: z.string().default('gemma3:4b'), detailed: z.boolean().default(false) });
- src/index.ts:330-361 (registration)Tool registration in the ListTools response, defining the name, description, and JSON inputSchema for analyze_screenshot.{ name: 'analyze_screenshot', description: 'Take a screenshot and analyze it with AI (Gemma3) to describe what is visible on the page', inputSchema: { type: 'object', properties: { fullPage: { type: 'boolean', default: false, description: 'Capture full scrollable page' }, path: { type: 'string', description: 'Path to save screenshot (optional)' }, pretext: { type: 'string', description: 'Optional context or specific instructions for what to look for in the analysis' }, model: { type: 'string', default: 'gemma3:4b', description: 'AI model to use for analysis (default: gemma3:4b)' }, detailed: { type: 'boolean', default: false, description: 'Provide detailed structural analysis of the page' } } } },