analyze_screenshot
Analyzes webpage screenshots using AI (Gemma3) to describe visible content, identify elements, and provide structural insights. Capture full-page or specific sections for detailed examination.
Instructions
Take a screenshot and analyze it with AI (Gemma3) to describe what is visible on the page
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| detailed | No | Provide detailed structural analysis of the page | |
| fullPage | No | Capture full scrollable page | |
| model | No | AI model to use for analysis (default: gemma3:4b) | gemma3:4b |
| path | No | Path to save screenshot (optional) | |
| pretext | No | Optional context or specific instructions for what to look for in the analysis |
Implementation Reference
- src/index.ts:681-759 (handler)The main handler function for the 'analyze_screenshot' tool. It takes a screenshot of the current page using Playwright, converts it to base64, sends it to Ollama AI (default gemma3:4b) with a descriptive prompt, and returns the AI-generated analysis of the page contents.case 'analyze_screenshot': { if (!currentPage) { throw new Error('No browser page available. Launch a browser first.'); } const params = AnalyzeScreenshotSchema.parse(args); // Take screenshot const screenshotPath = params.path || `screenshot-${Date.now()}.png`; const screenshotBuffer = await currentPage.screenshot({ fullPage: params.fullPage, path: screenshotPath }); try { // Initialize Ollama client const ollama = new Ollama({ host: 'http://localhost:11434' }); // Convert screenshot to base64 const base64Image = screenshotBuffer.toString('base64'); // Prepare the prompt let prompt = 'Analyze this website screenshot and describe exactly what you see. '; if (params.detailed) { prompt += 'Provide a detailed structural analysis including layout, navigation elements, content sections, forms, buttons, and any interactive elements. '; } else { prompt += 'Focus on the main content and key elements visible on the page. '; } if (params.pretext) { prompt += `Additional context/instructions: ${params.pretext}. `; } prompt += 'Be specific about colors, text content, images, and the overall design and functionality of the page.'; // Make AI request const response = await ollama.generate({ model: params.model, prompt: prompt, images: [base64Image], stream: false }); return { content: [ { type: 'text', text: `AI Analysis of Screenshot (${screenshotPath}): ${response.response} Screenshot saved to: ${screenshotPath} Model used: ${params.model} Analysis type: ${params.detailed ? 'Detailed structural analysis' : 'General description'}` } ] }; } catch (aiError) { // If AI analysis fails, still return screenshot info const fallbackMessage = aiError instanceof Error ? aiError.message : String(aiError); return { content: [ { type: 'text', text: `Screenshot taken and saved to: ${screenshotPath} AI Analysis Error: ${fallbackMessage} Note: Make sure Ollama is running locally with the ${params.model} model installed. You can install it with: ollama pull ${params.model} And start Ollama with: ollama serve` } ] }; } }
- src/index.ts:64-70 (schema)Zod schema defining the input parameters for the analyze_screenshot tool, including options for full page screenshot, save path, analysis pretext, AI model, and detail level.const AnalyzeScreenshotSchema = z.object({ fullPage: z.boolean().default(false), path: z.string().optional(), pretext: z.string().optional(), model: z.string().default('gemma3:4b'), detailed: z.boolean().default(false) });
- src/index.ts:330-361 (registration)Tool registration in the listTools handler, defining the name, description, and inputSchema for analyze_screenshot.{ name: 'analyze_screenshot', description: 'Take a screenshot and analyze it with AI (Gemma3) to describe what is visible on the page', inputSchema: { type: 'object', properties: { fullPage: { type: 'boolean', default: false, description: 'Capture full scrollable page' }, path: { type: 'string', description: 'Path to save screenshot (optional)' }, pretext: { type: 'string', description: 'Optional context or specific instructions for what to look for in the analysis' }, model: { type: 'string', default: 'gemma3:4b', description: 'AI model to use for analysis (default: gemma3:4b)' }, detailed: { type: 'boolean', default: false, description: 'Provide detailed structural analysis of the page' } } } },