analyze_screenshot

Analyze test screenshots using OCR and visual analysis to extract text, compare UI states, and provide detailed image analysis for QA validation.

Instructions

🔍 Analyze test screenshot with OCR and visual analysis - returns image to Claude Vision for detailed analysis

Input Schema

TableJSON Schema

Name	Required	Description	Default
`screenshotUrl`	No	Screenshot URL to download and analyze
`screenshotPath`	No	Local path to screenshot file
`testId`	No	Test ID for context
`enableOCR`	No	Enable OCR text extraction (slower)
`analysisType`	No	basic=metadata+OCR only, detailed=includes image for Claude Vision	detailed
`expectedState`	No	Expected UI state for comparison

Implementation Reference

src/utils/screenshot-analyzer.ts:255-299 (handler)
Core handler function that implements the analyze_screenshot tool logic. Analyzes image buffer for metadata (using Sharp), optional OCR (Tesseract.js), UI elements detection, and device info detection.
export async function analyzeScreenshot( buffer: Buffer, options: { enableOCR?: boolean; ocrLanguage?: string; } = {} ): Promise<ScreenshotAnalysis> { const { enableOCR = false, ocrLanguage = 'eng' } = options; // Extract metadata const metadata = await getImageMetadata(buffer); // Optional OCR let ocrResult: OCRResult | undefined; let uiElements: ScreenshotAnalysis['uiElements'] | undefined; if (enableOCR) { try { ocrResult = await extractTextOCR(buffer, { lang: ocrLanguage }); const uiDetection = detectUIElements(ocrResult.text); uiElements = { hasLoadingIndicator: uiDetection.hasLoadingIndicator, hasErrorDialog: uiDetection.hasErrorDialog, hasEmptyState: uiDetection.hasEmptyState, hasNavigationBar: uiDetection.hasNavigationBar }; } catch (error) { console.warn('OCR failed, continuing without text extraction:', error); } } // Device detection const deviceInfo = detectDeviceInfo(metadata); return { metadata, ocrText: ocrResult, deviceInfo: { detectedDevice: deviceInfo.detectedDevice, statusBarVisible: metadata.height > 2000, // Rough heuristic navigationBarVisible: uiElements?.hasNavigationBar }, uiElements }; }
src/utils/screenshot-analyzer.ts:34-48 (schema)
Output schema/type definition for the screenshot analysis result, including metadata, OCR results, device info, and UI elements.
export interface ScreenshotAnalysis { metadata: ImageMetadata; ocrText?: OCRResult; deviceInfo?: { detectedDevice?: string; statusBarVisible?: boolean; navigationBarVisible?: boolean; }; uiElements?: { hasLoadingIndicator?: boolean; hasErrorDialog?: boolean; hasEmptyState?: boolean; hasNavigationBar?: boolean; }; }
src/utils/screenshot-analyzer.ts:12-21 (schema)
Input/output schema for image metadata extracted by Sharp.
export interface ImageMetadata { width: number; height: number; format: string; size: number; orientation: 'portrait' | 'landscape' | 'square'; aspectRatio: string; hasAlpha: boolean; colorSpace?: string; }
src/utils/screenshot-analyzer.ts:53-83 (helper)
Helper function to extract detailed image metadata using Sharp library.
export async function getImageMetadata(buffer: Buffer): Promise<ImageMetadata> { try { const image = sharp(buffer); const metadata = await image.metadata(); const stats = await image.stats(); const width = metadata.width || 0; const height = metadata.height || 0; let orientation: 'portrait' | 'landscape' | 'square' = 'square'; if (width > height) orientation = 'landscape'; else if (height > width) orientation = 'portrait'; const gcd = (a: number, b: number): number => b === 0 ? a : gcd(b, a % b); const divisor = gcd(width, height); const aspectRatio = `${width / divisor}:${height / divisor}`; return { width, height, format: metadata.format || 'unknown', size: buffer.length, orientation, aspectRatio, hasAlpha: metadata.hasAlpha || false, colorSpace: metadata.space }; } catch (error) { throw new Error(`Failed to extract image metadata: ${error instanceof Error ? error.message : error}`); } }
src/utils/screenshot-analyzer.ts:89-136 (helper)
Helper function for OCR text extraction using Tesseract.js, configurable language and PSM.
export async function extractTextOCR( buffer: Buffer, options: { lang?: string; psm?: number; } = {} ): Promise<OCRResult> { const { lang = 'eng', psm = 3 } = options; let worker: Worker | null = null; try { worker = await createWorker(lang, 1, { logger: () => {}, // Suppress logs }); await worker.setParameters({ tessedit_pageseg_mode: psm as any, }); const { data } = await worker.recognize(buffer); const words = data.words.map(word => ({ text: word.text, confidence: word.confidence, bbox: { x: word.bbox.x0, y: word.bbox.y0, width: word.bbox.x1 - word.bbox.x0, height: word.bbox.y1 - word.bbox.y0 } })); const lines = data.lines.map(line => line.text); return { text: data.text.trim(), confidence: data.confidence, words, lines }; } catch (error) { throw new Error(`OCR extraction failed: ${error instanceof Error ? error.message : error}`); } finally { if (worker) { await worker.terminate(); } } }

Zebrunner MCP Server

analyze_screenshot

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API