analyze_screenshot
Analyze test screenshots using OCR and visual analysis to extract text, compare UI states, and provide detailed image analysis for QA validation.
Instructions
🔍 Analyze test screenshot with OCR and visual analysis - returns image to Claude Vision for detailed analysis
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| screenshotUrl | No | Screenshot URL to download and analyze | |
| screenshotPath | No | Local path to screenshot file | |
| testId | No | Test ID for context | |
| enableOCR | No | Enable OCR text extraction (slower) | |
| analysisType | No | basic=metadata+OCR only, detailed=includes image for Claude Vision | detailed |
| expectedState | No | Expected UI state for comparison |
Implementation Reference
- src/utils/screenshot-analyzer.ts:255-299 (handler)Core handler function that implements the analyze_screenshot tool logic. Analyzes image buffer for metadata (using Sharp), optional OCR (Tesseract.js), UI elements detection, and device info detection.export async function analyzeScreenshot( buffer: Buffer, options: { enableOCR?: boolean; ocrLanguage?: string; } = {} ): Promise<ScreenshotAnalysis> { const { enableOCR = false, ocrLanguage = 'eng' } = options; // Extract metadata const metadata = await getImageMetadata(buffer); // Optional OCR let ocrResult: OCRResult | undefined; let uiElements: ScreenshotAnalysis['uiElements'] | undefined; if (enableOCR) { try { ocrResult = await extractTextOCR(buffer, { lang: ocrLanguage }); const uiDetection = detectUIElements(ocrResult.text); uiElements = { hasLoadingIndicator: uiDetection.hasLoadingIndicator, hasErrorDialog: uiDetection.hasErrorDialog, hasEmptyState: uiDetection.hasEmptyState, hasNavigationBar: uiDetection.hasNavigationBar }; } catch (error) { console.warn('OCR failed, continuing without text extraction:', error); } } // Device detection const deviceInfo = detectDeviceInfo(metadata); return { metadata, ocrText: ocrResult, deviceInfo: { detectedDevice: deviceInfo.detectedDevice, statusBarVisible: metadata.height > 2000, // Rough heuristic navigationBarVisible: uiElements?.hasNavigationBar }, uiElements }; }
- Output schema/type definition for the screenshot analysis result, including metadata, OCR results, device info, and UI elements.export interface ScreenshotAnalysis { metadata: ImageMetadata; ocrText?: OCRResult; deviceInfo?: { detectedDevice?: string; statusBarVisible?: boolean; navigationBarVisible?: boolean; }; uiElements?: { hasLoadingIndicator?: boolean; hasErrorDialog?: boolean; hasEmptyState?: boolean; hasNavigationBar?: boolean; }; }
- Input/output schema for image metadata extracted by Sharp.export interface ImageMetadata { width: number; height: number; format: string; size: number; orientation: 'portrait' | 'landscape' | 'square'; aspectRatio: string; hasAlpha: boolean; colorSpace?: string; }
- Helper function to extract detailed image metadata using Sharp library.export async function getImageMetadata(buffer: Buffer): Promise<ImageMetadata> { try { const image = sharp(buffer); const metadata = await image.metadata(); const stats = await image.stats(); const width = metadata.width || 0; const height = metadata.height || 0; let orientation: 'portrait' | 'landscape' | 'square' = 'square'; if (width > height) orientation = 'landscape'; else if (height > width) orientation = 'portrait'; const gcd = (a: number, b: number): number => b === 0 ? a : gcd(b, a % b); const divisor = gcd(width, height); const aspectRatio = `${width / divisor}:${height / divisor}`; return { width, height, format: metadata.format || 'unknown', size: buffer.length, orientation, aspectRatio, hasAlpha: metadata.hasAlpha || false, colorSpace: metadata.space }; } catch (error) { throw new Error(`Failed to extract image metadata: ${error instanceof Error ? error.message : error}`); } }
- Helper function for OCR text extraction using Tesseract.js, configurable language and PSM.export async function extractTextOCR( buffer: Buffer, options: { lang?: string; psm?: number; } = {} ): Promise<OCRResult> { const { lang = 'eng', psm = 3 } = options; let worker: Worker | null = null; try { worker = await createWorker(lang, 1, { logger: () => {}, // Suppress logs }); await worker.setParameters({ tessedit_pageseg_mode: psm as any, }); const { data } = await worker.recognize(buffer); const words = data.words.map(word => ({ text: word.text, confidence: word.confidence, bbox: { x: word.bbox.x0, y: word.bbox.y0, width: word.bbox.x1 - word.bbox.x0, height: word.bbox.y1 - word.bbox.y0 } })); const lines = data.lines.map(line => line.text); return { text: data.text.trim(), confidence: data.confidence, words, lines }; } catch (error) { throw new Error(`OCR extraction failed: ${error instanceof Error ? error.message : error}`); } finally { if (worker) { await worker.terminate(); } } }