find_assets_in_screenshot
Locate specific image assets within webpage screenshots using template matching to return their exact positions and dimensions for layout analysis.
Instructions
Find known image assets within a screenshot. Uses template matching to locate each asset and return its position (x, y, width, height). Useful for determining where specific images appear in a webpage screenshot.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| screenshot_path | Yes | Absolute path to the screenshot image file | |
| asset_paths | Yes | List of absolute paths to asset images to find | |
| threshold | No | Match confidence threshold (0-1). Default 0.8 |
Implementation Reference
- src/layout_detector/server.py:105-133 (handler)MCP server tool handler for 'find_assets_in_screenshot'. Validates inputs, calls the helper function find_all_assets, formats and returns the result as JSON.if name == "find_assets_in_screenshot": screenshot_path = arguments["screenshot_path"] asset_paths = arguments["asset_paths"] threshold = arguments.get("threshold", 0.8) # Validate paths if not Path(screenshot_path).exists(): return [TextContent( type="text", text=json.dumps({"error": f"Screenshot not found: {screenshot_path}"}) )] missing_assets = [p for p in asset_paths if not Path(p).exists()] if missing_assets: return [TextContent( type="text", text=json.dumps({"error": f"Assets not found: {missing_assets}"}) )] # Find assets matches = find_all_assets(screenshot_path, asset_paths, threshold) result = { "found": len(matches), "total_assets": len(asset_paths), "matches": [m.to_dict() for m in matches], } return [TextContent(type="text", text=json.dumps(result, indent=2))]
- src/layout_detector/server.py:23-50 (registration)Registration of the 'find_assets_in_screenshot' tool in the list_tools handler, including name, description, and input schema.Tool( name="find_assets_in_screenshot", description=( "Find known image assets within a screenshot. Uses template matching " "to locate each asset and return its position (x, y, width, height). " "Useful for determining where specific images appear in a webpage screenshot." ), inputSchema={ "type": "object", "properties": { "screenshot_path": { "type": "string", "description": "Absolute path to the screenshot image file", }, "asset_paths": { "type": "array", "items": {"type": "string"}, "description": "List of absolute paths to asset images to find", }, "threshold": { "type": "number", "description": "Match confidence threshold (0-1). Default 0.8", "default": 0.8, }, }, "required": ["screenshot_path", "asset_paths"], }, ),
- Helper function called by the handler. Loops through asset paths and uses find_asset_in_screenshot to collect all matches.def find_all_assets( screenshot_path: str, asset_paths: list[str], threshold: float = 0.8, ) -> list[AssetMatch]: """ Find all provided assets within a screenshot. Args: screenshot_path: Path to the screenshot image asset_paths: List of paths to asset images to find threshold: Minimum confidence score (0-1) for matches Returns: List of AssetMatch for all found assets """ matches = [] for asset_path in asset_paths: match = find_asset_in_screenshot(screenshot_path, asset_path, threshold) if match: matches.append(match) return matches
- Core implementation using OpenCV template matching to find a single asset in the screenshot and return an AssetMatch with bounding box and confidence.def find_asset_in_screenshot( screenshot_path: str, asset_path: str, threshold: float = 0.8, ) -> AssetMatch | None: """ Find a single asset within a screenshot using template matching. Args: screenshot_path: Path to the screenshot image asset_path: Path to the asset/template image to find threshold: Minimum confidence score (0-1) for a match Returns: AssetMatch if found above threshold, None otherwise """ # Load images screenshot = load_image(screenshot_path) template = load_image(asset_path) # Get dimensions template_h, template_w = template.shape[:2] # Convert to same format for matching # Handle alpha channel if present if len(screenshot.shape) == 3 and screenshot.shape[2] == 4: screenshot_gray = cv2.cvtColor(screenshot, cv2.COLOR_BGRA2GRAY) elif len(screenshot.shape) == 3: screenshot_gray = cv2.cvtColor(screenshot, cv2.COLOR_BGR2GRAY) else: screenshot_gray = screenshot if len(template.shape) == 3 and template.shape[2] == 4: template_gray = cv2.cvtColor(template, cv2.COLOR_BGRA2GRAY) elif len(template.shape) == 3: template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY) else: template_gray = template # Template matching result = cv2.matchTemplate(screenshot_gray, template_gray, cv2.TM_CCOEFF_NORMED) # Find best match min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result) if max_val >= threshold: return AssetMatch( asset_path=asset_path, asset_name=os.path.basename(asset_path), bbox=BoundingBox( x=max_loc[0], y=max_loc[1], width=template_w, height=template_h, ), confidence=float(max_val), ) return None