Skip to main content
Glama

find_assets_in_screenshot

Locate specific image assets within webpage screenshots using template matching to return their exact positions and dimensions for layout analysis.

Instructions

Find known image assets within a screenshot. Uses template matching to locate each asset and return its position (x, y, width, height). Useful for determining where specific images appear in a webpage screenshot.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
screenshot_pathYesAbsolute path to the screenshot image file
asset_pathsYesList of absolute paths to asset images to find
thresholdNoMatch confidence threshold (0-1). Default 0.8

Implementation Reference

  • MCP server tool handler for 'find_assets_in_screenshot'. Validates inputs, calls the helper function find_all_assets, formats and returns the result as JSON.
    if name == "find_assets_in_screenshot":
        screenshot_path = arguments["screenshot_path"]
        asset_paths = arguments["asset_paths"]
        threshold = arguments.get("threshold", 0.8)
    
        # Validate paths
        if not Path(screenshot_path).exists():
            return [TextContent(
                type="text",
                text=json.dumps({"error": f"Screenshot not found: {screenshot_path}"})
            )]
    
        missing_assets = [p for p in asset_paths if not Path(p).exists()]
        if missing_assets:
            return [TextContent(
                type="text",
                text=json.dumps({"error": f"Assets not found: {missing_assets}"})
            )]
    
        # Find assets
        matches = find_all_assets(screenshot_path, asset_paths, threshold)
    
        result = {
            "found": len(matches),
            "total_assets": len(asset_paths),
            "matches": [m.to_dict() for m in matches],
        }
    
        return [TextContent(type="text", text=json.dumps(result, indent=2))]
  • Registration of the 'find_assets_in_screenshot' tool in the list_tools handler, including name, description, and input schema.
    Tool(
        name="find_assets_in_screenshot",
        description=(
            "Find known image assets within a screenshot. Uses template matching "
            "to locate each asset and return its position (x, y, width, height). "
            "Useful for determining where specific images appear in a webpage screenshot."
        ),
        inputSchema={
            "type": "object",
            "properties": {
                "screenshot_path": {
                    "type": "string",
                    "description": "Absolute path to the screenshot image file",
                },
                "asset_paths": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "List of absolute paths to asset images to find",
                },
                "threshold": {
                    "type": "number",
                    "description": "Match confidence threshold (0-1). Default 0.8",
                    "default": 0.8,
                },
            },
            "required": ["screenshot_path", "asset_paths"],
        },
    ),
  • Helper function called by the handler. Loops through asset paths and uses find_asset_in_screenshot to collect all matches.
    def find_all_assets(
        screenshot_path: str,
        asset_paths: list[str],
        threshold: float = 0.8,
    ) -> list[AssetMatch]:
        """
        Find all provided assets within a screenshot.
    
        Args:
            screenshot_path: Path to the screenshot image
            asset_paths: List of paths to asset images to find
            threshold: Minimum confidence score (0-1) for matches
    
        Returns:
            List of AssetMatch for all found assets
        """
        matches = []
    
        for asset_path in asset_paths:
            match = find_asset_in_screenshot(screenshot_path, asset_path, threshold)
            if match:
                matches.append(match)
    
        return matches
  • Core implementation using OpenCV template matching to find a single asset in the screenshot and return an AssetMatch with bounding box and confidence.
    def find_asset_in_screenshot(
        screenshot_path: str,
        asset_path: str,
        threshold: float = 0.8,
    ) -> AssetMatch | None:
        """
        Find a single asset within a screenshot using template matching.
    
        Args:
            screenshot_path: Path to the screenshot image
            asset_path: Path to the asset/template image to find
            threshold: Minimum confidence score (0-1) for a match
    
        Returns:
            AssetMatch if found above threshold, None otherwise
        """
        # Load images
        screenshot = load_image(screenshot_path)
        template = load_image(asset_path)
    
        # Get dimensions
        template_h, template_w = template.shape[:2]
    
        # Convert to same format for matching
        # Handle alpha channel if present
        if len(screenshot.shape) == 3 and screenshot.shape[2] == 4:
            screenshot_gray = cv2.cvtColor(screenshot, cv2.COLOR_BGRA2GRAY)
        elif len(screenshot.shape) == 3:
            screenshot_gray = cv2.cvtColor(screenshot, cv2.COLOR_BGR2GRAY)
        else:
            screenshot_gray = screenshot
    
        if len(template.shape) == 3 and template.shape[2] == 4:
            template_gray = cv2.cvtColor(template, cv2.COLOR_BGRA2GRAY)
        elif len(template.shape) == 3:
            template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
        else:
            template_gray = template
    
        # Template matching
        result = cv2.matchTemplate(screenshot_gray, template_gray, cv2.TM_CCOEFF_NORMED)
    
        # Find best match
        min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
    
        if max_val >= threshold:
            return AssetMatch(
                asset_path=asset_path,
                asset_name=os.path.basename(asset_path),
                bbox=BoundingBox(
                    x=max_loc[0],
                    y=max_loc[1],
                    width=template_w,
                    height=template_h,
                ),
                confidence=float(max_val),
            )
    
        return None

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/katlis/layout-detector-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server