Skip to main content
Glama

find_assets_in_screenshot

Locate specific image assets within webpage screenshots using template matching to return their exact positions and dimensions for layout analysis.

Instructions

Find known image assets within a screenshot. Uses template matching to locate each asset and return its position (x, y, width, height). Useful for determining where specific images appear in a webpage screenshot.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
screenshot_pathYesAbsolute path to the screenshot image file
asset_pathsYesList of absolute paths to asset images to find
thresholdNoMatch confidence threshold (0-1). Default 0.8

Implementation Reference

  • MCP server tool handler for 'find_assets_in_screenshot'. Validates inputs, calls the helper function find_all_assets, formats and returns the result as JSON.
    if name == "find_assets_in_screenshot":
        screenshot_path = arguments["screenshot_path"]
        asset_paths = arguments["asset_paths"]
        threshold = arguments.get("threshold", 0.8)
    
        # Validate paths
        if not Path(screenshot_path).exists():
            return [TextContent(
                type="text",
                text=json.dumps({"error": f"Screenshot not found: {screenshot_path}"})
            )]
    
        missing_assets = [p for p in asset_paths if not Path(p).exists()]
        if missing_assets:
            return [TextContent(
                type="text",
                text=json.dumps({"error": f"Assets not found: {missing_assets}"})
            )]
    
        # Find assets
        matches = find_all_assets(screenshot_path, asset_paths, threshold)
    
        result = {
            "found": len(matches),
            "total_assets": len(asset_paths),
            "matches": [m.to_dict() for m in matches],
        }
    
        return [TextContent(type="text", text=json.dumps(result, indent=2))]
  • Registration of the 'find_assets_in_screenshot' tool in the list_tools handler, including name, description, and input schema.
    Tool(
        name="find_assets_in_screenshot",
        description=(
            "Find known image assets within a screenshot. Uses template matching "
            "to locate each asset and return its position (x, y, width, height). "
            "Useful for determining where specific images appear in a webpage screenshot."
        ),
        inputSchema={
            "type": "object",
            "properties": {
                "screenshot_path": {
                    "type": "string",
                    "description": "Absolute path to the screenshot image file",
                },
                "asset_paths": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "List of absolute paths to asset images to find",
                },
                "threshold": {
                    "type": "number",
                    "description": "Match confidence threshold (0-1). Default 0.8",
                    "default": 0.8,
                },
            },
            "required": ["screenshot_path", "asset_paths"],
        },
    ),
  • Helper function called by the handler. Loops through asset paths and uses find_asset_in_screenshot to collect all matches.
    def find_all_assets(
        screenshot_path: str,
        asset_paths: list[str],
        threshold: float = 0.8,
    ) -> list[AssetMatch]:
        """
        Find all provided assets within a screenshot.
    
        Args:
            screenshot_path: Path to the screenshot image
            asset_paths: List of paths to asset images to find
            threshold: Minimum confidence score (0-1) for matches
    
        Returns:
            List of AssetMatch for all found assets
        """
        matches = []
    
        for asset_path in asset_paths:
            match = find_asset_in_screenshot(screenshot_path, asset_path, threshold)
            if match:
                matches.append(match)
    
        return matches
  • Core implementation using OpenCV template matching to find a single asset in the screenshot and return an AssetMatch with bounding box and confidence.
    def find_asset_in_screenshot(
        screenshot_path: str,
        asset_path: str,
        threshold: float = 0.8,
    ) -> AssetMatch | None:
        """
        Find a single asset within a screenshot using template matching.
    
        Args:
            screenshot_path: Path to the screenshot image
            asset_path: Path to the asset/template image to find
            threshold: Minimum confidence score (0-1) for a match
    
        Returns:
            AssetMatch if found above threshold, None otherwise
        """
        # Load images
        screenshot = load_image(screenshot_path)
        template = load_image(asset_path)
    
        # Get dimensions
        template_h, template_w = template.shape[:2]
    
        # Convert to same format for matching
        # Handle alpha channel if present
        if len(screenshot.shape) == 3 and screenshot.shape[2] == 4:
            screenshot_gray = cv2.cvtColor(screenshot, cv2.COLOR_BGRA2GRAY)
        elif len(screenshot.shape) == 3:
            screenshot_gray = cv2.cvtColor(screenshot, cv2.COLOR_BGR2GRAY)
        else:
            screenshot_gray = screenshot
    
        if len(template.shape) == 3 and template.shape[2] == 4:
            template_gray = cv2.cvtColor(template, cv2.COLOR_BGRA2GRAY)
        elif len(template.shape) == 3:
            template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
        else:
            template_gray = template
    
        # Template matching
        result = cv2.matchTemplate(screenshot_gray, template_gray, cv2.TM_CCOEFF_NORMED)
    
        # Find best match
        min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
    
        if max_val >= threshold:
            return AssetMatch(
                asset_path=asset_path,
                asset_name=os.path.basename(asset_path),
                bbox=BoundingBox(
                    x=max_loc[0],
                    y=max_loc[1],
                    width=template_w,
                    height=template_h,
                ),
                confidence=float(max_val),
            )
    
        return None

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/katlis/layout-detector-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server