describe_screenshot
Analyze screenshots by detecting UI regions with vision models, returning bounding boxes and descriptive labels.
Instructions
Describe UI regions in a screenshot using Florence-2.
Args: image_path: Absolute or relative path to the screenshot file (supports PNG, JPEG, SVG). detail_level: 'normal' for dense region captions, 'high' for per-region descriptions. model_mode: 'fast' for Florence-2 (default), 'deep' for MiniCPM-V 4.6 (better document understanding).
Returns: Dict with detected regions (bounding boxes and labels) and model name.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| image_path | Yes | ||
| model_mode | No | fast | |
| detail_level | No | normal |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||