qwen_vision_understand
Analyze images with a vision model to generate code from UI screenshots, interpret design mockups, debug visuals, and understand charts or documents.
Instructions
Analyze an image using Alibaba Qwen3.7-plus (multimodal vision model). Supports local image files and remote URLs. Reaches DashScope via the Anthropic-compatible endpoint (https://dashscope.aliyuncs.com/apps/anthropic). Excels at: UI screenshot→code, design mockup analysis, visual debugging, chart/document understanding.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| image | Yes | Image source: local file path (e.g. C:/path/to/screenshot.png) or URL (https://...) | |
| prompt | Yes | What to ask about the image. Be specific for best results. E.g.: 'Recreate this UI as HTML with Tailwind CSS' | |
| max_tokens | No | Maximum output tokens | |
| temperature | No | Sampling temperature (0-2) |