agnes_vision
Analyze one or more images with a text instruction to describe, OCR, or answer questions about the visual content.
Instructions
Capability 4 — Multimodal understanding. Send one or more images (public URLs or data URIs) plus a text instruction and the model describes, analyzes, OCRs, or answers questions about the visual content. Models: agnes-2.0-flash, agnes-1.5-flash (both accept image_url input).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Vision-capable chat model. | agnes-2.0-flash |
| images | Yes | One or more publicly accessible image URLs or data:image/...;base64,... URIs. | |
| instruction | No | What to do with the image(s). | Describe the content of this image. |
| system | No | Optional system prompt. | |
| temperature | No | ||
| max_tokens | No | Max output tokens (up to 1M context). | |
| stream | No |