vision-analyze
Analyze any image URL: describe scenes, extract text, interpret charts, evaluate interfaces, identify objects, or answer questions using GPT-4o-mini vision.
Instructions
Analyze any image URL using GPT-4o-mini vision. Returns structured analysis based on the mode: describe (full description), ocr (text extraction), chart (data/trend extraction), ui (interface analysis), identify (object/subject ID), or qa (answer a specific question about the image). Input must be a publicly accessible image URL (JPEG, PNG, GIF, WebP). $0.050/call.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | No | Publicly accessible URL of the image to analyze. Must return image/jpeg, image/png, image/gif, or image/webp content-type. Max file size: 20MB. | |
| mode | No | Analysis mode: describe (full scene description), ocr (text extraction), chart (data/chart analysis), ui (UI screenshot analysis), identify (object/subject identification), qa (answer a specific question about the image — requires the 'question' parameter). | |
| question | No | For mode=qa only: the specific question to answer about the image. E.g., 'What is the total revenue shown in Q3?' or 'What does the error message say?' | |
| detail | No | OpenAI vision detail level. 'auto' (default): model decides based on image size. 'low': faster, cheaper, less detail (best for simple images). 'high': slower, more detail (best for charts, dense text, complex scenes). |