OCR (Extract Text)
ocrExtract text verbatim from images or PDFs with Gemini multimodal OCR. Returns raw text as Markdown, preserving structure without summarization.
Instructions
Extract text verbatim from images or PDFs using Gemini multimodal OCR. Returns the raw text (as Markdown for structure) — no summarising or analysis. For documents/PDFs, MEDIUM resolution gives the same OCR quality at half the token cost.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| images | Yes | One or more images/PDFs to OCR. Use filePath for large files (incl. .pdf). | |
| language | No | Optional hint for the document language (e.g. "German"). Improves accuracy. | |
| prompt | No | Optional override/extra instruction for the OCR (appended to the default). | |
| model | No | Model to use (defaults to the configured image-analysis model). | |
| max_tokens | No | Maximum tokens in response (default 16384). | |
| global_media_resolution | No | Image quality. MEDIUM (default) = same OCR quality as HIGH at 50% token cost. | MEDIA_RESOLUTION_MEDIUM |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| content | Yes | ||
| success | Yes |