Skip to main content
Glama

OCR a scanned/image-only PDF (Tesseract.js)

obsidian_ocr_pdf
Read-onlyIdempotent

Extracts text from scanned or image-only PDFs using Tesseract OCR, returning per-page text and confidence scores. Supports multilingual OCR and optional page ranges.

Instructions

Runs Tesseract OCR over each page of an image-only / scanned PDF, returning per-page text + per-page confidence + mean confidence + the same shape as obsidian_read_pdf. Use this when obsidian_read_pdf returns has_text: false (typical for scans, photographed paper, image-only PDFs). Multilingual via lang (default 'eng'; multi-lang via '+', e.g. 'eng+rus'). Optional pages range and scale (DPI multiplier, default 2 ~ 150 DPI, capped at 4). ~1-2s per page on M1 CPU. Read-only. Powered by Tesseract.js (Apache-2.0; language trained-data must be pre-installed via enquire-mcp install-ocr-lang <code> — serve mode makes zero outbound network calls, so a language missing from the local cache fails closed with an install hint rather than downloading at runtime) + @napi-rs/canvas for PDF→bitmap rendering. Both gated to optionalDependencies so the markdown-only path stays zero-cost.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pathYesVault-relative path of the .pdf file (with or without .pdf)
langNoTesseract language pack(s). Default 'eng'. Multi-lang via '+': 'eng+rus' for English+Russian mixed scans (max 8 packs per call). Common: 'eng', 'rus', 'jpn', 'chi_sim', 'fra', 'deu'.
pagesNoOptional 1-indexed inclusive page range, e.g. [2, 5] OCRs pages 2..5
scaleNoRender scale (DPI multiplier). Default 2 (~150 DPI). Higher = better OCR on small text but slower.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: read-only nature, ~1-2s per page performance, optional dependencies for zero-cost markdown path, Tesseract.js license and pre-installation requirement, no outbound network calls. Annotations already declare readOnlyHint and idempotentHint, and description reinforces without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed and informative, but somewhat dense. It front-loads the main purpose, which is good, but could be more structured (e.g., bullet points). However, every sentence adds value, so it earns a 4 rather than 5 for slight verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters and no output schema, the description is thorough. It explains the tool's role, performance, dependencies, installation, and fallback behavior. It describes return shape ('per-page text + per-page confidence + mean confidence + same shape as `obsidian_read_pdf`'), which compensates for lack of output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The description adds extra meaning: default `lang` is 'eng', multi-lang via '+' with max 8 packs; `pages` is 1-indexed inclusive range; `scale` default 2 (~150 DPI), capped at 4, with performance trade-off. This enriches the schema significantly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs OCR on image-only/scanned PDFs, with specific verb 'OCR' and resource 'image-only/scanned PDF'. It distinguishes from sibling `obsidian_read_pdf` by referencing the `has_text: false` return case, ensuring no ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this when `obsidian_read_pdf` returns `has_text: false`', providing clear context. Also mentions multilingual setup via `lang`, optional `pages` and `scale`, and fallback for missing language data, guiding proper invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/oomkapwn/enquire-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server