screenshot_text
Capture a screenshot and extract text using OCR, enabling Claude to read on-screen content like error dialogs or terminal output.
Instructions
Take a screenshot and OCR it with tesseract. Returns the recognized text plus the path to the underlying PNG. Use when Claude needs to READ what is on screen (log windows, error dialogs, terminal output in non-focused windows) rather than just see the image. Requires tesseract-ocr installed.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| active_window | No | If true, capture only the currently-focused window. Default false (full screen). | |
| path | No | Optional target path for the PNG. Defaults to /tmp/claude-linux-mcp/shots/shot-<ts>.png. | |
| lang | No | Tesseract language code (e.g. "eng", "fra", "deu", "nld", or "eng+fra" for multi). Default "eng". Requires the matching tesseract-ocr-<lang> package. |
Implementation Reference
- server.js:349-381 (handler)The handler function `screenshotText` that implements the screenshot_text tool logic. It takes a screenshot (reusing the `screenshot` tool), then runs tesseract OCR on the resulting PNG to extract text. Returns the recognized text, path, size, language used, and text length.
// ─── Tool: screenshot_text ──────────────────────────────────────────────── // Take a screenshot, then OCR it with tesseract. Returns the recognized // text plus the path to the underlying PNG. Useful when Claude needs to // READ what's on screen (log windows, error dialogs, terminal output in // non-focused windows) rather than just see the image. async function screenshotText(args) { if (!BIN.tesseract) { return errorResult('tesseract is not installed. Install with: sudo apt install tesseract-ocr tesseract-ocr-eng (add tesseract-ocr-<lang> for other languages).'); } // Reuse the screenshot tool to capture (and inherit its fallback chain). const shot = await screenshot({ active_window: args.active_window === true, path: args.path }); if (shot.isError) return shot; // screenshot returns { content: [{ type: 'text', text: JSON.stringify({path, ...}) }] } const meta = JSON.parse(shot.content[0].text); const lang = (typeof args.lang === 'string' && args.lang.trim()) ? args.lang.trim() : 'eng'; // tesseract <input> stdout -l <lang> writes plain text to stdout. const r = await run(BIN.tesseract, [meta.path, 'stdout', '-l', lang], { env: cleanEnv(), }); if (r.code !== 0) { return errorResult(`tesseract failed (code ${r.code}): ${r.stderr || r.stdout || 'unknown'}. If lang=${lang} is missing, install tesseract-ocr-${lang}.`); } const text = (r.stdout || '').replace(/\f$/, '').trimEnd(); return textResult({ path: meta.path, size_bytes: meta.size_bytes, active_window: meta.active_window, tool: meta.tool, lang, text, text_length: text.length, }); } - server.js:538-551 (schema)Input schema and description for the screenshot_text tool. Defines three optional parameters: active_window (boolean), path (string for output PNG), and lang (string for tesseract language code, default: 'eng').
{ name: 'screenshot_text', description: 'Take a screenshot and OCR it with tesseract. Returns the recognized text plus the path to the underlying PNG. Use when Claude needs to READ what is on screen (log windows, error dialogs, terminal output in non-focused windows) rather than just see the image. Requires tesseract-ocr installed.', annotations: { title: 'Read text from screen (OCR)', readOnlyHint: true }, inputSchema: { type: 'object', properties: { active_window: { type: 'boolean', description: 'If true, capture only the currently-focused window. Default false (full screen).' }, path: { type: 'string', description: 'Optional target path for the PNG. Defaults to /tmp/claude-linux-mcp/shots/shot-<ts>.png.' }, lang: { type: 'string', description: 'Tesseract language code (e.g. "eng", "fra", "deu", "nld", or "eng+fra" for multi). Default "eng". Requires the matching tesseract-ocr-<lang> package.' }, }, }, }, ]; - server.js:568-568 (registration)Registration of the screenshotText handler in the HANDLERS dispatch map, mapping the tool name 'screenshot_text' to the handler function.
screenshot_text: screenshotText, - server.js:20-28 (helper)Discovery of system binaries including 'tesseract' (line 27) using the `which()` helper, which is checked inside the screenshotText handler before running OCR.
const BIN = { xdotool: which('xdotool'), wmctrl: which('wmctrl'), xclip: which('xclip'), gnomeShot: which('gnome-screenshot'), scrot: which('scrot'), maim: which('maim'), tesseract: which('tesseract'), };