macOS Native OCR MCP Server

Overview Schema Related Servers Score Discussions

macos-ocr-mcp
skills
macos-ocr-helper

SKILL.md•7.74 KiB

--- name: macos-ocr-helper description: This skill should be used when a user asks to extract text from images or PDFs, especially on macOS. Use it for OCR (Optical Character Recognition) tasks including converting images to text, PDF to Markdown, extracting tables to CSV, processing receipts/invoices, extracting code from screenshots, or analyzing scanned documents. It provides two local tools: read_image_text for pure text extraction and read_image_layout for structured layout information. --- # macOS OCR Helper ## Overview This skill provides direct access to macOS's native Vision framework for offline, high-accuracy OCR (Optical Character Recognition). It enables extracting text from images and PDFs with support for multiple languages (Chinese, English, and mixed), automatic paragraph merging, table optimization, and structured layout analysis for LLM-friendly output. **No MCP server required - works directly on macOS.** ## Quick Start To use macOS OCR, call one of the two available local tools: - **`read_image_text`**: Extract pure text from images or PDFs with automatic paragraph merging - **`read_image_layout`**: Extract structured layout information including text blocks, bounding boxes, and semantic information All processing happens locally on macOS with no cloud uploads or API keys required. ## Tool Selection Guide ### Use `read_image_text` when: - Only need plain text content without layout information - Quickly extract readable text from documents - Perform text search, content analysis, or summarization - Use OCR results for simple string manipulation **Example**: ``` Call macos-ocr's read_image_text to read the following file: image_path=/absolute/path/image.png Output only the OCR text without explanations or formatting. ``` ### Use `read_image_layout` when: - Need to preserve or reconstruct document layout - Process complex documents with tables or multi-column layouts - Require LLM to reconstruct document structure - Need to locate text positions within the image - Convert OCR results to structured formats (CSV, Markdown, etc.) **Example**: ``` Call macos-ocr's read_image_layout to read the following file: image_path=/absolute/path/document.pdf Convert the returned blocks to Markdown format based on bbox coordinates. ``` ## Common Use Cases ### Extract Plain Text from Images **Scenario**: Quickly copy text from an image screenshot Call `read_image_text` with the image path. The tool automatically handles paragraph merging and table optimization. **Output**: Pure text string ### Convert Images/PDFs to Markdown **Scenario**: Transform a scanned document into editable Markdown 1. Call `read_image_layout` to get structured blocks with layout information 2. Process blocks using bbox coordinates to reconstruct layout 3. Apply appropriate Markdown syntax for headings, paragraphs, lists, and tables 4. Use page separators for multi-page documents (e.g., `--- Page N ---`) **Output**: Markdown document with preserved layout ### Extract Tables to CSV **Scenario**: Convert a table screenshot into spreadsheet-compatible format 1. Call `read_image_layout` to get table structure 2. Identify table boundaries using bbox coordinates 3. Format cells as comma-separated values 4. Handle merged cells with appropriate placeholders or repeated values **Output**: CSV format with table data ### Process Receipts/Invoices **Scenario**: Extract structured information from financial documents 1. Call `read_image_text` to get text content 2. Parse text to extract fields: merchant, date, amount, tax, items 3. Output structured JSON with all extracted fields **Output**: JSON object with receipt details: ```json { "merchant": string|null, "date": "YYYY-MM-DD"|null, "currency": string|null, "total": number|null, "tax": number|null, "items": [...], "payment_method": string|null, "invoice_no": string|null } ``` ### Extract Code from Screenshots **Scenario**: Get executable code from terminal or code editor screenshots 1. Call `read_image_text` to get code content 2. Remove line numbers, prompts (e.g., `$`, `>>>`), and irrelevant characters 3. Fix common OCR errors: 0/O confusion, 1/l/I confusion, punctuation errors 4. Maintain original indentation 5. Wrap in Markdown code block with appropriate language identifier **Output**: Markdown code block with corrected syntax ### Analyze Long Documents **Scenario**: Extract structure and key points from scanned papers or contracts 1. Call `read_image_text` for full text extraction 2. Generate structured outline (H1/H2/H3 headings) 3. Identify key points (maximum 10 items) 4. List potential issues requiring human verification (unclear numbers, broken references) **Output**: Structured outline with key information ## Best Practices ### Prompt Construction - Always explicitly specify the tool to use (`read_image_text` or `read_image_layout`) - Use absolute file paths - Clearly state expected output format - Avoid ambiguous instructions ### Quality Optimization For code documents: - Remove line numbers and prompts - Fix character confusions (0/O, 1/l/I, :, ;) - Preserve indentation - Wrap in appropriate code block For formal documents: - Generate structural outline - Extract key points with original text references - List verification notes for unclear content ### Data Format Notes **read_image_layout output structure**: ```json [ { "text": "Corrected semantic text", "bbox": { "x": 0.0, // Normalized x-coordinate [0-1] "y": 0.0, // Normalized y-coordinate [0-1] "w": 0.5, // Normalized width [0-1] "h": 0.2 // Normalized height [0-1] }, "lines": [ // Original line information (optional) { "text": "Original line text", "bbox": {...} } ] } ] ``` All bbox coordinates are normalized values between 0 and 1, representing positions relative to image dimensions. ### Error Handling Common OCR error types and handling strategies: - **Character confusion**: Fix 0/O, 1/l/I, punctuation in code blocks - **Punctuation errors**: Correct quotes and bracket pairs - **Line splitting**: Merge incorrectly split paragraphs (automatically handled by the tool) ### Performance Considerations - For large documents, consider preprocessing (compression, cropping) - For batch processing, clearly separate results for each file - Request progress updates or summaries when processing multiple files ## Troubleshooting ### Poor Recognition Quality - Check image resolution (recommended 300 DPI or higher) - Ensure image clarity without blur - Try increasing contrast - Check and correct image rotation if needed ### Multi-Page PDF Issues - Verify PDF file integrity - Check for password protection - Use `read_image_layout` for more detailed information ### Inaccurate Coordinates - Verify image dimensions and orientation - Ensure normalized coordinates are within [0, 1] range - Use `read_image_layout` for complete structural information ## Resources ### references/ This skill includes reference documentation for detailed usage patterns and best practices. **references/examples.md**: - Six complete usage scenarios with template prompts - Detailed examples for each common use case - Output format specifications **references/best_practices.md**: - Tool selection guidelines with decision criteria - Prompt writing recommendations - Quality assurance strategies - Error handling and troubleshooting guide - Performance optimization tips Load these reference documents when working with complex scenarios or when detailed guidance is needed. --- ## Important Notes - All OCR processing happens locally on macOS - No cloud uploads or API keys required - Supports Chinese (simplified/traditional), English, and mixed text - Built-in PDF rendering engine handles multi-page documents automatically - Zero configuration required - works out of the box

Loading blob content...

Latest Blog Posts

MCP isn't dead–it's maturing
By punkpeye on January 20, 2026.
mcp
Google's AI Overview Has Been Sending Me the Wrong Customers for 6 Months
By punkpeye on January 20, 2026.
google
ai
startups
Expose Your Local MCP Server to the Internet
By punkpeye on January 19, 2026.
MCP Inspector
mcp
tutorial

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wenjiazhu/macos-ocr-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SKILL.md•7.74 KiB