Schema | macos-vision-mcp

macos-vision-mcp

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`resources`	{ "listChanged": true }

Tools

Functions exposed to the LLM to take actions

Name	Description
ocr_imageA	Extract text from a local image or PDF file using Apple Vision OCR (offline, no API key needed). USE WHEN: The user provides a local file path to an image, screenshot, scanned document, or PDF and wants to extract the text from it. DO NOT USE for: images hosted on URLs (download first), non-macOS systems, or when the user wants face/barcode detection (use the dedicated tools). Supported formats: jpg, jpeg, png, heic, heif, tiff, bmp, pdf Parameters: path — absolute or relative path to the image/PDF file format — "text" returns a single plain-text string (default) "blocks" returns JSON { pages: [{ page, paragraphs, textBlocks }] } with reading-order paragraphs and per-block bounding boxes. Each textBlock carries lineId, paragraphId, confidence, and page-local bbox (0–1). PDFs return one entry per page. start_page — PDFs only — 1-based index of the first page to OCR (default 1). Ignored for images. start_page past the end returns an empty result. max_pages — PDFs only — maximum number of pages to OCR from start_page (default: all). Ignored for images. Returns: extracted text as a string (format="text") or a JSON document with per-page paragraphs and text blocks (format="blocks").
detect_facesA	Detect human faces in a local image file using Apple Vision (offline, no API key needed). USE WHEN: The user wants to know how many faces are in a local image, or needs their positions. DO NOT USE for: text extraction (use ocr_image), barcode reading (use detect_barcodes). Returns: JSON with the total face count and an array of face positions expressed as percentage of image dimensions (top, left, width, height).
detect_barcodesA	Detect and decode barcodes or QR codes in a local image file using Apple Vision (offline, no API key needed). USE WHEN: The user wants to read a QR code, barcode, EAN, UPC, Code128, PDF417, Aztec, DataMatrix or other 1D/2D code from a local file. DO NOT USE for: text extraction (use ocr_image), face detection (use detect_faces). Supported symbologies: QR, EAN-8, EAN-13, UPC-E, Code39, Code93, Code128, ITF, PDF417, Aztec, DataMatrix, GS1DataBar and more. Returns: JSON array of detected codes, each with its decoded value and symbology type.
detect_documentA	Detect the boundary of a document in a local image using Apple Vision (offline, no API key needed). USE WHEN: The user has a photo of a piece of paper, a receipt, a card, an ID, or any rectangular document and wants the four corner points — typically as a hint for cropping, deskewing, or straightening the image before further OCR. DO NOT USE for: reading the document text (use ocr_image), classifying the image (use classify_image), or analyzing a PDF (PDFs are already rectangular pages). Returns: JSON with the four corner points of the detected document — topLeft, topRight, bottomLeft, bottomRight — each as { x, y } in 0–1 image coordinates, plus a confidence score. Returns { "detected": false } if no document is found.
classify_imageA	Classify the content of a local image into categories using Apple Vision (offline, no API key needed). USE WHEN: The user wants to know what is depicted in an image — objects, scenes, activities, animals, food, etc. Works with 1000+ categories and returns confidence scores. DO NOT USE for: text extraction (use ocr_image), face/barcode detection (dedicated tools), images that need detailed visual description (use the model's built-in vision). Returns: JSON array of classification labels sorted by confidence (highest first), each with a label name and confidence score (0–1).
analyze_documentA	Run a full analysis pipeline on a local image or PDF and return structured JSON for document reconstruction: OCR (with line/paragraph grouping in reading order), face detection, barcode/QR detection, and rectangle detection — all in parallel, fully offline, no API key needed. USE WHEN: The user wants the model to reconstruct a document into Markdown, HTML, DOCX, or any other format — invoices, scanned reports, contracts, IDs, receipts, mixed-content scans. Returns enough structure (paragraphs + raw text blocks with bounding boxes) that the model can render the output in whatever format the user asks for. DO NOT USE when: the user needs only one capability (use the dedicated tool — it will be faster). Returns: JSON with this shape: { "source": { "path", "pageCount", "isPdf" }, "pages": [ { "page": 0, "paragraphs": [{ "paragraphId", "lineIds", "text" }, ...], // primary surface "textBlocks": [{ "text", "lineId", "paragraphId", "confidence", "bbox": { "x","y","width","height" } }, ...], "faces": [{ "x","y","width","height" }, ...], "barcodes": [{ "value","symbology","bbox" }, ...], "rectangles": [{ "confidence","bbox" }, ...] }, ... ], "summary": { "totalTextBlocks","totalParagraphs","totalFaces","totalBarcodes","totalRectangles" } } Use paragraphs[].text as the primary surface for reading-order content. Use textBlocks[] when spatial information matters — multi-column layouts, tables, forms. PDFs return one entry per page; all coordinates are page-local 0–1. Face/barcode/rectangle detection on PDFs is best-effort (the underlying binary analyzes the PDF as a whole rather than per page). Parameters: path — absolute or relative path to the image/PDF file start_page — PDFs only — 1-based index of the first page to analyze (default 1). Only narrows the OCR pass; face/barcode/rectangle detections are still whole-document and attached to the first returned page. Ignored for images. max_pages — PDFs only — maximum number of pages to OCR from start_page (default: all). Ignored for images.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
`macos-vision-capabilities`

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/woladi/macos-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server