Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}
resources
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
ocr_imageA

Extract text from a local image or PDF file using Apple Vision OCR (offline, no API key needed).

USE WHEN: The user provides a local file path to an image, screenshot, scanned document, or PDF and wants to extract the text from it. DO NOT USE for: images hosted on URLs (download first), non-macOS systems, or when the user wants face/barcode detection (use the dedicated tools).

Supported formats: jpg, jpeg, png, heic, heif, tiff, bmp, pdf

Parameters: path — absolute or relative path to the image/PDF file format — "text" returns a single plain-text string (default) "blocks" returns JSON { pages: [{ page, paragraphs, textBlocks }] } with reading-order paragraphs and per-block bounding boxes. Each textBlock carries lineId, paragraphId, confidence, and page-local bbox (0–1). PDFs return one entry per page. start_page — PDFs only — 1-based index of the first page to OCR (default 1). Ignored for images. start_page past the end returns an empty result. max_pages — PDFs only — maximum number of pages to OCR from start_page (default: all). Ignored for images.

Returns: extracted text as a string (format="text") or a JSON document with per-page paragraphs and text blocks (format="blocks").

detect_facesA

Detect human faces in a local image file using Apple Vision (offline, no API key needed).

USE WHEN: The user wants to know how many faces are in a local image, or needs their positions. DO NOT USE for: text extraction (use ocr_image), barcode reading (use detect_barcodes).

Returns: JSON with the total face count and an array of face positions expressed as percentage of image dimensions (top, left, width, height).

detect_barcodesA

Detect and decode barcodes or QR codes in a local image file using Apple Vision (offline, no API key needed).

USE WHEN: The user wants to read a QR code, barcode, EAN, UPC, Code128, PDF417, Aztec, DataMatrix or other 1D/2D code from a local file. DO NOT USE for: text extraction (use ocr_image), face detection (use detect_faces).

Supported symbologies: QR, EAN-8, EAN-13, UPC-E, Code39, Code93, Code128, ITF, PDF417, Aztec, DataMatrix, GS1DataBar and more.

Returns: JSON array of detected codes, each with its decoded value and symbology type.

detect_documentA

Detect the boundary of a document in a local image using Apple Vision (offline, no API key needed).

USE WHEN: The user has a photo of a piece of paper, a receipt, a card, an ID, or any rectangular document and wants the four corner points — typically as a hint for cropping, deskewing, or straightening the image before further OCR. DO NOT USE for: reading the document text (use ocr_image), classifying the image (use classify_image), or analyzing a PDF (PDFs are already rectangular pages).

Returns: JSON with the four corner points of the detected document — topLeft, topRight, bottomLeft, bottomRight — each as { x, y } in 0–1 image coordinates, plus a confidence score. Returns { "detected": false } if no document is found.

classify_imageA

Classify the content of a local image into categories using Apple Vision (offline, no API key needed).

USE WHEN: The user wants to know what is depicted in an image — objects, scenes, activities, animals, food, etc. Works with 1000+ categories and returns confidence scores. DO NOT USE for: text extraction (use ocr_image), face/barcode detection (dedicated tools), images that need detailed visual description (use the model's built-in vision).

Returns: JSON array of classification labels sorted by confidence (highest first), each with a label name and confidence score (0–1).

analyze_documentA

Run a full analysis pipeline on a local image or PDF and return structured JSON for document reconstruction: OCR (with line/paragraph grouping in reading order), face detection, barcode/QR detection, and rectangle detection — all in parallel, fully offline, no API key needed.

USE WHEN: The user wants the model to reconstruct a document into Markdown, HTML, DOCX, or any other format — invoices, scanned reports, contracts, IDs, receipts, mixed-content scans. Returns enough structure (paragraphs + raw text blocks with bounding boxes) that the model can render the output in whatever format the user asks for. DO NOT USE when: the user needs only one capability (use the dedicated tool — it will be faster).

Returns: JSON with this shape: { "source": { "path", "pageCount", "isPdf" }, "pages": [ { "page": 0, "paragraphs": [{ "paragraphId", "lineIds", "text" }, ...], // primary surface "textBlocks": [{ "text", "lineId", "paragraphId", "confidence", "bbox": { "x","y","width","height" } }, ...], "faces": [{ "x","y","width","height" }, ...], "barcodes": [{ "value","symbology","bbox" }, ...], "rectangles": [{ "confidence","bbox" }, ...] }, ... ], "summary": { "totalTextBlocks","totalParagraphs","totalFaces","totalBarcodes","totalRectangles" } }

Use paragraphs[].text as the primary surface for reading-order content. Use textBlocks[] when spatial information matters — multi-column layouts, tables, forms. PDFs return one entry per page; all coordinates are page-local 0–1. Face/barcode/rectangle detection on PDFs is best-effort (the underlying binary analyzes the PDF as a whole rather than per page).

Parameters: path — absolute or relative path to the image/PDF file start_page — PDFs only — 1-based index of the first page to analyze (default 1). Only narrows the OCR pass; face/barcode/rectangle detections are still whole-document and attached to the first returned page. Ignored for images. max_pages — PDFs only — maximum number of pages to OCR from start_page (default: all). Ignored for images.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription
macos-vision-capabilities

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/woladi/macos-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server