Skip to main content
Glama

Install and Go. One command, single binary. Your AI reads any document — PDF, text, Markdown, DOCX, images.

MCP server for multi-format document access — read, search, extract images, OCR, and fetch documents from URLs via the Model Context Protocol. 12 tools, 6 formats, zero configuration.

go install github.com/drolosoft/go-docs-mcp@latest
# That's it. Single binary, starts in milliseconds.

For a deeper look at why an MCP server beats a direct tool, see Why MCP?


🏆 Why Go-Docs MCP?

Every other document MCP server handles one format — a PDF server for PDFs, a DOCX server for DOCX. You'd need three separate servers to read three formats.

Go-Docs MCP

Others

Single binary, no runtime

Yes

Need Node/Python

go install one-liner

Yes

npm+deps or pip+venv

Multi-format (6 types)

Yes

One format each

Full-text search

Yes

Partial or none

OCR (scanned PDFs + images)

Yes

Rare

Image & table extraction

Yes

Partial

Document outline

Yes

Rare

Fetch from URL

Yes

Rare

Dir-locked, read-only

Yes

Varies

Smart caching

Yes

No

Fully offline

Yes

Yes

Go-Docs MCP reads them all from a single binary — fast, secure, and dependency-free at runtime.


📋 Features — 12 Tools

Category

Tool

Description

Discovery

list_documents

List all documents with metadata (format, pages, size)

Discovery

list_formats

List supported formats and dependency status

Reading

read_document

Full text, specific page, or page ranges from any format

Reading

read_url

Download from URL and extract text (50MB max)

Reading

get_document_summary

First 3 pages as a quick overview

Search

search_document

Case-insensitive full-text search with context

Analysis

get_document_metadata

Title, author, dates, version, page count

Analysis

get_document_outline

Table of contents / bookmarks

Analysis

extract_tables

Tables as structured data

Analysis

extract_images

Images as base64 (max 10 per call)

OCR

ocr_document

Force OCR on scanned/image-based PDFs

OCR

read_image

Extract text from PNG, JPG, TIFF via OCR

Highlights:

  • Fast — mtime-based in-memory caching avoids redundant extraction

  • Multi-format — PDF, TXT, MD, CSV, DOCX, and images from one server

  • OCR — automatic fallback to tesseract for scanned documents

  • Secure — directory-locked with path traversal prevention

  • Portable — works on macOS and Linux


📄 Supported Formats

Format

Dependencies

Notes

PDF

poppler (pdftotext, pdfinfo, pdfimages, pdftoppm)

Full support — text, images, metadata, OCR fallback

TXT, MD, CSV

None

Native, zero dependencies

DOCX

pandoc (optional)

Word document extraction

Images (PNG, JPG, TIFF)

tesseract (optional)

OCR text extraction


📦 Prerequisites

  • Go 1.25+ (install)

  • poppler — required for PDF support

  • tesseract (optional) — enables OCR for scanned docs and images

  • pandoc (optional) — enables DOCX support

# macOS
brew install poppler
brew install tesseract        # optional: OCR
brew install pandoc           # optional: DOCX

# Debian/Ubuntu
apt install poppler-utils
apt install tesseract-ocr     # optional: OCR
apt install pandoc            # optional: DOCX

# Fedora/RHEL
dnf install poppler-utils
dnf install tesseract         # optional: OCR
dnf install pandoc            # optional: DOCX

Note: TXT, MD, and CSV work out of the box with zero dependencies. Install only what you need.


🚀 Installation

From source

go install github.com/drolosoft/go-docs-mcp@latest

Build locally

git clone https://github.com/drolosoft/go-docs-mcp.git
cd go-docs-mcp
make build      # produces ./go-docs-mcp
make install    # installs to /usr/local/bin/

⚙️ Configuration

Go-Docs MCP reads documents from a configured directory. Set DOCS_MCP_DIR to change it:

Variable

Default

Description

DOCS_MCP_DIR

~/.docs-mcp/documents/

Directory containing documents to serve

PDF_MCP_DIR

(legacy alias)

Backward-compatible alias for DOCS_MCP_DIR

Place your documents in the directory and the server finds them automatically. All supported formats are detected.


💡 Usage

With Claude Code

Add to your .claude/settings.json:

{
  "mcpServers": {
    "docs": {
      "command": "go-docs-mcp",
      "env": {
        "DOCS_MCP_DIR": "/path/to/your/documents"
      }
    }
  }
}

With Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "docs": {
      "command": "/usr/local/bin/go-docs-mcp",
      "env": {
        "DOCS_MCP_DIR": "/path/to/your/documents"
      }
    }
  }
}

With any MCP client

The server communicates over stdio using JSON-RPC 2.0:

echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | go-docs-mcp

📖 Tool Reference

list_documents

Lists all documents in the configured directory with format detection.

Parameters: None

Example output:

[
  {
    "filename": "architecture-guide.pdf",
    "format": "pdf",
    "title": "architecture-guide",
    "pages": 42,
    "size_bytes": 1048576
  },
  {
    "filename": "notes.md",
    "format": "markdown",
    "title": "notes",
    "size_bytes": 4096
  }
]

list_formats

Lists all supported document formats and their dependency status.

Parameters: None


read_document

Reads the extracted text content of a document. Automatically falls back to OCR if the document is image-based/scanned and pdftotext returns empty text.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The document filename to read

page

number

No

Single page number (1-based). Omit for full text.

pages

string

No

Page ranges, e.g. "1-5", "10", "1-3,7,10-12". Overrides page.

Example input:

{
  "filename": "architecture-guide.pdf",
  "pages": "1-3,10-12"
}

search_document

Searches within a document for lines matching a query. Returns matches with 2 lines of context and approximate page numbers.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The document filename to search

query

string

Yes

Search query (case-insensitive)

Example output:

Found 3 matches for 'microservice' in architecture-guide.pdf:

--- Match 1 (page ~2, line 45) ---
  The system is composed of several
> microservice components that communicate
  via gRPC and message queues.

get_document_summary

Returns the text from the first 3 pages of a document as a quick summary.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The document filename to summarize


get_document_metadata

Returns full document metadata.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The document filename to get metadata for

Example output:

{
  "title": "Architecture Guide",
  "author": "Jane Doe",
  "subject": "System Design",
  "creator": "LaTeX",
  "producer": "pdfTeX",
  "creation_date": "Thu May 15 10:30:00 2025",
  "modification_date": "Thu May 15 10:30:00 2025",
  "pages": 42,
  "file_size_bytes": 1048576,
  "pdf_version": "1.5"
}

get_document_outline

Extracts the document outline (table of contents / bookmarks) as a structured list.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The document filename to extract outline from


extract_tables

Extracts tables from a document as structured data.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The document filename to extract tables from

page

number

No

Specific page to extract from. Omit for all pages.


extract_images

Extracts images from a document as base64-encoded data. Returns up to 10 images per call.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The document filename to extract images from

page

number

No

Specific page to extract from. Omit for all pages.

Example output:

[
  {
    "page": 1,
    "index": 0,
    "format": "jpeg",
    "width": 800,
    "height": 600,
    "data_base64": "/9j/4AAQSkZJRg..."
  }
]

read_url

Downloads a document from a URL and extracts its text content. Maximum file size: 50MB.

Parameters:

Name

Type

Required

Description

url

string

Yes

The URL of the document to download and read

pages

string

No

Page ranges to extract, e.g. "1-5". Omit for full text.

Example input:

{
  "url": "https://example.com/report.pdf",
  "pages": "1-3"
}

ocr_document

Forces OCR on a PDF document using tesseract. Useful for scanned/image-based PDFs or when pdftotext returns garbled text. Requires tesseract and pdftoppm.

Note: read_document already auto-detects image-based PDFs and falls back to OCR. Use ocr_document when you want to force OCR regardless, or need to specify a non-English language.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The PDF filename to OCR

page

number

No

Specific page to OCR (1-based). Omit for all pages.

language

string

No

Tesseract language code (default: eng). Use spa, fra, etc.

Example input:

{
  "filename": "scanned-contract.pdf",
  "page": 1,
  "language": "spa"
}

read_image

Extracts text from an image file using OCR. Supports PNG, JPG, and TIFF. Requires tesseract.

Parameters:

Name

Type

Required

Description

filename

string

Yes

The image filename to read (PNG, JPG, TIFF)

language

string

No

Tesseract language code (default: eng).

Example input:

{
  "filename": "receipt.png",
  "language": "eng"
}

🔒 Security

  • Directory-locked — only files within DOCS_MCP_DIR are accessible

  • Path traversal prevention — filenames sanitized; ../ rejected

  • Extension filter — only supported formats served

  • Read-only — no write operations

  • URL downloads — 50MB limit, Content-Type validated, temp files cleaned immediately


🛠️ Development

make build     # Build the binary
make test      # Run tests with race detector
make clean     # Remove build artifacts

Project structure

go-docs-mcp/
  main.go              # MCP server setup, 12 tool registrations
  internal/
    pdf/
      reader.go        # Document extraction, caching, search, metadata, images, OCR
  Makefile             # Build targets
  go.mod               # Module definition

📄 License

MIT - Copyright 2026 Drolosoft


💛 Support


DrolosoftTools we wish existed

Install Server
A
license - permissive license
A
quality
B
maintenance

Maintenance

Maintainers
Response time
0dRelease cycle
2Releases (12mo)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/drolosoft/go-docs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server