Skip to main content
Glama

doc-agent

Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI.

Features

  • πŸ” Document Extraction: Extract structured data from PDFs and images using Vision AI

  • πŸ¦™ Ollama-First: Privacy-first default using local llama3.2-vision model

  • πŸ”§ Zero Setup: Auto-installs Ollama via Homebrew if needed, auto-pulls models

  • πŸ“„ Multi-Format: Supports PDFs and images (PNG, JPEG, WebP)

  • πŸ”¬ OCR-Enhanced: Uses Tesseract.js for accurate text extraction from receipts

  • πŸ’Ύ Local Storage: All data persists to local SQLite database

  • πŸ”Ž Semantic Search: Natural language search over indexed documents (coming soon)

  • πŸ€– MCP Integration: Use via Claude Desktop or any MCP-compatible assistant

  • πŸ”’ Privacy-First: Data stays on your machine (unless you opt for cloud AI)

Quick Start

Installation

npm install -g doc-agent

Usage

Extract document data (uses Ollama by default):

doc extract invoice.pdf

πŸ’‘ Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew.

With Gemini (cloud, higher accuracy):

export GEMINI_API_KEY=your_key_here doc extract invoice.pdf --provider gemini

Start MCP server:

doc mcp

Claude Desktop Integration

Add to your claude_desktop_config.json:

{ "mcpServers": { "doc-agent": { "command": "npx", "args": ["-y", "doc-agent", "mcp"], "env": { "GEMINI_API_KEY": "your_key_here" } } } }

Then in Claude Desktop:

"Extract data from ~/Downloads/invoice.pdf"

Development

# Clone and install dependencies git clone https://github.com/prosdevlab/doc-agent cd doc-agent pnpm install # Build the project pnpm build # Run CLI locally pnpm dev extract examples/invoice.pdf # Run tests pnpm test # Start MCP server pnpm mcp

Architecture

The CLI is built with Ink (React for CLIs) for rich interactive output:

packages/ β”œβ”€β”€ cli/ # Ink-based CLI with services, hooks, and components β”œβ”€β”€ core/ # Shared types and interfaces β”œβ”€β”€ extract/ # Document extraction (Gemini, Ollama) + OCR β”œβ”€β”€ storage/ # SQLite persistence (Drizzle ORM) └── vector-store/ # Vector database for semantic search

Roadmap

See ROADMAP.md for the project plan.

License

MIT

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/prosdevlab/doc-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server