doc-agent
Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI.
Features
π Document Extraction: Extract structured data from PDFs and images using Vision AI
π¦ Ollama-First: Privacy-first default using local
llama3.2-visionmodelπ§ Zero Setup: Auto-installs Ollama via Homebrew if needed, auto-pulls models
π Multi-Format: Supports PDFs and images (PNG, JPEG, WebP)
π¬ OCR-Enhanced: Uses Tesseract.js for accurate text extraction from receipts
πΎ Local Storage: All data persists to local SQLite database
π Semantic Search: Natural language search over indexed documents (coming soon)
π€ MCP Integration: Use via Claude Desktop or any MCP-compatible assistant
π Privacy-First: Data stays on your machine (unless you opt for cloud AI)
Quick Start
Installation
Usage
Extract document data (uses Ollama by default):
π‘ Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew.
With Gemini (cloud, higher accuracy):
Start MCP server:
Claude Desktop Integration
Add to your claude_desktop_config.json:
Then in Claude Desktop:
"Extract data from ~/Downloads/invoice.pdf"
Development
Architecture
The CLI is built with Ink (React for CLIs) for rich interactive output:
Roadmap
See ROADMAP.md for the project plan.
License
MIT