Skip to main content
Glama
README.mdβ€’2.54 kB
# doc-agent Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI. ## Features - πŸ” **Document Extraction**: Extract structured data from PDFs and images using Vision AI - πŸ¦™ **Ollama-First**: Privacy-first default using local `llama3.2-vision` model - πŸ”§ **Zero Setup**: Auto-installs Ollama via Homebrew if needed, auto-pulls models - πŸ“„ **Multi-Format**: Supports PDFs and images (PNG, JPEG, WebP) - πŸ”¬ **OCR-Enhanced**: Uses Tesseract.js for accurate text extraction from receipts - πŸ’Ύ **Local Storage**: All data persists to local SQLite database - πŸ”Ž **Semantic Search**: Natural language search over indexed documents (coming soon) - πŸ€– **MCP Integration**: Use via Claude Desktop or any MCP-compatible assistant - πŸ”’ **Privacy-First**: Data stays on your machine (unless you opt for cloud AI) ## Quick Start ### Installation ```bash npm install -g doc-agent ``` ### Usage **Extract document data (uses Ollama by default):** ```bash doc extract invoice.pdf ``` > πŸ’‘ Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew. **With Gemini (cloud, higher accuracy):** ```bash export GEMINI_API_KEY=your_key_here doc extract invoice.pdf --provider gemini ``` **Start MCP server:** ```bash doc mcp ``` ### Claude Desktop Integration Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "doc-agent": { "command": "npx", "args": ["-y", "doc-agent", "mcp"], "env": { "GEMINI_API_KEY": "your_key_here" } } } } ``` Then in Claude Desktop: > "Extract data from ~/Downloads/invoice.pdf" ## Development ```bash # Clone and install dependencies git clone https://github.com/prosdevlab/doc-agent cd doc-agent pnpm install # Build the project pnpm build # Run CLI locally pnpm dev extract examples/invoice.pdf # Run tests pnpm test # Start MCP server pnpm mcp ``` ### Architecture The CLI is built with [Ink](https://github.com/vadimdemedes/ink) (React for CLIs) for rich interactive output: ``` packages/ β”œβ”€β”€ cli/ # Ink-based CLI with services, hooks, and components β”œβ”€β”€ core/ # Shared types and interfaces β”œβ”€β”€ extract/ # Document extraction (Gemini, Ollama) + OCR β”œβ”€β”€ storage/ # SQLite persistence (Drizzle ORM) └── vector-store/ # Vector database for semantic search ``` ## Roadmap See [ROADMAP.md](./ROADMAP.md) for the project plan. ## License MIT

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/prosdevlab/doc-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server