# doc-agent
Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI.
## Features
- π **Document Extraction**: Extract structured data from PDFs and images using Vision AI
- π¦ **Ollama-First**: Privacy-first default using local `llama3.2-vision` model
- π§ **Zero Setup**: Auto-installs Ollama via Homebrew if needed, auto-pulls models
- π **Multi-Format**: Supports PDFs and images (PNG, JPEG, WebP)
- π¬ **OCR-Enhanced**: Uses Tesseract.js for accurate text extraction from receipts
- πΎ **Local Storage**: All data persists to local SQLite database
- π **Semantic Search**: Natural language search over indexed documents (coming soon)
- π€ **MCP Integration**: Use via Claude Desktop or any MCP-compatible assistant
- π **Privacy-First**: Data stays on your machine (unless you opt for cloud AI)
## Quick Start
### Installation
```bash
npm install -g doc-agent
```
### Usage
**Extract document data (uses Ollama by default):**
```bash
doc extract invoice.pdf
```
> π‘ Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew.
**With Gemini (cloud, higher accuracy):**
```bash
export GEMINI_API_KEY=your_key_here
doc extract invoice.pdf --provider gemini
```
**Start MCP server:**
```bash
doc mcp
```
### Claude Desktop Integration
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"doc-agent": {
"command": "npx",
"args": ["-y", "doc-agent", "mcp"],
"env": {
"GEMINI_API_KEY": "your_key_here"
}
}
}
}
```
Then in Claude Desktop:
> "Extract data from ~/Downloads/invoice.pdf"
## Development
```bash
# Clone and install dependencies
git clone https://github.com/prosdevlab/doc-agent
cd doc-agent
pnpm install
# Build the project
pnpm build
# Run CLI locally
pnpm dev extract examples/invoice.pdf
# Run tests
pnpm test
# Start MCP server
pnpm mcp
```
### Architecture
The CLI is built with [Ink](https://github.com/vadimdemedes/ink) (React for CLIs) for rich interactive output:
```
packages/
βββ cli/ # Ink-based CLI with services, hooks, and components
βββ core/ # Shared types and interfaces
βββ extract/ # Document extraction (Gemini, Ollama) + OCR
βββ storage/ # SQLite persistence (Drizzle ORM)
βββ vector-store/ # Vector database for semantic search
```
## Roadmap
See [ROADMAP.md](./ROADMAP.md) for the project plan.
## License
MIT