Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Doc Agentextract data from my latest invoice.pdf"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
doc-agent
Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI.
Features
š Document Extraction: Extract structured data from PDFs and images using Vision AI
š¦ Ollama-First: Privacy-first default using local
llama3.2-visionmodelš§ Zero Setup: Auto-installs Ollama via Homebrew if needed, auto-pulls models
š Multi-Format: Supports PDFs and images (PNG, JPEG, WebP)
š¬ OCR-Enhanced: Uses Tesseract.js for accurate text extraction from receipts
š¾ Local Storage: All data persists to local SQLite database
š Semantic Search: Natural language search over indexed documents (coming soon)
š¤ MCP Integration: Use via Claude Desktop or any MCP-compatible assistant
š Privacy-First: Data stays on your machine (unless you opt for cloud AI)
Quick Start
Installation
npm install -g doc-agentUsage
Extract document data (uses Ollama by default):
doc extract invoice.pdfš” Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew.
With Gemini (cloud, higher accuracy):
export GEMINI_API_KEY=your_key_here
doc extract invoice.pdf --provider geminiStart MCP server:
doc mcpClaude Desktop Integration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"doc-agent": {
"command": "npx",
"args": ["-y", "doc-agent", "mcp"],
"env": {
"GEMINI_API_KEY": "your_key_here"
}
}
}
}Then in Claude Desktop:
"Extract data from ~/Downloads/invoice.pdf"
Development
# Clone and install dependencies
git clone https://github.com/prosdevlab/doc-agent
cd doc-agent
pnpm install
# Build the project
pnpm build
# Run CLI locally
pnpm dev extract examples/invoice.pdf
# Run tests
pnpm test
# Start MCP server
pnpm mcpArchitecture
The CLI is built with Ink (React for CLIs) for rich interactive output:
packages/
āāā cli/ # Ink-based CLI with services, hooks, and components
āāā core/ # Shared types and interfaces
āāā extract/ # Document extraction (Gemini, Ollama) + OCR
āāā storage/ # SQLite persistence (Drizzle ORM)
āāā vector-store/ # Vector database for semantic searchRoadmap
See ROADMAP.md for the project plan.
License
MIT
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.