Skip to main content
Glama
AGENTS.md6.21 kB
# AGENTS.md This is the developer documentation for the **doc-agent** project. **CRITICAL**: When starting any new task, feature, or fix, you MUST follow **The Drill™** outlined in [WORKFLOW.md](./WORKFLOW.md). ## Project Overview **doc-agent** is a document extraction and semantic search CLI with MCP (Model Context Protocol) integration. It allows extracting structured data from PDFs and images (e.g., invoices, receipts, bank statements) using Vision AI and performing semantic search over indexed documents. **Tech Stack:** - **Runtime**: Node.js >= 22 (LTS) - **Package Manager**: pnpm (using workspaces) - **Language**: TypeScript - **Build System**: Turborepo + tsup - **Linter/Formatter**: Biome - **Testing**: Vitest + ink-testing-library - **AI Providers**: Ollama (default, local), Google Gemini (cloud) - **Database**: SQLite via better-sqlite3 + Drizzle ORM - **Vector Database**: LanceDB (via `vectordb` package) - **CLI Framework**: Ink (React for CLIs) + Commander.js - **OCR**: Tesseract.js (WASM-based, fully local) ## Repository Structure The project is organized as a monorepo using pnpm workspaces: ``` packages/ ├── cli/ # CLI entry point and MCP server ├── core/ # Shared types and interfaces ├── extract/ # Document extraction (Gemini, Ollama) ├── storage/ # SQLite persistence (Drizzle ORM) └── vector-store/ # Vector database for semantic search ``` ## Setup Commands ```bash # Install dependencies pnpm install # Build all packages pnpm build # Run CLI locally during development pnpm dev extract examples/invoice.pdf # Run MCP server locally pnpm mcp ``` ## Development Workflow See [WORKFLOW.md](./WORKFLOW.md) for the complete development process, including: 1. Finding work 2. Planning 3. Implementation 4. Quality Checks 5. Committing & PRs ### Working on the CLI The CLI is the main entry point (`packages/cli`), built with **Ink** (React for CLIs). - **Run CLI**: `pnpm --filter @doc-agent/cli dev ...` (or just `pnpm dev` from root) - **Command Structure**: - `doc extract <file>`: Extract data from a document. - `doc mcp`: Start the MCP server. - `doc search <query>`: (Coming soon) Search indexed documents. **CLI Architecture** (testable by design): ``` packages/cli/src/ ├── cli.ts # Commander.js entry point ├── components/ # Ink React components (UI) │ ├── ExtractApp.tsx # Main extraction flow │ ├── OllamaStatus.tsx │ └── StreamingOutput.tsx ├── hooks/ # React hooks (stateful logic) │ ├── useOllama.ts │ └── useExtraction.ts ├── services/ # Pure functions (external interactions) │ └── ollama.ts # Ollama install/start/pull ├── contexts/ # React contexts (dependency injection) │ ├── OllamaContext.tsx │ └── ExtractionContext.tsx └── mcp/ # MCP server ``` ### Working on Core Logic - **Extraction** (`packages/extract`): Handles AI provider communication and document parsing. - Supports Ollama (local) and Gemini (cloud). - PDF-to-image conversion via `pdf-to-img`. - OCR preprocessing via `tesseract.js` for improved accuracy. - Zod schema validation for AI responses. - **Storage** (`packages/storage`): SQLite persistence via Drizzle ORM. - Stores extracted document metadata and JSON data. - Auto-migrates schema on startup. - **Vector Store** (`packages/vector-store`): LanceDB for semantic search (coming soon). - **Types** (`packages/core`): Shared interfaces (`DocumentData`, `Config`). ### Build Process Dependencies must be built in order (handled by Turbo): 1. `@doc-agent/core` (No internal deps) 2. `@doc-agent/extract` & `@doc-agent/vector-store` (Depend on `core`) 3. `@doc-agent/cli` (Depends on all above) **Critical**: Always run `pnpm build` before running the CLI if you've changed shared packages, as the CLI consumes the built output of dependencies. ## Testing Tests are run using Vitest from the root: ```bash pnpm test ``` ## MCP Integration The project exposes an MCP server (`packages/cli/src/mcp/server.ts`) that allows AI assistants (like Claude Desktop) to interact with the tool. **Tools Exposed:** - `extract_document`: Extracts data from a file path. - `search_documents`: Searches the local index. ## Code Style - **Linting/Formatting**: Run `pnpm lint` and `pnpm format` (using Biome). - **Commits**: Follow Conventional Commits (e.g., `feat(cli): add new command`). ## Security & Privacy This is a **privacy-first** tool. All data stays local. When implementing features that touch user data, follow these principles: ### PII Considerations Before storing or logging any data, ask: 1. **Does this contain PII?** (usernames, paths, emails, IPs) 2. **Is storage necessary?** Can we derive what we need without storing raw PII? 3. **What's the blast radius?** If this data leaks, what's exposed? **Preferred patterns:** | Instead of | Use | |------------|-----| | `/Users/john/docs/invoice.pdf` | Hash of path + filename only | | Full error stack with paths | Sanitized error messages | | Logging request bodies | Logging request metadata only | ### Data Locality - All extracted document data stays in local SQLite - No telemetry, no cloud sync, no external API calls (except AI providers) - User controls their data completely ### AI Provider Data When sending data to AI providers: - Gemini/OpenAI: Data leaves machine (user accepts this by providing API key) - Ollama: Data stays local (default, privacy-first option) Document these trade-offs clearly in user-facing docs. ## Adding New Features 1. **New AI Provider**: Update `packages/extract/src/index.ts` to implement the provider logic. 2. **New CLI Command**: Add the command to `packages/cli/src/cli.ts`. 3. **New Shared Type**: Add to `packages/core/src/index.ts`. ## Release Release is handled via GitHub Actions and Changesets. 1. Run `pnpm changeset` to generate a version bump. 2. Commit the changeset. 3. On merge to main, a release PR is created/updated.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/prosdevlab/doc-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server