docdex

Overview Schema Related Servers Score Discussions

sds.md•119 KiB

# System Design Specification *Generated by gpt-creator create-sds* ## Table of Contents - [1 Architecture Overview](#architecture-overview) - [1.1 Operating Principles](#operating-principles) - [1.2 Waterfall Retrieval Model](#waterfall-retrieval-model) - [1.3 Repo Isolation Model](#repo-isolation-model) - [1.4 Hardware Awareness](#hardware-awareness) - [2 Core Components](#core-components) - [2.1 Config and State Manager](#config-and-state-manager) - [2.2 Repo Manager](#repo-manager) - [2.3 Indexing and Search](#indexing-and-search) - [2.4 Waterfall Orchestrator](#waterfall-orchestrator) - [2.5 Web Discovery and Scraping](#web-discovery-and-scraping) - [2.6 LLM and Embeddings](#llm-and-embeddings) - [2.7 Memory and Reasoning DAG](#memory-and-reasoning-dag) - [3 Data Management and Storage](#data-management-and-storage) - [3.1 Directory and Fingerprint Layout](#directory-and-fingerprint-layout) - [3.2 Schemas and Indexes](#schemas-and-indexes) - [3.3 Caching Strategy](#caching-strategy) - [4 Interfaces and Integrations](#interfaces-and-integrations) - [4.1 CLI Commands](#cli-commands) - [4.2 HTTP API](#http-api) - [4.3 MCP Server](#mcp-server) - [4.4 Local Dependencies](#local-dependencies) - [5 Runtime, Deployment, and Operations](#runtime,-deployment,-and-operations) - [5.1 Daemon Lifecycle and Binding](#daemon-lifecycle-and-binding) - [5.2 Resource and Concurrency Controls](#resource-and-concurrency-controls) - [5.3 Security and Privacy](#security-and-privacy) - [5.4 Observability and Health](#observability-and-health) - [5.5 Configuration Management](#configuration-management) - [6 Quality, Testing, and Risks](#quality,-testing,-and-risks) - [6.1 Phase Gates](#phase-gates) - [6.2 Test Coverage Focus](#test-coverage-focus) - [6.3 Risks and Mitigations](#risks-and-mitigations) ## Architecture Overview {#architecture-overview} Docdex v2.0 runs a per-repo local-first daemon (`docdexd serve`) and also supports a singleton daemon (`docdexd daemon`) for multi-repo mounting. Per-repo daemons expose HTTP APIs and (optionally) stdio MCP; the singleton exposes HTTP APIs plus shared MCP over HTTP/SSE. The design aims to keep all inference and retrieval local by default, escalating to gated web enrichment only when confidence drops. - **Core surfaces**: per-repo HTTP endpoint set (OpenAI-compatible chat) plus shared MCP over HTTP/SSE for the singleton daemon (legacy per-repo stdio MCP remains); CLI is a thin client to the daemon. No additional surfaces are introduced in this section beyond shared MCP transport. - **Repo Manager**: normalizes repo paths, fingerprints via SHA256, lazily initializes per-repo state (Tantivy indexes, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`, `libs_index`), and ensures handle closure on shutdown. - **Waterfall retrieval** (per repo): Tier 1 local indexes (source \+ libs), Tier 2 zero-cost web discovery/fetch (DuckDuckGo HTML \+ guarded headless Chrome), Tier 3 local cognition/memory (Ollama chat/embeddings, sqlite-vec memory). Cached library docs are treated as local within Tier 1\. - **Context assembly**: fixed priority Memory → Repo Code → Library/Web; token budget roughly 10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer. Budgeting happens before Ollama calls. - **Isolation model**: per-repo state under `~/.docdex/state/repos/<fingerprint>/`; global caches (`cache/web`, `cache/libs`) are reused but ingested per repo. CLI/MCP require explicit repo id/path; HTTP uses the daemon repo by default and validates any provided repo id/path. - **Hardware awareness**: daemon detects RAM/VRAM to recommend or constrain Ollama models (e.g., \<8GB ultra-light; ≥16GB default `phi3.5:3.8b`; ≥32GB \+ GPU suggests `llama3.1:70b` if present). No silent auto-install of Ollama/models; npm postinstall may prompt and installs only on explicit confirmation. - **Security posture**: binds to `127.0.0.1` by default; `--expose` demands token auth on HTTP, and MCP enforces `auth_token` when configured. No telemetry or paid/cloud services. - **Scalability & reliability (per PDR scope)**: targets ≥8 concurrent repos by running separate per-repo daemons; local search p95 \< 50ms (\<20ms typical). Browser guard prevents zombie Chrome; web rate limits (≥2s DDG, ≥1s fetch) mitigate bans. - **Out-of-scope (per section)**: new surfaces, cloud/vector backends, cross-repo memory, clustered/multi-tenant daemon topologies are explicitly excluded. **Open Questions & Risks** - Confirm shutdown behavior for active sessions: should in-flight requests be drained or rejected? - How to handle simultaneous web-trigger requests across repos within rate limits without head-of-line blocking? - Risk: confidence gating (`web_trigger_threshold` default 0.7) may under-trigger web enrichment for sparse repos. **Verification Strategy** - Run `docdexd check` to validate config, state perms, Ollama, Chrome, repo registry, bind configuration. - Concurrency tests across per-repo daemons under ≥8 repos; ensure handle closure and no cross-repo leakage. - Latency benchmarks: local search p95 \< 50ms and typical \<20ms on representative repos. - Waterfall tests: force low-confidence queries and assert escalation order and rate-limit compliance. - Security checks: ensure localhost bind by default and token required when `--expose` is set; reject invalid repo ids when supplied. ### Operating Principles {#operating-principles} Local-first, per-repo daemon discipline governs all decisions: `docdexd` serves HTTP and MCP for a single repo, defaulting to offline behavior and zero paid components. Web access is a gated fallback on confidence drop or explicit user demand. All operations are repo-scoped; no cross-repo state or memory. Privacy and cost ceilings drive dependency choices (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec, Tree-sitter); any cloud/paid API is out of scope. **Repo scoping and isolation** - CLI and MCP calls must include repo id/path; HTTP calls default to the daemon repo and validate any provided repo id/path. - Per-repo state lives under fingerprinted directories; caches are global but ingestion is repo-local to prevent bleed. - LRU eviction is not required for per-repo daemons; resource caps apply per repo. **Waterfall retrieval discipline** - Tier 1: local indexes (source \+ libs) are always tried first; cached library docs count as local. - Tier 2: web discovery/fetch only if top local score \< `web_trigger_threshold` or user forces it; DDG HTML discovery with ≥2s spacing, per-domain fetch delay ≥1s, guarded headless Chrome with readability. - Tier 3: local cognition/memory via Ollama embeddings/chat; memory prioritized in context assembly over repo code, then library/web. **Security and privacy defaults** - Bind HTTP/MCP to `127.0.0.1`; `--expose` requires token auth on HTTP requests, and stdio MCP expects `auth_token` in `initialize` when configured. - No telemetry, no paid keys; compliance demands open-source dependencies only. - Browser lifecycle guarded; locks directory to prevent zombie Chrome processes. **Resource discipline and hardware awareness** - RAM/VRAM detection guides model recommendations (≤8GB: ultra-light; ≥16GB: `phi3.5:3.8b` default; ≥32GB \+ GPU: `llama3.1:70b` if installed). - Bounded Chrome concurrency and per-domain rate limits; clear errors when caps are hit. **Out of scope (per PDR)** - Cross-repo memory/indexing, clustered/multi-tenant daemon deployments, telemetry, paid/cloud APIs. Open Questions & Risks - Do we need configurable backpressure when multiple per-repo daemons spike web fetches simultaneously? - What is the exact failure mode when `--expose` token is missing or malformed—HTTP status and body contract? - Risk: DDG rate limiting/IP bans despite spacing; may need backoff tuning. - Risk: Model recommendation accuracy on heterogeneous hardware (e.g., eGPU, shared RAM GPUs). Verification Strategy - Unit/integration: enforce repo-required flag on CLI/MCP; HTTP defaults to daemon repo; reject unknown repo ids. - Concurrency tests: parallel per-repo daemon access; ensure handle closure. - Waterfall tests: trigger web only below threshold; verify delays and cache reuse; assert Chrome teardown. - Security tests: localhost bind by default; token required when exposed; no external calls without explicit trigger. - Performance checks: local search p95 \<50ms, typical \<20ms; memory/token budgeting honors priority order. ### Waterfall Retrieval Model {#waterfall-retrieval-model} Architectural intent: tiered retrieval that stays local by default, escalates to zero-cost web enrichment only on low confidence or explicit request, and finally leverages local cognition/memory; all operations are repo-scoped with strict isolation. Components and flow - Tier 1 Local: Tantivy source index plus per-repo `libs_index`; cached library docs are treated as local. BM25 search with optional local rerank. Provides score used for gating. - Tier 2 Web (fallback): DuckDuckGo HTML discovery (≥2s between searches, blocklist) → headless Chrome fetch with readability (≥1s per-domain delay, page timeout \~15s) → cache HTML/cleaned JSON under `cache/web` → ingest into repo context as needed. Guarded browser lifecycle to avoid zombies; locks under `~/.docdex/state/locks`. - Tier 3 Cognition/Memory: Local Ollama for chat/embeddings; per-repo `memory.db` (sqlite-vec) prioritized in context assembly; DAG logging per session. - Gating logic: If top local score ≥ `web_trigger_threshold` (default 0.7), stay in Tier 1; otherwise escalate to Tier 2 or when explicitly forced by user. Context assembly priority: Memory → Repo Code → Library/Web; token budget approx 10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer. - Surfaces using waterfall: CLI (`chat --repo`, `web-rag --repo`), HTTP `/v1/chat/completions` (defaults to daemon repo; repo id optional), MCP tools (all require repo). Repo Manager enforces repo selection and isolation throughout. - Data contracts (implied from PDR): search results carry score \+ snippet \+ source path; web fetch outputs cleaned text plus metadata (url, fetched\_at, cache key); memory rows carry `id, content, embedding, created_at, metadata`. No additional schemas beyond stated. Scalability, reliability, security, observability, DevOps - Scalability: Target local search p95 \<50ms; typical \<20ms. Scale by running per-repo daemons. - Reliability: Browser guard to prevent zombie Chrome; clear errors on missing index/repo/models; fallback only on confidence drop to avoid unnecessary web calls. - Security/Privacy: Offline-by-default; web only on gated escalation or explicit request. HTTP defaults to daemon repo; MCP requires repo selection; daemon binds 127.0.0.1 by default; `--expose` requires token auth. - Observability: Not requested in PDR; expect logs for gating decisions, web escalations, and token budgeting drops. - DevOps: No paid services; Ollama and Chrome are external dependencies validated via `docdexd check`. Cache reuse reduces repeated web fetches. Assumptions - BM25 search is sufficient for Tier 1 initial ranking; rerank is optional/local only. - `web_trigger_threshold` default 0.7 is configurable; same threshold used across surfaces unless overridden. - Browser availability and Ollama models are handled by setup flows; Playwright auto-install is opt-out and controlled by config/flags. Open Questions & Risks - Should Tier 2 be skipped entirely when offline is enforced by config/flag even if confidence is low? (risk: bad answers vs policy violation) - How is rerank configured/enabled per repo or globally? (missing config detail) - Cache eviction/TTL for `cache/web` not specified; risk of stale or unbounded cache. - What is the exact backoff strategy on repeated web fetch failures beyond rate limits? Verification Strategy - Unit/integration: gating logic around `web_trigger_threshold`; ensure explicit force bypasses gate. - Functional: Tier 1 latency benchmarks (\<50ms p95), Tier 2 rate-limit observance, Chrome guard tested for zombie-free teardown. - End-to-end: `chat --repo` and `/v1/chat/completions` across tiers with correct repo scoping and context ordering (Memory → Repo → Library/Web). - Negative tests: offline mode with forced web request returns clear error; missing index/repo/model paths produce expected failures. ### Repo Isolation Model {#repo-isolation-model} Architectural intent: enforce strict per-repo scoping for all state, indexes, memory, and DAG data so multiple per-repo daemons can serve multiple repos without cross-contamination. Design - Repo identity: normalized repo path → SHA256 fingerprint; fingerprint is the sole key for on-disk state under `~/.docdex/state/repos/<fingerprint>/`. - Per-repo state dirs: `index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`; all opened lazily on first access. - Global/shared caches: `cache/web/` (HTML \+ cleaned JSON) and `cache/libs/<ecosystem>/<pkg>/`; reused across repos but ingested into per-repo indexes only on demand to avoid bleed. - Repo Manager: maintains registry (path ↔ fingerprint); unknown/unindexed repos return clear errors. - Access contract: CLI uses `--repo`; MCP tools require `project_root`/`repo_path` (unless `initialize` sets a default) and enforce it matches the MCP server repo; HTTP defaults to the daemon repo and validates any provided repo id/path. MCP server is per-repo; tools are repo-parameterized. - Concurrency: multiple repos are served by running multiple per-repo daemons; operations within a repo serialize per underlying DB/index constraints. - Security/privacy: data never leaves repo scope; no cross-repo memory/DAG queries; bound to 127.0.0.1 by default with optional token when exposed. No telemetry. - Observability: not requested in PDR. - Scalability/reliability: target ≥8 concurrent repos via multiple per-repo daemons; idle daemon memory target \<100MB; clear errors on missing repo/index. - DevOps: state layout must remain stable across upgrades; `docdexd check` validates RW permissions and registry integrity. Assumptions - Fingerprint is deterministic on normalized absolute path; moving a repo changes fingerprint (requires re-index) unless a future alias/relocation map is added. - Per-repo daemons do not evict repos in-process. Open Questions & Risks - How should repo path moves/renames be handled without re-index? (Out of current scope.) - Handling repo path moves/renames without re-index? (Out of current scope.) - Race conditions on rapid open/close cycles under load; need tests. Verification Strategy - Unit/integration: Repo Manager handles concurrent open/close; errors on unknown/unindexed repo. - State isolation tests: ensure no cross-repo reads/writes for indexes, memory, DAG, libs ingestion. - Config validation: `docdexd check` confirms registry and state RW. - Load tests: ≥8 concurrent repos operations without bleed across per-repo daemons. ### Hardware Awareness {#hardware-awareness} Docdexd detects host RAM and (when present) GPU VRAM to guide Ollama model recommendations and default selection, keeping inference local and resource-safe. This logic informs CLI commands (`docdexd llm-list`, `docdexd check`, `docdexd llm-setup`) but does not introduce new APIs beyond what is already defined. - **Detection scope**: Read total system RAM; detect GPU presence and VRAM when available. No other hardware signals are in scope per PDR. - **Threshold policy (from PDR)**: RAM \<8GB → recommend ultra-light; ≥16GB → default `phi3.5:3.8b`; ≥32GB \+ GPU → suggest `llama3.1:70b` if installed. Keep decisions advisory; do not auto-download/install without explicit confirmation. - **Integration points**: - `docdexd llm-list` runs detection, loads `llm_list.json`, filters, and outputs recommendations. - `docdexd llm-setup` reuses detection to suggest pulls and update `[llm]` defaults in config; must honor offline-first (no automatic network installs). - `docdexd check` reports hardware/readiness (Ollama reachability, models present) and should warn when the configured default exceeds detected capacity. - **Configuration behavior**: `[llm]` default\_model should be validated against detected capacity; emit warnings, not hard failures, when oversized. No additional config keys required beyond existing PDR surface. - **Security/Privacy**: Local-only detection; no telemetry or external calls. No new attack surface beyond existing CLI. - **Reliability/DevOps**: Not requested in PDR beyond reporting readiness; ensure detection failures degrade to conservative recommendations rather than blocking startup. Assumptions - GPU VRAM detection is best-effort; absence of a GPU is treated as CPU-only. - Hardware checks run locally with no privilege escalation; only readonly system queries are used. Open Questions & Risks - How to handle systems with ≥32GB RAM but no GPU: stay on 8B or offer a mid-tier 14B if present? (PDR silent.) - Should warnings on oversized `[llm].default_model` be treated as non-zero exit in CI (`docdexd check`), or only logged? - `llm_list.json` source and schema are assumed to already include model size/requirements; confirm format. Verification Strategy - Unit tests: hardware probe parser with mocked RAM/VRAM inputs across thresholds. - CLI tests: `docdexd llm-list` and `docdexd llm-setup` output expected recommendations given mocked detection. - Readiness: `docdexd check` emits warning for mismatched default model vs detected capacity; confirm no crash on missing GPU info. ## Core Components {#core-components} This section defines the daemon’s key subsystems and how they cooperate to satisfy the per-repo, local-first constraints. Components are limited to the PDR-approved stack (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec, Tree-sitter); no new surfaces or services are added. ### Config and State Manager - Responsibilities: Parse/validate `~/.docdex/config.toml`; ensure RW on `global_state_dir`; materialize defaults when missing; expose typed config to all services; enforce localhost bind unless `--expose` with token. - State layout: Creates/validates `state/repos/<fingerprint>/{index/,libs_index/,memory.db,symbols.db,dag.db,impact_graph.json}` and shared caches `cache/web`, `cache/libs/<ecosystem>/<pkg>/`, `locks/` for browser/process guards. - Hardware awareness: On startup and `llm-list`, detect RAM/VRAM to suggest models (`phi3.5:3.8b` default; heavier only if hardware allows). - Data contract: Provides immutable config snapshot to consumers; emits normalized repo fingerprint function. ### Repo Manager - Responsibilities: Map normalized repo paths → SHA256 fingerprints; lazy init per-repo state; prevent cross-repo contamination. Singleton daemons apply an LRU watcher lifecycle (stop watchers after ~2h idle, hibernate after ~24h); per-repo daemons skip LRU. - Interactions: Called by CLI/HTTP/MCP entrypoints to resolve repo context before any operation; hands back handles to Tantivy indexes, sqlite DBs, and libs index. ### Indexing and Search - Local index: Tantivy BM25 over repo source; optional symbol extraction (Tree-sitter) and libs index treated as Tier-1. - Ignore rules: `.docdexignore` (first-party) and `.gitignore` are honored by the indexer and file watcher to skip unwanted files/dirs. - Operations: `docdexd index --repo` builds/updates source index; search invoked by chat/RAG and Waterfall Tier 1\. Token budgeting favors Memory \> Repo \> Library/Web. - Data contract: Query returns ranked hits with path, snippet, score; exposes top score for Waterfall gate comparison against `web_trigger_threshold`. ### Waterfall Orchestrator - Logic: Tier 1 local search; if top score \< `web_trigger_threshold` or forced, escalate to Tier 2 web (DDG discovery → Chrome fetch → readability → cache → ingest) and then Tier 3 cognition (LLM/memory). Cached library docs participate as local. - Sequence (textual): Receive query with repo id → run local search → if below threshold, invoke DiscoveryService (rate-limited) → fetch pages with guarded Chrome → clean/cache → ingest relevant text into context → assemble prompt with token budget → stream via Ollama. - Guardrails: Enforces scrape delays (≥2s DDG, ≥1s fetch/domain), page timeout 15s, browser lifecycle guard, and context priority order. ### Web Discovery and Scraping - DiscoveryService: DuckDuckGo HTML only; blocklist support; respects 2s minimum between queries; caches results under `cache/web`. - ScraperEngine: Headless Chrome (readability extraction), headless by default; guarded via `locks/`; timeouts per config; zero zombie processes requirement. - Outputs: Cleaned HTML/JSON cached globally; per-repo ingestion handled by orchestrator. ### LLM and Embeddings - Provider: configurable (Ollama default); streaming responses; token budgeting enforced pre-call; models filtered by hardware guidance. - Embeddings: Ollama embeddings used for memory and rerank where applicable; max answer tokens from config. - Interfaces: CLI chat, HTTP `/v1/chat/completions`, MCP tools all go through the same LLM gateway; HTTP defaults to the daemon repo and validates any provided repo id. ### Memory and Reasoning DAG - Memory: Per-repo `memory.db` (sqlite-vec) with tables `memories` (id UUID, content TEXT, embedding BLOB, created_at INT, metadata JSON), `memory_vec` (vec0 embeddings), and `memory_meta` (key/value embedding_dim, schema_version); ops `memory_store`, `memory_recall` scoped by repo; prioritized in context merge. - DAG: Per-repo `dag.db`; node types UserRequest/Thought/ToolCall/Observation/Decision; logging per session; `dag view --repo <path> <session_id>` renders text/DOT. - Isolation: No cross-repo memory or DAG queries; per-repo daemons close handles on shutdown. **Open Questions & Risks** - How to coordinate limits across multiple per-repo daemons running concurrently? Not specified. - Exact limits for concurrent Chrome instances and fetch queue sizing are unstated; risk of overuse on low-end machines. - Rerank presence/algorithm for local search optional in PDR; decision needed. - Token budgeting percentages fixed in PDR; need confirmation on adaptability per model/context size. **Verification Strategy** - `docdexd check` validates config, state perms, Ollama reachability/models, Chrome availability, repo registry, HTTP bind, and MCP binary readiness. - Repo Manager tests for isolation under concurrent access across multiple per-repo daemons. - Waterfall tests: force low-confidence path to verify DDG spacing, fetch delays, cache use, and Chrome guard; ensure local-only when above threshold. - Memory tests: store/recall per repo; ensure no cross-repo leakage; embedding flow via Ollama. - DAG tests: log and view sessions across node types; ensure per-repo separation. - Index/search tests: p95 latency targets (\<50ms), correct scoring exposure for threshold gating; symbol extraction present where supported. ### Config and State Manager Config/state layer ensures typed configuration, RW validation, and deterministic state layout that other subsystems rely on for per-repo isolation across per-repo daemons. - **Intent**: Provide a single source of truth for daemon/runtime configuration and a predictable per-repo/global state directory tree with enforced read/write guarantees and auto-creation of sane defaults. - **Config location & shape**: `~/.docdex/config.toml` auto-created on first run with localhost defaults. Sections per PDR: `[core] global_state_dir, log_level, max_concurrent_fetches`; `[llm] provider=<name> (default `ollama`), base_url, default_model, embedding_model, max_answer_tokens`; `[search] web_trigger_threshold, max_repo_hits, max_web_hits`; `[web] discovery_provider=duckduckgo_html, user_agent, ddg_base_url, ddg_proxy_base_url, min_spacing_ms, cache_ttl_secs, blocklist`; `[web.scraper] engine, headless, chrome_binary_path, auto_install, browser_kind, request_delay_ms, page_load_timeout_secs`; `[memory] enabled=true, backend=sqlite`; `[server] http_bind_addr=127.0.0.1:3210, enable_mcp=true`. Typed parsing with defaults; warn on unknown providers. The npm installer may update `http_bind_addr` during auto-port selection. - **Env override**: `DOCDEX_WEB_BLOCKLIST=example.com,docs.example.org` sets the web discovery blocklist as a comma-separated list of domain suffixes. - **Env override**: `DOCDEX_WEB_MIN_SPACING_MS` (DDG spacing, min 2000ms) and `DOCDEX_WEB_REQUEST_DELAY_MS` (per-domain fetch delay, min 1000ms). - **Env override**: `DOCDEX_DDG_BASE_URL` to override the DuckDuckGo discovery endpoint (default `https://html.duckduckgo.com/html/`). - **Env override**: `DOCDEX_DDG_PROXY_BASE_URL` to set an optional proxy fallback for DDG discovery (used when the primary endpoint returns anomaly/blocked pages). - **State root & layout**: `~/.docdex/state/` with enforced creation/validation: - `repos/<fingerprint>/index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json` - `cache/web/` (raw HTML \+ cleaned JSON), `cache/libs/<ecosystem>/<pkg>/` - `locks/` for browser/process guards - `logs/` optional daemon logs when enabled - Repo fingerprint \= SHA256 of normalized repo path; all per-repo paths must use this key to prevent cross-contamination. - **Responsibilities**: - Validate RW on `global_state_dir` at startup and before per-repo init. - Create missing config with defaults; on missing state subdirs/DBs for a repo, lazily initialize via Repo Manager. - Expose normalized, typed config/state handles to: Repo Manager (per-repo paths), Waterfall (web caches), Memory/DAG/Index subsystems, and Chrome guard (locks path). - Hardware awareness surfaced to LLM config recommender: detect RAM/VRAM and suggest model tiers (`ultra-light` \<8GB, default `phi3.5:3.8b` ≥16GB, `llama3.1:70b` with GPU ≥32GB). - **Interactions (textual diagram)**: - On daemon start: Config Loader → parse/validate `config.toml` (defaults) → State Manager → validate/create `global_state_dir`, `cache/*`, `locks/`. - Per repo access: Repo Manager → fingerprint(repo\_path) → State Manager → ensure `repos/<fp>/*` exist → return handles/paths to Indexer, Memory, DAG, Symbols. - Check command: `docdexd check` orchestrates Config Loader \+ State Manager validation \+ Ollama/Chrome reachability. - **Scalability/Reliability**: Bounded by `max_concurrent_fetches` and per-repo resources; state layout supports multiple per-repo daemons without cross-contamination. RW validation prevents partial init; locks directory guards browser lifecycle. - **Security/Isolation**: Enforce localhost defaults; state paths scoped by fingerprint to prevent cross-repo bleed. No telemetry. Token auth handled at server layer; config/state manager just supplies bind info. - **Observability**: Log config warnings (unknown provider), RW failures, and auto-create events. Additional metrics not requested in PDR. - **DevOps**: Persistence across upgrades; Playwright auto-installs a managed Chromium build on macOS/Windows/Linux when no browser is detected (opt-out supported), with system browsers as fallback. No cloud dependencies. **Open Questions & Risks** - Should config validation hard-fail on unknown keys or only warn? (PDR silent) - Behavior when `global_state_dir` is moved or lacks perms after init—migrate vs. fail? - Fingerprint collisions are improbable but not addressed; assume SHA256 sufficient. - Chrome/browser path defaults on diverse OSes—multi-browser fallbacks and Playwright auto-install need to remain deterministic. **Verification Strategy** - `docdexd check`: parse config, verify RW on `global_state_dir`, presence/creation of required subdirs, validate provider-specific reachability (Ollama default), test Chrome availability. - Unit/integration: fingerprint normalization tests; per-repo init creates expected layout; locks directory remains clean after guarded Chrome usage. - Negative tests: fail on non-writable state dir; clear error on missing repo index/state when accessed. ### Repo Manager Repo Manager maintains normalized repo registry mapped to SHA256 fingerprints and lazily initializes per-repo state. Per-repo daemons do not require max-open-repos LRU eviction. **Scope & Intent** - Responsibilities: path normalization → fingerprinting; per-repo state directory creation; lazy init of handles for indexes/DBs; prevention of cross-repo contamination. Everything else (chat, search orchestration, memory ops, web tiers) depends on it but is out-of-scope here. - Exclusions: no cross-repo memory/index sharing; no additional surfaces beyond those already defined (CLI/HTTP/MCP). **Core Functions** - Path normalization and fingerprinting: compute SHA256 over normalized repo path; fingerprint used for all state paths under `~/.docdex/state/repos/<fingerprint>/`. - Lazy initialization: on first access, create/validate per-repo dirs and handles for `index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`; RW checks on `global_state_dir` before use. - Registry & lookup: map from normalized path (and optionally repo id) to fingerprint and live handles; CLI/MCP callers must supply repo id/path, HTTP defaults to the daemon repo. - Max-open-repos: not required for per-repo daemons; reserved if multi-repo mode is reintroduced. - Isolation: no cross-repo data mixing; shared caches (`cache/web`, `cache/libs`) are ingested per repo but never cross-read directly. - Lifecycle integration: daemon startup validates repo registry readiness; CLI/API/MCP operations fail with clear error on unknown/unindexed repo. **Interactions & Data Contracts** - Inputs: normalized repo path or repo\_id. - Outputs: repo context handles (index, libs\_index, memory, symbols, dag) bound to fingerprinted state paths. - Errors: clear signals for missing repo, missing index, over-capacity eviction, permission issues, or fingerprint mismatch. **Scalability & Reliability** - Bound resource use per repo; target ≥8 concurrent repos via multiple per-repo daemons. - Safeguards against cross-contamination by scoping all paths under fingerprinted dirs. - Startup checks (per PDR `docdexd check`) validate RW on state dir and registry. **Security & Privacy** - No telemetry; operates under local state; relies on daemon-level binding/token policies (defined elsewhere). Enforces repo scoping on all operations; rejects requests without repo selection. **Observability & DevOps** - Not explicitly requested in PDR; minimum: log repo open events and errors to aid diagnosing permission issues. **Assumptions** - Fingerprint is deterministic SHA256 over normalized absolute path; no secondary IDs needed. - Per-repo daemons keep on-disk state; no eviction within a single repo context. - Callers are responsible for ensuring repos are indexed before use; Repo Manager only manages lifecycle/handles. **Open Questions & Risks** - How are repo deletions handled (on-disk cleanup vs. orphaned state)? - Risk: stale on-disk state if repos move without reindexing; define cleanup guidance. **Verification Strategy** - Unit tests: path normalization → fingerprint determinism; registry lookup; lazy init idempotence. - Integration tests: concurrent per-repo daemon access shows no cross-contamination; state paths stay under fingerprinted dirs. - CLI/daemon check: `docdexd check` validates RW perms and reports registry readiness. ### Indexing and Search Architectural intent: deliver fast, repo-scoped retrieval that stays local-first, supports per-repo isolation, and feeds downstream chat/memory/DAG flows. Per PDR, indexing covers repo source, cached library docs, symbols, and impact graph metadata; search uses Tantivy BM25 with optional local rerank and respects waterfall gating to web only on low confidence. Components and flows - Repo Manager: lazily initializes per-repo `state/repos/<fingerprint>/index/` (source), `libs_index/`, `symbols.db`, `dag.db`, `impact_graph.json`. - Tantivy source index (per repo): indexes files with BM25; scope limited to selected repo fingerprint. Out of scope: cross-repo search. - Libraries index (per repo): ingests cached library docs (Phase 2.1) into `libs_index` so library answers count as Tier 1 local context. - Query path (Tier 1): `docdexd chat --repo` and `/v1/chat/completions` call local BM25 search across source \+ libs index; optional local rerank (model unspecified in PDR—TBD). Waterfall escalation only if top score \< `web_trigger_threshold` or forced. - Symbol extraction (Phase 6): Tree-sitter during `index` populates `symbols.db` with name/kind/file/lines/signature to support code intelligence and impact graph. - Impact graph (Phase 6): dependency edges captured during indexing; served via `GET /v1/graph/impact?file=<path>` (repo id optional for per-repo daemon) returning schema-tagged inbound/outbound deps with explicit edge direction semantics. Data contracts (as implied) - Per-repo state layout: `state/repos/<fingerprint>/index/` (Tantivy), `libs_index/`, `symbols.db`, `dag.db`, `impact_graph.json`, `memory.db`. - Impact API response: directed deps; exact schema not detailed in PDR (open question). Scalability, reliability, security, observability - Performance targets: local search p95 \< 50ms, typical \< 20ms; indexing \< 1GB memory. - Reliability: clear errors for missing repo/index. Browser guard and rate limits belong to web tier (not primary here). - Security: repo scoping mandatory; no cross-repo data bleed; localhost bind unless `--expose` with token. - Observability: not requested in PDR for this section. Assumptions - Optional rerank is local and uses available Ollama model; model choice not fixed. - Impact graph edge extraction happens during indexing; no separate job runner. Open Questions & Risks - Impact API schema specifics and pagination/limits. - Rerank model choice and enable/disable flag default. - Handling of large binaries or generated files in Tantivy index (inclusion/exclusion policy). - Consistency when a repo is modified mid-query (do we fail fast or retry?). Verification Strategy - `docdexd index --repo` builds Tantivy index and symbols without errors; measure memory bound (\<1GB). - Local search latency benchmarks hit p95 \< 50ms under concurrent per-repo daemon load. - Isolation tests: queries never return content from other repos. - Waterfall gate: assert web escalation only when score \< `web_trigger_threshold` or forced flag. - Impact API returns correct inbound/outbound deps for known fixtures. ### Waterfall Orchestrator Routes each query through a tiered pipeline—local → web → cognition—based on confidence, assembling context within a fixed token budget while honoring per-repo isolation and per-repo daemon constraints. **Scope & Intent** - Enforce local-first retrieval with gated escalation to web and cognition when local confidence \< `web_trigger_threshold` (default 0.7) or when explicitly forced. - Keep repo isolation: all retrievals and caches are repo-scoped via Repo Manager fingerprints; no cross-repo bleed. - Assemble context with fixed priority and budgeting: Memory \> Repo Code \> Library/Web, \~10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer. **Flow (textual sequence)** 1) Receive query (CLI/HTTP/MCP) with repo selection; CLI uses `--repo`, MCP tools require `project_root`/`repo_path` unless `initialize` sets a default, HTTP defaults to the daemon repo and validates any provided repo id. Resolve RepoContext (paths, indexes, caches) and token budget. 2) Tier 1 Local: query Tantivy source index \+ repo `libs_index`; optional local rerank. If top score ≥ threshold, proceed to prompt assembly. 3) Gate: if score \< threshold or forced web, proceed to Tier 2\. 4) Tier 2 Web: DiscoveryService (DuckDuckGo HTML, ≥2s between searches) → ScraperEngine (headless Chrome, readability, ≥1s/domain). Cache raw/cleaned under `cache/web`; ingest snippets per repo. 5) Context merge: prioritize memory snippets, then repo code, then libs/web; drop lowest-priority content first on overflow. 6) Tier 3 Cognition: local Ollama for chat/embeddings; stream response; log DAG nodes if Phase 4+ enabled. 7) Return response; no cross-repo eviction within a per-repo daemon. **Components & Contracts** - WaterfallOrchestrator: owns confidence gate, tier routing, and token budgeting. - RepoContext accessor: supplies per-repo indexes (`index/`, `libs_index`, `memory.db`, `dag.db`, `impact_graph.json`) and cache handles; enforces SHA256 fingerprinting. - DiscoveryService & ScraperEngine: respect rate limits, cache TTL, Chrome lifecycle guards; return cleaned documents with source metadata for ingestion. - Memory layer: per-repo sqlite-vec recall, prioritized in assembly. - Token budgeter: counts tokens pre-Ollama call; emits drop logs when pruning low-priority snippets. **Scalability & Reliability** - Scaling via multiple per-repo daemons; ensure handles close cleanly on shutdown. - Performance target: local search p95 \< 50ms; keep web fetch concurrency bounded (`max_concurrent_fetches`). - Browser guard prevents zombie Chrome; per-domain rate limits backoff on HTTP errors. **Security & Privacy** - Localhost-only by default; `--expose` requires token; HTTP uses daemon repo by default and validates any provided repo id. - No paid/cloud APIs; web only on threshold drop/explicit request; cached data stored locally. **Observability & DevOps** - Logs: gate decisions (scores, threshold), tier chosen, snippets dropped due to budget, web rate-limit/backoff events, Chrome lifecycle. - `docdexd check` validates Ollama, Chrome, config, repo registry before serving. **Assumptions** - Threshold comparator uses top-scoring local hit; rerank (if present) happens before gate. - Web cache ingestion remains repo-scoped even though cache is global. **Open Questions & Risks** - How to tune `web_trigger_threshold` per repo or workload without regression? - What is the fallback if discovery repeatedly fails (e.g., DDG blocked)? Backoff strategy only? - Potential latency spikes if Chrome cold-starts under load; need warm pool? **Verification Strategy** - Unit/integration: confidence gate branches (\>=, \< threshold, forced web); token pruning order and logging. - Performance: local search p95 \< 50ms; web rate-limit compliance (≥2s discovery, ≥1s/domain). - Isolation: concurrent per-repo daemons show no cross-repo cache/index/memory bleed. - Reliability: simulate scraper/Chrome failure; ensure graceful degradation and error clarity. - End-to-end: `web-search`, `web-fetch`, `web-rag` flows, and `/v1/chat/completions` routing respect repo scoping and gating. ### Web Discovery and Scraping The system provides zero-cost web enrichment as Tier 2 of the retrieval waterfall, using DuckDuckGo HTML for discovery and headless Chrome for content fetch with readability cleanup, all guarded for resource and privacy constraints. **Components & Responsibilities** - `DiscoveryService`: issues DuckDuckGo HTML searches with ≥2s delay between queries; applies blocklist; respects global `[web]` config (user agent, cache TTL). - `ScraperEngine`: headless Chrome (local binary) fetch with readability extraction; enforces request delay ≥1s per domain and page load timeout (\~15s default); guarded lifecycle with locks to prevent zombie processes; runs only when Tier 2 is triggered or explicitly requested. - Cache: global `cache/web/` storing raw HTML and cleaned JSON; reused across repos but ingested per repo as needed. - Waterfall Orchestrator: triggers discovery/fetch when local confidence \< `web_trigger_threshold` (default 0.7) or on explicit web commands; merges cleaned snippets into context after token budgeting (priority: Memory → Repo → Library/Web). **Interactions & Data Flow** - CLI: `web-search "<query>"` → DiscoveryService (rate-limited) → return URLs (cached). - CLI/HTTP orchestration: `web-fetch <url>` or auto-fetch in `web-rag`/`/v1/chat/completions` Tier 2 path → ScraperEngine fetch → readability → cache write → cleaned content merged into context (repo-scoped ingestion). - Guards: Browser lifecycle uses locks under `state/locks/`; orchestrator caps concurrency via `[core].max_concurrent_fetches` and scraper-specific config. - Config hooks: `[web.scraper]` selects engine, headless mode, chrome path, request delay, timeouts; `[search]` sets `web_trigger_threshold`, `max_web_hits`. **Scalability & Reliability** - Rate limits per domain and per DDG query as specified; backoff on HTTP errors (from PDR risk section). - Bounded Chrome concurrency prevents resource exhaustion; per-repo daemons do not evict global web cache. - Idle daemon must avoid zombie Chrome; validated in `docdexd check`. **Security & Privacy** - Local-only by default; web access is explicit/gated by confidence; no paid APIs. - When `--expose`, HTTP/MCP require token auth (inherited); web fetches still use local Chrome and DDG HTML only. **Observability/DevOps** - Not explicitly requested in PDR beyond `docdexd check` validating Chrome availability and rate-limit/backoff behavior. **Assumptions** - Readability extraction suffices for all supported content types; no PDF/JS rendering beyond Chrome page load. - Cache TTL from `[web]` applies to both discovery results and cleaned pages unless overridden elsewhere (not specified). **Open Questions & Risks** - How to surface/handle DDG blocklist updates and HTTP backoff policy specifics? - Do we need per-domain concurrency caps beyond global request delay? - Handling of JS-heavy pages when readability fails; fallback strategy not specified. - Cache eviction policy/TTL granularity not fully defined. **Verification Strategy** - `docdexd check` confirms Chrome availability, scraper guards, and rate-limit readiness. - Automated tests: rate-limit enforcement (≥2s DDG, ≥1s fetch), cache reuse, waterfall trigger on confidence gate, Chrome lifecycle guard (no zombies). - Manual tests: `web-search` and `web-fetch` commands; `web-rag` end-to-end with token budgeting and context priority enforcement. ### LLM and Embeddings LLM/embedding layer is Ollama-only for both generation and embeddings, operating locally within the per-repo daemon constraint. It must respect token budgets, stream responses, and stay repo-scoped by construction. - **Provider/Models**: `[llm] provider` is configurable (default `ollama`) with `base_url`; `default_model` for chat (hardware-guided selection) and `embedding_model` for sqlite-vec memory and search enrichment. No paid APIs by default; warn if provider is unknown or missing required config. - **Invocation & Surfaces**: One call path shared by CLI/HTTP/MCP. `/v1/chat/completions` (OpenAI-compatible) and `docdexd chat` route to Ollama; HTTP defaults to the daemon repo and validates any provided repo id. MCP tools reuse the same pipe. Streaming responses required. - **Token Budgeting**: Pre-call budgeting per request: \~10% system prompt, 20% memory (if enabled), 50% repo/library/web context, 20% generation buffer. Drop lowest-priority snippets first (library/web before repo before memory) with logging. Enforce `max_answer_tokens` from config. - **Context Assembly**: Priority order Memory → Repo code (Tantivy \+ symbols) → Library/web artifacts. Library docs treated as Tier-1 support. Waterfall orchestrator only escalates to web when confidence \< `web_trigger_threshold` or explicitly forced. - **Repo Isolation**: CLI/MCP calls require repo id/path; HTTP defaults to the daemon repo and validates any provided repo id. Embeddings and memory stored per-repo (`state/repos/<fingerprint>/memory.db`), no cross-repo bleed. Unknown/unindexed repo returns clear error. - **Embeddings**: Ollama embedding model only; used for memory\_store/recall, local rerank (optional), and any vector similarity in sqlite-vec. No external vector DB. - **Hardware Awareness**: `llm-list` detects RAM/VRAM and recommends models (e.g., `phi3.5:3.8b` default, `:70b` if resources and installed; ultra-light if \<8GB RAM). `llm-setup` ensures `ollama` in PATH and guides pulls; npm postinstall may prompt and installs only on explicit confirmation. - **Reliability & Limits**: Streaming must tolerate backpressure; apply timeouts/retries aligned with daemon defaults. Ensure daemon startup (`check`) validates Ollama reachability/models and budget configuration. Token overflow mitigated by pruning per priorities above. - **Security/Privacy**: Local-only by default (bind 127.0.0.1); when `--expose`, require auth token on HTTP/MCP. No telemetry; prompts and inference stay local. - **Observability**: Log model used, token budget decisions, truncation events, and repo id; avoid logging sensitive prompt content. Additional metrics not requested in PDR. - **DevOps**: No external dependencies beyond Ollama binary/models. Binaries distributed via npm wrapper; preserve state layout. Chrome/browser not part of LLM path (only web tier). **Open Questions & Risks** - Clarify default `embedding_model` value and size guidance per hardware tier. - Define concrete timeout/retry policy for Ollama streaming under load. - Confirm whether rerank uses embeddings and how to toggle it per repo/config. - Risk: large models pulled without hardware fit; mitigation via `llm-list` gating and warnings. - Risk: token budgeting misconfiguration leading to truncation of high-priority context; need guardrails and logs. **Verification Strategy** - `docdexd check` validates provider reachability, required models present, and token budget config sane. - Unit/integ tests: enforce repo scoping; assert memory isolation across per-repo daemons. - Budgeting tests: construct oversized contexts and verify priority-based truncation and logging. - Streaming tests: ensure chunked output end-to-end via CLI and `/v1/chat/completions`. - Hardware-guidance tests: simulate RAM/VRAM tiers and assert model recommendations/warnings. ### Memory and Reasoning DAG This section defines per-repo long-term memory (sqlite-vec) and reasoning DAG logging, aligned to per-repo daemon isolation. Both live under `~/.docdex/state/repos/<fingerprint>/` and are always scoped by repo selection. **Architectural intent** - Provide repo-scoped recall to prioritize grounded answers (memory precedes code/library/web context). - Capture reasoning sessions as DAGs for auditability and tooling (CLI/MCP/dashboard). - Preserve local-first, zero-cost operation using sqlite-vec and sqlite for DAG logging. **Components and data** - `memory.db` (sqlite-vec): tables `memories` (id UUID, content TEXT, embedding BLOB, created_at INT, metadata JSON), `memory_vec` (vec0 embedding table), and `memory_meta` (embedding_dim, schema_version key/value). Ollama embeddings only. - `dag.db` (sqlite): node types `UserRequest`, `Thought`, `ToolCall`, `Observation`, `Decision`; session-scoped logging. - Repo Manager: ensures per-repo initialization/closing without cross-repo access. - Embedding/model config: uses `[llm]` `embedding_model` via Ollama; no external vector DB. **Interactions** - Memory store: `memory_store(text, metadata, repo)` computes embedding via Ollama, inserts into `memory.db`. - Memory recall: `memory_recall(query, repo)` embeds query, performs sqlite-vec similarity, feeds top hits into context assembly with highest priority budget slice. - DAG logging: each chat/waterfall session creates a session id; nodes appended with timestamps and minimal metadata (tool name, repo path/id). No cross-session edges. - Viewing: `docdexd dag view --repo <path> <session_id>` renders text or DOT from `dag.db`. - Context assembly order: Memory → Repo index → Library/Web; token budget enforced before LLM call. **Scalability and reliability** - sqlite-vec per repo; no cross-repo queries. - Embedding calls stay local; no paid/network calls. Performance target aligns with p95 \< 50ms local search; memory recall should stay within that budget (assumes moderate memory set). - DAG writes are lightweight appends; expected to remain small per session. **Security and privacy** - Local-only storage; bound to repo fingerprinted path. No telemetry. Token auth only when HTTP/MCP exposed; repo id/path required on every call. **Observability** - Log memory store/recall events (repo, counts, latency) and DAG session creation/render calls. No additional metrics beyond standard logging requested. **DevOps** - Managed by daemon lifecycle; `docdexd check` validates sqlite RW and Ollama reachability. No migration tooling specified; schema assumed stable per phase. **Assumptions** - Memory size per repo is modest; no sharding/compaction requirements defined. - DOT rendering is generated on demand from stored nodes; no precomputed layouts. - Embedding model availability is ensured by `[llm]` config and `llm-setup`. - **Open Questions & Risks** - Memory growth limits/compaction strategy not defined; potential bloat over long use. - Concurrency semantics for memory insert/recall under multi-client access not specified. - DAG node schema may need expansion (costs, token counts); change protocol unclear. - Performance expectations for large memory tables relative to p95 targets need validation. - **Verification Strategy** - Automated tests: memory\_store/recall correctness per repo; isolation (no cross-repo results). - CLI: `docdexd dag view --repo <path> <session_id>` renders expected nodes; memory recall returns stored items. - `docdexd check` confirms `memory.db`/`dag.db` RW and Ollama embedding availability. - Performance checks: recall latency within local search budget on representative dataset. ## Data Management and Storage {#data-management-and-storage} Architectural intent: enforce per-repo isolation while sharing global caches, keeping all data local-by-default; guarantee deterministic layout keyed by repo fingerprint so multiple per-repo daemons can serve multiple repos without cross-contamination, while supporting fast search, memory, symbols, and DAG logging. ### Directory and Fingerprint Layout - Repo fingerprint: SHA256 of normalized repo path; all per-repo paths nested under `~/.docdex/state/repos/<fingerprint>/`. - Per-repo subdirs/files: `index/` (Tantivy source), `libs_index/` (ingested library docs), `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`. - Global/shared: `~/.docdex/state/cache/web/` (raw HTML \+ cleaned JSON), `cache/libs/<ecosystem>/<pkg>/` (fetched docs), `locks/` (browser/process guards). - Repo Manager duties: lazily create per-repo dirs on first touch; enforce RW checks before use. ### Schemas and Indexes - Source index (Tantivy): BM25 primary; fields include path, content, lang, offsets; optional local rerank later (still Tantivy-based per PDR). - Library index: Tantivy index under per-repo `libs_index/`, ingesting documents from global cache; treated as Tier-1 alongside source. - Memory (`memory.db`): sqlite-vec tables `memories` (id UUID, content TEXT, embedding BLOB, created_at INT, metadata JSON), `memory_vec` (vec0 embedding table), and `memory_meta` (embedding_dim, schema_version key/value); per-repo only, no cross-repo queries. - Symbols (`symbols.db`): Tree-sitter extraction stored with `name, kind, file_path, line_start, line_end, signature`; supports impact graph. - DAG (`dag.db`): node types {UserRequest, Thought, ToolCall, Observation, Decision}; logged per session. - Impact graph (Phase 6): directed edges from imports; served via `GET /v1/graph/impact` scoped by repo fingerprint. - Data contracts: CLI/MCP require repo id/path; HTTP defaults to the daemon repo and validates any provided repo id. Unknown/unindexed repo returns clear error. ### Caching Strategy - Web cache: global `cache/web/`; reused across repos; scraper enforces ≥1s per-domain fetch delay and ≥2s DDG discovery gap; guarded lifecycle to avoid zombie Chrome. - Library cache: global `cache/libs/<ecosystem>/<pkg>/`; ingestion into per-repo `libs_index/` only (no direct reads); ingest sources must be under repo root or `cache/libs` to prevent bleed. - Waterfall: Tier 1 (repo index \+ per-repo ingested libs) → Tier 2 (web discovery/fetch using cache; ingested per repo) → Tier 3 (memory/DAG context); escalation only when local score below `web_trigger_threshold` or explicitly forced. - Eviction: not required for per-repo daemons; caches persist until TTL/purge (TTL for web defined in config). Open Questions & Risks - Clarify exact Tantivy schema fields and analyzers; PDR leaves flexible. - Define cache TTL/purge policy for web and library caches (config mentions TTL but not default). - Concurrency semantics when two repos ingest the same cached library doc—need locking or idempotent writes. - Impact graph storage format uses `docdex.impact_graph` schema metadata (current v2) with in-memory migration for legacy files; reindex to persist upgrades. - Risk: fingerprint collisions theoretically possible but negligible with SHA256; document assumption. Verification Strategy - `docdexd check` validates RW on `~/.docdex/state`, presence of per-repo dirs, and Chrome/Ollama availability. - Unit/integration: repo isolation under concurrent access across per-repo daemons; prevent cross-repo reads. - Rate-limit tests for scraper/discovery honoring delays and cache reuse. - Schema migrations: initialize/upgrade `memory.db`, `symbols.db`, `dag.db` deterministically and reject cross-repo access by fingerprint; `impact_graph.json` uses schema metadata + migration guards. - Functional: missing repo/index errors are clear; library ingestion only populates target repo `libs_index`. ### Directory and Fingerprint Layout Architectural intent: enforce per-repo isolation while enabling shared, zero-cost caches; deterministic fingerprints prevent path ambiguity across per-repo daemons. - **Fingerprinting**: SHA256 of normalized repo path; required for all per-repo state resolution and repo\_id references. Normalization definition must be consistent across CLI/HTTP/MCP surfaces (assumption: resolved realpath, lowercase on case-insensitive FS; confirm for Windows/WSL). - **Per-repo state root**: `~/.docdex/state/repos/<fingerprint>/`. Created lazily by Repo Manager after RW validation. Per-repo daemons close DB/index handles on shutdown. - **Per-repo contents** (all required, no cross-repo mixing): - `index/` (Tantivy source index) - `libs_index/` (Tantivy index of ingested library docs) - `memory.db` (sqlite-vec) for long-term memory - `symbols.db` (Tree-sitter symbols) - `dag.db` (reasoning DAG) - **Per-repo manifest**: `repo_meta.json` at the repo root includes `fingerprint_sha256`, `fingerprint_version`, `canonical_path`, `created_at_epoch_ms`, and `last_seen_at_epoch_ms` to support diagnostics and migrations. - **Shared caches** (global, reused across repos but ingested per repo): - `~/.docdex/state/cache/web/` for raw HTML \+ cleaned JSON from web fetches - `~/.docdex/state/cache/libs/<ecosystem>/<pkg>/` for scraped library docs - **Locks and guards**: `~/.docdex/state/locks/` for browser/process guards; Repo Manager enforces per-repo invariants. - **Interactions**: Repo Manager maps repo path → fingerprint → per-repo directories; Waterfall uses per-repo index/libs\_index/memory; library/web caches are read-only to non-owner repos until ingested into that repo’s `libs_index`. - **Security/Isolation**: No cross-repo memory/symbol/DAG access; all per-repo paths scoped by fingerprint; default localhost binding applies to any API that references these paths. - **Reliability/cleanup**: Close Tantivy/sqlite handles on shutdown; Chrome guard uses `locks/` to avoid zombie processes; directories persist across restarts/upgrades. Open Questions & Risks - Define exact path normalization rules across platforms (case sensitivity, symlinks, UNC/WSL). - Policy for cleaning orphaned per-repo directories when repos are unused long-term. - Concurrency: ensure atomic creation of per-repo directories under parallel `index`/`chat`. - Cache poisoning risk if web/libs cache lacks integrity checks; consider hashing fetched content. Verification Strategy - Unit: fingerprint derivation idempotence across repeated calls and platforms. - Integration: `docdexd check` validates RW on `state/`, presence/permissions of per-repo subdirs, and lock dir. - Concurrency tests: parallel repo opens across per-repo daemons ensure handles closed and no cross-repo writes. - Functional: per-repo isolation validated by querying memory/symbols/DAG for one repo and confirming absence in another; shared cache reuse verified via ingestion logs. ### Schemas and Indexes Architectural intent: define per-repo and global storage schemas that support low-latency local search, code intelligence, long-term memory, and reasoning traces while preserving strict repo isolation and deterministic data lifecycles. **Components & Data Contracts** - `memory.db` (per repo, sqlite-vec): `memories` table plus `memory_vec` (vec0) and `memory_meta` (embedding_dim, schema_version); embeddings from Ollama; queried via vector search; prioritized in context assembly. - `symbols.db` (per repo): Tree-sitter extracted symbols for Rust/TypeScript/JavaScript/Python/Go/Java/C#/C/C++/PHP/Kotlin/Swift/Ruby/Lua/Dart with columns `{name, kind, file_path, line_start, line_end, signature}`; enables symbol search and impact analysis inputs. - `dag.db` (per repo): nodes table with `type ENUM(UserRequest|Thought|ToolCall|Observation|Decision)`, `session_id`, `payload JSON`, `created_at`; edges implied by `session_id` \+ ordering (PDR: DAG logging and view). - `index/` (per repo, Tantivy): source index for repo code; `libs_index/` for ingested library docs; both scoped by repo fingerprint to prevent cross-contamination. - `cache/web` and `cache/libs` (global read-mostly): raw HTML/cleaned JSON and cached library docs; ingestion into per-repo indexes is explicit. - Impact graph (per repo, `impact_graph.json` at the repo state root): directed edges derived from imports; API `GET /v1/graph/impact` returns schema-tagged inbound/outbound deps keyed by `file` (directed `source -> target`). **Interactions** - Repo Manager initializes per-repo `index/, libs_index/, memory.db, symbols.db, dag.db, impact_graph.json` under `state/repos/<fingerprint>/`. - Waterfall: Tier-1 search hits Tantivy indexes (`index/`, `libs_index/`) and can merge with `memory.db` results; confidence gate controls web fetch/caching. - Indexing flow: `docdexd index --repo` populates Tantivy and `symbols.db`; dependency extraction feeds impact graph edges. - Memory ops: `memory_store` writes to `memory.db`; `memory_recall` vector-searches embeddings. - DAG logging: per session, append node rows; `dag view --repo <path> <session_id>` renders text/DOT from stored nodes. **Scalability & Reliability** - Performance targets: local search p95 \< 50ms; use repo-scoped indexes to keep postings bounded. - Concurrency: multiple per-repo daemons serve multiple repos; schema design keeps no cross-repo tables to avoid locking contention and bleed. - Caching: global caches reused but ingestion remains repo-scoped; prevents cache stampedes by reuse of fetched artifacts. **Security & Isolation** - Repo fingerprinted paths enforce isolation; no cross-repo queries for memory/DAG/symbols/impact. - When `--expose`, token auth enforced (per PDR); HTTP defaults to daemon repo while MCP requires repo id/path. **Observability & DevOps** - Not requested in PDR; minimal requirement: `docdexd check` validates DB presence/permissions, Ollama/Chrome availability; logs errors for missing indexes/DBs. **Assumptions** - Impact graph edges stored in per-repo `impact_graph.json` under the repo state root; schema matches `docdex.impact_graph` response requirements and carries version metadata. Legacy files are accepted and migrated in-memory; reindex to persist upgrades. - No cross-repo memory or DAG aggregation is needed. **Open Questions & Risks** - Do we need migrations/versioning for `memory.db`, `symbols.db`, `dag.db` as schemas evolve? - How to handle symbol kinds/signatures across languages uniformly (Tree-sitter node mapping consistency)? - Resource risk: large repos may push Tantivy index size; need bounds/compaction policy. **Verification Strategy** - `docdexd check` confirms per-repo DBs/indexes exist and are writable; validates Ollama reachability and Chrome guard. - Indexing test: `index --repo` followed by sample search ensures Tantivy and `symbols.db` populated. - Memory test: `memory_store` then `memory_recall` returns stored content with embedding search functioning. - DAG test: execute chat/session to generate nodes; `dag view` renders expected sequence/DOT. - Impact API test: call `GET /v1/graph/impact` on known deps to verify inbound/outbound edge retrieval. ### Caching Strategy Docdex v2.0 uses caches to avoid re-fetching external sources while keeping per-repo isolation. Global caches store fetched web pages and library docs; ingestion into per-repo indexes remains repo-scoped to prevent bleed. - **Cache scopes**: Global `cache/web/` holds raw HTML and cleaned JSON from DuckDuckGo discovery \+ headless Chrome fetches. Global `cache/libs/<ecosystem>/<pkg>/` stores scraped library docs keyed by ecosystem/package. No cross-repo memory/DAG caching by design. - **Reuse model**: Cached web pages and library docs are reused across repos, but ingestion into each repo’s Tantivy `index/` and `libs_index/` is performed per repo; Repo Manager enforces fingerprinted paths. - **Freshness/TTL**: PDR calls for configurable web cache TTL via `[web] cache_ttl_secs`; reuse is preferred until TTL expiry, then refetch. Library cache TTL not specified; assume long-lived unless invalidated manually or on version change (open question). - **Write paths and guards**: Cache directories live under `~/.docdex/state/cache/...`; Repo Manager must validate RW on startup (`docdexd check`) and ensure no writes occur outside fingerprinted state roots. - **Waterfall interaction**: Waterfall tiering treats cached library docs as Tier-1 (local) once ingested; cached web content is Tier-2 but may bypass live fetch if cache hit is valid. Confidence gate (`web_trigger_threshold`) still applies. - **Concurrency and eviction**: Per-repo daemons only close per-repo handles on shutdown; cached artifacts persist globally. Cache eviction policy beyond TTL is not specified; default is unbounded within disk limits (risk). - **Observability**: Log cache hits/misses and TTL expiry decisions; `docdexd check` should report cache directory health. Metrics beyond this are not requested in PDR. - **Security/privacy**: Caches remain local-only; no upload/telemetry. When `--expose`, token auth protects HTTP/MCP, but caches stay on disk without extra encryption (not requested). Open Questions & Risks - Library cache TTL/versioning policy is unspecified; risk of stale docs across package updates. - Disk growth risk without eviction for `cache/web` and `cache/libs`. - Cache keying for web fetches: normalize URLs vs. raw; redirects/canonicalization handling not described. - Cache corruption/rebuild path (e.g., partial files) is not defined. Verification Strategy - `docdexd check` validates cache directories exist and are writable; fails fast otherwise. - Unit/integration tests: cache hit/miss paths for web fetch, TTL expiry triggers refetch, per-repo ingestion uses cached artifacts without cross-repo writes. - Concurrency tests: simultaneous fetches use the same cache entry without races/corruption (locks/guards). - Disk-bound tests (if added later): ensure behavior under low-space conditions or large cache growth. ## Interfaces and Integrations {#interfaces-and-integrations} Docdex v2.0 exposes local-first surfaces (CLI, HTTP, MCP) per repo, each served by a per-repo `docdexd` daemon. Integrations stay zero-cost and local by default (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec), with strict scoping to prevent cross-repo contamination. ### CLI Commands - Surfaces: `check`, `index --repo <path>`, `chat --repo <path> [--query <q>]`, `llm-list`, `llm-setup`, `web-search "<query>"`, `web-fetch <url>`, `web-rag "<q>" --repo`, `libs fetch --repo`, `dag view --repo <path> <session_id>`, `run-tests --repo <path> --target <file|dir>`, `mcp`, `tui`. - Behavior: every operation requiring repo context mandates `--repo <path>`; unknown/unindexed repo returns a clear error. Waterfall (local → web → cognition) is triggered by `web-trigger-threshold` or explicit web commands. - Token budgeting and streaming: CLI chat/web commands stream Ollama responses; budgets enforce priority (Memory \> Repo \> Lib/Web). - Exposure: `--expose` (on daemon) requires token auth; otherwise bind is `127.0.0.1`. ### HTTP API - Endpoints: `POST /v1/chat/completions` (OpenAI-compatible) and `GET /v1/graph/impact?repo_id=<id>&file=<path>` (handler in `src/api/v1/graph.rs`, routed from `src/search/mod.rs`). - Repo routing: repo provided via body/header/query; missing/unknown repo is an error. Waterfall and token budgeting mirror CLI behavior; responses stream. - Security: local bind by default; token required when exposed; no telemetry or paid API usage. ### MCP Server - MCP: shared HTTP/SSE endpoint on the singleton daemon plus legacy stdio `docdexd mcp --repo <path>`; tools require `project_root`/`repo_path` unless `initialize` sets a default. - Tools: `docdex_search`, `docdex_web_research`, `docdex_memory_save`, `docdex_memory_recall`; errors on unknown/unindexed repo. - Lifecycle: runs alongside HTTP within each per-repo daemon; no multi-repo server mode. ### Local Dependencies - LLM/embeddings: Provider-configured; models recommended via hardware-aware `llm-list`/`llm-setup`. - Retrieval: Tantivy for source/libs indexes; sqlite-vec for per-repo memory; Tree-sitter for symbols; headless Chrome (guarded) plus DuckDuckGo HTML for discovery/fetch with rate limits and caching. - Caching/state: `~/.docdex/state/` per-repo fingerprints; shared caches for web and libs but ingested per repo. **Open Questions & Risks** - Need explicit error contract formats for HTTP/MCP when repo is missing or index is stale. - Clarify auth header/key format for `--expose` mode across CLI/HTTP/MCP clients. - Risk: Chrome lifecycle/zombie processes impacting MCP/HTTP availability; ensure guard hooks cover daemon crashes. - Risk: Waterfall thresholds may differ per surface; confirm single source of truth in config. **Verification Strategy** - CLI: run `docdexd check`, `index`, `chat`, `web-search/fetch/rag`, `libs fetch`, `dag view`, `run-tests` against a known repo; assert repo-required errors on omission. - HTTP: call `/v1/chat/completions` and `/v1/graph/impact` with and without repo ids; verify streaming and token budgeting enforcement. - MCP: invoke each tool with/without valid repo; assert per-repo routing and clear errors. - Dependency checks: ensure Ollama reachable/models present; Chrome availability and rate-limit enforcement; cache directories writable. ### CLI Commands Repo-scoped CLI entry points exposed by `docdexd` (daemon) and `docdex` (wrapper). All commands require explicit repo selection where applicable to preserve per-repo isolation and align with per-repo daemon intent. - **Command Surface & Scope** - Core readiness: `docdexd check` validates config RW, state layout, repo registry, Ollama reachability/models, headless Chrome availability, HTTP bind, MCP enablement. - Repo indexing/chat: `index --repo <path>` builds Tantivy \+ symbols \+ dag/lib scaffolding; `chat --repo <path> [--query <q>]` runs Tier-1 local search, optional REPL if no query. - LLM ops: `llm-list` (hardware-aware recommendations from `llm_list.json`); `llm-setup` (verify `ollama` presence, list/pull models, update `[llm]` config; prompt-based install only). - Web waterfall: `web-search "<query>"`, `web-fetch <url>`, `web-rag "<question>" --repo <path>` triggering discovery→scrape→cache with rate limits. - Library docs: `libs fetch --repo <path>` detects deps (Cargo/Node/Python), scrapes docs, caches under `cache/libs`, ingests into repo `libs_index`. - DAG: `dag view --repo <path> <session_id>` renders text/DOT from `dag.db`. - Tests: `run-tests --repo <path> --target <file_or_dir>` executes configured test command locally; returns structured JSON. - MCP/TUI: shared MCP is served by `docdexd daemon` over HTTP/SSE; `mcp` still starts a per-repo stdio MCP server; `tui` shells out to the `docdex-tui` binary (override with `DOCDEX_TUI_BIN`) as a local exception. - HTTP alignment: CLI routes to daemon HTTP/MCP surfaces; enforces repo id/path on every call (except local-only `run-tests`/`tui`). - **Interactions & Data Flow** - Commands invoke daemon APIs; daemon resolves repo fingerprint → per-repo state dirs (`index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`). Local-only exceptions: `run-tests` and `tui`. - CLI base URL derives from `server.http_bind_addr` unless `DOCDEX_HTTP_BASE_URL` is set; `DOCDEX_CLI_LOCAL=1` forces legacy local execution when the daemon is unavailable. - Waterfall commands share caches: `cache/web` (HTML \+ cleaned JSON) and `cache/libs`; ingestion is repo-scoped. - Token budgeting for chat/web-rag enforced by daemon (not CLI); CLI streams outputs from Ollama via daemon. - **Reliability & Resource Discipline** - `check` surfaces readiness/errors (missing index, models, Chrome). - Commands error clearly if repo unknown/unindexed. - Web commands respect DDG (≥2s) and fetch (≥1s) delays; Chrome guarded to avoid zombies. - **Security/Privacy** - Default localhost bind; `--expose` requires token; CLI must pass token when remote. - Stdio MCP enforces `auth_token` in `initialize` only when `DOCDEX_AUTH_TOKEN`/`--auth-token` is supplied (auto-started MCP inherits the daemon token). - No paid APIs; offline-first; web only on low confidence or explicit web commands. - All repo-scoped commands require repo arg to prevent cross-repo bleed; MCP tools mirror this. - **Observability** - Not detailed in PDR for CLI; rely on daemon logs for command outcomes and rate-limit notices. - **Scalability** - CLI defers to the per-repo daemon; run separate daemons per repo; performance target p95 local search \<50ms upheld by daemon. - **DevOps** - Config at `~/.docdex/config.toml` auto-created; CLI should warn if provider ≠ Ollama. Ollama installs remain prompt-based; Playwright browser installs are opt-out and can run via setup/auto-install. - **Assumptions** - CLI is a thin client; heavy work lives in the daemon except `run-tests` and `tui` local execution. - HTTP and MCP endpoints already authenticated/authorized by daemon when exposed. - Test command config provided per repo (outside scope here). - **Open Questions & Risks** - How is `run-tests` test command configured/discovered per repo? (config key vs repo file) - Exact output schema for `run-tests` is not specified; DAG export formats are defined in `docs/contracts/dag_export_schema_v1.md`. - Error codes/UX for missing indexes or models could be underspecified. - TUI dependency footprint and startup guards not described. - **Verification Strategy** - Manual: run `docdexd check`, `llm-list`, `llm-setup`, `index --repo`, `chat --repo`, `web-search/fetch/rag`, `libs fetch --repo`, `dag view --repo`, `run-tests --repo`, `mcp`, `tui` with success/error paths. - Automated: CLI integration tests hitting daemon with repo-scoped fixtures; assert repo requirement enforcement and rate-limit behavior via logs. ### HTTP API - Intent: expose a single machine-local HTTP surface (default bind `127.0.0.1:3210`) that is OpenAI-compatible for chat and a repo-scoped impact graph read API, matching CLI/MCP semantics of explicit repo selection and local-first execution. **Endpoints** - `POST /v1/chat/completions`: OpenAI-compatible; requires repo identification (body/header/query). Runs RepoContext resolution → Waterfall (Tier 1 local index \+ libs, Tier 2 web on low confidence/explicit, Tier 3 cognition/memory) with token budgeting before calling Ollama; supports streaming responses. - `GET /v1/graph/impact?file=<path>`: returns schema-tagged inbound/outbound dependency edges from per-repo `symbols.db`/dependency graph (directed `source -> target`). **Behavior & Contracts** - Repo scoping: HTTP defaults to the daemon repo; validate any provided repo id/path and reject unknown/unindexed repos. - Token budgeting: fixed priority (Memory \> Repo \> Library/Web) with \~10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer; drop lowest-priority snippets first with logging. - Streaming: mirror OpenAI stream semantics for chat responses (chunked events). - Waterfall gating: web escalation only if top local score \< `web_trigger_threshold` (default 0.7) or explicitly requested. - State usage: per-repo dirs under `~/.docdex/state/repos/<fingerprint>/`; shared caches (`cache/web`, `cache/libs`) ingested per repo. - Security: localhost by default; `--expose` requires token auth checked per request; no telemetry; no paid APIs. - Performance targets: local search p95 \<50ms; typical \<20ms. - Error handling: clear errors for missing repo/index, missing models, or offline web; web rate limits enforced (≥2s DDG discovery, ≥1s fetch delay, 15s page timeout). - Observability: not requested in PDR. - DevOps: per-repo daemon; `docdexd check` validates binding, Ollama, Chrome, repo registry before serving. **Diagrams (textual)** - Sequence: Client → `/v1/chat/completions` → Repo Manager (resolve repo/fingerprint) → Waterfall Orchestrator (Tier 1 search → optional web discovery/fetch → context merge with memory/libs) → Token Budgeter → Ollama stream → Client. - Component: HTTP Server (OpenAI-compatible adapter) ↔ Repo Manager ↔ Indexes (`index/`, `libs_index`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`) ↔ Waterfall services (DiscoveryService, ScraperEngine) ↔ Ollama. Open Questions & Risks - Need exact shape of repo selector in headers/query/body for OpenAI-compatible calls (e.g., `docdex-repo-id` header vs body extension). - Token auth scheme/format when `--expose` (bearer vs custom) not specified. - Streaming chunk format: assume OpenAI SSE-compatible; confirm no deviations. - Web escalation override: exact request flag naming for “force web” is unspecified. Verification Strategy - Unit/integration: HTTP defaults to daemon repo; reject unknown repo ids. - Contract tests: OpenAI API compatibility (non-stream/stream), token budgeting enforcement, Waterfall gating at `web_trigger_threshold`. - Performance: measure local search latency p95 and streaming start time under load with ≥8 concurrent repos. - Security: token required when exposed; localhost bind default; no external paid APIs invoked. - Impact graph: validate inbound/outbound edges match stored dependency graph for known fixtures. ### MCP Server Intent: shared MCP surface for the singleton daemon (HTTP/SSE) plus legacy per-repo stdio MCP (`docdexd mcp`). Tools remain repo-scoped with clear errors on unknown/unindexed repos to avoid cross-repo bleed. Scope and components - Surface: singleton daemon exposes MCP over HTTP/SSE (`/sse`, `/v1/mcp`, `/v1/mcp/message`) on the daemon bind address. MCP `initialize` with `rootUri`/`workspace_root` calls `/v1/initialize` and binds the MCP session to that repo; per-request `project_root`/`repo_path` can override the bound repo for `/v1/mcp`. Per-repo stdio MCP remains available via `docdexd mcp`. - Auto-start: `docdexd daemon` starts the shared MCP proxy when enabled (config, `DOCDEX_ENABLE_MCP`, or `--enable-mcp`); `--disable-mcp` overrides config/env. `docdexd serve` continues to spawn a per-repo stdio MCP server when desired. - Tools (`project_root`/`repo_path` required for MCP calls unless `initialize` sets a default; validated to match the server repo): - `docdex_search`: Tier-1 local (Tantivy \+ libs\_index) search; returns ranked snippets with source metadata. - `docdex_web_research`: Waterfall gate checks `web_trigger_threshold`; on low confidence or explicit force, performs DDG discovery \+ guarded headless Chrome fetch \+ readability; ingests cache per repo before responding. - `docdex_memory_save`: persists text \+ metadata into per-repo `memory.db` (sqlite-vec). - `docdex_memory_recall`: semantic recall via Ollama embeddings scoped to repo memory. - Error handling: unknown repo path/id → explicit error; missing index/memory → instruct to `index --repo` or enable memory; web disabled/offline → clear message. - Interactions: MCP server delegates to Repo Manager (fingerprint resolution), Waterfall orchestrator, Memory service, Web cache, and Token budgeter to assemble context before tool responses. Behavior and constraints - Repo scoping: MCP tools require `project_root`/`repo_path` unless `initialize` sets a default; shared HTTP/SSE sessions use `initialize` to mount and pin a repo id; Repo Manager guards isolation. - Local-first: web tier only on confidence drop (\<`web_trigger_threshold`) or explicit request; DDG HTML \+ headless Chrome with rate limits (≥2s search, ≥1s fetch) and browser guard to avoid zombies. - Security: default localhost bind; `--expose` requires token on MCP requests; no telemetry or paid APIs. - Reliability: startup `docdexd check` validates MCP enabled, config perms, Ollama, Chrome, and repo registry; clear failures if dependencies missing. - Performance: must sustain ≥8 concurrent repos via multiple per-repo daemons; local search p95 \<50ms target; memory and libs treated as Tier-1 support. - Observability: log tool invocations with repo id, tier selection (local/web/memory), and errors; no additional metrics mandated in PDR. - DevOps: no new deployment surface; MCP shipped with daemon; uses same config/state layout (`~/.docdex/state/repos/<fingerprint>/...`). Assumptions - MCP uses existing HTTP binding/port allocation from daemon; no separate port negotiation described. - Ollama embeddings/models are reachable locally before MCP tools execute memory operations. Open Questions & Risks - Should MCP reject calls when `enable_mcp=false` with a distinct error code vs generic not-found? - How to signal rate-limit/backoff to clients (tool error vs structured retry hint)? - Risk: headless Chrome availability impacts `docdex_web_research`; must propagate actionable error instead of silent fallback. Verification Strategy - `docdexd check` confirms MCP enabled, dependencies (Ollama, Chrome), repo registry, and config RW. - Tool-level tests: ensure each tool errors on missing repo/index, respects repo isolation, and enforces confidence gate for web tier. - Concurrency tests: operate against ≥8 repos across per-repo daemons; verify no cross-repo data leakage. - Web safety tests: validate DDG/search delays, Chrome guard, and cache reuse; confirm clear errors when offline. ### Local Dependencies Docdex relies solely on locally managed, zero-cost components for LLM/embeddings, web discovery/fetch, and code intelligence. This section defines how Ollama, headless Chrome, DuckDuckGo HTML discovery, and Tree-sitter are integrated to preserve local sovereignty, reliability, and repo isolation. **Components and Roles** - LLM provider: configured via `[llm]` in `config.toml` (default `ollama`) with hardware-aware model guidance and token budgeting handled upstream. - DuckDuckGo HTML discovery: search-only HTML endpoint used for web queries; enforces ≥2s between queries. - Headless Chrome: fetch and readability extraction for discovered URLs; guarded lifecycle to avoid zombie processes; respects per-domain ≥1s fetch delay and 15s page timeout defaults. - Tree-sitter: language parsers (Rust, TypeScript/JavaScript, Python, Go, Java, C#, C/C++, PHP, Kotlin, Swift, Ruby, Lua, Dart) for symbol extraction during `index`; outputs stored in per-repo `symbols.db`. - Impact graph resolution (best-effort): import edges resolve static patterns including literal import strings, string concatenation with constant bindings, static path joins (`path.join`, `path.resolve`, `os.path.join`), template strings or f-strings with static bindings (multiple candidates use a deterministic tie-break), Python `importlib.import_module`, `importlib.util.spec_from_file_location`, `importlib.machinery.SourceFileLoader`, and Rust `mod`/`use`/`include!`. Unresolved dynamic imports are skipped and recorded in impact diagnostics. - Import hints: `docdex.import_map.json` supports mapping overrides and pattern expansions (`targets` + `expand`); runtime traces can be supplied via repo-root `docdex.import_traces.jsonl` or `<repo-state-root>/import_traces.jsonl` (toggle with `[code_intelligence].import_traces_enabled` or `DOCDEX_ENABLE_IMPORT_TRACES`). Dynamic import scan limits can be tuned via `[code_intelligence].dynamic_import_scan_limit` or `DOCDEX_DYNAMIC_IMPORT_SCAN_LIMIT`. - Parser drift policy: when stored Tree-sitter parser versions differ from the running build, Docdex invalidates symbols/AST, sets `symbols_reindex_required`, and `GET /v1/symbols`/`GET /v1/ast` return `409 stale_index` until reindex. Drift metadata is exposed via `GET /v1/symbols/status` and `docdexd symbols-status`. - AST search surface: `GET /v1/ast/search` accepts `kinds` (node kinds), `mode` (`any|all`), and `limit` to list files matching AST criteria for richer code intelligence queries. - AST query surface: `POST /v1/ast/query` accepts `kinds`, optional `name`/`field`/`pathPrefix`, and `mode`/`limit`/`sampleLimit` to return per-file match counts plus sample nodes. - Ranking signals: symbol/AST boosts apply per-kind weights and optional name matches; enable/disable via `[search].symbol_ranking_enabled`, `[search].ast_ranking_enabled`, `[search].chat_symbol_ranking_enabled`, `[search].chat_ast_ranking_enabled` or env overrides. **Interactions and Data Flow** - Waterfall Tier-2 (web) path: Local search confidence below `web_trigger_threshold` or explicit user request → DuckDuckGo discovery (rate-limited) → headless Chrome fetch with readability → raw HTML \+ cleaned JSON cached under `cache/web/` → ingested per repo when merged into context. - Library docs path: Dependency detectors resolve docs URLs → headless Chrome fetch with same guardrails → cached under `cache/libs/<ecosystem>/<pkg>/` → ingested into per-repo `libs_index`, treated as Tier-1 support content. - Indexing path: `docdexd index --repo` invokes Tree-sitter to emit symbols (name, kind, file\_path, line\_start/line\_end, signature) into `symbols.db`; Repo Manager ensures per-repo isolation via SHA256 fingerprinted paths. - Diff-aware RAG path: when diff inputs are provided (CLI diff flags or `docdex.diff` in `/v1/chat/completions`), collect git diff (working tree, staged, or range; optionally path-scoped), expand to 1-hop dependencies using the impact graph, and assemble a diff context slice. Context ordering is Memory → Diff → Repo → Lib/Web with token budgets and drop logging. Dynamic imports resolve best-effort (literals, concatenations, static joins, template strings with static bindings, optional `docdex.import_map.json` hints); unresolved imports are skipped and reported via diagnostics/logs. - LLM path: Completions and embeddings use the configured provider; local Ollama is the default. No implicit cloud fallback; `[llm]` warns if provider is unknown or missing required config. **Operational Guardrails** - Local-only by default: daemon binds to 127.0.0.1 unless `--expose` is set; when exposed, token auth enforced on HTTP/MCP surfaces. - Resource controls: Chrome lifecycle guarded with locks and teardown; per-repo daemons close DB/index handles on shutdown. - Caching behavior: Web/library caches are global but ingestion is repo-scoped; no cross-repo memory or symbol bleed. - Observability: Dependency readiness surfaced via `docdexd check` (Ollama reachability, Chrome availability, model presence); additional telemetry not requested in PDR. - Security: No paid APIs; no external egress beyond explicit web fetch; blocklist honored during discovery. **Assumptions** - Ollama binary is user-installed and present in PATH; SDS will not automate installation. - Chrome/Chromium available locally and supports headless mode. - Tree-sitter grammars for the supported languages are bundled or vendored; additional languages out of scope for this phase. - Open Questions & Risks - What is the exact mechanism for Chrome guard/locks to ensure zero zombies under crash conditions? - Do we need configurable per-domain rate limits beyond the stated defaults (≥1s fetch, ≥2s discovery)? - How are Tree-sitter parser versions managed to avoid AST drift across releases? - Fallback behavior if Chrome is unavailable (skip web tier vs. fail request) is not explicitly specified. - Cache eviction/TTL policies for `cache/web` and `cache/libs` are not defined; risk of unbounded disk growth. - Verification Strategy - `docdexd check` validates Ollama reachability/models, Chrome availability, repo registry, and bind availability; MCP spawn probe runs when `DOCDEX_CHECK_MCP_SPAWN=1` (timeout via `DOCDEX_CHECK_MCP_SPAWN_TIMEOUT_MS`). - Rate-limit tests: assert ≥2s between DuckDuckGo queries and ≥1s between fetches; ensure errors/backoff on HTTP failures. - Chrome lifecycle tests: start/stop under load and crash injection to confirm no lingering processes and lock cleanup. - Tree-sitter extraction tests across supported languages to confirm symbols populated in `symbols.db` with correct spans. - Web/library cache tests: fetch → cache → ingest per repo; verify no cross-repo contamination. ## Runtime, Deployment, and Operations {#runtime,-deployment,-and-operations} **Intent**: Operate per-repo `docdexd` daemons that are localhost-bound by default, resource-disciplined, repo-scoped, and observable, while avoiding external costs and preventing browser/process leaks. Clustered/multi-tenant deployment modes are out of scope. ### Daemon Lifecycle and Binding - Startup can run a preflight check (`--preflight-check` or `DOCDEX_PREFLIGHT_CHECK=1`): validates config readability/writability, state dirs, Ollama reachability/models, headless Chrome availability, repo registry, bind availability, and MCP readiness. Fails fast with actionable errors when enabled. - Binding: default `127.0.0.1:3210`. `--expose` optional; when set, all HTTP/MCP surfaces require token auth (from env/config). No telemetry. - Each per-repo daemon hosts one HTTP API and one MCP server; CLI and TUI connect locally. Run one daemon per repo. - Chrome/browsers: headless lifecycle guarded; cleanup on exit/panic; locks under `state/locks/` to prevent concurrent zombie instances. - Single daemon mode (install-and-forget, planned): run one global daemon with a lockfile (`~/.docdex/daemon.lock`), mount repos dynamically on initialize, and auto-start from CLI when needed. ### Resource and Concurrency Controls - Repo lifecycle: per-repo daemon manages a single repo; DB/index handles close on shutdown. - RAM/VRAM-aware LLM guidance; defaults tuned to keep idle daemon \<100MB, indexing \<1GB (configurable). - Web access rate limits: ≥2s between DuckDuckGo searches; ≥1s per-domain fetch delay; page timeout default 15s; bounded Chrome concurrency. - Token budgeting: \~10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer; lowest-priority snippets dropped first when over budget. ### Security and Privacy - Local-first: offline by default; web escalation only on low confidence (`web_trigger_threshold`, default 0.7) or explicit request. - Authentication: token required when `--expose` is used; reject unauthenticated remote HTTP/MCP calls. No paid APIs or telemetry; only local/open components (Ollama, DuckDuckGo HTML, headless Chrome). - Data isolation: per-repo state under `state/repos/<fingerprint>/`; no cross-repo memory/index/DAG bleed; global caches are read/ingest-only per repo. ### Observability and Health - Health/readiness via `docdexd check` output; includes Chrome guard status, model availability, repo registry, and bind availability. - Logging: honor `log_level` from config; log rate-limit decisions, waterfall escalations, and browser lifecycle events. - DAG and memory are per repo; exposed via CLI/HTTP/MCP only where specified—no additional telemetry channels. ### Configuration Management - Config at `~/.docdex/config.toml`; auto-created with localhost defaults. Validates RW access to `global_state_dir`. - Key sections enforced: `[core]`, `[llm]`, `[search]`, `[web]`, `[web.scraper]`, `[memory]`, `[server]`. Warn if `provider` is unknown or missing required config. - State layout under `~/.docdex/state/` with repo fingerprints; includes `index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`, `cache/web`, `cache/libs`, `locks/`. - No silent auto-install of Ollama; Playwright browser installs are opt-out and run via setup/auto-install. `llm-setup` provides guidance, and npm postinstall may prompt for explicit installs. **Open Questions & Risks** - Should health be exposed as a lightweight HTTP endpoint in addition to `docdexd check`? (Not specified in PDR.) - Token distribution percentages: are they configurable or fixed constants? - Chrome guard failure modes: is there a retry/backoff policy beyond startup validation? - Risk: Resource exhaustion if many concurrent web requests despite rate limits—need clear error surfaces. **Verification Strategy** - Run `docdexd check` (or enable `--preflight-check`) to validate config/state/Ollama/Chrome/bind/MCP before serving. - Concurrency tests: multiple per-repo daemons under load; ensure handles close on shutdown. - Web safety: enforce DDG ≥2s interval and per-domain ≥1s delay; verify Chrome teardown and no zombie processes. - Security: attempt exposed-mode calls without token → expect rejection; verify localhost-only bind by default. - Token budgeting: construct over-budget requests to confirm lower-priority context is dropped first with logging. - Isolation: concurrent per-repo daemons show no cross-repo data or memory bleed. ### Daemon Lifecycle and Binding Per-repo `docdexd` processes expose one HTTP API and one MCP server each. Architectural intent: keep the daemon private by default (127.0.0.1:3210), allow optional exposure only with explicit user action and token auth, and ensure lifecycle guards prevent zombie processes or orphaned browser instances. - **Process model**: One `docdexd` per repo; multi-repo access comes from running multiple daemons. No clustered multi-tenant mode (out of scope per PDR). - **Planned singleton mode**: transition to one global daemon (`docdexd daemon`) with a lockfile at `~/.docdex/daemon.lock` and dynamic repo mounting; CLI pings and auto-starts the daemon as needed. - **Default binding**: Bind to `127.0.0.1:3210` from `[server] http_bind_addr`. MCP enabled by default (`enable_mcp=true`), sharing the same bind/interface posture; override via `DOCDEX_ENABLE_MCP` or `--disable-mcp`. - **Exposed mode**: `--expose` (or equivalent config override) permits non-localhost binding; requires token authentication provided via env/config. Token is enforced on HTTP and MCP requests when exposed; reject unauthenticated requests. - **Startup validation**: `docdexd check` ensures the bind address is free, permissions on `global_state_dir` are valid, Ollama and headless Chrome are reachable, and MCP can start when spawn checks are enabled (`DOCDEX_CHECK_MCP_SPAWN=1`). Preflight mode forces MCP spawn checks when MCP is enabled. - **Shutdown/guard rails**: Browser guard ensures headless Chrome is started/stopped cleanly; lock directories under `~/.docdex/state/locks/` prevent zombie Chrome processes. On panic/exit, ensure teardown routines run to avoid lingering processes. - **Resource limits (relevant here)**: DB/index handles are closed on shutdown; Chrome fetch concurrency is bounded; timeouts apply (page load \~15s). - **Security posture**: Local-only by default; zero telemetry; no paid APIs. When exposed, token auth is mandatory; otherwise reject. No other authentication modes are specified in PDR. - **Observability**: PDR does not request additional logging/tracing specifics here beyond readiness checks and error surfacing on failed binds or missing dependencies. Open Questions & Risks - How is the expose token configured/rotated (env var vs config field) and is reload without restart needed? - Should daemon refuse `--expose` when token missing/blank, and how is this surfaced to callers? - Are there OS-specific bind constraints (e.g., Windows loopback) that need explicit handling? - What is the expected behavior when MCP is disabled but HTTP is enabled (and vice versa) in exposed mode? Verification Strategy - `docdexd check` confirms bind availability, token presence when `--expose`, MCP readiness, Ollama/Chrome availability, and state directory permissions. - Integration tests: start daemon on default localhost, assert HTTP/MCP reachable only locally; start with `--expose` \+ token, assert remote access works with token and fails without. - Lifecycle tests: start/stop daemon repeatedly with web fetches to ensure Chrome processes are cleaned up; verify locks directory is empty after shutdown. - Resource tests: repeated start/stop cycles confirm handles close and daemon remains responsive. ### Resource and Concurrency Controls Architectural intent: keep a per-repo `docdexd` responsive on commodity machines by bounding repo footprint, browser usage, and external fetch rates while preventing resource bleed across repos. **Repo lifecycle controls** - Per-repo daemon manages a single repo; DB/index handles close on shutdown. - Fingerprinted per-repo state under `state/repos/<fingerprint>/` ensures isolation; cross-repo access is rejected early when repo id/path missing. - Daemon must return clear errors when repo context is missing or invalid. **Browser and web fetch controls** - Headless browser guarded by a lifecycle manager: bounded concurrency (configurable, tied to `[core].max_concurrent_fetches`/web scraper settings), per-page load timeout (default 15s), and teardown to avoid zombie processes; locks directory used to serialize guard state. - Browser discovery supports Chrome/Chromium/Edge/Brave/Vivaldi on macOS/Windows; Playwright auto-installs Chromium when none is found and persists the resolved path. - Discovery rate limits: DuckDuckGo HTML queries spaced ≥2s; fetch rate limit ≥1 req/sec per domain; backoff on HTTP errors; respect blocklist; cache reused to avoid redundant fetches. - Scraper uses readability cleanup; caches raw HTML \+ cleaned JSON under `cache/web/` with TTL from config. **Memory/CPU budgets** - Idle daemon memory target \<100MB; indexing \<1GB (configurable). Hardware awareness informs model recommendations but does not change control surfaces here. - Token budgeting enforces context mix priority (Memory \> Repo \> Library/Web) to prevent overflow; lowest-priority snippets dropped first with logging. **Concurrency surface interactions** - Waterfall orchestrator must honor `web_trigger_threshold` gate; web tier only invoked on low-confidence or explicit request, reducing unnecessary Chrome usage. - MCP/CLI require repo selection; HTTP defaults to the daemon repo; prevents accidental cross-repo operations. **Observability hooks** - `docdexd check` validates Chrome availability, repo registry health, and config/state RW. - Logging for rate-limit throttling, Chrome guard actions, and token-dropping decisions; metrics collection beyond logs not requested in PDR. **Security posture** - Localhost bind by default; exposing (`--expose`) requires token auth. Resource controls apply regardless of exposure; no telemetry or paid API usage. **DevOps/scalability** - No clustered multi-tenant mode; scale by running per-repo daemons and tuning fetch concurrency. State layout must remain upgrade-safe; Ollama installs remain prompt-based and Playwright browser installs are opt-out. **Assumptions** - Config provides knobs for fetch concurrency, rate delays, and timeouts; defaults match PDR. - Cache directories are writable and shared across repos but ingestion remains per-repo to avoid cross-contamination. **Open Questions & Risks** - Should per-repo daemons expose more detailed resource telemetry for tuning? - How to surface rate-limit/backoff status to clients (HTTP/MCP error codes vs. logs only)? - Risk: misconfigured Chrome path or permissions could bypass guard and leave zombies; mitigation relies on `check` coverage. **Verification Strategy** - Unit/integration: Repo Manager handles concurrent access across per-repo daemons; verify handles closed cleanly. - Rate-limit tests: enforce ≥2s DDG spacing and ≥1 req/sec per domain; ensure cache hits skip delays. - Browser guard tests: spawn/fetch/timeout cycles without orphaned Chrome processes. - Token budgeting tests: confirm lower-priority snippets are dropped first with logs emitted. - `docdexd check`: validates Chrome, repo registry, state RW, and config defaults. ### Security and Privacy Docdex enforces local-first, zero-cost operation with explicit controls when the daemon is exposed. Security posture is intentionally minimal: bind to localhost by default, require a token if remote exposure is enabled, and avoid any telemetry or paid/cloud dependencies. - **Network exposure**: `docdexd` binds to `127.0.0.1:3210` by default. Running with `--expose` (or equivalent config) requires a token; HTTP and MCP requests must present it or are rejected. No multi-tenant daemons; one daemon per repo. - **Authentication & authorization**: Single shared bearer-style token validated on all HTTP/MCP endpoints when exposed. No role model or per-repo ACLs in scope; all authorization is coarse-grained (token holder \= allowed). Token configured via env/config; no additional identity providers. - **Data residency & locality**: All inference, embeddings, search, and state are local by default; no telemetry. Only zero-cost/open components (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec, Tree-sitter) are permitted. Web access is gated (confidence-based or explicit) and cached locally. No cloud/vector DBs, no paid APIs. - **Repo isolation**: Per-repo state under `~/.docdex/state/repos/<fingerprint>/` (indexes, memory.db, symbols.db, dag.db, impact_graph.json, libs\_index). Global caches (`cache/web`, `cache/libs`) are shared storage but ingested per repo without cross-contamination. - **Process/browsing safeguards**: Headless Chrome guarded with locks and lifecycle checks to avoid zombie processes; rate limits enforced to reduce abuse risk. Locks directory under state for browser/process guards. - **Configuration defaults**: Auto-created config favors privacy: localhost bind, Ollama provider, MCP enabled locally. Warnings if LLM provider differs from Ollama. Ollama installs remain prompt-based; Playwright browser installs are opt-out and can run via setup/auto-install. - **Logging/observability**: PDR does not request telemetry; assume minimal local logs only. No remote log shipping described. Optional state logs via `DOCDEX_LOG_TO_STATE=1` write to `~/.docdex/state/logs/docdexd-<pid>.log`. - **Dependencies**: Open-source/local-only; no paid keys. DuckDuckGo HTML for discovery; local Chrome for scraping; Ollama for LLM/embeddings. Open Questions & Risks - Should TLS be supported/required when `--expose`? Not specified. - Token storage hardening (file perms, rotation) not defined. - No per-repo auth/ACLs; token grants full access—acceptable given PDR scope? - How to handle malformed/absent tokens on MCP surface (HTTP semantics vs MCP error codes)? - Browser sandboxing/SELinux/AppArmor not described; potential hardening gap. Verification Strategy - `docdexd check` confirms localhost bind (unless `--expose`), token requirement when exposed, and absence of telemetry/cloud calls. - Automated tests to assert all API/MCP calls fail without token when exposed and succeed with valid token. - Tests to confirm no network egress occurs for local-only operations; web access only when triggered and cached locally. - Repo isolation tests: ensure per-repo state separation without cross-repo leakage. - Chrome guard tests: locks and cleanup prevent zombie processes. ### Observability and Health Docdexd must surface readiness and dependency health so operators can trust local-first behavior without external telemetry. Observability centers on explicit checks and guarded lifecycles (especially headless Chrome), with clear failure surfaces instead of silent degradation. **Operational Health Model** - `docdexd check` runs at install/startup or on demand; validates config/state RW, Ollama reachability/models, Chrome availability, repo registry integrity, HTTP bind, MCP enablement, and enforces local bind unless `--expose` is set. Output: human-readable failures \+ actionable hints; non-zero exit on any failed prerequisite. - Dependency validation: confirms provider-specific dependencies (Ollama binary/models when configured), confirms Chrome binary present/launchable in headless mode with page timeout guard; warns (not fails) if the provider is unknown. - Browser guard: lifecycle locks under `state/locks/`; enforces teardown on exit/panic; caps concurrent Chrome sessions; rejects new sessions when caps hit with clear error; ensures no zombie Chrome processes remain post-operations. - Repo readiness: `check` verifies per-repo state existence and permissions; unknown/unindexed repos return clear errors across CLI/HTTP/MCP. - Waterfall guardrails: confidence gating before web fetch; respects rate limits; logs when escalations occur and when token budget forces snippet dropping (memory \> repo \> libs/web). - Security posture: `check` confirms default bind `127.0.0.1` and token presence when `--expose` is used; rejects unauthenticated remote calls. **Observability Surfaces** - Logs: structured, human-readable; emit at least INFO for readiness and dependency checks, WARN for degradations (e.g., missing optional models), ERROR for failed prerequisites. No telemetry export; local only. - Error surfaces: CLI/HTTP/MCP return repo-scoped, explicit messages (missing repo/index, missing models, offline web, Chrome not available, cap reached). No new interfaces beyond existing CLI/HTTP/MCP. - Metrics/tracing: not requested in PDR; intentionally out of scope. **Operations and Reliability** - Start-up gate: daemon fails fast if `check` prerequisites are not met, preventing partial service. - Resource discipline: bounded Chrome concurrency with timeouts; rejects work rather than hang. - Offline-first: web dependence is optional; failures in web tier degrade gracefully to local responses with logs explaining the fallback. **Assumptions** - Operators run `docdexd check` during install/startup CI; logs retained locally. - No external monitoring/telemetry is added beyond logs; acceptable per PDR. - Locks directory is available and writable under `global_state_dir`. **Open Questions & Risks** - Should `check` optionally auto-clean zombie Chrome processes on detection vs. only fail? - What is the default cap for concurrent Chrome sessions, and should it align with `max_concurrent_fetches`? - Failure mode when `global_state_dir` is on slow/remote FS—do we warn or fail? **Verification Strategy** - Automated `docdexd check` must fail on missing Chrome/Ollama/model, missing locks dir, or unwritable state; inspect exit codes and log lines. - Intentional Chrome crash test: confirm guard cleans up processes/locks. - Concurrency test: exceed Chrome session cap and verify clear error and no zombie processes. - Repo error handling: request with unknown repo over CLI/HTTP/MCP returns explicit, repo-scoped error. - Web tier failure injection (network disabled): ensure local tier responds with logged WARN, not crash. ### Configuration Management Configuration ensures `docdexd` starts with safe, local-first defaults, validates writable state paths, and guides hardware-aware model choices without adding new surfaces. Defaults and creation - Global config `~/.docdex/config.toml` auto-created on first run with localhost bind, Ollama-only LLM settings, and default thresholds (e.g., `web_trigger_threshold=0.7`). - State root `~/.docdex/state/` structured as in PDR (per-repo `index`, `libs_index`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`; shared `cache/web`, `cache/libs`, `locks`). Paths derived from SHA256 of normalized repo path; reject any path not under the fingerprinted root. Validation and safeguards - On startup and via `docdexd check`, validate `global_state_dir` readability/writability, per-repo RW on demand, and that bindings remain on `127.0.0.1` unless explicitly exposed with token auth. - Emit warning if `[llm].provider` is unrecognized or missing required settings; non-Ollama providers are permitted when configured. - Validate HTTP bind address format, MCP enablement flag, and that Ollama base URL is reachable when configured. - Ensure scraper/chrome settings exist but only report availability here; full browser lifecycle is covered elsewhere. Hardware-aware model recommendations - Detect RAM/VRAM at `llm-list`/`llm-setup` time; filter `llm_list.json` per thresholds: RAM \<8GB → ultra-light only; ≥16GB → default `phi3.5:3.8b`; ≥32GB with GPU → recommend `llama3.1:70b` if installed. - Never auto-install models silently; only suggest pulls and update `[llm]` defaults upon explicit confirmation. Scope boundaries - No config is stored per repo beyond fingerprinted state layout; cross-repo/global overrides out of scope per PDR. - No telemetry, paid providers, or cloud fallbacks introduced. Open Questions & Risks - How to surface granular permission errors on `global_state_dir` vs per-repo dirs without leaking host paths? - What is the precise failure mode if the configured provider is unreachable (block startup vs warn)? - Risk: misconfigured `--expose` without token enforcement; ensure config validation blocks this. Verification Strategy - Unit tests for config parsing/creation with missing file → defaults applied. - Integration test: `docdexd check` fails on non-writable `global_state_dir`; warns on non-Ollama provider. - Hardware detection tests: model recommendations align with RAM/VRAM thresholds and do not trigger pulls. ## Quality, Testing, and Risks {#quality,-testing,-and-risks} Docdex v2.0 quality is enforced through phase-gated validation tied to local-first, zero-cost constraints and per-repo daemon isolation. Each gate proves the daemon can safely serve a repo with correct scoping, guarded web escalation, and required local dependencies (Ollama, headless Chrome) before advancing. **Phase Gates** - Phase 0: `docdexd check` validates config RW, state layout, Ollama/Chrome presence, repo registry, localhost bind, MCP enabled. - Phase 1: `index --repo` builds per-repo source index; `chat --repo` answers from local snippets; `llm-list`/`llm-setup` functional with hardware-aware model guidance. - Phase 2: Waterfall uses `web_trigger_threshold`; `web-search`, `web-fetch`, `web-rag` operate with DDG ≥2s spacing, ≥1s fetch delay, Chrome guarded/cleaned. - Phase 2.1: `libs fetch --repo` detects Rust/Node/Python deps, caches, ingests into repo `libs_index`; chat grounded on ingested docs. - Phase 3/3.5: `/v1/chat/completions` defaults to daemon repo, budgets tokens, streams; `memory_store/recall` isolated per repo. - Phase 4: DAG nodes logged per session; `dag view --repo` renders text/DOT. - Phase 5: Per-repo MCP server exposes repo-aware tools; errors clearly on unknown/unindexed repo. - Phase 6: Symbols populated; impact API returns deps; `run-tests --repo` emits structured JSON; diff-aware summary produced. - Phase 7: TUI repo switch, dashboard tabs, VSCode extension always passes `repo_path`. **Test Coverage Focus** - Isolation: per-repo state dirs under concurrency; reject missing/unknown repo; no cross-repo memory/index bleed. - Local-first/no cost: no paid/external APIs beyond gated web; default localhost bind; token auth required when exposed. - Waterfall correctness: confidence gate honored; source priority Memory \> Repo \> Library/Web; token budgeting enforces drop order with logging. - Scraper safety: DDG spacing/backoff, fetch delay, cache TTL, readability cleanup, Chrome lifecycle guards, zero zombie processes. - Library ingestion: dependency detection for Rust/Node/Python; cache reuse; libs treated as Tier-1. - Performance: local search p95 \<50ms (\<20ms typical); resource caps respected (Chrome concurrency). - Security: token required when `--expose`; reject unauthenticated HTTP/MCP calls; clear errors for missing models/Chrome. **Open Questions & Risks** - How to simulate adverse network (DDG throttling) within CI to validate backoff logic? - Resource ceilings per hardware tier (RAM/VRAM) for concurrent repos beyond defaults need confirmation. - Potential drift between CLI and HTTP/MCP repo routing semantics; need contract tests. - Risk of token budget mis-sizing for large memory \+ libs contexts; requires guardrails and logging thresholds. - Handling of partial dependency graphs (non-Rust/Node/Python ecosystems) not defined. **Verification Strategy** - Gate-by-gate acceptance using the Phase list above; block progression on failure. - Automated integration tests per repo for: isolation, waterfall gate behavior, token auth when exposed, dependency detection and libs ingestion, Chrome lifecycle under stress. - Performance benchmarks for local search latency and resource usage across 8+ concurrent repos. - Fault injection: simulate missing Ollama/model, missing Chrome, slow/banned DDG responses. - Security checks: enforce repo selection on all surfaces; ensure no paid API calls; localhost bind by default; token required when exposed. ### Phase Gates {#phase-gates} Docdex advances through gated phases; each gate requires the preceding functionality to be demonstrably ready before enabling downstream features (local RAG → web waterfall → libs ingestion → unified API/memory → DAG → MCP → code intelligence → UI surfaces). Gates emphasize repo isolation, offline-first defaults, and deterministic promotion criteria. **Gate Criteria by Phase** - Phase 0 (Foundation): `docdexd check` passes; config/state RW validated; Ollama and headless Chrome availability confirmed. - Phase 1 (Local RAG/Chat): `index --repo` builds per-repo Tantivy index; `chat --repo` serves answers from local snippets; `llm-list` hardware detection and `llm-setup` guidance functional. - Phase 2 (Web Intelligence): Waterfall only triggers web when local confidence \< `web_trigger_threshold` or forced; `web-search`, `web-fetch`, `web-rag --repo` operate with DuckDuckGo HTML, headless Chrome, readability cleanup, enforced rate limits, and Chrome guard (no zombies). - Phase 2.1 (Library Context): `libs fetch --repo` detects Rust/Node/Python deps, resolves docs URLs, scrapes, caches under `cache/libs`, ingests into per-repo `libs_index`; chat answers grounded in cached library docs. - Phase 3/3.5 (Unified API \+ Memory): `/v1/chat/completions` defaults to the daemon repo (body/header/query optional), budgets tokens, streams via Ollama; per-repo `memory_store/recall` on `memory.db` with sqlite-vec embeddings; memory prioritized in context merge. - Phase 4 (Reasoning DAG): Per-repo `dag.db` logging UserRequest/Thought/ToolCall/Observation/Decision; `dag view --repo <session_id>` renders text/DOT. - Phase 5 (MCP): Per-repo MCP server exposes repo-aware tools (`docdex_search`, `docdex_web_research`, `docdex_memory_save/recall`); unknown/unindexed repo yields clear error. - Phase 6 (Code Intelligence): Tree-sitter symbols for Rust/TypeScript/JavaScript/Python/Go/Java/C#/C/C++/PHP/Kotlin/Swift/Ruby/Lua/Dart stored in `symbols.db`; import graph impact API `GET /v1/graph/impact?file=` returns schema-tagged inbound/outbound deps with explicit edge direction semantics; `run-tests --repo --target` returns structured JSON; diff-aware RAG uses git diff \+ impact graph \+ memory. - Phase 7 (UI Surfaces): TUI repo switcher via external `docdex-tui` binary; web dashboard + VSCode extension live in separate packages but target `/v1/chat/completions` and MCP, always passing `repo_path`. **Scalability/Reliability/Security Notes** - Scalability: Parallel repo operations come from multiple per-repo daemons; web tier rate limits and cache reuse guard against DDG/Chrome overload. - Reliability: Browser guard lifecycle verified before web gate; token budgeting must prevent context overflow before unified API gate. - Security: Default localhost bind enforced at each gate; `--expose` requires token; repo selection required on every surface; no paid/cloud calls. **Observability/DevOps** - Minimal logs: gate checks log failures for config, model presence, Chrome readiness, repo availability. Additional observability not requested in PDR. **Assumptions** - Local-only execution unless web fallback explicitly triggered or confidence gate trips. - Cached library docs treated as Tier-1; global caches reused but ingested per repo to maintain isolation. **Open Questions & Risks** - What is the precise policy for handling partial gate failures (e.g., Chrome unavailable but local RAG passes)? Promote with warnings or block? - How to surface rate-limit/backoff state to operators during web gate validation? - Risk: multiple per-repo daemons could contend for shared caches or Chrome resources; needs test coverage. - Risk: VSCode extension must reliably pass `repo_path`; missing arg could bypass repo scoping. **Verification Strategy** - Automated `docdexd check` for Phase 0; repeated on daemon startup. - CLI acceptance per gate: index/chat (Phase 1), web-search/fetch/rag with rate-limit assertions (Phase 2), libs fetch with fixture deps (Phase 2.1), HTTP `/v1/chat/completions` and memory CRUD (Phase 3/3.5), `dag view` snapshots (Phase 4), MCP tool calls with and without valid repo (Phase 5), symbols \+ impact API \+ `run-tests` structured output (Phase 6), UI smoke tests for repo selection and chat wiring (Phase 7). - Regression checks for repo isolation and no cross-repo data bleed at each gate. ### Test Coverage Focus {#test-coverage-focus} Docdex v2.0 testing targets risk hot-spots: per-repo daemon isolation under concurrency, strict local-first behavior, waterfall gating correctness, scraper safety, and security posture tied to repo selection. - Scope/Intent: Validate that phase-gated behaviors enforce local sovereignty and repo correctness; focus on concurrency isolation, gated web escalation, and secure surfaces (HTTP/MCP/CLI). No additional components beyond PDR surfaces. - Coverage Priorities: - Repo isolation: concurrent `docdexd` operations across ≥8 repos; ensure per-repo state (`index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`) stays isolated. - Local-first behavior: daemon binds 127.0.0.1 by default; no paid/external APIs; Ollama-only inference; confirm `--expose` path enforces token auth. - Waterfall gating: Tier-1 local search preferred; escalation only when confidence \< `web_trigger_threshold` or explicitly forced; token budget priority (Memory \> Repo \> Library/Web) preserved. - Scraper safety: DuckDuckGo discovery delay ≥2s; fetch delay ≥1s/domain; Chrome lifecycle guarded (no zombies); cache reuse honored. - Security with repo-required parameters: CLI/MCP require repo id/path; HTTP defaults to daemon repo; unknown/unindexed repos return clear errors; `--expose` requires token on HTTP/MCP. - Phase gates: Phase 0–7 readiness checks per PDR; each gate blocks progression until preceding behaviors validated. - Out of Scope: New surfaces/tech not in PDR; cloud telemetry/tests. Open Questions & Risks - How to simulate prolonged multi-repo churn across per-repo daemons without exceeding resource caps? (needs harness definition) - Token auth test vectors for `--expose` not specified (strength/format). - Headless Chrome availability in CI and deterministic timing for rate-limit tests. - Confidence scoring fixtures for waterfall gate (ground truth/threshold tuning). Verification Strategy - Concurrency isolation: parallel `index`, `chat`, `libs fetch` across per-repo daemons; assert per-repo fingerprints and DB/index handles close cleanly. - Local-first/security: `docdexd check` under `--expose` with/without token; assert bind address defaults; ensure no external paid calls (mock/deny outbound). - Waterfall gating: instrument confidence scores; assert no web tier when score ≥ threshold; forced escalation path validated; token budget ordering via trace logs. - Scraper safety: timed DDG/search/fetch calls with enforced delays; Chrome lifecycle hooks verified for teardown; cache hit/miss cases covered. - Repo-required parameters: negative tests for missing/unknown repo across CLI/HTTP/MCP; expect clear error codes/messages. ### Risks and Mitigations {#risks-and-mitigations} Architectural intent: enforce local-first, per-repo daemon invariants while preventing runaway processes, web throttling issues, context mis-budgeting, resource exhaustion, missing dependencies, hallucinated outputs, and unintended exposure. Controls attach to the Repo Manager, Waterfall orchestrator, ScraperEngine, config validator (`docdexd check`), and auth/binding logic; no new surfaces beyond PDR. - **Zombie Chrome**: ScraperEngine runs headless Chrome under a guarded lifecycle with lockfiles in `~/.docdex/state/locks/`; start/stop wrapped to ensure teardown on exit/panic; `docdexd check` asserts Chrome availability and stale process absence. - **DuckDuckGo throttling**: DiscoveryService enforces ≥2s between DDG searches and ≥1s fetch delay per domain; blocklist applied; caches reused (`cache/web`); HTTP error backoff before retry. - **Context overflow**: Waterfall prompt assembler performs token budgeting (10% system, 20% memory, 50% repo/libs/web, 20% generation buffer). Fixed priority ordering (Memory \> Repo \> Library/Web); lowest-priority snippets dropped first with logging for traceability. - **Resource exhaustion (browser)**: ScraperEngine bounds concurrent Chrome sessions; clear errors when caps reached instead of silent degradation. - **Missing dependencies (Ollama/Chrome/models)**: `docdexd check` validates `ollama` availability, model presence, Chrome binary/path, and RW on `global_state_dir`. `llm-setup` offers guided install/pull instructions; no cloud fallback permitted. - **Hallucinated APIs**: Library docs must be ingested via `libs fetch --repo`; prompts instruct model to rely on indexed repo/libs; Waterfall only escalates to web when below `web_trigger_threshold` or explicitly forced. - **Security exposure**: HTTP/MCP bind to `127.0.0.1` by default; `--expose` requires token in config/env and is checked per request. No telemetry or paid APIs; reject unknown repo ids to avoid cross-repo leakage. Open Questions & Risks - How to detect and clean zombie Chrome when lockfiles are present but the PID is reused by another process? (needs precise PID/PPID validation) - Should DDG backoff escalate to disabling web tier for the session after repeated 429s to prevent bans? - What is the policy when token budgeting repeatedly drops memory context—log-only or user-visible warning? - Failure mode if `global_state_dir` is on a slow/readonly FS: degrade gracefully or block startup? Verification Strategy - `docdexd check`: validate Chrome process guard, lock directory health, RW perms, Ollama reachability/models, and config correctness. - Rate-limit tests: simulate rapid DDG queries and assert delays/backoff and cache hits. - Token budgeting tests: crafted large context ensuring Memory \> Repo \> Library/Web ordering and logged drops. - Resource cap tests: repeated start/stop cycles confirm handle closure; exceed Chrome concurrency to ensure bounded queue/error. - Security tests: bind exposure requires token; reject unknown repo ids; verify no telemetry or paid API calls.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bekirdag/docdex'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

sds.md•119 KiB