# System Design Specification
*Generated by gpt-creator create-sds*
## Table of Contents
- [1 Architecture Overview](#architecture-overview)
- [1.1 Operating Principles](#operating-principles)
- [1.2 Waterfall Retrieval Model](#waterfall-retrieval-model)
- [1.3 Repo Isolation Model](#repo-isolation-model)
- [1.4 Hardware Awareness](#hardware-awareness)
- [2 Core Components](#core-components)
- [2.1 Config and State Manager](#config-and-state-manager)
- [2.2 Repo Manager](#repo-manager)
- [2.3 Indexing and Search](#indexing-and-search)
- [2.4 Waterfall Orchestrator](#waterfall-orchestrator)
- [2.5 Web Discovery and Scraping](#web-discovery-and-scraping)
- [2.6 LLM and Embeddings](#llm-and-embeddings)
- [2.7 Memory and Reasoning DAG](#memory-and-reasoning-dag)
- [3 Data Management and Storage](#data-management-and-storage)
- [3.1 Directory and Fingerprint Layout](#directory-and-fingerprint-layout)
- [3.2 Schemas and Indexes](#schemas-and-indexes)
- [3.3 Caching Strategy](#caching-strategy)
- [4 Interfaces and Integrations](#interfaces-and-integrations)
- [4.1 CLI Commands](#cli-commands)
- [4.2 HTTP API](#http-api)
- [4.3 MCP Server](#mcp-server)
- [4.4 Local Dependencies](#local-dependencies)
- [5 Runtime, Deployment, and Operations](#runtime,-deployment,-and-operations)
- [5.1 Daemon Lifecycle and Binding](#daemon-lifecycle-and-binding)
- [5.2 Resource and Concurrency Controls](#resource-and-concurrency-controls)
- [5.3 Security and Privacy](#security-and-privacy)
- [5.4 Observability and Health](#observability-and-health)
- [5.5 Configuration Management](#configuration-management)
- [6 Quality, Testing, and Risks](#quality,-testing,-and-risks)
- [6.1 Phase Gates](#phase-gates)
- [6.2 Test Coverage Focus](#test-coverage-focus)
- [6.3 Risks and Mitigations](#risks-and-mitigations)
## Architecture Overview {#architecture-overview}
Docdex v2.0 runs a per-repo local-first daemon (`docdexd serve`) and also supports a singleton daemon (`docdexd daemon`) for multi-repo mounting. Per-repo daemons expose HTTP APIs and (optionally) stdio MCP; the singleton exposes HTTP APIs plus shared MCP over HTTP/SSE. The design aims to keep all inference and retrieval local by default, escalating to gated web enrichment only when confidence drops.
- **Core surfaces**: per-repo HTTP endpoint set (OpenAI-compatible chat) plus shared MCP over HTTP/SSE for the singleton daemon (legacy per-repo stdio MCP remains); CLI is a thin client to the daemon. No additional surfaces are introduced in this section beyond shared MCP transport.
- **Repo Manager**: normalizes repo paths, fingerprints via SHA256, lazily initializes per-repo state (Tantivy indexes, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`, `libs_index`), and ensures handle closure on shutdown.
- **Waterfall retrieval** (per repo): Tier 1 local indexes (source \+ libs), Tier 2 zero-cost web discovery/fetch (DuckDuckGo HTML \+ guarded headless Chrome), Tier 3 local cognition/memory (Ollama chat/embeddings, sqlite-vec memory). Cached library docs are treated as local within Tier 1\.
- **Context assembly**: fixed priority Memory → Repo Code → Library/Web; token budget roughly 10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer. Budgeting happens before Ollama calls.
- **Isolation model**: per-repo state under `~/.docdex/state/repos/<fingerprint>/`; global caches (`cache/web`, `cache/libs`) are reused but ingested per repo. CLI/MCP require explicit repo id/path; HTTP uses the daemon repo by default and validates any provided repo id/path.
- **Hardware awareness**: daemon detects RAM/VRAM to recommend or constrain Ollama models (e.g., \<8GB ultra-light; ≥16GB default `phi3.5:3.8b`; ≥32GB \+ GPU suggests `llama3.1:70b` if present). No silent auto-install of Ollama/models; npm postinstall may prompt and installs only on explicit confirmation.
- **Security posture**: binds to `127.0.0.1` by default; `--expose` demands token auth on HTTP, and MCP enforces `auth_token` when configured. No telemetry or paid/cloud services.
- **Scalability & reliability (per PDR scope)**: targets ≥8 concurrent repos by running separate per-repo daemons; local search p95 \< 50ms (\<20ms typical). Browser guard prevents zombie Chrome; web rate limits (≥2s DDG, ≥1s fetch) mitigate bans.
- **Out-of-scope (per section)**: new surfaces, cloud/vector backends, cross-repo memory, clustered/multi-tenant daemon topologies are explicitly excluded.
**Open Questions & Risks**
- Confirm shutdown behavior for active sessions: should in-flight requests be drained or rejected?
- How to handle simultaneous web-trigger requests across repos within rate limits without head-of-line blocking?
- Risk: confidence gating (`web_trigger_threshold` default 0.7) may under-trigger web enrichment for sparse repos.
**Verification Strategy**
- Run `docdexd check` to validate config, state perms, Ollama, Chrome, repo registry, bind configuration.
- Concurrency tests across per-repo daemons under ≥8 repos; ensure handle closure and no cross-repo leakage.
- Latency benchmarks: local search p95 \< 50ms and typical \<20ms on representative repos.
- Waterfall tests: force low-confidence queries and assert escalation order and rate-limit compliance.
- Security checks: ensure localhost bind by default and token required when `--expose` is set; reject invalid repo ids when supplied.
### Operating Principles {#operating-principles}
Local-first, per-repo daemon discipline governs all decisions: `docdexd` serves HTTP and MCP for a single repo, defaulting to offline behavior and zero paid components. Web access is a gated fallback on confidence drop or explicit user demand. All operations are repo-scoped; no cross-repo state or memory. Privacy and cost ceilings drive dependency choices (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec, Tree-sitter); any cloud/paid API is out of scope.
**Repo scoping and isolation**
- CLI and MCP calls must include repo id/path; HTTP calls default to the daemon repo and validate any provided repo id/path.
- Per-repo state lives under fingerprinted directories; caches are global but ingestion is repo-local to prevent bleed.
- LRU eviction is not required for per-repo daemons; resource caps apply per repo.
**Waterfall retrieval discipline**
- Tier 1: local indexes (source \+ libs) are always tried first; cached library docs count as local.
- Tier 2: web discovery/fetch only if top local score \< `web_trigger_threshold` or user forces it; DDG HTML discovery with ≥2s spacing, per-domain fetch delay ≥1s, guarded headless Chrome with readability.
- Tier 3: local cognition/memory via Ollama embeddings/chat; memory prioritized in context assembly over repo code, then library/web.
**Security and privacy defaults**
- Bind HTTP/MCP to `127.0.0.1`; `--expose` requires token auth on HTTP requests, and stdio MCP expects `auth_token` in `initialize` when configured.
- No telemetry, no paid keys; compliance demands open-source dependencies only.
- Browser lifecycle guarded; locks directory to prevent zombie Chrome processes.
**Resource discipline and hardware awareness**
- RAM/VRAM detection guides model recommendations (≤8GB: ultra-light; ≥16GB: `phi3.5:3.8b` default; ≥32GB \+ GPU: `llama3.1:70b` if installed).
- Bounded Chrome concurrency and per-domain rate limits; clear errors when caps are hit.
**Out of scope (per PDR)**
- Cross-repo memory/indexing, clustered/multi-tenant daemon deployments, telemetry, paid/cloud APIs.
Open Questions & Risks
- Do we need configurable backpressure when multiple per-repo daemons spike web fetches simultaneously?
- What is the exact failure mode when `--expose` token is missing or malformed—HTTP status and body contract?
- Risk: DDG rate limiting/IP bans despite spacing; may need backoff tuning.
- Risk: Model recommendation accuracy on heterogeneous hardware (e.g., eGPU, shared RAM GPUs).
Verification Strategy
- Unit/integration: enforce repo-required flag on CLI/MCP; HTTP defaults to daemon repo; reject unknown repo ids.
- Concurrency tests: parallel per-repo daemon access; ensure handle closure.
- Waterfall tests: trigger web only below threshold; verify delays and cache reuse; assert Chrome teardown.
- Security tests: localhost bind by default; token required when exposed; no external calls without explicit trigger.
- Performance checks: local search p95 \<50ms, typical \<20ms; memory/token budgeting honors priority order.
### Waterfall Retrieval Model {#waterfall-retrieval-model}
Architectural intent: tiered retrieval that stays local by default, escalates to zero-cost web enrichment only on low confidence or explicit request, and finally leverages local cognition/memory; all operations are repo-scoped with strict isolation.
Components and flow
- Tier 1 Local: Tantivy source index plus per-repo `libs_index`; cached library docs are treated as local. BM25 search with optional local rerank. Provides score used for gating.
- Tier 2 Web (fallback): DuckDuckGo HTML discovery (≥2s between searches, blocklist) → headless Chrome fetch with readability (≥1s per-domain delay, page timeout \~15s) → cache HTML/cleaned JSON under `cache/web` → ingest into repo context as needed. Guarded browser lifecycle to avoid zombies; locks under `~/.docdex/state/locks`.
- Tier 3 Cognition/Memory: Local Ollama for chat/embeddings; per-repo `memory.db` (sqlite-vec) prioritized in context assembly; DAG logging per session.
- Gating logic: If top local score ≥ `web_trigger_threshold` (default 0.7), stay in Tier 1; otherwise escalate to Tier 2 or when explicitly forced by user. Context assembly priority: Memory → Repo Code → Library/Web; token budget approx 10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer.
- Surfaces using waterfall: CLI (`chat --repo`, `web-rag --repo`), HTTP `/v1/chat/completions` (defaults to daemon repo; repo id optional), MCP tools (all require repo). Repo Manager enforces repo selection and isolation throughout.
- Data contracts (implied from PDR): search results carry score \+ snippet \+ source path; web fetch outputs cleaned text plus metadata (url, fetched\_at, cache key); memory rows carry `id, content, embedding, created_at, metadata`. No additional schemas beyond stated.
Scalability, reliability, security, observability, DevOps
- Scalability: Target local search p95 \<50ms; typical \<20ms. Scale by running per-repo daemons.
- Reliability: Browser guard to prevent zombie Chrome; clear errors on missing index/repo/models; fallback only on confidence drop to avoid unnecessary web calls.
- Security/Privacy: Offline-by-default; web only on gated escalation or explicit request. HTTP defaults to daemon repo; MCP requires repo selection; daemon binds 127.0.0.1 by default; `--expose` requires token auth.
- Observability: Not requested in PDR; expect logs for gating decisions, web escalations, and token budgeting drops.
- DevOps: No paid services; Ollama and Chrome are external dependencies validated via `docdexd check`. Cache reuse reduces repeated web fetches.
Assumptions
- BM25 search is sufficient for Tier 1 initial ranking; rerank is optional/local only.
- `web_trigger_threshold` default 0.7 is configurable; same threshold used across surfaces unless overridden.
- Browser availability and Ollama models are handled by setup flows; Playwright auto-install is opt-out and controlled by config/flags.
Open Questions & Risks
- Should Tier 2 be skipped entirely when offline is enforced by config/flag even if confidence is low? (risk: bad answers vs policy violation)
- How is rerank configured/enabled per repo or globally? (missing config detail)
- Cache eviction/TTL for `cache/web` not specified; risk of stale or unbounded cache.
- What is the exact backoff strategy on repeated web fetch failures beyond rate limits?
Verification Strategy
- Unit/integration: gating logic around `web_trigger_threshold`; ensure explicit force bypasses gate.
- Functional: Tier 1 latency benchmarks (\<50ms p95), Tier 2 rate-limit observance, Chrome guard tested for zombie-free teardown.
- End-to-end: `chat --repo` and `/v1/chat/completions` across tiers with correct repo scoping and context ordering (Memory → Repo → Library/Web).
- Negative tests: offline mode with forced web request returns clear error; missing index/repo/model paths produce expected failures.
### Repo Isolation Model {#repo-isolation-model}
Architectural intent: enforce strict per-repo scoping for all state, indexes, memory, and DAG data so multiple per-repo daemons can serve multiple repos without cross-contamination.
Design
- Repo identity: normalized repo path → SHA256 fingerprint; fingerprint is the sole key for on-disk state under `~/.docdex/state/repos/<fingerprint>/`.
- Per-repo state dirs: `index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`; all opened lazily on first access.
- Global/shared caches: `cache/web/` (HTML \+ cleaned JSON) and `cache/libs/<ecosystem>/<pkg>/`; reused across repos but ingested into per-repo indexes only on demand to avoid bleed.
- Repo Manager: maintains registry (path ↔ fingerprint); unknown/unindexed repos return clear errors.
- Access contract: CLI uses `--repo`; MCP tools require `project_root`/`repo_path` (unless `initialize` sets a default) and enforce it matches the MCP server repo; HTTP defaults to the daemon repo and validates any provided repo id/path. MCP server is per-repo; tools are repo-parameterized.
- Concurrency: multiple repos are served by running multiple per-repo daemons; operations within a repo serialize per underlying DB/index constraints.
- Security/privacy: data never leaves repo scope; no cross-repo memory/DAG queries; bound to 127.0.0.1 by default with optional token when exposed. No telemetry.
- Observability: not requested in PDR.
- Scalability/reliability: target ≥8 concurrent repos via multiple per-repo daemons; idle daemon memory target \<100MB; clear errors on missing repo/index.
- DevOps: state layout must remain stable across upgrades; `docdexd check` validates RW permissions and registry integrity.
Assumptions
- Fingerprint is deterministic on normalized absolute path; moving a repo changes fingerprint (requires re-index) unless a future alias/relocation map is added.
- Per-repo daemons do not evict repos in-process.
Open Questions & Risks
- How should repo path moves/renames be handled without re-index? (Out of current scope.)
- Handling repo path moves/renames without re-index? (Out of current scope.)
- Race conditions on rapid open/close cycles under load; need tests.
Verification Strategy
- Unit/integration: Repo Manager handles concurrent open/close; errors on unknown/unindexed repo.
- State isolation tests: ensure no cross-repo reads/writes for indexes, memory, DAG, libs ingestion.
- Config validation: `docdexd check` confirms registry and state RW.
- Load tests: ≥8 concurrent repos operations without bleed across per-repo daemons.
### Hardware Awareness {#hardware-awareness}
Docdexd detects host RAM and (when present) GPU VRAM to guide Ollama model recommendations and default selection, keeping inference local and resource-safe. This logic informs CLI commands (`docdexd llm-list`, `docdexd check`, `docdexd llm-setup`) but does not introduce new APIs beyond what is already defined.
- **Detection scope**: Read total system RAM; detect GPU presence and VRAM when available. No other hardware signals are in scope per PDR.
- **Threshold policy (from PDR)**: RAM \<8GB → recommend ultra-light; ≥16GB → default `phi3.5:3.8b`; ≥32GB \+ GPU → suggest `llama3.1:70b` if installed. Keep decisions advisory; do not auto-download/install without explicit confirmation.
- **Integration points**:
- `docdexd llm-list` runs detection, loads `llm_list.json`, filters, and outputs recommendations.
- `docdexd llm-setup` reuses detection to suggest pulls and update `[llm]` defaults in config; must honor offline-first (no automatic network installs).
- `docdexd check` reports hardware/readiness (Ollama reachability, models present) and should warn when the configured default exceeds detected capacity.
- **Configuration behavior**: `[llm]` default\_model should be validated against detected capacity; emit warnings, not hard failures, when oversized. No additional config keys required beyond existing PDR surface.
- **Security/Privacy**: Local-only detection; no telemetry or external calls. No new attack surface beyond existing CLI.
- **Reliability/DevOps**: Not requested in PDR beyond reporting readiness; ensure detection failures degrade to conservative recommendations rather than blocking startup.
Assumptions
- GPU VRAM detection is best-effort; absence of a GPU is treated as CPU-only.
- Hardware checks run locally with no privilege escalation; only readonly system queries are used.
Open Questions & Risks
- How to handle systems with ≥32GB RAM but no GPU: stay on 8B or offer a mid-tier 14B if present? (PDR silent.)
- Should warnings on oversized `[llm].default_model` be treated as non-zero exit in CI (`docdexd check`), or only logged?
- `llm_list.json` source and schema are assumed to already include model size/requirements; confirm format.
Verification Strategy
- Unit tests: hardware probe parser with mocked RAM/VRAM inputs across thresholds.
- CLI tests: `docdexd llm-list` and `docdexd llm-setup` output expected recommendations given mocked detection.
- Readiness: `docdexd check` emits warning for mismatched default model vs detected capacity; confirm no crash on missing GPU info.
## Core Components {#core-components}
This section defines the daemon’s key subsystems and how they cooperate to satisfy the per-repo, local-first constraints. Components are limited to the PDR-approved stack (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec, Tree-sitter); no new surfaces or services are added.
### Config and State Manager
- Responsibilities: Parse/validate `~/.docdex/config.toml`; ensure RW on `global_state_dir`; materialize defaults when missing; expose typed config to all services; enforce localhost bind unless `--expose` with token.
- State layout: Creates/validates `state/repos/<fingerprint>/{index/,libs_index/,memory.db,symbols.db,dag.db,impact_graph.json}` and shared caches `cache/web`, `cache/libs/<ecosystem>/<pkg>/`, `locks/` for browser/process guards.
- Hardware awareness: On startup and `llm-list`, detect RAM/VRAM to suggest models (`phi3.5:3.8b` default; heavier only if hardware allows).
- Data contract: Provides immutable config snapshot to consumers; emits normalized repo fingerprint function.
### Repo Manager
- Responsibilities: Map normalized repo paths → SHA256 fingerprints; lazy init per-repo state; prevent cross-repo contamination. Singleton daemons apply an LRU watcher lifecycle (stop watchers after ~2h idle, hibernate after ~24h); per-repo daemons skip LRU.
- Interactions: Called by CLI/HTTP/MCP entrypoints to resolve repo context before any operation; hands back handles to Tantivy indexes, sqlite DBs, and libs index.
### Indexing and Search
- Local index: Tantivy BM25 over repo source; optional symbol extraction (Tree-sitter) and libs index treated as Tier-1.
- Ignore rules: `.docdexignore` (first-party) and `.gitignore` are honored by the indexer and file watcher to skip unwanted files/dirs.
- Operations: `docdexd index --repo` builds/updates source index; search invoked by chat/RAG and Waterfall Tier 1\. Token budgeting favors Memory \> Repo \> Library/Web.
- Data contract: Query returns ranked hits with path, snippet, score; exposes top score for Waterfall gate comparison against `web_trigger_threshold`.
### Waterfall Orchestrator
- Logic: Tier 1 local search; if top score \< `web_trigger_threshold` or forced, escalate to Tier 2 web (DDG discovery → Chrome fetch → readability → cache → ingest) and then Tier 3 cognition (LLM/memory). Cached library docs participate as local.
- Sequence (textual): Receive query with repo id → run local search → if below threshold, invoke DiscoveryService (rate-limited) → fetch pages with guarded Chrome → clean/cache → ingest relevant text into context → assemble prompt with token budget → stream via Ollama.
- Guardrails: Enforces scrape delays (≥2s DDG, ≥1s fetch/domain), page timeout 15s, browser lifecycle guard, and context priority order.
### Web Discovery and Scraping
- DiscoveryService: DuckDuckGo HTML only; blocklist support; respects 2s minimum between queries; caches results under `cache/web`.
- ScraperEngine: Headless Chrome (readability extraction), headless by default; guarded via `locks/`; timeouts per config; zero zombie processes requirement.
- Outputs: Cleaned HTML/JSON cached globally; per-repo ingestion handled by orchestrator.
### LLM and Embeddings
- Provider: configurable (Ollama default); streaming responses; token budgeting enforced pre-call; models filtered by hardware guidance.
- Embeddings: Ollama embeddings used for memory and rerank where applicable; max answer tokens from config.
- Interfaces: CLI chat, HTTP `/v1/chat/completions`, MCP tools all go through the same LLM gateway; HTTP defaults to the daemon repo and validates any provided repo id.
### Memory and Reasoning DAG
- Memory: Per-repo `memory.db` (sqlite-vec) with tables `memories` (id UUID, content TEXT, embedding BLOB, created_at INT, metadata JSON), `memory_vec` (vec0 embeddings), and `memory_meta` (key/value embedding_dim, schema_version); ops `memory_store`, `memory_recall` scoped by repo; prioritized in context merge.
- DAG: Per-repo `dag.db`; node types UserRequest/Thought/ToolCall/Observation/Decision; logging per session; `dag view --repo <path> <session_id>` renders text/DOT.
- Isolation: No cross-repo memory or DAG queries; per-repo daemons close handles on shutdown.
**Open Questions & Risks**
- How to coordinate limits across multiple per-repo daemons running concurrently? Not specified.
- Exact limits for concurrent Chrome instances and fetch queue sizing are unstated; risk of overuse on low-end machines.
- Rerank presence/algorithm for local search optional in PDR; decision needed.
- Token budgeting percentages fixed in PDR; need confirmation on adaptability per model/context size.
**Verification Strategy**
- `docdexd check` validates config, state perms, Ollama reachability/models, Chrome availability, repo registry, HTTP bind, and MCP binary readiness.
- Repo Manager tests for isolation under concurrent access across multiple per-repo daemons.
- Waterfall tests: force low-confidence path to verify DDG spacing, fetch delays, cache use, and Chrome guard; ensure local-only when above threshold.
- Memory tests: store/recall per repo; ensure no cross-repo leakage; embedding flow via Ollama.
- DAG tests: log and view sessions across node types; ensure per-repo separation.
- Index/search tests: p95 latency targets (\<50ms), correct scoring exposure for threshold gating; symbol extraction present where supported.
### Config and State Manager
Config/state layer ensures typed configuration, RW validation, and deterministic state layout that other subsystems rely on for per-repo isolation across per-repo daemons.
- **Intent**: Provide a single source of truth for daemon/runtime configuration and a predictable per-repo/global state directory tree with enforced read/write guarantees and auto-creation of sane defaults.
- **Config location & shape**: `~/.docdex/config.toml` auto-created on first run with localhost defaults. Sections per PDR: `[core] global_state_dir, log_level, max_concurrent_fetches`; `[llm] provider=<name> (default `ollama`), base_url, default_model, embedding_model, max_answer_tokens`; `[search] web_trigger_threshold, max_repo_hits, max_web_hits`; `[web] discovery_provider=duckduckgo_html, user_agent, ddg_base_url, ddg_proxy_base_url, min_spacing_ms, cache_ttl_secs, blocklist`; `[web.scraper] engine, headless, chrome_binary_path, auto_install, browser_kind, request_delay_ms, page_load_timeout_secs`; `[memory] enabled=true, backend=sqlite`; `[server] http_bind_addr=127.0.0.1:3210, enable_mcp=true`. Typed parsing with defaults; warn on unknown providers. The npm installer may update `http_bind_addr` during auto-port selection.
- **Env override**: `DOCDEX_WEB_BLOCKLIST=example.com,docs.example.org` sets the web discovery blocklist as a comma-separated list of domain suffixes.
- **Env override**: `DOCDEX_WEB_MIN_SPACING_MS` (DDG spacing, min 2000ms) and `DOCDEX_WEB_REQUEST_DELAY_MS` (per-domain fetch delay, min 1000ms).
- **Env override**: `DOCDEX_DDG_BASE_URL` to override the DuckDuckGo discovery endpoint (default `https://html.duckduckgo.com/html/`).
- **Env override**: `DOCDEX_DDG_PROXY_BASE_URL` to set an optional proxy fallback for DDG discovery (used when the primary endpoint returns anomaly/blocked pages).
- **State root & layout**: `~/.docdex/state/` with enforced creation/validation:
- `repos/<fingerprint>/index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`
- `cache/web/` (raw HTML \+ cleaned JSON), `cache/libs/<ecosystem>/<pkg>/`
- `locks/` for browser/process guards
- `logs/` optional daemon logs when enabled
- Repo fingerprint \= SHA256 of normalized repo path; all per-repo paths must use this key to prevent cross-contamination.
- **Responsibilities**:
- Validate RW on `global_state_dir` at startup and before per-repo init.
- Create missing config with defaults; on missing state subdirs/DBs for a repo, lazily initialize via Repo Manager.
- Expose normalized, typed config/state handles to: Repo Manager (per-repo paths), Waterfall (web caches), Memory/DAG/Index subsystems, and Chrome guard (locks path).
- Hardware awareness surfaced to LLM config recommender: detect RAM/VRAM and suggest model tiers (`ultra-light` \<8GB, default `phi3.5:3.8b` ≥16GB, `llama3.1:70b` with GPU ≥32GB).
- **Interactions (textual diagram)**:
- On daemon start: Config Loader → parse/validate `config.toml` (defaults) → State Manager → validate/create `global_state_dir`, `cache/*`, `locks/`.
- Per repo access: Repo Manager → fingerprint(repo\_path) → State Manager → ensure `repos/<fp>/*` exist → return handles/paths to Indexer, Memory, DAG, Symbols.
- Check command: `docdexd check` orchestrates Config Loader \+ State Manager validation \+ Ollama/Chrome reachability.
- **Scalability/Reliability**: Bounded by `max_concurrent_fetches` and per-repo resources; state layout supports multiple per-repo daemons without cross-contamination. RW validation prevents partial init; locks directory guards browser lifecycle.
- **Security/Isolation**: Enforce localhost defaults; state paths scoped by fingerprint to prevent cross-repo bleed. No telemetry. Token auth handled at server layer; config/state manager just supplies bind info.
- **Observability**: Log config warnings (unknown provider), RW failures, and auto-create events. Additional metrics not requested in PDR.
- **DevOps**: Persistence across upgrades; Playwright auto-installs a managed Chromium build on macOS/Windows/Linux when no browser is detected (opt-out supported), with system browsers as fallback. No cloud dependencies.
**Open Questions & Risks**
- Should config validation hard-fail on unknown keys or only warn? (PDR silent)
- Behavior when `global_state_dir` is moved or lacks perms after init—migrate vs. fail?
- Fingerprint collisions are improbable but not addressed; assume SHA256 sufficient.
- Chrome/browser path defaults on diverse OSes—multi-browser fallbacks and Playwright auto-install need to remain deterministic.
**Verification Strategy**
- `docdexd check`: parse config, verify RW on `global_state_dir`, presence/creation of required subdirs, validate provider-specific reachability (Ollama default), test Chrome availability.
- Unit/integration: fingerprint normalization tests; per-repo init creates expected layout; locks directory remains clean after guarded Chrome usage.
- Negative tests: fail on non-writable state dir; clear error on missing repo index/state when accessed.
### Repo Manager
Repo Manager maintains normalized repo registry mapped to SHA256 fingerprints and lazily initializes per-repo state. Per-repo daemons do not require max-open-repos LRU eviction.
**Scope & Intent**
- Responsibilities: path normalization → fingerprinting; per-repo state directory creation; lazy init of handles for indexes/DBs; prevention of cross-repo contamination. Everything else (chat, search orchestration, memory ops, web tiers) depends on it but is out-of-scope here.
- Exclusions: no cross-repo memory/index sharing; no additional surfaces beyond those already defined (CLI/HTTP/MCP).
**Core Functions**
- Path normalization and fingerprinting: compute SHA256 over normalized repo path; fingerprint used for all state paths under `~/.docdex/state/repos/<fingerprint>/`.
- Lazy initialization: on first access, create/validate per-repo dirs and handles for `index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`; RW checks on `global_state_dir` before use.
- Registry & lookup: map from normalized path (and optionally repo id) to fingerprint and live handles; CLI/MCP callers must supply repo id/path, HTTP defaults to the daemon repo.
- Max-open-repos: not required for per-repo daemons; reserved if multi-repo mode is reintroduced.
- Isolation: no cross-repo data mixing; shared caches (`cache/web`, `cache/libs`) are ingested per repo but never cross-read directly.
- Lifecycle integration: daemon startup validates repo registry readiness; CLI/API/MCP operations fail with clear error on unknown/unindexed repo.
**Interactions & Data Contracts**
- Inputs: normalized repo path or repo\_id.
- Outputs: repo context handles (index, libs\_index, memory, symbols, dag) bound to fingerprinted state paths.
- Errors: clear signals for missing repo, missing index, over-capacity eviction, permission issues, or fingerprint mismatch.
**Scalability & Reliability**
- Bound resource use per repo; target ≥8 concurrent repos via multiple per-repo daemons.
- Safeguards against cross-contamination by scoping all paths under fingerprinted dirs.
- Startup checks (per PDR `docdexd check`) validate RW on state dir and registry.
**Security & Privacy**
- No telemetry; operates under local state; relies on daemon-level binding/token policies (defined elsewhere). Enforces repo scoping on all operations; rejects requests without repo selection.
**Observability & DevOps**
- Not explicitly requested in PDR; minimum: log repo open events and errors to aid diagnosing permission issues.
**Assumptions**
- Fingerprint is deterministic SHA256 over normalized absolute path; no secondary IDs needed.
- Per-repo daemons keep on-disk state; no eviction within a single repo context.
- Callers are responsible for ensuring repos are indexed before use; Repo Manager only manages lifecycle/handles.
**Open Questions & Risks**
- How are repo deletions handled (on-disk cleanup vs. orphaned state)?
- Risk: stale on-disk state if repos move without reindexing; define cleanup guidance.
**Verification Strategy**
- Unit tests: path normalization → fingerprint determinism; registry lookup; lazy init idempotence.
- Integration tests: concurrent per-repo daemon access shows no cross-contamination; state paths stay under fingerprinted dirs.
- CLI/daemon check: `docdexd check` validates RW perms and reports registry readiness.
### Indexing and Search
Architectural intent: deliver fast, repo-scoped retrieval that stays local-first, supports per-repo isolation, and feeds downstream chat/memory/DAG flows. Per PDR, indexing covers repo source, cached library docs, symbols, and impact graph metadata; search uses Tantivy BM25 with optional local rerank and respects waterfall gating to web only on low confidence.
Components and flows
- Repo Manager: lazily initializes per-repo `state/repos/<fingerprint>/index/` (source), `libs_index/`, `symbols.db`, `dag.db`, `impact_graph.json`.
- Tantivy source index (per repo): indexes files with BM25; scope limited to selected repo fingerprint. Out of scope: cross-repo search.
- Libraries index (per repo): ingests cached library docs (Phase 2.1) into `libs_index` so library answers count as Tier 1 local context.
- Query path (Tier 1): `docdexd chat --repo` and `/v1/chat/completions` call local BM25 search across source \+ libs index; optional local rerank (model unspecified in PDR—TBD). Waterfall escalation only if top score \< `web_trigger_threshold` or forced.
- Symbol extraction (Phase 6): Tree-sitter during `index` populates `symbols.db` with name/kind/file/lines/signature to support code intelligence and impact graph.
- Impact graph (Phase 6): dependency edges captured during indexing; served via `GET /v1/graph/impact?file=<path>` (repo id optional for per-repo daemon)
returning schema-tagged inbound/outbound deps with explicit edge direction semantics.
Data contracts (as implied)
- Per-repo state layout: `state/repos/<fingerprint>/index/` (Tantivy), `libs_index/`, `symbols.db`, `dag.db`, `impact_graph.json`, `memory.db`.
- Impact API response: directed deps; exact schema not detailed in PDR (open question).
Scalability, reliability, security, observability
- Performance targets: local search p95 \< 50ms, typical \< 20ms; indexing \< 1GB memory.
- Reliability: clear errors for missing repo/index. Browser guard and rate limits belong to web tier (not primary here).
- Security: repo scoping mandatory; no cross-repo data bleed; localhost bind unless `--expose` with token.
- Observability: not requested in PDR for this section.
Assumptions
- Optional rerank is local and uses available Ollama model; model choice not fixed.
- Impact graph edge extraction happens during indexing; no separate job runner.
Open Questions & Risks
- Impact API schema specifics and pagination/limits.
- Rerank model choice and enable/disable flag default.
- Handling of large binaries or generated files in Tantivy index (inclusion/exclusion policy).
- Consistency when a repo is modified mid-query (do we fail fast or retry?).
Verification Strategy
- `docdexd index --repo` builds Tantivy index and symbols without errors; measure memory bound (\<1GB).
- Local search latency benchmarks hit p95 \< 50ms under concurrent per-repo daemon load.
- Isolation tests: queries never return content from other repos.
- Waterfall gate: assert web escalation only when score \< `web_trigger_threshold` or forced flag.
- Impact API returns correct inbound/outbound deps for known fixtures.
### Waterfall Orchestrator
Routes each query through a tiered pipeline—local → web → cognition—based on confidence, assembling context within a fixed token budget while honoring per-repo isolation and per-repo daemon constraints.
**Scope & Intent**
- Enforce local-first retrieval with gated escalation to web and cognition when local confidence \< `web_trigger_threshold` (default 0.7) or when explicitly forced.
- Keep repo isolation: all retrievals and caches are repo-scoped via Repo Manager fingerprints; no cross-repo bleed.
- Assemble context with fixed priority and budgeting: Memory \> Repo Code \> Library/Web, \~10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer.
**Flow (textual sequence)**
1) Receive query (CLI/HTTP/MCP) with repo selection; CLI uses `--repo`, MCP tools require `project_root`/`repo_path` unless `initialize` sets a default, HTTP defaults to the daemon repo and validates any provided repo id. Resolve RepoContext (paths, indexes, caches) and token budget.
2) Tier 1 Local: query Tantivy source index \+ repo `libs_index`; optional local rerank. If top score ≥ threshold, proceed to prompt assembly.
3) Gate: if score \< threshold or forced web, proceed to Tier 2\.
4) Tier 2 Web: DiscoveryService (DuckDuckGo HTML, ≥2s between searches) → ScraperEngine (headless Chrome, readability, ≥1s/domain). Cache raw/cleaned under `cache/web`; ingest snippets per repo.
5) Context merge: prioritize memory snippets, then repo code, then libs/web; drop lowest-priority content first on overflow.
6) Tier 3 Cognition: local Ollama for chat/embeddings; stream response; log DAG nodes if Phase 4+ enabled.
7) Return response; no cross-repo eviction within a per-repo daemon.
**Components & Contracts**
- WaterfallOrchestrator: owns confidence gate, tier routing, and token budgeting.
- RepoContext accessor: supplies per-repo indexes (`index/`, `libs_index`, `memory.db`, `dag.db`, `impact_graph.json`) and cache handles; enforces SHA256 fingerprinting.
- DiscoveryService & ScraperEngine: respect rate limits, cache TTL, Chrome lifecycle guards; return cleaned documents with source metadata for ingestion.
- Memory layer: per-repo sqlite-vec recall, prioritized in assembly.
- Token budgeter: counts tokens pre-Ollama call; emits drop logs when pruning low-priority snippets.
**Scalability & Reliability**
- Scaling via multiple per-repo daemons; ensure handles close cleanly on shutdown.
- Performance target: local search p95 \< 50ms; keep web fetch concurrency bounded (`max_concurrent_fetches`).
- Browser guard prevents zombie Chrome; per-domain rate limits backoff on HTTP errors.
**Security & Privacy**
- Localhost-only by default; `--expose` requires token; HTTP uses daemon repo by default and validates any provided repo id.
- No paid/cloud APIs; web only on threshold drop/explicit request; cached data stored locally.
**Observability & DevOps**
- Logs: gate decisions (scores, threshold), tier chosen, snippets dropped due to budget, web rate-limit/backoff events, Chrome lifecycle.
- `docdexd check` validates Ollama, Chrome, config, repo registry before serving.
**Assumptions**
- Threshold comparator uses top-scoring local hit; rerank (if present) happens before gate.
- Web cache ingestion remains repo-scoped even though cache is global.
**Open Questions & Risks**
- How to tune `web_trigger_threshold` per repo or workload without regression?
- What is the fallback if discovery repeatedly fails (e.g., DDG blocked)? Backoff strategy only?
- Potential latency spikes if Chrome cold-starts under load; need warm pool?
**Verification Strategy**
- Unit/integration: confidence gate branches (\>=, \< threshold, forced web); token pruning order and logging.
- Performance: local search p95 \< 50ms; web rate-limit compliance (≥2s discovery, ≥1s/domain).
- Isolation: concurrent per-repo daemons show no cross-repo cache/index/memory bleed.
- Reliability: simulate scraper/Chrome failure; ensure graceful degradation and error clarity.
- End-to-end: `web-search`, `web-fetch`, `web-rag` flows, and `/v1/chat/completions` routing respect repo scoping and gating.
### Web Discovery and Scraping
The system provides zero-cost web enrichment as Tier 2 of the retrieval waterfall, using DuckDuckGo HTML for discovery and headless Chrome for content fetch with readability cleanup, all guarded for resource and privacy constraints.
**Components & Responsibilities**
- `DiscoveryService`: issues DuckDuckGo HTML searches with ≥2s delay between queries; applies blocklist; respects global `[web]` config (user agent, cache TTL).
- `ScraperEngine`: headless Chrome (local binary) fetch with readability extraction; enforces request delay ≥1s per domain and page load timeout (\~15s default); guarded lifecycle with locks to prevent zombie processes; runs only when Tier 2 is triggered or explicitly requested.
- Cache: global `cache/web/` storing raw HTML and cleaned JSON; reused across repos but ingested per repo as needed.
- Waterfall Orchestrator: triggers discovery/fetch when local confidence \< `web_trigger_threshold` (default 0.7) or on explicit web commands; merges cleaned snippets into context after token budgeting (priority: Memory → Repo → Library/Web).
**Interactions & Data Flow**
- CLI: `web-search "<query>"` → DiscoveryService (rate-limited) → return URLs (cached).
- CLI/HTTP orchestration: `web-fetch <url>` or auto-fetch in `web-rag`/`/v1/chat/completions` Tier 2 path → ScraperEngine fetch → readability → cache write → cleaned content merged into context (repo-scoped ingestion).
- Guards: Browser lifecycle uses locks under `state/locks/`; orchestrator caps concurrency via `[core].max_concurrent_fetches` and scraper-specific config.
- Config hooks: `[web.scraper]` selects engine, headless mode, chrome path, request delay, timeouts; `[search]` sets `web_trigger_threshold`, `max_web_hits`.
**Scalability & Reliability**
- Rate limits per domain and per DDG query as specified; backoff on HTTP errors (from PDR risk section).
- Bounded Chrome concurrency prevents resource exhaustion; per-repo daemons do not evict global web cache.
- Idle daemon must avoid zombie Chrome; validated in `docdexd check`.
**Security & Privacy**
- Local-only by default; web access is explicit/gated by confidence; no paid APIs.
- When `--expose`, HTTP/MCP require token auth (inherited); web fetches still use local Chrome and DDG HTML only.
**Observability/DevOps**
- Not explicitly requested in PDR beyond `docdexd check` validating Chrome availability and rate-limit/backoff behavior.
**Assumptions**
- Readability extraction suffices for all supported content types; no PDF/JS rendering beyond Chrome page load.
- Cache TTL from `[web]` applies to both discovery results and cleaned pages unless overridden elsewhere (not specified).
**Open Questions & Risks**
- How to surface/handle DDG blocklist updates and HTTP backoff policy specifics?
- Do we need per-domain concurrency caps beyond global request delay?
- Handling of JS-heavy pages when readability fails; fallback strategy not specified.
- Cache eviction policy/TTL granularity not fully defined.
**Verification Strategy**
- `docdexd check` confirms Chrome availability, scraper guards, and rate-limit readiness.
- Automated tests: rate-limit enforcement (≥2s DDG, ≥1s fetch), cache reuse, waterfall trigger on confidence gate, Chrome lifecycle guard (no zombies).
- Manual tests: `web-search` and `web-fetch` commands; `web-rag` end-to-end with token budgeting and context priority enforcement.
### LLM and Embeddings
LLM/embedding layer is Ollama-only for both generation and embeddings, operating locally within the per-repo daemon constraint. It must respect token budgets, stream responses, and stay repo-scoped by construction.
- **Provider/Models**: `[llm] provider` is configurable (default `ollama`) with `base_url`; `default_model` for chat (hardware-guided selection) and `embedding_model` for sqlite-vec memory and search enrichment. No paid APIs by default; warn if provider is unknown or missing required config.
- **Invocation & Surfaces**: One call path shared by CLI/HTTP/MCP. `/v1/chat/completions` (OpenAI-compatible) and `docdexd chat` route to Ollama; HTTP defaults to the daemon repo and validates any provided repo id. MCP tools reuse the same pipe. Streaming responses required.
- **Token Budgeting**: Pre-call budgeting per request: \~10% system prompt, 20% memory (if enabled), 50% repo/library/web context, 20% generation buffer. Drop lowest-priority snippets first (library/web before repo before memory) with logging. Enforce `max_answer_tokens` from config.
- **Context Assembly**: Priority order Memory → Repo code (Tantivy \+ symbols) → Library/web artifacts. Library docs treated as Tier-1 support. Waterfall orchestrator only escalates to web when confidence \< `web_trigger_threshold` or explicitly forced.
- **Repo Isolation**: CLI/MCP calls require repo id/path; HTTP defaults to the daemon repo and validates any provided repo id. Embeddings and memory stored per-repo (`state/repos/<fingerprint>/memory.db`), no cross-repo bleed. Unknown/unindexed repo returns clear error.
- **Embeddings**: Ollama embedding model only; used for memory\_store/recall, local rerank (optional), and any vector similarity in sqlite-vec. No external vector DB.
- **Hardware Awareness**: `llm-list` detects RAM/VRAM and recommends models (e.g., `phi3.5:3.8b` default, `:70b` if resources and installed; ultra-light if \<8GB RAM). `llm-setup` ensures `ollama` in PATH and guides pulls; npm postinstall may prompt and installs only on explicit confirmation.
- **Reliability & Limits**: Streaming must tolerate backpressure; apply timeouts/retries aligned with daemon defaults. Ensure daemon startup (`check`) validates Ollama reachability/models and budget configuration. Token overflow mitigated by pruning per priorities above.
- **Security/Privacy**: Local-only by default (bind 127.0.0.1); when `--expose`, require auth token on HTTP/MCP. No telemetry; prompts and inference stay local.
- **Observability**: Log model used, token budget decisions, truncation events, and repo id; avoid logging sensitive prompt content. Additional metrics not requested in PDR.
- **DevOps**: No external dependencies beyond Ollama binary/models. Binaries distributed via npm wrapper; preserve state layout. Chrome/browser not part of LLM path (only web tier).
**Open Questions & Risks**
- Clarify default `embedding_model` value and size guidance per hardware tier.
- Define concrete timeout/retry policy for Ollama streaming under load.
- Confirm whether rerank uses embeddings and how to toggle it per repo/config.
- Risk: large models pulled without hardware fit; mitigation via `llm-list` gating and warnings.
- Risk: token budgeting misconfiguration leading to truncation of high-priority context; need guardrails and logs.
**Verification Strategy**
- `docdexd check` validates provider reachability, required models present, and token budget config sane.
- Unit/integ tests: enforce repo scoping; assert memory isolation across per-repo daemons.
- Budgeting tests: construct oversized contexts and verify priority-based truncation and logging.
- Streaming tests: ensure chunked output end-to-end via CLI and `/v1/chat/completions`.
- Hardware-guidance tests: simulate RAM/VRAM tiers and assert model recommendations/warnings.
### Memory and Reasoning DAG
This section defines per-repo long-term memory (sqlite-vec) and reasoning DAG logging, aligned to per-repo daemon isolation. Both live under `~/.docdex/state/repos/<fingerprint>/` and are always scoped by repo selection.
**Architectural intent**
- Provide repo-scoped recall to prioritize grounded answers (memory precedes code/library/web context).
- Capture reasoning sessions as DAGs for auditability and tooling (CLI/MCP/dashboard).
- Preserve local-first, zero-cost operation using sqlite-vec and sqlite for DAG logging.
**Components and data**
- `memory.db` (sqlite-vec): tables `memories` (id UUID, content TEXT, embedding BLOB, created_at INT, metadata JSON), `memory_vec` (vec0 embedding table), and `memory_meta` (embedding_dim, schema_version key/value). Ollama embeddings only.
- `dag.db` (sqlite): node types `UserRequest`, `Thought`, `ToolCall`, `Observation`, `Decision`; session-scoped logging.
- Repo Manager: ensures per-repo initialization/closing without cross-repo access.
- Embedding/model config: uses `[llm]` `embedding_model` via Ollama; no external vector DB.
**Interactions**
- Memory store: `memory_store(text, metadata, repo)` computes embedding via Ollama, inserts into `memory.db`.
- Memory recall: `memory_recall(query, repo)` embeds query, performs sqlite-vec similarity, feeds top hits into context assembly with highest priority budget slice.
- DAG logging: each chat/waterfall session creates a session id; nodes appended with timestamps and minimal metadata (tool name, repo path/id). No cross-session edges.
- Viewing: `docdexd dag view --repo <path> <session_id>` renders text or DOT from `dag.db`.
- Context assembly order: Memory → Repo index → Library/Web; token budget enforced before LLM call.
**Scalability and reliability**
- sqlite-vec per repo; no cross-repo queries.
- Embedding calls stay local; no paid/network calls. Performance target aligns with p95 \< 50ms local search; memory recall should stay within that budget (assumes moderate memory set).
- DAG writes are lightweight appends; expected to remain small per session.
**Security and privacy**
- Local-only storage; bound to repo fingerprinted path. No telemetry. Token auth only when HTTP/MCP exposed; repo id/path required on every call.
**Observability**
- Log memory store/recall events (repo, counts, latency) and DAG session creation/render calls. No additional metrics beyond standard logging requested.
**DevOps**
- Managed by daemon lifecycle; `docdexd check` validates sqlite RW and Ollama reachability. No migration tooling specified; schema assumed stable per phase.
**Assumptions**
- Memory size per repo is modest; no sharding/compaction requirements defined.
- DOT rendering is generated on demand from stored nodes; no precomputed layouts.
- Embedding model availability is ensured by `[llm]` config and `llm-setup`.
- **Open Questions & Risks**
- Memory growth limits/compaction strategy not defined; potential bloat over long use.
- Concurrency semantics for memory insert/recall under multi-client access not specified.
- DAG node schema may need expansion (costs, token counts); change protocol unclear.
- Performance expectations for large memory tables relative to p95 targets need validation.
- **Verification Strategy**
- Automated tests: memory\_store/recall correctness per repo; isolation (no cross-repo results).
- CLI: `docdexd dag view --repo <path> <session_id>` renders expected nodes; memory recall returns stored items.
- `docdexd check` confirms `memory.db`/`dag.db` RW and Ollama embedding availability.
- Performance checks: recall latency within local search budget on representative dataset.
## Data Management and Storage {#data-management-and-storage}
Architectural intent: enforce per-repo isolation while sharing global caches, keeping all data local-by-default; guarantee deterministic layout keyed by repo fingerprint so multiple per-repo daemons can serve multiple repos without cross-contamination, while supporting fast search, memory, symbols, and DAG logging.
### Directory and Fingerprint Layout
- Repo fingerprint: SHA256 of normalized repo path; all per-repo paths nested under `~/.docdex/state/repos/<fingerprint>/`.
- Per-repo subdirs/files: `index/` (Tantivy source), `libs_index/` (ingested library docs), `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`.
- Global/shared: `~/.docdex/state/cache/web/` (raw HTML \+ cleaned JSON), `cache/libs/<ecosystem>/<pkg>/` (fetched docs), `locks/` (browser/process guards).
- Repo Manager duties: lazily create per-repo dirs on first touch; enforce RW checks before use.
### Schemas and Indexes
- Source index (Tantivy): BM25 primary; fields include path, content, lang, offsets; optional local rerank later (still Tantivy-based per PDR).
- Library index: Tantivy index under per-repo `libs_index/`, ingesting documents from global cache; treated as Tier-1 alongside source.
- Memory (`memory.db`): sqlite-vec tables `memories` (id UUID, content TEXT, embedding BLOB, created_at INT, metadata JSON), `memory_vec` (vec0 embedding table), and `memory_meta` (embedding_dim, schema_version key/value); per-repo only, no cross-repo queries.
- Symbols (`symbols.db`): Tree-sitter extraction stored with `name, kind, file_path, line_start, line_end, signature`; supports impact graph.
- DAG (`dag.db`): node types {UserRequest, Thought, ToolCall, Observation, Decision}; logged per session.
- Impact graph (Phase 6): directed edges from imports; served via `GET /v1/graph/impact` scoped by repo fingerprint.
- Data contracts: CLI/MCP require repo id/path; HTTP defaults to the daemon repo and validates any provided repo id. Unknown/unindexed repo returns clear error.
### Caching Strategy
- Web cache: global `cache/web/`; reused across repos; scraper enforces ≥1s per-domain fetch delay and ≥2s DDG discovery gap; guarded lifecycle to avoid zombie Chrome.
- Library cache: global `cache/libs/<ecosystem>/<pkg>/`; ingestion into per-repo `libs_index/` only (no direct reads); ingest sources must be under repo root or `cache/libs` to prevent bleed.
- Waterfall: Tier 1 (repo index \+ per-repo ingested libs) → Tier 2 (web discovery/fetch using cache; ingested per repo) → Tier 3 (memory/DAG context); escalation only when local score below `web_trigger_threshold` or explicitly forced.
- Eviction: not required for per-repo daemons; caches persist until TTL/purge (TTL for web defined in config).
Open Questions & Risks
- Clarify exact Tantivy schema fields and analyzers; PDR leaves flexible.
- Define cache TTL/purge policy for web and library caches (config mentions TTL but not default).
- Concurrency semantics when two repos ingest the same cached library doc—need locking or idempotent writes.
- Impact graph storage format uses `docdex.impact_graph` schema metadata (current v2) with in-memory migration for legacy files; reindex to persist upgrades.
- Risk: fingerprint collisions theoretically possible but negligible with SHA256; document assumption.
Verification Strategy
- `docdexd check` validates RW on `~/.docdex/state`, presence of per-repo dirs, and Chrome/Ollama availability.
- Unit/integration: repo isolation under concurrent access across per-repo daemons; prevent cross-repo reads.
- Rate-limit tests for scraper/discovery honoring delays and cache reuse.
- Schema migrations: initialize/upgrade `memory.db`, `symbols.db`, `dag.db` deterministically and reject cross-repo access by fingerprint; `impact_graph.json` uses schema metadata + migration guards.
- Functional: missing repo/index errors are clear; library ingestion only populates target repo `libs_index`.
### Directory and Fingerprint Layout
Architectural intent: enforce per-repo isolation while enabling shared, zero-cost caches; deterministic fingerprints prevent path ambiguity across per-repo daemons.
- **Fingerprinting**: SHA256 of normalized repo path; required for all per-repo state resolution and repo\_id references. Normalization definition must be consistent across CLI/HTTP/MCP surfaces (assumption: resolved realpath, lowercase on case-insensitive FS; confirm for Windows/WSL).
- **Per-repo state root**: `~/.docdex/state/repos/<fingerprint>/`. Created lazily by Repo Manager after RW validation. Per-repo daemons close DB/index handles on shutdown.
- **Per-repo contents** (all required, no cross-repo mixing):
- `index/` (Tantivy source index)
- `libs_index/` (Tantivy index of ingested library docs)
- `memory.db` (sqlite-vec) for long-term memory
- `symbols.db` (Tree-sitter symbols)
- `dag.db` (reasoning DAG)
- **Per-repo manifest**: `repo_meta.json` at the repo root includes `fingerprint_sha256`, `fingerprint_version`, `canonical_path`, `created_at_epoch_ms`, and `last_seen_at_epoch_ms` to support diagnostics and migrations.
- **Shared caches** (global, reused across repos but ingested per repo):
- `~/.docdex/state/cache/web/` for raw HTML \+ cleaned JSON from web fetches
- `~/.docdex/state/cache/libs/<ecosystem>/<pkg>/` for scraped library docs
- **Locks and guards**: `~/.docdex/state/locks/` for browser/process guards; Repo Manager enforces per-repo invariants.
- **Interactions**: Repo Manager maps repo path → fingerprint → per-repo directories; Waterfall uses per-repo index/libs\_index/memory; library/web caches are read-only to non-owner repos until ingested into that repo’s `libs_index`.
- **Security/Isolation**: No cross-repo memory/symbol/DAG access; all per-repo paths scoped by fingerprint; default localhost binding applies to any API that references these paths.
- **Reliability/cleanup**: Close Tantivy/sqlite handles on shutdown; Chrome guard uses `locks/` to avoid zombie processes; directories persist across restarts/upgrades.
Open Questions & Risks
- Define exact path normalization rules across platforms (case sensitivity, symlinks, UNC/WSL).
- Policy for cleaning orphaned per-repo directories when repos are unused long-term.
- Concurrency: ensure atomic creation of per-repo directories under parallel `index`/`chat`.
- Cache poisoning risk if web/libs cache lacks integrity checks; consider hashing fetched content.
Verification Strategy
- Unit: fingerprint derivation idempotence across repeated calls and platforms.
- Integration: `docdexd check` validates RW on `state/`, presence/permissions of per-repo subdirs, and lock dir.
- Concurrency tests: parallel repo opens across per-repo daemons ensure handles closed and no cross-repo writes.
- Functional: per-repo isolation validated by querying memory/symbols/DAG for one repo and confirming absence in another; shared cache reuse verified via ingestion logs.
### Schemas and Indexes
Architectural intent: define per-repo and global storage schemas that support low-latency local search, code intelligence, long-term memory, and reasoning traces while preserving strict repo isolation and deterministic data lifecycles.
**Components & Data Contracts**
- `memory.db` (per repo, sqlite-vec): `memories` table plus `memory_vec` (vec0) and `memory_meta` (embedding_dim, schema_version); embeddings from Ollama; queried via vector search; prioritized in context assembly.
- `symbols.db` (per repo): Tree-sitter extracted symbols for Rust/TypeScript/JavaScript/Python/Go/Java/C#/C/C++/PHP/Kotlin/Swift/Ruby/Lua/Dart with columns `{name, kind, file_path, line_start, line_end, signature}`; enables symbol search and impact analysis inputs.
- `dag.db` (per repo): nodes table with `type ENUM(UserRequest|Thought|ToolCall|Observation|Decision)`, `session_id`, `payload JSON`, `created_at`; edges implied by `session_id` \+ ordering (PDR: DAG logging and view).
- `index/` (per repo, Tantivy): source index for repo code; `libs_index/` for ingested library docs; both scoped by repo fingerprint to prevent cross-contamination.
- `cache/web` and `cache/libs` (global read-mostly): raw HTML/cleaned JSON and cached library docs; ingestion into per-repo indexes is explicit.
- Impact graph (per repo, `impact_graph.json` at the repo state root): directed edges derived from imports; API `GET /v1/graph/impact`
returns schema-tagged inbound/outbound deps keyed by `file` (directed `source -> target`).
**Interactions**
- Repo Manager initializes per-repo `index/, libs_index/, memory.db, symbols.db, dag.db, impact_graph.json` under `state/repos/<fingerprint>/`.
- Waterfall: Tier-1 search hits Tantivy indexes (`index/`, `libs_index/`) and can merge with `memory.db` results; confidence gate controls web fetch/caching.
- Indexing flow: `docdexd index --repo` populates Tantivy and `symbols.db`; dependency extraction feeds impact graph edges.
- Memory ops: `memory_store` writes to `memory.db`; `memory_recall` vector-searches embeddings.
- DAG logging: per session, append node rows; `dag view --repo <path> <session_id>` renders text/DOT from stored nodes.
**Scalability & Reliability**
- Performance targets: local search p95 \< 50ms; use repo-scoped indexes to keep postings bounded.
- Concurrency: multiple per-repo daemons serve multiple repos; schema design keeps no cross-repo tables to avoid locking contention and bleed.
- Caching: global caches reused but ingestion remains repo-scoped; prevents cache stampedes by reuse of fetched artifacts.
**Security & Isolation**
- Repo fingerprinted paths enforce isolation; no cross-repo queries for memory/DAG/symbols/impact.
- When `--expose`, token auth enforced (per PDR); HTTP defaults to daemon repo while MCP requires repo id/path.
**Observability & DevOps**
- Not requested in PDR; minimal requirement: `docdexd check` validates DB presence/permissions, Ollama/Chrome availability; logs errors for missing indexes/DBs.
**Assumptions**
- Impact graph edges stored in per-repo `impact_graph.json` under the repo state root; schema matches `docdex.impact_graph` response requirements and carries version metadata. Legacy files are accepted and migrated in-memory; reindex to persist upgrades.
- No cross-repo memory or DAG aggregation is needed.
**Open Questions & Risks**
- Do we need migrations/versioning for `memory.db`, `symbols.db`, `dag.db` as schemas evolve?
- How to handle symbol kinds/signatures across languages uniformly (Tree-sitter node mapping consistency)?
- Resource risk: large repos may push Tantivy index size; need bounds/compaction policy.
**Verification Strategy**
- `docdexd check` confirms per-repo DBs/indexes exist and are writable; validates Ollama reachability and Chrome guard.
- Indexing test: `index --repo` followed by sample search ensures Tantivy and `symbols.db` populated.
- Memory test: `memory_store` then `memory_recall` returns stored content with embedding search functioning.
- DAG test: execute chat/session to generate nodes; `dag view` renders expected sequence/DOT.
- Impact API test: call `GET /v1/graph/impact` on known deps to verify inbound/outbound edge retrieval.
### Caching Strategy
Docdex v2.0 uses caches to avoid re-fetching external sources while keeping per-repo isolation. Global caches store fetched web pages and library docs; ingestion into per-repo indexes remains repo-scoped to prevent bleed.
- **Cache scopes**: Global `cache/web/` holds raw HTML and cleaned JSON from DuckDuckGo discovery \+ headless Chrome fetches. Global `cache/libs/<ecosystem>/<pkg>/` stores scraped library docs keyed by ecosystem/package. No cross-repo memory/DAG caching by design.
- **Reuse model**: Cached web pages and library docs are reused across repos, but ingestion into each repo’s Tantivy `index/` and `libs_index/` is performed per repo; Repo Manager enforces fingerprinted paths.
- **Freshness/TTL**: PDR calls for configurable web cache TTL via `[web] cache_ttl_secs`; reuse is preferred until TTL expiry, then refetch. Library cache TTL not specified; assume long-lived unless invalidated manually or on version change (open question).
- **Write paths and guards**: Cache directories live under `~/.docdex/state/cache/...`; Repo Manager must validate RW on startup (`docdexd check`) and ensure no writes occur outside fingerprinted state roots.
- **Waterfall interaction**: Waterfall tiering treats cached library docs as Tier-1 (local) once ingested; cached web content is Tier-2 but may bypass live fetch if cache hit is valid. Confidence gate (`web_trigger_threshold`) still applies.
- **Concurrency and eviction**: Per-repo daemons only close per-repo handles on shutdown; cached artifacts persist globally. Cache eviction policy beyond TTL is not specified; default is unbounded within disk limits (risk).
- **Observability**: Log cache hits/misses and TTL expiry decisions; `docdexd check` should report cache directory health. Metrics beyond this are not requested in PDR.
- **Security/privacy**: Caches remain local-only; no upload/telemetry. When `--expose`, token auth protects HTTP/MCP, but caches stay on disk without extra encryption (not requested).
Open Questions & Risks
- Library cache TTL/versioning policy is unspecified; risk of stale docs across package updates.
- Disk growth risk without eviction for `cache/web` and `cache/libs`.
- Cache keying for web fetches: normalize URLs vs. raw; redirects/canonicalization handling not described.
- Cache corruption/rebuild path (e.g., partial files) is not defined.
Verification Strategy
- `docdexd check` validates cache directories exist and are writable; fails fast otherwise.
- Unit/integration tests: cache hit/miss paths for web fetch, TTL expiry triggers refetch, per-repo ingestion uses cached artifacts without cross-repo writes.
- Concurrency tests: simultaneous fetches use the same cache entry without races/corruption (locks/guards).
- Disk-bound tests (if added later): ensure behavior under low-space conditions or large cache growth.
## Interfaces and Integrations {#interfaces-and-integrations}
Docdex v2.0 exposes local-first surfaces (CLI, HTTP, MCP) per repo, each served by a per-repo `docdexd` daemon. Integrations stay zero-cost and local by default (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec), with strict scoping to prevent cross-repo contamination.
### CLI Commands
- Surfaces: `check`, `index --repo <path>`, `chat --repo <path> [--query <q>]`, `llm-list`, `llm-setup`, `web-search "<query>"`, `web-fetch <url>`, `web-rag "<q>" --repo`, `libs fetch --repo`, `dag view --repo <path> <session_id>`, `run-tests --repo <path> --target <file|dir>`, `mcp`, `tui`.
- Behavior: every operation requiring repo context mandates `--repo <path>`; unknown/unindexed repo returns a clear error. Waterfall (local → web → cognition) is triggered by `web-trigger-threshold` or explicit web commands.
- Token budgeting and streaming: CLI chat/web commands stream Ollama responses; budgets enforce priority (Memory \> Repo \> Lib/Web).
- Exposure: `--expose` (on daemon) requires token auth; otherwise bind is `127.0.0.1`.
### HTTP API
- Endpoints: `POST /v1/chat/completions` (OpenAI-compatible) and `GET /v1/graph/impact?repo_id=<id>&file=<path>` (handler in `src/api/v1/graph.rs`, routed from `src/search/mod.rs`).
- Repo routing: repo provided via body/header/query; missing/unknown repo is an error. Waterfall and token budgeting mirror CLI behavior; responses stream.
- Security: local bind by default; token required when exposed; no telemetry or paid API usage.
### MCP Server
- MCP: shared HTTP/SSE endpoint on the singleton daemon plus legacy stdio `docdexd mcp --repo <path>`; tools require `project_root`/`repo_path` unless `initialize` sets a default.
- Tools: `docdex_search`, `docdex_web_research`, `docdex_memory_save`, `docdex_memory_recall`; errors on unknown/unindexed repo.
- Lifecycle: runs alongside HTTP within each per-repo daemon; no multi-repo server mode.
### Local Dependencies
- LLM/embeddings: Provider-configured; models recommended via hardware-aware `llm-list`/`llm-setup`.
- Retrieval: Tantivy for source/libs indexes; sqlite-vec for per-repo memory; Tree-sitter for symbols; headless Chrome (guarded) plus DuckDuckGo HTML for discovery/fetch with rate limits and caching.
- Caching/state: `~/.docdex/state/` per-repo fingerprints; shared caches for web and libs but ingested per repo.
**Open Questions & Risks**
- Need explicit error contract formats for HTTP/MCP when repo is missing or index is stale.
- Clarify auth header/key format for `--expose` mode across CLI/HTTP/MCP clients.
- Risk: Chrome lifecycle/zombie processes impacting MCP/HTTP availability; ensure guard hooks cover daemon crashes.
- Risk: Waterfall thresholds may differ per surface; confirm single source of truth in config.
**Verification Strategy**
- CLI: run `docdexd check`, `index`, `chat`, `web-search/fetch/rag`, `libs fetch`, `dag view`, `run-tests` against a known repo; assert repo-required errors on omission.
- HTTP: call `/v1/chat/completions` and `/v1/graph/impact` with and without repo ids; verify streaming and token budgeting enforcement.
- MCP: invoke each tool with/without valid repo; assert per-repo routing and clear errors.
- Dependency checks: ensure Ollama reachable/models present; Chrome availability and rate-limit enforcement; cache directories writable.
### CLI Commands
Repo-scoped CLI entry points exposed by `docdexd` (daemon) and `docdex` (wrapper). All commands require explicit repo selection where applicable to preserve per-repo isolation and align with per-repo daemon intent.
- **Command Surface & Scope**
- Core readiness: `docdexd check` validates config RW, state layout, repo registry, Ollama reachability/models, headless Chrome availability, HTTP bind, MCP enablement.
- Repo indexing/chat: `index --repo <path>` builds Tantivy \+ symbols \+ dag/lib scaffolding; `chat --repo <path> [--query <q>]` runs Tier-1 local search, optional REPL if no query.
- LLM ops: `llm-list` (hardware-aware recommendations from `llm_list.json`); `llm-setup` (verify `ollama` presence, list/pull models, update `[llm]` config; prompt-based install only).
- Web waterfall: `web-search "<query>"`, `web-fetch <url>`, `web-rag "<question>" --repo <path>` triggering discovery→scrape→cache with rate limits.
- Library docs: `libs fetch --repo <path>` detects deps (Cargo/Node/Python), scrapes docs, caches under `cache/libs`, ingests into repo `libs_index`.
- DAG: `dag view --repo <path> <session_id>` renders text/DOT from `dag.db`.
- Tests: `run-tests --repo <path> --target <file_or_dir>` executes configured test command locally; returns structured JSON.
- MCP/TUI: shared MCP is served by `docdexd daemon` over HTTP/SSE; `mcp` still starts a per-repo stdio MCP server; `tui` shells out to the `docdex-tui` binary (override with `DOCDEX_TUI_BIN`) as a local exception.
- HTTP alignment: CLI routes to daemon HTTP/MCP surfaces; enforces repo id/path on every call (except local-only `run-tests`/`tui`).
- **Interactions & Data Flow**
- Commands invoke daemon APIs; daemon resolves repo fingerprint → per-repo state dirs (`index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`). Local-only exceptions: `run-tests` and `tui`.
- CLI base URL derives from `server.http_bind_addr` unless `DOCDEX_HTTP_BASE_URL` is set; `DOCDEX_CLI_LOCAL=1` forces legacy local execution when the daemon is unavailable.
- Waterfall commands share caches: `cache/web` (HTML \+ cleaned JSON) and `cache/libs`; ingestion is repo-scoped.
- Token budgeting for chat/web-rag enforced by daemon (not CLI); CLI streams outputs from Ollama via daemon.
- **Reliability & Resource Discipline**
- `check` surfaces readiness/errors (missing index, models, Chrome).
- Commands error clearly if repo unknown/unindexed.
- Web commands respect DDG (≥2s) and fetch (≥1s) delays; Chrome guarded to avoid zombies.
- **Security/Privacy**
- Default localhost bind; `--expose` requires token; CLI must pass token when remote.
- Stdio MCP enforces `auth_token` in `initialize` only when `DOCDEX_AUTH_TOKEN`/`--auth-token` is supplied (auto-started MCP inherits the daemon token).
- No paid APIs; offline-first; web only on low confidence or explicit web commands.
- All repo-scoped commands require repo arg to prevent cross-repo bleed; MCP tools mirror this.
- **Observability**
- Not detailed in PDR for CLI; rely on daemon logs for command outcomes and rate-limit notices.
- **Scalability**
- CLI defers to the per-repo daemon; run separate daemons per repo; performance target p95 local search \<50ms upheld by daemon.
- **DevOps**
- Config at `~/.docdex/config.toml` auto-created; CLI should warn if provider ≠ Ollama. Ollama installs remain prompt-based; Playwright browser installs are opt-out and can run via setup/auto-install.
- **Assumptions**
- CLI is a thin client; heavy work lives in the daemon except `run-tests` and `tui` local execution.
- HTTP and MCP endpoints already authenticated/authorized by daemon when exposed.
- Test command config provided per repo (outside scope here).
- **Open Questions & Risks**
- How is `run-tests` test command configured/discovered per repo? (config key vs repo file)
- Exact output schema for `run-tests` is not specified; DAG export formats are defined in `docs/contracts/dag_export_schema_v1.md`.
- Error codes/UX for missing indexes or models could be underspecified.
- TUI dependency footprint and startup guards not described.
- **Verification Strategy**
- Manual: run `docdexd check`, `llm-list`, `llm-setup`, `index --repo`, `chat --repo`, `web-search/fetch/rag`, `libs fetch --repo`, `dag view --repo`, `run-tests --repo`, `mcp`, `tui` with success/error paths.
- Automated: CLI integration tests hitting daemon with repo-scoped fixtures; assert repo requirement enforcement and rate-limit behavior via logs.
### HTTP API
- Intent: expose a single machine-local HTTP surface (default bind `127.0.0.1:3210`) that is OpenAI-compatible for chat and a repo-scoped impact graph read API, matching CLI/MCP semantics of explicit repo selection and local-first execution.
**Endpoints**
- `POST /v1/chat/completions`: OpenAI-compatible; requires repo identification (body/header/query). Runs RepoContext resolution → Waterfall (Tier 1 local index \+ libs, Tier 2 web on low confidence/explicit, Tier 3 cognition/memory) with token budgeting before calling Ollama; supports streaming responses.
- `GET /v1/graph/impact?file=<path>`: returns schema-tagged inbound/outbound dependency edges from per-repo
`symbols.db`/dependency graph (directed `source -> target`).
**Behavior & Contracts**
- Repo scoping: HTTP defaults to the daemon repo; validate any provided repo id/path and reject unknown/unindexed repos.
- Token budgeting: fixed priority (Memory \> Repo \> Library/Web) with \~10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer; drop lowest-priority snippets first with logging.
- Streaming: mirror OpenAI stream semantics for chat responses (chunked events).
- Waterfall gating: web escalation only if top local score \< `web_trigger_threshold` (default 0.7) or explicitly requested.
- State usage: per-repo dirs under `~/.docdex/state/repos/<fingerprint>/`; shared caches (`cache/web`, `cache/libs`) ingested per repo.
- Security: localhost by default; `--expose` requires token auth checked per request; no telemetry; no paid APIs.
- Performance targets: local search p95 \<50ms; typical \<20ms.
- Error handling: clear errors for missing repo/index, missing models, or offline web; web rate limits enforced (≥2s DDG discovery, ≥1s fetch delay, 15s page timeout).
- Observability: not requested in PDR.
- DevOps: per-repo daemon; `docdexd check` validates binding, Ollama, Chrome, repo registry before serving.
**Diagrams (textual)**
- Sequence: Client → `/v1/chat/completions` → Repo Manager (resolve repo/fingerprint) → Waterfall Orchestrator (Tier 1 search → optional web discovery/fetch → context merge with memory/libs) → Token Budgeter → Ollama stream → Client.
- Component: HTTP Server (OpenAI-compatible adapter) ↔ Repo Manager ↔ Indexes (`index/`, `libs_index`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`) ↔ Waterfall services (DiscoveryService, ScraperEngine) ↔ Ollama.
Open Questions & Risks
- Need exact shape of repo selector in headers/query/body for OpenAI-compatible calls (e.g., `docdex-repo-id` header vs body extension).
- Token auth scheme/format when `--expose` (bearer vs custom) not specified.
- Streaming chunk format: assume OpenAI SSE-compatible; confirm no deviations.
- Web escalation override: exact request flag naming for “force web” is unspecified.
Verification Strategy
- Unit/integration: HTTP defaults to daemon repo; reject unknown repo ids.
- Contract tests: OpenAI API compatibility (non-stream/stream), token budgeting enforcement, Waterfall gating at `web_trigger_threshold`.
- Performance: measure local search latency p95 and streaming start time under load with ≥8 concurrent repos.
- Security: token required when exposed; localhost bind default; no external paid APIs invoked.
- Impact graph: validate inbound/outbound edges match stored dependency graph for known fixtures.
### MCP Server
Intent: shared MCP surface for the singleton daemon (HTTP/SSE) plus legacy per-repo stdio MCP (`docdexd mcp`). Tools remain repo-scoped with clear errors on unknown/unindexed repos to avoid cross-repo bleed.
Scope and components
- Surface: singleton daemon exposes MCP over HTTP/SSE (`/sse`, `/v1/mcp`, `/v1/mcp/message`) on the daemon bind address. MCP `initialize` with `rootUri`/`workspace_root` calls `/v1/initialize` and binds the MCP session to that repo; per-request `project_root`/`repo_path` can override the bound repo for `/v1/mcp`. Per-repo stdio MCP remains available via `docdexd mcp`.
- Auto-start: `docdexd daemon` starts the shared MCP proxy when enabled (config, `DOCDEX_ENABLE_MCP`, or `--enable-mcp`); `--disable-mcp` overrides config/env. `docdexd serve` continues to spawn a per-repo stdio MCP server when desired.
- Tools (`project_root`/`repo_path` required for MCP calls unless `initialize` sets a default; validated to match the server repo):
- `docdex_search`: Tier-1 local (Tantivy \+ libs\_index) search; returns ranked snippets with source metadata.
- `docdex_web_research`: Waterfall gate checks `web_trigger_threshold`; on low confidence or explicit force, performs DDG discovery \+ guarded headless Chrome fetch \+ readability; ingests cache per repo before responding.
- `docdex_memory_save`: persists text \+ metadata into per-repo `memory.db` (sqlite-vec).
- `docdex_memory_recall`: semantic recall via Ollama embeddings scoped to repo memory.
- Error handling: unknown repo path/id → explicit error; missing index/memory → instruct to `index --repo` or enable memory; web disabled/offline → clear message.
- Interactions: MCP server delegates to Repo Manager (fingerprint resolution), Waterfall orchestrator, Memory service, Web cache, and Token budgeter to assemble context before tool responses.
Behavior and constraints
- Repo scoping: MCP tools require `project_root`/`repo_path` unless `initialize` sets a default; shared HTTP/SSE sessions use `initialize` to mount and pin a repo id; Repo Manager guards isolation.
- Local-first: web tier only on confidence drop (\<`web_trigger_threshold`) or explicit request; DDG HTML \+ headless Chrome with rate limits (≥2s search, ≥1s fetch) and browser guard to avoid zombies.
- Security: default localhost bind; `--expose` requires token on MCP requests; no telemetry or paid APIs.
- Reliability: startup `docdexd check` validates MCP enabled, config perms, Ollama, Chrome, and repo registry; clear failures if dependencies missing.
- Performance: must sustain ≥8 concurrent repos via multiple per-repo daemons; local search p95 \<50ms target; memory and libs treated as Tier-1 support.
- Observability: log tool invocations with repo id, tier selection (local/web/memory), and errors; no additional metrics mandated in PDR.
- DevOps: no new deployment surface; MCP shipped with daemon; uses same config/state layout (`~/.docdex/state/repos/<fingerprint>/...`).
Assumptions
- MCP uses existing HTTP binding/port allocation from daemon; no separate port negotiation described.
- Ollama embeddings/models are reachable locally before MCP tools execute memory operations.
Open Questions & Risks
- Should MCP reject calls when `enable_mcp=false` with a distinct error code vs generic not-found?
- How to signal rate-limit/backoff to clients (tool error vs structured retry hint)?
- Risk: headless Chrome availability impacts `docdex_web_research`; must propagate actionable error instead of silent fallback.
Verification Strategy
- `docdexd check` confirms MCP enabled, dependencies (Ollama, Chrome), repo registry, and config RW.
- Tool-level tests: ensure each tool errors on missing repo/index, respects repo isolation, and enforces confidence gate for web tier.
- Concurrency tests: operate against ≥8 repos across per-repo daemons; verify no cross-repo data leakage.
- Web safety tests: validate DDG/search delays, Chrome guard, and cache reuse; confirm clear errors when offline.
### Local Dependencies
Docdex relies solely on locally managed, zero-cost components for LLM/embeddings, web discovery/fetch, and code intelligence. This section defines how Ollama, headless Chrome, DuckDuckGo HTML discovery, and Tree-sitter are integrated to preserve local sovereignty, reliability, and repo isolation.
**Components and Roles**
- LLM provider: configured via `[llm]` in `config.toml` (default `ollama`) with hardware-aware model guidance and token budgeting handled upstream.
- DuckDuckGo HTML discovery: search-only HTML endpoint used for web queries; enforces ≥2s between queries.
- Headless Chrome: fetch and readability extraction for discovered URLs; guarded lifecycle to avoid zombie processes; respects per-domain ≥1s fetch delay and 15s page timeout defaults.
- Tree-sitter: language parsers (Rust, TypeScript/JavaScript, Python, Go, Java, C#, C/C++, PHP, Kotlin, Swift, Ruby, Lua, Dart) for symbol extraction during `index`; outputs stored in per-repo `symbols.db`.
- Impact graph resolution (best-effort): import edges resolve static patterns including literal import strings, string concatenation with constant bindings, static path joins (`path.join`, `path.resolve`, `os.path.join`), template strings or f-strings with static bindings (multiple candidates use a deterministic tie-break), Python `importlib.import_module`, `importlib.util.spec_from_file_location`, `importlib.machinery.SourceFileLoader`, and Rust `mod`/`use`/`include!`. Unresolved dynamic imports are skipped and recorded in impact diagnostics.
- Import hints: `docdex.import_map.json` supports mapping overrides and pattern expansions (`targets` + `expand`); runtime traces can be supplied via repo-root `docdex.import_traces.jsonl` or `<repo-state-root>/import_traces.jsonl` (toggle with `[code_intelligence].import_traces_enabled` or `DOCDEX_ENABLE_IMPORT_TRACES`). Dynamic import scan limits can be tuned via `[code_intelligence].dynamic_import_scan_limit` or `DOCDEX_DYNAMIC_IMPORT_SCAN_LIMIT`.
- Parser drift policy: when stored Tree-sitter parser versions differ from the running build, Docdex invalidates symbols/AST, sets `symbols_reindex_required`, and `GET /v1/symbols`/`GET /v1/ast` return `409 stale_index` until reindex. Drift metadata is exposed via `GET /v1/symbols/status` and `docdexd symbols-status`.
- AST search surface: `GET /v1/ast/search` accepts `kinds` (node kinds), `mode` (`any|all`), and `limit` to list files matching AST criteria for richer code intelligence queries.
- AST query surface: `POST /v1/ast/query` accepts `kinds`, optional `name`/`field`/`pathPrefix`, and `mode`/`limit`/`sampleLimit` to return per-file match counts plus sample nodes.
- Ranking signals: symbol/AST boosts apply per-kind weights and optional name matches; enable/disable via `[search].symbol_ranking_enabled`, `[search].ast_ranking_enabled`, `[search].chat_symbol_ranking_enabled`, `[search].chat_ast_ranking_enabled` or env overrides.
**Interactions and Data Flow**
- Waterfall Tier-2 (web) path: Local search confidence below `web_trigger_threshold` or explicit user request → DuckDuckGo discovery (rate-limited) → headless Chrome fetch with readability → raw HTML \+ cleaned JSON cached under `cache/web/` → ingested per repo when merged into context.
- Library docs path: Dependency detectors resolve docs URLs → headless Chrome fetch with same guardrails → cached under `cache/libs/<ecosystem>/<pkg>/` → ingested into per-repo `libs_index`, treated as Tier-1 support content.
- Indexing path: `docdexd index --repo` invokes Tree-sitter to emit symbols (name, kind, file\_path, line\_start/line\_end, signature) into `symbols.db`; Repo Manager ensures per-repo isolation via SHA256 fingerprinted paths.
- Diff-aware RAG path: when diff inputs are provided (CLI diff flags or `docdex.diff` in `/v1/chat/completions`), collect git diff (working tree, staged, or range; optionally path-scoped), expand to 1-hop dependencies using the impact graph, and assemble a diff context slice. Context ordering is Memory → Diff → Repo → Lib/Web with token budgets and drop logging. Dynamic imports resolve best-effort (literals, concatenations, static joins, template strings with static bindings, optional `docdex.import_map.json` hints); unresolved imports are skipped and reported via diagnostics/logs.
- LLM path: Completions and embeddings use the configured provider; local Ollama is the default. No implicit cloud fallback; `[llm]` warns if provider is unknown or missing required config.
**Operational Guardrails**
- Local-only by default: daemon binds to 127.0.0.1 unless `--expose` is set; when exposed, token auth enforced on HTTP/MCP surfaces.
- Resource controls: Chrome lifecycle guarded with locks and teardown; per-repo daemons close DB/index handles on shutdown.
- Caching behavior: Web/library caches are global but ingestion is repo-scoped; no cross-repo memory or symbol bleed.
- Observability: Dependency readiness surfaced via `docdexd check` (Ollama reachability, Chrome availability, model presence); additional telemetry not requested in PDR.
- Security: No paid APIs; no external egress beyond explicit web fetch; blocklist honored during discovery.
**Assumptions**
- Ollama binary is user-installed and present in PATH; SDS will not automate installation.
- Chrome/Chromium available locally and supports headless mode.
- Tree-sitter grammars for the supported languages are bundled or vendored; additional languages out of scope for this phase.
- Open Questions & Risks
- What is the exact mechanism for Chrome guard/locks to ensure zero zombies under crash conditions?
- Do we need configurable per-domain rate limits beyond the stated defaults (≥1s fetch, ≥2s discovery)?
- How are Tree-sitter parser versions managed to avoid AST drift across releases?
- Fallback behavior if Chrome is unavailable (skip web tier vs. fail request) is not explicitly specified.
- Cache eviction/TTL policies for `cache/web` and `cache/libs` are not defined; risk of unbounded disk growth.
- Verification Strategy
- `docdexd check` validates Ollama reachability/models, Chrome availability, repo registry, and bind availability; MCP spawn probe runs when `DOCDEX_CHECK_MCP_SPAWN=1` (timeout via `DOCDEX_CHECK_MCP_SPAWN_TIMEOUT_MS`).
- Rate-limit tests: assert ≥2s between DuckDuckGo queries and ≥1s between fetches; ensure errors/backoff on HTTP failures.
- Chrome lifecycle tests: start/stop under load and crash injection to confirm no lingering processes and lock cleanup.
- Tree-sitter extraction tests across supported languages to confirm symbols populated in `symbols.db` with correct spans.
- Web/library cache tests: fetch → cache → ingest per repo; verify no cross-repo contamination.
## Runtime, Deployment, and Operations {#runtime,-deployment,-and-operations}
**Intent**: Operate per-repo `docdexd` daemons that are localhost-bound by default, resource-disciplined, repo-scoped, and observable, while avoiding external costs and preventing browser/process leaks. Clustered/multi-tenant deployment modes are out of scope.
### Daemon Lifecycle and Binding
- Startup can run a preflight check (`--preflight-check` or `DOCDEX_PREFLIGHT_CHECK=1`): validates config readability/writability, state dirs, Ollama reachability/models, headless Chrome availability, repo registry, bind availability, and MCP readiness. Fails fast with actionable errors when enabled.
- Binding: default `127.0.0.1:3210`. `--expose` optional; when set, all HTTP/MCP surfaces require token auth (from env/config). No telemetry.
- Each per-repo daemon hosts one HTTP API and one MCP server; CLI and TUI connect locally. Run one daemon per repo.
- Chrome/browsers: headless lifecycle guarded; cleanup on exit/panic; locks under `state/locks/` to prevent concurrent zombie instances.
- Single daemon mode (install-and-forget, planned): run one global daemon with a lockfile (`~/.docdex/daemon.lock`), mount repos dynamically on initialize, and auto-start from CLI when needed.
### Resource and Concurrency Controls
- Repo lifecycle: per-repo daemon manages a single repo; DB/index handles close on shutdown.
- RAM/VRAM-aware LLM guidance; defaults tuned to keep idle daemon \<100MB, indexing \<1GB (configurable).
- Web access rate limits: ≥2s between DuckDuckGo searches; ≥1s per-domain fetch delay; page timeout default 15s; bounded Chrome concurrency.
- Token budgeting: \~10% system prompt, 20% memory, 50% repo/library/web, 20% generation buffer; lowest-priority snippets dropped first when over budget.
### Security and Privacy
- Local-first: offline by default; web escalation only on low confidence (`web_trigger_threshold`, default 0.7) or explicit request.
- Authentication: token required when `--expose` is used; reject unauthenticated remote HTTP/MCP calls. No paid APIs or telemetry; only local/open components (Ollama, DuckDuckGo HTML, headless Chrome).
- Data isolation: per-repo state under `state/repos/<fingerprint>/`; no cross-repo memory/index/DAG bleed; global caches are read/ingest-only per repo.
### Observability and Health
- Health/readiness via `docdexd check` output; includes Chrome guard status, model availability, repo registry, and bind availability.
- Logging: honor `log_level` from config; log rate-limit decisions, waterfall escalations, and browser lifecycle events.
- DAG and memory are per repo; exposed via CLI/HTTP/MCP only where specified—no additional telemetry channels.
### Configuration Management
- Config at `~/.docdex/config.toml`; auto-created with localhost defaults. Validates RW access to `global_state_dir`.
- Key sections enforced: `[core]`, `[llm]`, `[search]`, `[web]`, `[web.scraper]`, `[memory]`, `[server]`. Warn if `provider` is unknown or missing required config.
- State layout under `~/.docdex/state/` with repo fingerprints; includes `index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`, `cache/web`, `cache/libs`, `locks/`.
- No silent auto-install of Ollama; Playwright browser installs are opt-out and run via setup/auto-install. `llm-setup` provides guidance, and npm postinstall may prompt for explicit installs.
**Open Questions & Risks**
- Should health be exposed as a lightweight HTTP endpoint in addition to `docdexd check`? (Not specified in PDR.)
- Token distribution percentages: are they configurable or fixed constants?
- Chrome guard failure modes: is there a retry/backoff policy beyond startup validation?
- Risk: Resource exhaustion if many concurrent web requests despite rate limits—need clear error surfaces.
**Verification Strategy**
- Run `docdexd check` (or enable `--preflight-check`) to validate config/state/Ollama/Chrome/bind/MCP before serving.
- Concurrency tests: multiple per-repo daemons under load; ensure handles close on shutdown.
- Web safety: enforce DDG ≥2s interval and per-domain ≥1s delay; verify Chrome teardown and no zombie processes.
- Security: attempt exposed-mode calls without token → expect rejection; verify localhost-only bind by default.
- Token budgeting: construct over-budget requests to confirm lower-priority context is dropped first with logging.
- Isolation: concurrent per-repo daemons show no cross-repo data or memory bleed.
### Daemon Lifecycle and Binding
Per-repo `docdexd` processes expose one HTTP API and one MCP server each. Architectural intent: keep the daemon private by default (127.0.0.1:3210), allow optional exposure only with explicit user action and token auth, and ensure lifecycle guards prevent zombie processes or orphaned browser instances.
- **Process model**: One `docdexd` per repo; multi-repo access comes from running multiple daemons. No clustered multi-tenant mode (out of scope per PDR).
- **Planned singleton mode**: transition to one global daemon (`docdexd daemon`) with a lockfile at `~/.docdex/daemon.lock` and dynamic repo mounting; CLI pings and auto-starts the daemon as needed.
- **Default binding**: Bind to `127.0.0.1:3210` from `[server] http_bind_addr`. MCP enabled by default (`enable_mcp=true`), sharing the same bind/interface posture; override via `DOCDEX_ENABLE_MCP` or `--disable-mcp`.
- **Exposed mode**: `--expose` (or equivalent config override) permits non-localhost binding; requires token authentication provided via env/config. Token is enforced on HTTP and MCP requests when exposed; reject unauthenticated requests.
- **Startup validation**: `docdexd check` ensures the bind address is free, permissions on `global_state_dir` are valid, Ollama and headless Chrome are reachable, and MCP can start when spawn checks are enabled (`DOCDEX_CHECK_MCP_SPAWN=1`). Preflight mode forces MCP spawn checks when MCP is enabled.
- **Shutdown/guard rails**: Browser guard ensures headless Chrome is started/stopped cleanly; lock directories under `~/.docdex/state/locks/` prevent zombie Chrome processes. On panic/exit, ensure teardown routines run to avoid lingering processes.
- **Resource limits (relevant here)**: DB/index handles are closed on shutdown; Chrome fetch concurrency is bounded; timeouts apply (page load \~15s).
- **Security posture**: Local-only by default; zero telemetry; no paid APIs. When exposed, token auth is mandatory; otherwise reject. No other authentication modes are specified in PDR.
- **Observability**: PDR does not request additional logging/tracing specifics here beyond readiness checks and error surfacing on failed binds or missing dependencies.
Open Questions & Risks
- How is the expose token configured/rotated (env var vs config field) and is reload without restart needed?
- Should daemon refuse `--expose` when token missing/blank, and how is this surfaced to callers?
- Are there OS-specific bind constraints (e.g., Windows loopback) that need explicit handling?
- What is the expected behavior when MCP is disabled but HTTP is enabled (and vice versa) in exposed mode?
Verification Strategy
- `docdexd check` confirms bind availability, token presence when `--expose`, MCP readiness, Ollama/Chrome availability, and state directory permissions.
- Integration tests: start daemon on default localhost, assert HTTP/MCP reachable only locally; start with `--expose` \+ token, assert remote access works with token and fails without.
- Lifecycle tests: start/stop daemon repeatedly with web fetches to ensure Chrome processes are cleaned up; verify locks directory is empty after shutdown.
- Resource tests: repeated start/stop cycles confirm handles close and daemon remains responsive.
### Resource and Concurrency Controls
Architectural intent: keep a per-repo `docdexd` responsive on commodity machines by bounding repo footprint, browser usage, and external fetch rates while preventing resource bleed across repos.
**Repo lifecycle controls**
- Per-repo daemon manages a single repo; DB/index handles close on shutdown.
- Fingerprinted per-repo state under `state/repos/<fingerprint>/` ensures isolation; cross-repo access is rejected early when repo id/path missing.
- Daemon must return clear errors when repo context is missing or invalid.
**Browser and web fetch controls**
- Headless browser guarded by a lifecycle manager: bounded concurrency (configurable, tied to `[core].max_concurrent_fetches`/web scraper settings), per-page load timeout (default 15s), and teardown to avoid zombie processes; locks directory used to serialize guard state.
- Browser discovery supports Chrome/Chromium/Edge/Brave/Vivaldi on macOS/Windows; Playwright auto-installs Chromium when none is found and persists the resolved path.
- Discovery rate limits: DuckDuckGo HTML queries spaced ≥2s; fetch rate limit ≥1 req/sec per domain; backoff on HTTP errors; respect blocklist; cache reused to avoid redundant fetches.
- Scraper uses readability cleanup; caches raw HTML \+ cleaned JSON under `cache/web/` with TTL from config.
**Memory/CPU budgets**
- Idle daemon memory target \<100MB; indexing \<1GB (configurable). Hardware awareness informs model recommendations but does not change control surfaces here.
- Token budgeting enforces context mix priority (Memory \> Repo \> Library/Web) to prevent overflow; lowest-priority snippets dropped first with logging.
**Concurrency surface interactions**
- Waterfall orchestrator must honor `web_trigger_threshold` gate; web tier only invoked on low-confidence or explicit request, reducing unnecessary Chrome usage.
- MCP/CLI require repo selection; HTTP defaults to the daemon repo; prevents accidental cross-repo operations.
**Observability hooks**
- `docdexd check` validates Chrome availability, repo registry health, and config/state RW.
- Logging for rate-limit throttling, Chrome guard actions, and token-dropping decisions; metrics collection beyond logs not requested in PDR.
**Security posture**
- Localhost bind by default; exposing (`--expose`) requires token auth. Resource controls apply regardless of exposure; no telemetry or paid API usage.
**DevOps/scalability**
- No clustered multi-tenant mode; scale by running per-repo daemons and tuning fetch concurrency. State layout must remain upgrade-safe; Ollama installs remain prompt-based and Playwright browser installs are opt-out.
**Assumptions**
- Config provides knobs for fetch concurrency, rate delays, and timeouts; defaults match PDR.
- Cache directories are writable and shared across repos but ingestion remains per-repo to avoid cross-contamination.
**Open Questions & Risks**
- Should per-repo daemons expose more detailed resource telemetry for tuning?
- How to surface rate-limit/backoff status to clients (HTTP/MCP error codes vs. logs only)?
- Risk: misconfigured Chrome path or permissions could bypass guard and leave zombies; mitigation relies on `check` coverage.
**Verification Strategy**
- Unit/integration: Repo Manager handles concurrent access across per-repo daemons; verify handles closed cleanly.
- Rate-limit tests: enforce ≥2s DDG spacing and ≥1 req/sec per domain; ensure cache hits skip delays.
- Browser guard tests: spawn/fetch/timeout cycles without orphaned Chrome processes.
- Token budgeting tests: confirm lower-priority snippets are dropped first with logs emitted.
- `docdexd check`: validates Chrome, repo registry, state RW, and config defaults.
### Security and Privacy
Docdex enforces local-first, zero-cost operation with explicit controls when the daemon is exposed. Security posture is intentionally minimal: bind to localhost by default, require a token if remote exposure is enabled, and avoid any telemetry or paid/cloud dependencies.
- **Network exposure**: `docdexd` binds to `127.0.0.1:3210` by default. Running with `--expose` (or equivalent config) requires a token; HTTP and MCP requests must present it or are rejected. No multi-tenant daemons; one daemon per repo.
- **Authentication & authorization**: Single shared bearer-style token validated on all HTTP/MCP endpoints when exposed. No role model or per-repo ACLs in scope; all authorization is coarse-grained (token holder \= allowed). Token configured via env/config; no additional identity providers.
- **Data residency & locality**: All inference, embeddings, search, and state are local by default; no telemetry. Only zero-cost/open components (Ollama, Tantivy, DuckDuckGo HTML, headless Chrome, sqlite-vec, Tree-sitter) are permitted. Web access is gated (confidence-based or explicit) and cached locally. No cloud/vector DBs, no paid APIs.
- **Repo isolation**: Per-repo state under `~/.docdex/state/repos/<fingerprint>/` (indexes, memory.db, symbols.db, dag.db, impact_graph.json, libs\_index). Global caches (`cache/web`, `cache/libs`) are shared storage but ingested per repo without cross-contamination.
- **Process/browsing safeguards**: Headless Chrome guarded with locks and lifecycle checks to avoid zombie processes; rate limits enforced to reduce abuse risk. Locks directory under state for browser/process guards.
- **Configuration defaults**: Auto-created config favors privacy: localhost bind, Ollama provider, MCP enabled locally. Warnings if LLM provider differs from Ollama. Ollama installs remain prompt-based; Playwright browser installs are opt-out and can run via setup/auto-install.
- **Logging/observability**: PDR does not request telemetry; assume minimal local logs only. No remote log shipping described. Optional state logs via `DOCDEX_LOG_TO_STATE=1` write to `~/.docdex/state/logs/docdexd-<pid>.log`.
- **Dependencies**: Open-source/local-only; no paid keys. DuckDuckGo HTML for discovery; local Chrome for scraping; Ollama for LLM/embeddings.
Open Questions & Risks
- Should TLS be supported/required when `--expose`? Not specified.
- Token storage hardening (file perms, rotation) not defined.
- No per-repo auth/ACLs; token grants full access—acceptable given PDR scope?
- How to handle malformed/absent tokens on MCP surface (HTTP semantics vs MCP error codes)?
- Browser sandboxing/SELinux/AppArmor not described; potential hardening gap.
Verification Strategy
- `docdexd check` confirms localhost bind (unless `--expose`), token requirement when exposed, and absence of telemetry/cloud calls.
- Automated tests to assert all API/MCP calls fail without token when exposed and succeed with valid token.
- Tests to confirm no network egress occurs for local-only operations; web access only when triggered and cached locally.
- Repo isolation tests: ensure per-repo state separation without cross-repo leakage.
- Chrome guard tests: locks and cleanup prevent zombie processes.
### Observability and Health
Docdexd must surface readiness and dependency health so operators can trust local-first behavior without external telemetry. Observability centers on explicit checks and guarded lifecycles (especially headless Chrome), with clear failure surfaces instead of silent degradation.
**Operational Health Model**
- `docdexd check` runs at install/startup or on demand; validates config/state RW, Ollama reachability/models, Chrome availability, repo registry integrity, HTTP bind, MCP enablement, and enforces local bind unless `--expose` is set. Output: human-readable failures \+ actionable hints; non-zero exit on any failed prerequisite.
- Dependency validation: confirms provider-specific dependencies (Ollama binary/models when configured), confirms Chrome binary present/launchable in headless mode with page timeout guard; warns (not fails) if the provider is unknown.
- Browser guard: lifecycle locks under `state/locks/`; enforces teardown on exit/panic; caps concurrent Chrome sessions; rejects new sessions when caps hit with clear error; ensures no zombie Chrome processes remain post-operations.
- Repo readiness: `check` verifies per-repo state existence and permissions; unknown/unindexed repos return clear errors across CLI/HTTP/MCP.
- Waterfall guardrails: confidence gating before web fetch; respects rate limits; logs when escalations occur and when token budget forces snippet dropping (memory \> repo \> libs/web).
- Security posture: `check` confirms default bind `127.0.0.1` and token presence when `--expose` is used; rejects unauthenticated remote calls.
**Observability Surfaces**
- Logs: structured, human-readable; emit at least INFO for readiness and dependency checks, WARN for degradations (e.g., missing optional models), ERROR for failed prerequisites. No telemetry export; local only.
- Error surfaces: CLI/HTTP/MCP return repo-scoped, explicit messages (missing repo/index, missing models, offline web, Chrome not available, cap reached). No new interfaces beyond existing CLI/HTTP/MCP.
- Metrics/tracing: not requested in PDR; intentionally out of scope.
**Operations and Reliability**
- Start-up gate: daemon fails fast if `check` prerequisites are not met, preventing partial service.
- Resource discipline: bounded Chrome concurrency with timeouts; rejects work rather than hang.
- Offline-first: web dependence is optional; failures in web tier degrade gracefully to local responses with logs explaining the fallback.
**Assumptions**
- Operators run `docdexd check` during install/startup CI; logs retained locally.
- No external monitoring/telemetry is added beyond logs; acceptable per PDR.
- Locks directory is available and writable under `global_state_dir`.
**Open Questions & Risks**
- Should `check` optionally auto-clean zombie Chrome processes on detection vs. only fail?
- What is the default cap for concurrent Chrome sessions, and should it align with `max_concurrent_fetches`?
- Failure mode when `global_state_dir` is on slow/remote FS—do we warn or fail?
**Verification Strategy**
- Automated `docdexd check` must fail on missing Chrome/Ollama/model, missing locks dir, or unwritable state; inspect exit codes and log lines.
- Intentional Chrome crash test: confirm guard cleans up processes/locks.
- Concurrency test: exceed Chrome session cap and verify clear error and no zombie processes.
- Repo error handling: request with unknown repo over CLI/HTTP/MCP returns explicit, repo-scoped error.
- Web tier failure injection (network disabled): ensure local tier responds with logged WARN, not crash.
### Configuration Management
Configuration ensures `docdexd` starts with safe, local-first defaults, validates writable state paths, and guides hardware-aware model choices without adding new surfaces.
Defaults and creation
- Global config `~/.docdex/config.toml` auto-created on first run with localhost bind, Ollama-only LLM settings, and default thresholds (e.g., `web_trigger_threshold=0.7`).
- State root `~/.docdex/state/` structured as in PDR (per-repo `index`, `libs_index`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`; shared `cache/web`, `cache/libs`, `locks`). Paths derived from SHA256 of normalized repo path; reject any path not under the fingerprinted root.
Validation and safeguards
- On startup and via `docdexd check`, validate `global_state_dir` readability/writability, per-repo RW on demand, and that bindings remain on `127.0.0.1` unless explicitly exposed with token auth.
- Emit warning if `[llm].provider` is unrecognized or missing required settings; non-Ollama providers are permitted when configured.
- Validate HTTP bind address format, MCP enablement flag, and that Ollama base URL is reachable when configured.
- Ensure scraper/chrome settings exist but only report availability here; full browser lifecycle is covered elsewhere.
Hardware-aware model recommendations
- Detect RAM/VRAM at `llm-list`/`llm-setup` time; filter `llm_list.json` per thresholds: RAM \<8GB → ultra-light only; ≥16GB → default `phi3.5:3.8b`; ≥32GB with GPU → recommend `llama3.1:70b` if installed.
- Never auto-install models silently; only suggest pulls and update `[llm]` defaults upon explicit confirmation.
Scope boundaries
- No config is stored per repo beyond fingerprinted state layout; cross-repo/global overrides out of scope per PDR.
- No telemetry, paid providers, or cloud fallbacks introduced.
Open Questions & Risks
- How to surface granular permission errors on `global_state_dir` vs per-repo dirs without leaking host paths?
- What is the precise failure mode if the configured provider is unreachable (block startup vs warn)?
- Risk: misconfigured `--expose` without token enforcement; ensure config validation blocks this.
Verification Strategy
- Unit tests for config parsing/creation with missing file → defaults applied.
- Integration test: `docdexd check` fails on non-writable `global_state_dir`; warns on non-Ollama provider.
- Hardware detection tests: model recommendations align with RAM/VRAM thresholds and do not trigger pulls.
## Quality, Testing, and Risks {#quality,-testing,-and-risks}
Docdex v2.0 quality is enforced through phase-gated validation tied to local-first, zero-cost constraints and per-repo daemon isolation. Each gate proves the daemon can safely serve a repo with correct scoping, guarded web escalation, and required local dependencies (Ollama, headless Chrome) before advancing.
**Phase Gates**
- Phase 0: `docdexd check` validates config RW, state layout, Ollama/Chrome presence, repo registry, localhost bind, MCP enabled.
- Phase 1: `index --repo` builds per-repo source index; `chat --repo` answers from local snippets; `llm-list`/`llm-setup` functional with hardware-aware model guidance.
- Phase 2: Waterfall uses `web_trigger_threshold`; `web-search`, `web-fetch`, `web-rag` operate with DDG ≥2s spacing, ≥1s fetch delay, Chrome guarded/cleaned.
- Phase 2.1: `libs fetch --repo` detects Rust/Node/Python deps, caches, ingests into repo `libs_index`; chat grounded on ingested docs.
- Phase 3/3.5: `/v1/chat/completions` defaults to daemon repo, budgets tokens, streams; `memory_store/recall` isolated per repo.
- Phase 4: DAG nodes logged per session; `dag view --repo` renders text/DOT.
- Phase 5: Per-repo MCP server exposes repo-aware tools; errors clearly on unknown/unindexed repo.
- Phase 6: Symbols populated; impact API returns deps; `run-tests --repo` emits structured JSON; diff-aware summary produced.
- Phase 7: TUI repo switch, dashboard tabs, VSCode extension always passes `repo_path`.
**Test Coverage Focus**
- Isolation: per-repo state dirs under concurrency; reject missing/unknown repo; no cross-repo memory/index bleed.
- Local-first/no cost: no paid/external APIs beyond gated web; default localhost bind; token auth required when exposed.
- Waterfall correctness: confidence gate honored; source priority Memory \> Repo \> Library/Web; token budgeting enforces drop order with logging.
- Scraper safety: DDG spacing/backoff, fetch delay, cache TTL, readability cleanup, Chrome lifecycle guards, zero zombie processes.
- Library ingestion: dependency detection for Rust/Node/Python; cache reuse; libs treated as Tier-1.
- Performance: local search p95 \<50ms (\<20ms typical); resource caps respected (Chrome concurrency).
- Security: token required when `--expose`; reject unauthenticated HTTP/MCP calls; clear errors for missing models/Chrome.
**Open Questions & Risks**
- How to simulate adverse network (DDG throttling) within CI to validate backoff logic?
- Resource ceilings per hardware tier (RAM/VRAM) for concurrent repos beyond defaults need confirmation.
- Potential drift between CLI and HTTP/MCP repo routing semantics; need contract tests.
- Risk of token budget mis-sizing for large memory \+ libs contexts; requires guardrails and logging thresholds.
- Handling of partial dependency graphs (non-Rust/Node/Python ecosystems) not defined.
**Verification Strategy**
- Gate-by-gate acceptance using the Phase list above; block progression on failure.
- Automated integration tests per repo for: isolation, waterfall gate behavior, token auth when exposed, dependency detection and libs ingestion, Chrome lifecycle under stress.
- Performance benchmarks for local search latency and resource usage across 8+ concurrent repos.
- Fault injection: simulate missing Ollama/model, missing Chrome, slow/banned DDG responses.
- Security checks: enforce repo selection on all surfaces; ensure no paid API calls; localhost bind by default; token required when exposed.
### Phase Gates {#phase-gates}
Docdex advances through gated phases; each gate requires the preceding functionality to be demonstrably ready before enabling downstream features (local RAG → web waterfall → libs ingestion → unified API/memory → DAG → MCP → code intelligence → UI surfaces). Gates emphasize repo isolation, offline-first defaults, and deterministic promotion criteria.
**Gate Criteria by Phase**
- Phase 0 (Foundation): `docdexd check` passes; config/state RW validated; Ollama and headless Chrome availability confirmed.
- Phase 1 (Local RAG/Chat): `index --repo` builds per-repo Tantivy index; `chat --repo` serves answers from local snippets; `llm-list` hardware detection and `llm-setup` guidance functional.
- Phase 2 (Web Intelligence): Waterfall only triggers web when local confidence \< `web_trigger_threshold` or forced; `web-search`, `web-fetch`, `web-rag --repo` operate with DuckDuckGo HTML, headless Chrome, readability cleanup, enforced rate limits, and Chrome guard (no zombies).
- Phase 2.1 (Library Context): `libs fetch --repo` detects Rust/Node/Python deps, resolves docs URLs, scrapes, caches under `cache/libs`, ingests into per-repo `libs_index`; chat answers grounded in cached library docs.
- Phase 3/3.5 (Unified API \+ Memory): `/v1/chat/completions` defaults to the daemon repo (body/header/query optional), budgets tokens, streams via Ollama; per-repo `memory_store/recall` on `memory.db` with sqlite-vec embeddings; memory prioritized in context merge.
- Phase 4 (Reasoning DAG): Per-repo `dag.db` logging UserRequest/Thought/ToolCall/Observation/Decision; `dag view --repo <session_id>` renders text/DOT.
- Phase 5 (MCP): Per-repo MCP server exposes repo-aware tools (`docdex_search`, `docdex_web_research`, `docdex_memory_save/recall`); unknown/unindexed repo yields clear error.
- Phase 6 (Code Intelligence): Tree-sitter symbols for Rust/TypeScript/JavaScript/Python/Go/Java/C#/C/C++/PHP/Kotlin/Swift/Ruby/Lua/Dart stored in `symbols.db`; import graph impact API
`GET /v1/graph/impact?file=` returns schema-tagged inbound/outbound deps with explicit edge direction semantics;
`run-tests --repo --target` returns structured JSON; diff-aware RAG uses git diff \+ impact graph \+ memory.
- Phase 7 (UI Surfaces): TUI repo switcher via external `docdex-tui` binary; web dashboard + VSCode extension live in separate packages but target `/v1/chat/completions` and MCP, always passing `repo_path`.
**Scalability/Reliability/Security Notes**
- Scalability: Parallel repo operations come from multiple per-repo daemons; web tier rate limits and cache reuse guard against DDG/Chrome overload.
- Reliability: Browser guard lifecycle verified before web gate; token budgeting must prevent context overflow before unified API gate.
- Security: Default localhost bind enforced at each gate; `--expose` requires token; repo selection required on every surface; no paid/cloud calls.
**Observability/DevOps**
- Minimal logs: gate checks log failures for config, model presence, Chrome readiness, repo availability. Additional observability not requested in PDR.
**Assumptions**
- Local-only execution unless web fallback explicitly triggered or confidence gate trips.
- Cached library docs treated as Tier-1; global caches reused but ingested per repo to maintain isolation.
**Open Questions & Risks**
- What is the precise policy for handling partial gate failures (e.g., Chrome unavailable but local RAG passes)? Promote with warnings or block?
- How to surface rate-limit/backoff state to operators during web gate validation?
- Risk: multiple per-repo daemons could contend for shared caches or Chrome resources; needs test coverage.
- Risk: VSCode extension must reliably pass `repo_path`; missing arg could bypass repo scoping.
**Verification Strategy**
- Automated `docdexd check` for Phase 0; repeated on daemon startup.
- CLI acceptance per gate: index/chat (Phase 1), web-search/fetch/rag with rate-limit assertions (Phase 2), libs fetch with fixture deps (Phase 2.1), HTTP `/v1/chat/completions` and memory CRUD (Phase 3/3.5), `dag view` snapshots (Phase 4), MCP tool calls with and without valid repo (Phase 5), symbols \+ impact API \+ `run-tests` structured output (Phase 6), UI smoke tests for repo selection and chat wiring (Phase 7).
- Regression checks for repo isolation and no cross-repo data bleed at each gate.
### Test Coverage Focus {#test-coverage-focus}
Docdex v2.0 testing targets risk hot-spots: per-repo daemon isolation under concurrency, strict local-first behavior, waterfall gating correctness, scraper safety, and security posture tied to repo selection.
- Scope/Intent: Validate that phase-gated behaviors enforce local sovereignty and repo correctness; focus on concurrency isolation, gated web escalation, and secure surfaces (HTTP/MCP/CLI). No additional components beyond PDR surfaces.
- Coverage Priorities:
- Repo isolation: concurrent `docdexd` operations across ≥8 repos; ensure per-repo state (`index/`, `libs_index/`, `memory.db`, `symbols.db`, `dag.db`, `impact_graph.json`) stays isolated.
- Local-first behavior: daemon binds 127.0.0.1 by default; no paid/external APIs; Ollama-only inference; confirm `--expose` path enforces token auth.
- Waterfall gating: Tier-1 local search preferred; escalation only when confidence \< `web_trigger_threshold` or explicitly forced; token budget priority (Memory \> Repo \> Library/Web) preserved.
- Scraper safety: DuckDuckGo discovery delay ≥2s; fetch delay ≥1s/domain; Chrome lifecycle guarded (no zombies); cache reuse honored.
- Security with repo-required parameters: CLI/MCP require repo id/path; HTTP defaults to daemon repo; unknown/unindexed repos return clear errors; `--expose` requires token on HTTP/MCP.
- Phase gates: Phase 0–7 readiness checks per PDR; each gate blocks progression until preceding behaviors validated.
- Out of Scope: New surfaces/tech not in PDR; cloud telemetry/tests.
Open Questions & Risks
- How to simulate prolonged multi-repo churn across per-repo daemons without exceeding resource caps? (needs harness definition)
- Token auth test vectors for `--expose` not specified (strength/format).
- Headless Chrome availability in CI and deterministic timing for rate-limit tests.
- Confidence scoring fixtures for waterfall gate (ground truth/threshold tuning).
Verification Strategy
- Concurrency isolation: parallel `index`, `chat`, `libs fetch` across per-repo daemons; assert per-repo fingerprints and DB/index handles close cleanly.
- Local-first/security: `docdexd check` under `--expose` with/without token; assert bind address defaults; ensure no external paid calls (mock/deny outbound).
- Waterfall gating: instrument confidence scores; assert no web tier when score ≥ threshold; forced escalation path validated; token budget ordering via trace logs.
- Scraper safety: timed DDG/search/fetch calls with enforced delays; Chrome lifecycle hooks verified for teardown; cache hit/miss cases covered.
- Repo-required parameters: negative tests for missing/unknown repo across CLI/HTTP/MCP; expect clear error codes/messages.
### Risks and Mitigations {#risks-and-mitigations}
Architectural intent: enforce local-first, per-repo daemon invariants while preventing runaway processes, web throttling issues, context mis-budgeting, resource exhaustion, missing dependencies, hallucinated outputs, and unintended exposure. Controls attach to the Repo Manager, Waterfall orchestrator, ScraperEngine, config validator (`docdexd check`), and auth/binding logic; no new surfaces beyond PDR.
- **Zombie Chrome**: ScraperEngine runs headless Chrome under a guarded lifecycle with lockfiles in `~/.docdex/state/locks/`; start/stop wrapped to ensure teardown on exit/panic; `docdexd check` asserts Chrome availability and stale process absence.
- **DuckDuckGo throttling**: DiscoveryService enforces ≥2s between DDG searches and ≥1s fetch delay per domain; blocklist applied; caches reused (`cache/web`); HTTP error backoff before retry.
- **Context overflow**: Waterfall prompt assembler performs token budgeting (10% system, 20% memory, 50% repo/libs/web, 20% generation buffer). Fixed priority ordering (Memory \> Repo \> Library/Web); lowest-priority snippets dropped first with logging for traceability.
- **Resource exhaustion (browser)**: ScraperEngine bounds concurrent Chrome sessions; clear errors when caps reached instead of silent degradation.
- **Missing dependencies (Ollama/Chrome/models)**: `docdexd check` validates `ollama` availability, model presence, Chrome binary/path, and RW on `global_state_dir`. `llm-setup` offers guided install/pull instructions; no cloud fallback permitted.
- **Hallucinated APIs**: Library docs must be ingested via `libs fetch --repo`; prompts instruct model to rely on indexed repo/libs; Waterfall only escalates to web when below `web_trigger_threshold` or explicitly forced.
- **Security exposure**: HTTP/MCP bind to `127.0.0.1` by default; `--expose` requires token in config/env and is checked per request. No telemetry or paid APIs; reject unknown repo ids to avoid cross-repo leakage.
Open Questions & Risks
- How to detect and clean zombie Chrome when lockfiles are present but the PID is reused by another process? (needs precise PID/PPID validation)
- Should DDG backoff escalate to disabling web tier for the session after repeated 429s to prevent bans?
- What is the policy when token budgeting repeatedly drops memory context—log-only or user-visible warning?
- Failure mode if `global_state_dir` is on a slow/readonly FS: degrade gracefully or block startup?
Verification Strategy
- `docdexd check`: validate Chrome process guard, lock directory health, RW perms, Ollama reachability/models, and config correctness.
- Rate-limit tests: simulate rapid DDG queries and assert delays/backoff and cache hits.
- Token budgeting tests: crafted large context ensuring Memory \> Repo \> Library/Web ordering and logged drops.
- Resource cap tests: repeated start/stop cycles confirm handle closure; exceed Chrome concurrency to ensure bounded queue/error.
- Security tests: bind exposure requires token; reject unknown repo ids; verify no telemetry or paid API calls.