docdex

Overview Schema Related Servers Score Discussions

docdex
docs

symbols_store.md•11.3 KiB

# Symbols store (optional code intelligence) Docdex can optionally extract per-file **symbols** during indexing and persist them in a repo-scoped symbols store. Downstream features (e.g. MCP clients) can query this store via a documented interface. This document describes: - How to enable/disable symbol extraction - Symbols store location and lifecycle - Query interfaces (MCP + internal Rust) - Payload schema/versioning and stable identifiers - How `ok` / `skipped` / `failed` outcomes are represented and how failures affect indexing ## Enablement (and default behavior) Symbol extraction is **always enabled** during indexing. - The legacy toggle `DOCDEX_ENABLE_SYMBOL_EXTRACTION` / `--enable-symbol-extraction` is deprecated and ignored (a warning is logged if set to false). - Indexing and search always populate `symbols.db` alongside the Tantivy index. - Impact graph extraction is tied to symbols and is always enabled as well. - AST extraction uses the same enablement path and is always enabled for supported languages. ## Store location and lifecycle ### State directory The symbols store lives under the **per-repo state root** (the parent of `index/`): - Default: `~/.docdex/state/repos/<repo_id>/` - Override: `--state-dir <path>` / `DOCDEX_STATE_DIR` - Relative paths are resolved under the repo root. - Absolute paths outside the repo are treated as shared bases and scoped under `<state-dir>/repos/<repo_id>/index`. - Legacy in-repo state (opt-in): `--state-dir .docdex/index` or `--state-dir .gpt-creator/docdex/index` ### Symbols store path and layout When enabled, the symbols store is a **SQLite database** at: `<repo-state-root>/symbols.db` Tables (v4): - `symbols_meta (key TEXT PRIMARY KEY, value TEXT)` - `schema_version` is stored here. - `parser_versions` stores a JSON map of Tree-sitter parser crate versions. - `parser_versions_previous` captures the prior parser version map when drift is detected. - `parser_versions_changed_at_ms` records when the parser version drift was detected. - `symbols_invalidation_reason` is set when symbols are cleared (e.g. `parser_versions_changed`). - `symbols_invalidated_at_ms` records when symbols were cleared due to drift. - `docdex_version` records the docdexd version that last wrote the store. - `symbols_files (file_path TEXT PRIMARY KEY, outcome_status TEXT, outcome_reason TEXT, outcome_error_summary TEXT, file_lang TEXT)` - Per-file extraction outcome metadata. - `symbols (id INTEGER PRIMARY KEY AUTOINCREMENT, file_path TEXT, symbol_id TEXT, name TEXT, kind TEXT, line_start INT, start_col INT, line_end INT, end_col INT, signature TEXT)` - `symbol_id` is optional in storage; it is computed if missing when reading. - Indexed by `file_path` for fast lookup. - Additional indexes on `name` and `kind` plus a `symbols_files(file_lang)` index are included for deterministic migrations. - `ast_files (file_path TEXT PRIMARY KEY, outcome_status TEXT, outcome_reason TEXT, outcome_error_summary TEXT, node_count INT, truncated INT, file_lang TEXT)` - Per-file AST extraction outcome metadata. - `ast_nodes (file_path TEXT, node_id INT, parent_id INT, kind TEXT, is_named INT, line_start INT, start_col INT, line_end INT, end_col INT)` - Stores Tree-sitter AST nodes for each file; primary key is `(file_path, node_id)`. - Indexed by `file_path` (and `kind`) for fast lookup. Legacy migration: - If a legacy `symbols.db/` directory exists (JSON files), Docdex moves it to `symbols.db.legacy*` and imports the JSON payloads into SQLite (best-effort). ### Lifecycle rules - Full reindex (`docdexd index`): - Docdex attempts to remove `<repo-state-root>/symbols.db` and recreate the SQLite schema. - If the reset fails, indexing continues; stale symbol records may remain on disk for paths that are no longer indexed. - Parser version drift: - If the stored Tree-sitter parser versions differ from the running build, Docdex clears `symbols` and `symbols_files` and records invalidation metadata in `symbols_meta` (reindex required). - The `symbols_reindex_required` flag is set when drift is detected and cleared after a full reindex. - Incremental ingest (`docdexd ingest` / watcher ingestion): - Docdex overwrites the per-file record for the ingested file. - File delete: - Docdex attempts to remove the per-file record for the deleted file; failures are logged and ignored. ## Query interfaces ### MCP tool: `docdex_symbols` Tool name aliases: `docdex_symbols` and `docdex.symbols`. Arguments: ```json { "path": "path/relative/to/repo.ext", "project_root": "/path/to/repo" } ``` Return value: - A `docdex.symbols` payload, as defined in `docs/contracts/code_intelligence_schema_v1.md`. Failure semantics (MCP JSON-RPC errors): - `missing_index`: no symbols record exists for that `path` (common before the first index run). - `invalid_path`: path is not a safe repo-relative path. See `docs/mcp/errors.md` for the common error envelope. ### MCP tool: `docdex_ast` Tool name aliases: `docdex_ast` and `docdex.ast`. Arguments: ```json { "path": "path/relative/to/repo.ext", "project_root": "/path/to/repo", "max_nodes": 20000 } ``` Return value: - A `docdex.ast` payload, as defined in `docs/contracts/code_intelligence_schema_v1.md`. ### HTTP endpoint: `GET /v1/symbols` Query: ``` GET /v1/symbols?path=path/relative/to/repo.ext ``` Repo context may be provided via `x-docdex-repo-id` header or `repo_id` query param (optional for per-repo daemons). Return value: - A `docdex.symbols` payload, as defined in `docs/contracts/code_intelligence_schema_v1.md`. Failure semantics (HTTP JSON errors): - `invalid_argument`: `path` is missing/empty or not a safe repo-relative path. - `missing_index`: no symbols record exists for the requested path. - `stale_index`: parser version drift invalidated symbols/AST; reindex required. ### HTTP endpoint: `GET /v1/ast` Query: ``` GET /v1/ast?path=path/relative/to/repo.ext&maxNodes=20000 ``` Return value: - A `docdex.ast` payload, as defined in `docs/contracts/code_intelligence_schema_v1.md`. Failure semantics (HTTP JSON errors): - `invalid_argument`: `path` is missing/empty or not a safe repo-relative path. - `missing_index`: no AST record exists for the requested path. - `stale_index`: parser version drift invalidated symbols/AST; reindex required. ### HTTP endpoint: `GET /v1/ast/search` Query: ``` GET /v1/ast/search?kinds=function_item,struct_item&mode=all&limit=50 ``` Parameters: - `kinds`: comma-separated list of Tree-sitter node kinds to match (required). - `mode`: `any` (default) or `all` (require all kinds per file). - `limit`: maximum files returned (default 50, server capped). Return value: - A `docdex.ast_search` payload listing files and match counts. Failure semantics (HTTP JSON errors): - `invalid_argument`: `kinds` is missing/empty or `mode` is unsupported. - `stale_index`: parser version drift invalidated symbols/AST; reindex required. ### HTTP endpoint: `GET /v1/symbols/status` Query: ``` GET /v1/symbols/status ``` Repo context may be provided via `x-docdex-repo-id` header or `repo_id` query param (optional for per-repo daemons). Return value: - A `docdex.symbols_status` payload describing Tree-sitter parser versions and drift metadata. - `parser_versions_changed`: `true` if parser versions changed since the previous run. - `requires_reindex`: `true` if symbols were invalidated and a full reindex is required. ### CLI command: `docdexd symbols-status` Example: ``` docdexd symbols-status --repo /path/to/repo ``` Returns the same payload as `GET /v1/symbols/status`. ### Rust interface (internal) Internal consumers can use the `SymbolsStore` API in `src/symbols.rs`: - `SymbolsStore::new(repo_root, state_dir) -> Result<SymbolsStore>` - `SymbolsStore::read_symbols(rel_path) -> Result<Option<SymbolsResponseV1>>` - `SymbolsStore::upsert_symbols(rel_path, payload) -> Result<()>` - `SymbolsStore::delete_symbols(rel_path) -> Result<()>` The store is repo-scoped via: - `repo_id_for_root(repo_root) -> Result<String>` ## Payload schema and versioning Each stored record is served as a `docdex.symbols` JSON payload: - The wire contract is defined in `docs/contracts/code_intelligence_schema_v1.md`. - The payload includes a top-level `schema` object: - `schema.name`: `docdex.symbols` - `schema.version`: currently `1` - `schema.compatible`: compatibility window (currently `{ "min": 1, "max": 1 }`) `SymbolsStore::read_symbols()` is tolerant of older/missing fields: - If `repo_id` or `file` are missing/empty, it fills them from the store context and the read path. - If `symbol_id` is missing/empty on any symbol, it is recomputed. - Symbols are sorted by `symbol_id` for deterministic outputs. ## Schema migrations (workflow) When changing the symbols store schema: - Bump `SYMBOLS_SCHEMA_VERSION` in `src/symbols.rs`. - Add a new `migrate_to_vN` step and register it in `SymbolsStore::migration_steps` (even if it is a no-op). - Add or update migration tests in `tests/symbols_schema_migration.rs` (upgrade + downgrade rejection). ## Stable identifiers ### `repo_id` `repo_id` is a SHA-256 hex digest derived from the repo root path after canonicalization and slash normalization. Assumption/implication: - Moving the same repo to a different absolute path will change `repo_id` (and thus `symbol_id` prefixes). ### `symbol_id` `symbol_id` is deterministic and stable for a given `repo_id`, `file`, and symbol location: `"<repo_id>:<file>#<start_line>:<start_col>-<end_line>:<end_col>:<kind>:<name>"` All position fields are **1-based** and refer to the repo-relative `file` path. ## Outcomes and failure semantics Symbol extraction is best-effort and **must not fail indexing**. For every indexed file, when symbol extraction is enabled, Docdex attempts to persist a `docdex.symbols` record with: - `symbols`: extracted symbols (may be empty) - `outcome`: per-file status and optional metadata ### Outcome statuses The `outcome.status` field is one of: - `ok`: extraction ran successfully (even if zero symbols were found). - `skipped`: extraction was intentionally not run (e.g. unsupported language). - `failed`: extraction attempted but failed (read failure or extraction error). ### Standard reasons (current implementation) Docdex currently uses these `outcome.reason` values: - `unsupported_language` (status: `skipped`) - `read_failed (<language>)` (status: `failed`) - `extract_failed (<language>)` (status: `failed`) `outcome.error_summary` is a bounded, best-effort string intended for debugging (no stack traces). ### Supported languages (current implementation) Symbol extraction is attempted only for: - Markdown (`.md`, `.markdown`, `.mdx`) - Rust (`.rs`) - Python (`.py`) - TypeScript/TSX (`.ts`, `.tsx`) - JavaScript/JSX (`.js`, `.jsx`) - Go (`.go`) - Java (`.java`) - C# (`.cs`) - C (`.c`, `.h`) - C++ (`.cc`, `.cpp`, `.cxx`, `.hh`, `.hpp`, `.hxx`) - PHP (`.php`) - Kotlin (`.kt`, `.kts`) - Swift (`.swift`) - Ruby (`.rb`) - Lua (`.lua`) - Dart (`.dart`) Other extensions are recorded as `skipped` with `reason: unsupported_language`. ### Indexing behavior on symbol failures If symbol extraction fails for a file/language: - Docdex still indexes the file’s text (or indexes an empty body if the file could not be read). - Docdex records a `failed` (or `skipped`) symbols `outcome` for the file. - The overall indexing run continues; symbol extraction outcomes are per-file and do not change indexing exit status.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bekirdag/docdex'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

symbols_store.md•11.3 KiB