Skip to main content
Glama
RaziStuff

code-search

by RaziStuff

code-search

CI

Semantic + lexical code search as an MCP server. Agents query in natural language and get back ranked file:line ranges to read precisely.

Quickstart

Requires Node 22+ (for the built-in node:sqlite).

git clone https://github.com/RaziStuff/code-search-mcp.git
cd code-search-mcp
npm install      # builds automatically; first search downloads a ~90MB model once

Index a project and search it — the .code-index.db is written next to the code:

cd /path/to/your/project
node /path/to/code-search-mcp/dist/cli.js index .
node /path/to/code-search-mcp/dist/cli.js search "how is the request body parsed"

npm link puts a code-search command on your PATH so you can drop the long path. To let a coding agent search for you, see MCP server below.

Related MCP server: Satori

How it works

  • Chunking — syntax-aware via web-tree-sitter (function / class / method boundaries, with symbol names), heading-section chunks for markdown, and a line-window fallback otherwise.

  • Embeddings — local all-MiniLM-L6-v2 via transformers.js (384-dim, no API, no code leaves the machine).

  • Index — SQLite + sqlite-vec (.code-index.db). Incremental: only files whose content hash changed are re-embedded; deleted files are dropped. (A change to chunker/embedder logic needs a full rebuild — delete the .db — since incremental keys on file content, not code version.) Lockfiles, minified bundles, and source maps are skipped so they don't swamp results.

  • Ranking — hybrid: vector KNN fused with BM25 (FTS5) via reciprocal rank fusion, plus an extra-weighted exact-phrase list, a small code-over-prose nudge, and a test-file down-weight, so exact symbol/token matches and implementing code don't get lost behind embedding-friendly prose or their own test files. Each result's score is its true cosine similarity (0–1); when the top result is below CODE_SEARCH_MIN_SCORE (default 0.25) the response is flagged low-confidence so callers can detect "no good match". Confidence uses the best cosine in the result set, and results are hybrid-ranked (#1 = best overall), so the per-row cosine is a confidence annotation, not the sort key.

  • Freshness — optional chokidar watcher auto-reindexes on file changes.

Setup

cd code-search-mcp
npm install

Node 22+ required (built-in node:sqlite). First embed downloads the model (~90MB), cached locally.

CLI

npm run index ../some-project    # incremental re-index
npm run search "where are auth tokens validated"
npm run watch ../some-project    # index, then auto-reindex on changes

MCP server

CODE_SEARCH_ROOT=/path/to/project npm run serve
# add CODE_SEARCH_WATCH=1 to auto-reindex on file changes

Tools exposed: search_code (hybrid), reindex (incremental sync), and index_status. Register it with any MCP client — e.g. claude mcp add code-search -- node /abs/path/dist/server.js — or add a .mcp.json entry whose command/args point at dist/server.js. Set CODE_SEARCH_WATCH=1 in its env to auto-reindex on file changes.

Tests

npm test

Uses Node's built-in node:test runner via tsx (no extra deps). Store / indexer / watcher tests use a deterministic FakeEmbedder, so the suite runs in ~1s with no model download or network. Voyage is tested against a local mock HTTP server — no API key needed.

Retrieval quality is measured separately:

npm run eval

Runs a labeled query set (eval/*.jsonl) and reports hit@1 / hit@3 / MRR plus no-match accuracy — so ranking changes are measured, not eyeballed. Point it at any prebuilt index with EVAL_FILE=… EVAL_DB=/path/.code-index.db EVAL_SYNC=0.

Choosing an embedder

Default is local MiniLM (private, free). To use Voyage's code-tuned model:

export CODE_SEARCH_EMBEDDER=voyage
export VOYAGE_API_KEY=...           # required
# optional: VOYAGE_MODEL (voyage-code-3), VOYAGE_DIM (1024), VOYAGE_BASE_URL

Caveats: this sends your code to api.voyageai.com and costs per token. Switching embedders changes the vector dimension, which the store detects and wipes the index, forcing a full re-embed (i.e. every chunk is sent to Voyage on the next index/reindex). The local default sends nothing off-machine.

Version pin worth knowing

web-tree-sitter is pinned to 0.22.6 to match the prebuilt grammars in tree-sitter-wasms@0.1.13. Newer web-tree-sitter (0.25+) changed its WASM ABI and can't load those grammars. Bump both together or neither.

Still to come

  • ANN indexing when sqlite-vec ships it — search is currently an exhaustive (but fast, compiled-C) scan, fine into the tens of thousands of chunks.

F
license - not found
-
quality - not tested
-
maintenance - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RaziStuff/code-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server