Provides structural indexing and AST parsing for JavaScript codebases, enabling AI agents to perform precise searches for code definitions and references.
Integrates with OpenAI's API to generate vector embeddings for semantic search, allowing agents to search codebases by intent and natural language.
Provides structural indexing and AST parsing for TypeScript codebases, enabling AI agents to perform precise searches for code definitions and references.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Headless Codebase IndexerFind all references to the processOrder function across the codebase"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Headless Codebase Indexer (MCP Server)
A minimalist, server-side codebase indexing tool built to give autonomous AI agents and language models the deep code-understanding capabilities typically reserved for visual IDEs like Cursor.
Why I built this: When transitioning manual IDE workflows into fully autonomous, server-hosted processes, I hit a wall: unsupervised agents running in the cloud lack structural context. They try to grep their way through massive codebases and end up hallucinating or blowing out their context windows.
Instead of deploying massive, heavy infrastructure, I wrote this lightweight tool to bridge that gap. It headlessly exposes both semantic meaning and strict AST structures via the Model Context Protocol (MCP), giving my automated agents the guardrails they need to navigate codebases securely and predictably.
โ๏ธ Core Architecture
๐ง Semantic Search: Fast vector embeddings via ChromaDB so agents can search by intent ("find authentication logic") rather than exact strings.
๐๏ธ Structural Search: Native AST / LSP parsing. Enables exact definitions, references, and full file token structures for TypeScript/JavaScript.
โฑ๏ธ Zero-Block Indexing: Background file ingestion so the event loop never freezes while your agents work.
๐ Live Cache Validation: A built-in watcher (
chokidar) instantly invalidates the AST cache when files change on disk.๐ Cloud / Local Ready: Runs locally via StdIO for desktop clients (Claude), or securely over HTTP (SSE) behind Bearer token auth for remote pipelines.
Getting Started
You will need the following dependencies:
Node.js (v18+)
An API Key: Set
OPENAI_API_KEY,GEMINI_API_KEY, orVOYAGE_API_KEYas an environment variable.ChromaDB: A local vector database to store the code embeddings.
To start ChromaDB via Docker:
Local Setup (e.g., Claude Desktop)
Since this is an MCP server, it is typically launched by your AI client rather than run manually.
To connect it to Claude Desktop, open your configuration file (~/Library/Application Support/Claude/claude_desktop_config.json on Mac or %APPDATA%\Claude\claude_desktop_config.json on Windows) and add this configuration:
Note: Replace
Cloud Setup (SSE / HTTP)
To host the indexer for remote agents, you can run it over HTTP by providing a PORT environment variable:
Remote agents can then connect securely using the Bearer token (your-secret-token) at http://localhost:3000/sse.
๐ ๏ธ Use Cases
Deterministic Refactoring I pass the agent a high-level task. It uses
semantic_searchto map the neighborhood (e.g., finding the "billing provider"), and then strictly enforcesget_referencesto track every upstream caller. It ensures cross-file edits are safe before opening a PR.Automated Code Reviews in CI/CD A pipeline agent semantically verifies new pull requests against our existing architectural patterns. It leverages structural lookups to definitively prove that upstream dependencies weren't silently broken.
Auditing Technical Debt Instead of manually tracing legacy code, I deploy a background worker. It pulls the
get_file_structureto outline massive legacy files, and traces deprecated API usage through exact AST definitions without ever running out of context.