semantic-code-mcp

README.md•8.06 KiB

# semantic-code-mcp MCP server that provides semantic code search for Claude Code. Instead of iterative grep/glob, it indexes your codebase with embeddings and returns ranked results by meaning. Supports **Python**, **Rust**, and **Markdown** — more languages planned. ## How It Works ``` Claude Code ──(MCP/STDIO)──▶ semantic-code-mcp server │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ AST Chunker Embedder LanceDB (tree-sitter) (sentence-trans) (vectors) ``` 1. **Chunking** — tree-sitter parses source files into functions, classes, methods, structs, traits, markdown sections, etc. 2. **Embedding** — sentence-transformers encodes each chunk (all-MiniLM-L6-v2, 384d) 3. **Storage** — vectors stored in LanceDB (embedded, like SQLite) 4. **Search** — hybrid semantic + keyword search with recency boosting Indexing is incremental (mtime-based) and uses `git ls-files` for fast file discovery. The embedding model loads lazily on first query. ## Installation ### macOS / Windows PyPI ships CPU-only torch on these platforms, so no extra flags are needed (~1.7GB install). ```bash uvx semantic-code-mcp ``` **Claude Code integration:** ```bash claude mcp add --scope user semantic-code -- uvx semantic-code-mcp ``` ### Linux > [!IMPORTANT] > Without the `--index` flag, PyPI installs CUDA-bundled torch (~3.5GB). Unless you need GPU acceleration (you don't — embeddings run on CPU), use the command below to get the CPU-only build (~1.7GB). ```bash uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp ``` **Claude Code integration:** ```bash claude mcp add --scope user semantic-code -- \ uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp ``` <details> <summary>Claude Desktop / other MCP clients (JSON config)</summary> ```json { "mcpServers": { "semantic-code": { "command": "uvx", "args": ["--index", "pytorch-cpu=https://download.pytorch.org/whl/cpu", "semantic-code-mcp"] } } } ``` On macOS/Windows you can omit the `--index` and `pytorch-cpu` args. </details> ### Updating `uvx` caches the installed version. To get the latest release: ```bash uvx --upgrade semantic-code-mcp ``` Or pin a specific version in your MCP config: ```bash claude mcp add --scope user semantic-code -- uvx semantic-code-mcp@0.2.0 ``` ## MCP Tools ### `search_code` Search code by meaning, not just text matching. Auto-indexes on first search. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `query` | `str` | required | Natural language description of what you're looking for | | `project_path` | `str` | required | Absolute path to the project root | | `limit` | `int` | `10` | Maximum number of results | Returns ranked results with `file_path`, `line_start`, `line_end`, `name`, `chunk_type`, `content`, and `score`. ### `index_codebase` Index a codebase for semantic search. Only processes new and changed files unless `force=True`. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `project_path` | `str` | required | Absolute path to the project root | | `force` | `bool` | `False` | Re-index all files regardless of changes | ### `index_status` Check indexing status for a project. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `project_path` | `str` | required | Absolute path to the project root | Returns `is_indexed`, `files_count`, and `chunks_count`. ## Configuration All settings are environment variables with the `SEMANTIC_CODE_MCP_` prefix (via pydantic-settings): | Variable | Default | Description | |----------|---------|-------------| | `SEMANTIC_CODE_MCP_CACHE_DIR` | `~/.cache/semantic-code-mcp` | Where indexes are stored | | `SEMANTIC_CODE_MCP_LOCAL_INDEX` | `false` | Store index in `.semantic-code/` within each project | | `SEMANTIC_CODE_MCP_EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | Sentence-transformers model | | `SEMANTIC_CODE_MCP_DEBUG` | `false` | Enable debug logging | | `SEMANTIC_CODE_MCP_PROFILE` | `false` | Enable pyinstrument profiling | Pass environment variables via the `env` field in your MCP config: ```json { "mcpServers": { "semantic-code": { "command": "uvx", "args": ["semantic-code-mcp"], "env": { "SEMANTIC_CODE_MCP_DEBUG": "true", "SEMANTIC_CODE_MCP_LOCAL_INDEX": "true" } } } } ``` Or with Claude Code CLI: ```bash claude mcp add --scope user semantic-code \ -e SEMANTIC_CODE_MCP_DEBUG=true \ -e SEMANTIC_CODE_MCP_LOCAL_INDEX=true \ -- uvx semantic-code-mcp ``` ## Tech Stack | Component | Choice | Rationale | |-----------|--------|-----------| | MCP Framework | FastMCP | Python decorators, STDIO transport | | Embeddings | sentence-transformers | Local, no API costs, good quality | | Vector Store | LanceDB | Embedded (like SQLite), no server needed | | Chunking | tree-sitter | AST-based, respects code structure | ## Development ```bash uv sync # Install dependencies uv run python -m semantic_code_mcp # Run server uv run pytest # Run tests uv run ruff check src/ # Lint uv run ruff format src/ # Format ``` Pre-commit hooks enforce linting, formatting, type-checking (`ty`), security scanning (`bandit`), and [Conventional Commits](https://www.conventionalcommits.org/). ### Releasing Versions are derived from git tags automatically (`hatch-vcs`) — there's no hardcoded version in `pyproject.toml`. ```bash git tag v0.2.0 git push origin v0.2.0 ``` CI builds the package, publishes to PyPI, and creates a GitHub Release with auto-generated notes. ### Adding a New Language The chunker system is designed to make adding languages straightforward. Each language needs: 1. **A tree-sitter grammar package** (e.g. `tree-sitter-javascript`) 2. **A chunker subclass** that walks the AST and extracts meaningful chunks Steps: ```bash uv add tree-sitter-mylang ``` Create `src/semantic_code_mcp/chunkers/mylang.py`: ```python from enum import StrEnum, auto import tree_sitter_mylang as tsmylang from tree_sitter import Language, Node from semantic_code_mcp.chunkers.base import BaseTreeSitterChunker from semantic_code_mcp.models import Chunk, ChunkType class NodeType(StrEnum): function_definition = auto() # ... other node types class MyLangChunker(BaseTreeSitterChunker): language = Language(tsmylang.language()) extensions = (".ml",) def _extract_chunks(self, root: Node, file_path: str, lines: list[str]) -> list[Chunk]: chunks = [] for node in root.children: match node.type: case NodeType.function_definition: name = node.child_by_field_name("name").text.decode() chunks.append(self._make_chunk(node, file_path, lines, ChunkType.function, name)) # ... other node types return chunks ``` Register it in `src/semantic_code_mcp/container.py`: ```python from semantic_code_mcp.chunkers.mylang import MyLangChunker def get_chunkers(self) -> list[BaseTreeSitterChunker]: return [PythonChunker(), RustChunker(), MarkdownChunker(), MyLangChunker()] ``` The `CompositeChunker` handles dispatch by file extension automatically. Use `BaseTreeSitterChunker._make_chunk()` for consistent chunk construction. See `chunkers/python.py` and `chunkers/rust.py` for complete examples. ### Project Structure - `src/semantic_code_mcp/chunkers/` — language chunkers (`base.py`, `composite.py`, `python.py`, `rust.py`, `markdown.py`) - `src/semantic_code_mcp/services/` — IndexService (scan/chunk/index), SearchService (search + auto-index) - `src/semantic_code_mcp/indexer.py` — embed + store pipeline - `docs/decisions/` — architecture decision records - `TODO.md` — epics and planning - `CHANGELOG.md` — completed work (Keep a Changelog format) - `.claude/rules/` — context-specific coding rules for AI agents ## License MIT

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vrppaul/semantic-code-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•8.06 KiB