semantic-code-mcp
Enables semantic indexing and searching of Python codebases by using tree-sitter to parse AST structures like functions, classes, and methods for precise code retrieval.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@semantic-code-mcpfind the logic for processing user authentication in /home/user/project"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
semantic-code-mcp
MCP server that provides semantic code search for Claude Code. Instead of iterative grep/glob, it indexes your codebase with embeddings and returns ranked results by meaning.
Supports Python, Rust, and Markdown — more languages planned.
How It Works
Claude Code ──(MCP/STDIO)──▶ semantic-code-mcp server
│
┌───────────────┼───────────────┐
▼ ▼ ▼
AST Chunker Embedder LanceDB
(tree-sitter) (sentence-trans) (vectors)Chunking — tree-sitter parses source files into functions, classes, methods, structs, traits, markdown sections, etc.
Embedding — sentence-transformers encodes each chunk (all-MiniLM-L6-v2, 384d)
Storage — vectors stored in LanceDB (embedded, like SQLite)
Search — hybrid semantic + keyword search with recency boosting
Indexing is incremental (mtime-based) and uses git ls-files for fast file discovery. The embedding model loads lazily on first query.
Installation
macOS / Windows
PyPI ships CPU-only torch on these platforms, so no extra flags are needed (~1.7GB install).
uvx semantic-code-mcpClaude Code integration:
claude mcp add --scope user semantic-code -- uvx semantic-code-mcpLinux
Without the--index flag, PyPI installs CUDA-bundled torch (~3.5GB). Unless you need GPU acceleration (you don't — embeddings run on CPU), use the command below to get the CPU-only build (~1.7GB).
uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcpClaude Code integration:
claude mcp add --scope user semantic-code -- \
uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp{
"mcpServers": {
"semantic-code": {
"command": "uvx",
"args": ["--index", "pytorch-cpu=https://download.pytorch.org/whl/cpu", "semantic-code-mcp"]
}
}
}On macOS/Windows you can omit the --index and pytorch-cpu args.
Updating
uvx caches the installed version. To get the latest release:
uvx --upgrade semantic-code-mcpOr pin a specific version in your MCP config:
claude mcp add --scope user semantic-code -- uvx semantic-code-mcp@0.2.0MCP Tools
search_code
Search code by meaning, not just text matching. Auto-indexes on first search.
Parameter | Type | Default | Description |
|
| required | Natural language description of what you're looking for |
|
| required | Absolute path to the project root |
|
|
| Maximum number of results |
Returns ranked results with file_path, line_start, line_end, name, chunk_type, content, and score.
index_codebase
Index a codebase for semantic search. Only processes new and changed files unless force=True.
Parameter | Type | Default | Description |
|
| required | Absolute path to the project root |
|
|
| Re-index all files regardless of changes |
index_status
Check indexing status for a project.
Parameter | Type | Default | Description |
|
| required | Absolute path to the project root |
Returns is_indexed, files_count, and chunks_count.
Configuration
All settings are environment variables with the SEMANTIC_CODE_MCP_ prefix (via pydantic-settings):
Variable | Default | Description |
|
| Where indexes are stored |
|
| Store index in |
|
| Sentence-transformers model |
|
| Enable debug logging |
|
| Enable pyinstrument profiling |
Pass environment variables via the env field in your MCP config:
{
"mcpServers": {
"semantic-code": {
"command": "uvx",
"args": ["semantic-code-mcp"],
"env": {
"SEMANTIC_CODE_MCP_DEBUG": "true",
"SEMANTIC_CODE_MCP_LOCAL_INDEX": "true"
}
}
}
}Or with Claude Code CLI:
claude mcp add --scope user semantic-code \
-e SEMANTIC_CODE_MCP_DEBUG=true \
-e SEMANTIC_CODE_MCP_LOCAL_INDEX=true \
-- uvx semantic-code-mcpTech Stack
Component | Choice | Rationale |
MCP Framework | FastMCP | Python decorators, STDIO transport |
Embeddings | sentence-transformers | Local, no API costs, good quality |
Vector Store | LanceDB | Embedded (like SQLite), no server needed |
Chunking | tree-sitter | AST-based, respects code structure |
Development
uv sync # Install dependencies
uv run python -m semantic_code_mcp # Run server
uv run pytest # Run tests
uv run ruff check src/ # Lint
uv run ruff format src/ # FormatPre-commit hooks enforce linting, formatting, type-checking (ty), security scanning (bandit), and Conventional Commits.
Releasing
Versions are derived from git tags automatically (hatch-vcs) — there's no hardcoded version in pyproject.toml.
git tag v0.2.0
git push origin v0.2.0CI builds the package, publishes to PyPI, and creates a GitHub Release with auto-generated notes.
Adding a New Language
The chunker system is designed to make adding languages straightforward. Each language needs:
A tree-sitter grammar package (e.g.
tree-sitter-javascript)A chunker subclass that walks the AST and extracts meaningful chunks
Steps:
uv add tree-sitter-mylangCreate src/semantic_code_mcp/chunkers/mylang.py:
from enum import StrEnum, auto
import tree_sitter_mylang as tsmylang
from tree_sitter import Language, Node
from semantic_code_mcp.chunkers.base import BaseTreeSitterChunker
from semantic_code_mcp.models import Chunk, ChunkType
class NodeType(StrEnum):
function_definition = auto()
# ... other node types
class MyLangChunker(BaseTreeSitterChunker):
language = Language(tsmylang.language())
extensions = (".ml",)
def _extract_chunks(self, root: Node, file_path: str, lines: list[str]) -> list[Chunk]:
chunks = []
for node in root.children:
match node.type:
case NodeType.function_definition:
name = node.child_by_field_name("name").text.decode()
chunks.append(self._make_chunk(node, file_path, lines, ChunkType.function, name))
# ... other node types
return chunksRegister it in src/semantic_code_mcp/container.py:
from semantic_code_mcp.chunkers.mylang import MyLangChunker
def get_chunkers(self) -> list[BaseTreeSitterChunker]:
return [PythonChunker(), RustChunker(), MarkdownChunker(), MyLangChunker()]The CompositeChunker handles dispatch by file extension automatically. Use BaseTreeSitterChunker._make_chunk() for consistent chunk construction. See chunkers/python.py and chunkers/rust.py for complete examples.
Project Structure
src/semantic_code_mcp/chunkers/— language chunkers (base.py,composite.py,python.py,rust.py,markdown.py)src/semantic_code_mcp/services/— IndexService (scan/chunk/index), SearchService (search + auto-index)src/semantic_code_mcp/indexer.py— embed + store pipelinedocs/decisions/— architecture decision recordsTODO.md— epics and planningCHANGELOG.md— completed work (Keep a Changelog format).claude/rules/— context-specific coding rules for AI agents
License
MIT
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/vrppaul/semantic-code-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server