semantic-search-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@semantic-search-mcpfind authentication middleware code"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Semantic Search MCP Server
An MCP server that provides semantic code search using local embeddings. Search your codebase with natural language queries like "authentication middleware" or "database connection pooling".
Features
Hybrid search: Combines vector similarity (Jina code embeddings) with FTS5 keyword matching using Reciprocal Rank Fusion
165+ languages: Tree-sitter parsing for Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, PHP, and more
Incremental indexing: File watcher automatically detects additions, modifications, and deletions
Respects .gitignore: Honors your project's
.gitignorefiles (including nested ones)Auto-initialization: Model loads and codebase indexes in the background on server startup
Zero external APIs: All embeddings generated locally with FastEmbed
Related MCP server: Enterprise Code Search MCP Server
Installation
uv tool install semantic-search-mcpOr with pip:
pip install semantic-search-mcpOr run directly without installing:
uvx semantic-search-mcpQuick Start
Add to Claude Code
Option A: Project-level config (recommended)
After installing with uv tool install or pip install, create .mcp.json in your project root:
{
"mcpServers": {
"semantic-search": {
"command": "semantic-search-mcp"
}
}
}Option B: CLI
claude mcp add semantic-search -- semantic-search-mcpOption C: Without installing (ephemeral)
If you prefer not to install, use uvx to run in an ephemeral environment:
{
"mcpServers": {
"semantic-search": {
"command": "uvx",
"args": ["semantic-search-mcp"]
}
}
}Use
The server auto-initializes on startup.
Available Tools
Tool | Description |
| Search codebase with natural language |
| Get server state, progress, and statistics |
| Pause file watching (events discarded) |
| Resume file watching |
| Start full reindex (runs in background) |
| Cancel running indexing job |
| Wipe all indexed data |
| Add paths to ignore (session-only) |
| Remove paths from exclusion list |
How It Works
Indexing
On startup, the server:
Scans your codebase for supported file types
Parses code into semantic chunks (functions, classes, methods) using Tree-sitter
Generates embeddings for each chunk using Jina's code embedding model
Stores everything in a local SQLite database with vector search support
File Watching
The server monitors your codebase for changes in real-time:
Event | Action |
File created | Parsed, embedded, and added to index |
File modified | Re-indexed if content hash changed |
File deleted | Removed from index |
Changes are debounced (default 1s) to batch rapid modifications.
What Gets Indexed
Included:
Files with code extensions:
.py,.js,.ts,.tsx,.jsx,.go,.rs,.java,.c,.cpp,.h,.rb,.php,.swift,.kt,.scala, and more
Excluded:
Files matching
.gitignorepatterns (all.gitignorefiles in your project are respected)Common non-code directories:
node_modules,__pycache__,.venv,build,dist,.git,vendor, etc.Binary files and non-code file types
Configuration
Environment variables:
Variable | Default | Description |
|
| Index database location |
|
| Embedding model |
|
| Minimum relevance threshold (0-1) |
|
| File watcher debounce in milliseconds |
|
| Files per batch (reduce if running out of memory) |
|
| Skip files larger than this (KB) |
|
| Texts per embedding call (reduce if OOM) |
|
| ONNX runtime threads (higher = faster on multi-core) |
|
| Use INT8 quantized model (30-40% faster) |
Performance
GPU Acceleration
GPU acceleration is auto-detected and used when available:
Platform | Provider | Installation |
NVIDIA | CUDA |
|
Apple Silicon | CoreML | Automatic (M1/M2/M3) |
AMD | ROCm | Install ROCm-enabled onnxruntime |
Windows | DirectML | Install DirectML-enabled onnxruntime |
Alternative Models
For faster indexing (with quality tradeoffs), you can use a lighter model:
Model | Dimensions | Speed | Best For |
| 768 | Baseline | Code search (default) |
| 384 | ~10x faster | General text |
| 384 | ~32x faster | Speed priority |
To use an alternative model:
export SEMANTIC_SEARCH_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"Note: Changing models requires a full reindex (delete .semantic-search/ directory).
UniXcoder (Experimental)
Microsoft UniXcoder is a code-specific model pre-trained on code + AST + comments. It may provide better semantic understanding of code structure, but is substantially slower (~20x slower than Jina).
Model | Dimensions | Speed | Languages |
| 768 | ~20x slower | 6 (java, ruby, python, php, js, go) |
| 768 | ~20x slower | 9 (+ c, c++, c#) |
Installation (requires additional dependencies):
pip install semantic-search-mcp[unixcoder]Usage:
export SEMANTIC_SEARCH_EMBEDDING_MODEL="microsoft/unixcoder-base-nine"When to use UniXcoder:
You prioritize search quality over indexing speed
Your codebase is small to medium sized
You have GPU acceleration (CUDA or Apple Silicon MPS)
When to avoid UniXcoder:
Large codebases (10,000+ files) - indexing will take hours
You need fast initial indexing
Running on CPU without GPU acceleration
Claude Code Integration
Skills and commands are automatically installed when the MCP server first starts:
Skills →
~/.claude/skills/(AI auto-discovery)Commands →
~/.claude/commands/(user-invocable slash commands)
To manually reinstall or update:
semantic-search-mcp-install-skillsAvailable Slash Commands
Command | Description |
| Search codebase with natural language |
| Check server status and index stats |
| Trigger full codebase reindex |
| Cancel running indexing job |
| Wipe all indexed data |
| Pause file watcher |
| Resume file watcher |
Requirements
Python 3.11+
~700MB disk for embedding model (downloaded on first run, ~150MB with INT8 quantization)
~1GB RAM for embedding model
License
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/adam-hanna/semantic-search-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server