# Semantic Search MCP Server
A local Model Context Protocol (MCP) server that enables AI agents to perform semantic search over codebases using natural language queries. The server converts queries into efficient text search patterns (grep/ripgrep) and verifies relevance before returning results.
## Quick Setup
### Installation
```bash
pip install -e .
```
### Environment Variables
Set the following environment variables:
- `REPO_PATH` - Path to the repository to search (defaults to current directory)
- `SEARCHER_TYPE` - Searcher implementation to use (default: `sgr_gemini_flash_lite`)
**API Keys** (choose one based on your searcher type):
- For Claude-based searchers: `CLAUDE_API_KEY` or `ANTHROPIC_API_KEY`
- For Gemini-based searchers: `GOOGLE_API_KEY`, `GEMINI_API_KEY`, `AI_STUDIO`, or `VERTEX_AI_API_KEY`
- For OpenAI-based searchers: `OPENAI_API_KEY`
### Available Searchers
**SGR (Schema-Guided Reasoning) searchers** - Production-ready implementations:
- `sgr` / `sgr_gemini_flash_lite` - Default, recommended (Gemini Flash Lite)
- `sgr_gemini_flash` - SGR with Gemini Flash
- `sgr_gemini_pro` - SGR with Gemini Pro
- `sgr_gpt4o` - SGR with GPT-4o
- `sgr_gpt4o_mini` - SGR with GPT-4o Mini
**Note:** Other searcher types (`ripgrep_claude`, `agent_claude`, `agent_gemini_flash_lite`, etc.) are experimental implementations from earlier development phases and are not recommended for production use.
## Running the MCP Server
**Important:** The MCP server is not meant to be run directly in a terminal. It communicates via STDIO using JSON-RPC protocol and must be launched by an IDE or MCP client.
### Cursor Configuration
Add to your `cursor-mcp-config.json`:
```json
{
"mcpServers": {
"qure-semantic-search": {
"command": "/path/to/.venv/bin/qure-semantic-search-mcp",
"env": {
"REPO_PATH": "/path/to/your/repo"
}
}
}
}
```
After configuring, restart Cursor. The server will be automatically launched when you use the `semantic_search` tool in Cursor's AI chat.
**Note:** If you see JSON parsing errors when running the command directly in terminal, this is expected - the server requires an MCP client (like Cursor) to communicate with it via JSON-RPC protocol.
## Evaluation
### Running Evaluation
**Standard mode** (single run per query):
```bash
python -m eval.run_eval
```
**Stability mode** (10 runs per query to measure consistency):
```bash
python -m eval.run_eval --stability
```
**Stability mode with custom runs** (e.g., 20 runs per query):
```bash
python -m eval.run_eval --stability --runs 20
```
**Evaluate all searchers** (compares different searcher implementations):
```bash
python -m eval.run_all_searchers --stability
```
**Additional options:**
- `--verbose` / `-v` - Print detailed per-query statistics
- `--single-dataset` - Use only main dataset (exclude easy dataset)
- `--output <path>` - Export results to JSON file
### Datasets
The evaluation uses two datasets:
1. **Main dataset** (`data/dataset.jsonl`) - 12 challenging examples across different codebases (Django, Gin, CodeQL, QGIS, etc.) with non-trivial queries where simple keyword matching fails.
2. **Easy dataset** (`data/dataset_easy.jsonl`) - 14 simpler examples designed for faster evaluation and testing. These queries are more straightforward but still require semantic understanding.
By default, both datasets are used together (26 queries total). Use `--single-dataset` to evaluate only the main dataset.
## Metrics
For detailed metric definitions and mathematical proof of perfection, see **[METRICS_LOGIC.md](METRICS_LOGIC.md)**.
**Quick Summary:**
- **Precision@K** = TP / (TP + FP) - Fraction of returned results that are relevant
- **Recall@K** = TP / (TP + FN) - Fraction of all relevant items that were returned
- **F1@K** = Harmonic mean of Precision and Recall
- **File Discovery Rate** = Files Found / Files Expected
- **Substring Coverage** = Substrings Found / Substrings Required
**The Logic Test:** If all metrics score 1.0, the solution is mathematically perfect (see proof in METRICS_LOGIC.md).
See `eval/metrics.py` for detailed implementations.
## Performance Results
Evaluation results for `sgr_gemini_flash_lite` searcher (10 runs per query, 26 queries total):
### Overall Performance
| Metric | Value | Stability |
|--------|-------|-----------|
| **Precision@10** | 0.30 ± 0.38 | ⚠ High variance (CV=127%) |
| **Recall@10** | 0.31 ± 0.41 | ⚠ High variance (CV=133%) |
| **F1@10** | 0.29 ± 0.38 | ⚠ High variance (CV=130%) |
| **Success Rate@10** | 0.40 ± 0.46 | ⚠ High variance (CV=114%) |
| **File Discovery Rate** | 0.61 ± 0.40 | ⚠ Moderate variance (CV=66%) |
| **Substring Coverage** | 0.35 ± 0.39 | ⚠ High variance (CV=111%) |
| **Avg Latency** | 20.6s ± 7.9s | Range: 9.6s - 38.3s |
| **Stability Score** | 73.9% | 16/26 stable queries (61.5%) |
### Dataset Breakdown
#### Easy Dataset (14 examples)
- **Precision@10**: 0.40 ± 0.44
- **Recall@10**: 0.46 ± 0.49
- **F1@10**: 0.42 ± 0.45
- **File Discovery Rate**: 0.92 ± 0.13 ✓ (Good stability)
- **Avg Latency**: 15.0s ± 4.8s
- **Stability Score**: 85.9% ✓ (Good stability)
#### Main Dataset (12 examples)
- **Precision@10**: 0.17 ± 0.25
- **Recall@10**: 0.13 ± 0.18
- **F1@10**: 0.14 ± 0.20
- **File Discovery Rate**: 0.26 ± 0.30
- **Avg Latency**: 27.2s ± 5.3s
- **Stability Score**: 60.0% ⚠ (Moderate stability)
### Notes
- **High variance** in metrics is expected due to LLM non-determinism and the complexity of semantic search queries
- **File Discovery Rate** shows better stability, especially on easier queries (92% success rate)
- **Latency** varies significantly (9-38s) depending on query complexity and codebase size
- Results are evaluated on non-trivial queries where simple keyword matching fails
## Project Structure
- `src/` - Core MCP server and searcher implementations
- `eval/` - Evaluation scripts and metrics
- `data/` - Evaluation dataset and test repositories
- `scripts/` - Utility scripts for testing and debugging
## Documentation
- **[METRICS_LOGIC.md](METRICS_LOGIC.md)** - Mathematical justification for metric selection and proof of perfection
- **[KNOWN_ISSUES.md](KNOWN_ISSUES.md)** - Current limitations, known problems, and workarounds
- **[FUTURE_ROADMAP.md](FUTURE_ROADMAP.md)** - Planned improvements and mitigation strategies