Qdrant RAG MCP Server

search-scoring-analysis.md•4.79 KiB

# Search Scoring Analysis ## Executive Summary Our search system is producing lower-than-expected scores (0.5-0.6 range) when searching for exact terms that should yield high confidence matches (0.8-0.9 range). This analysis examines the root causes and proposes solutions. ## Current Scoring Behavior ### Example Query: "bm25_manager search" **Expected**: High scores (0.8+) for exact matches of these terms **Actual**: Scores in 0.5-0.6 range #### Score Breakdown: - **Vector Search Only**: 0.60-0.68 - **Keyword Search Only**: 1.0, 0.5, 0.333 (reciprocal rank scores) - **Hybrid Search**: 0.603 (after linear combination and enhancement) ## Root Causes ### 1. Chunking Strategy Limitations **Issue**: Code is split into small chunks that separate related concepts - Classes are separated from their methods - Methods are separated from their implementations - Context is lost between chunks **Example**: ``` Chunk 1: class BM25Manager Chunk 2: def search(self, ...) ``` These should be together for "bm25_manager search" to match well. ### 2. BM25 Scoring Limitations **Issue**: BM25 returns reciprocal rank scores (1/rank) not true relevance scores - Rank 1: 1.0 - Rank 2: 0.5 - Rank 3: 0.333 **Problem**: These don't align well with vector similarity scores (0.0-1.0 continuous) ### 3. Vector Similarity Gaps **Issue**: Generic embeddings don't understand code semantics well - "bm25_manager" → "BM25Manager" similarity is only ~0.68 - Camel case, underscores, and code conventions reduce similarity - Generic models weren't trained on code structure ### 4. Linear Combination Weights **Current**: 0.7 vector + 0.3 BM25 **Issue**: These weights may not be optimal for code search where exact matches matter more ### 5. Enhanced Ranking Penalties **Issue**: Additional signals sometimes reduce scores rather than boost them - Code structure: 0.3 for class definitions (low) - Recency: 0.0 for older files - File proximity: Only helps if searching from within same file ## Impact on Search Quality 1. **Exact matches don't surface well**: Terms that should be obvious matches get mediocre scores 2. **User confidence is reduced**: Scores of 0.5-0.6 suggest uncertain matches 3. **Progressive search suffers**: Low base scores mean progressive context shows weak matches 4. **Ranking is less reliable**: Similar scores make it hard to distinguish quality ## Score Pipeline Analysis ``` 1. Vector Search: "bm25_manager search" → 0.68 2. BM25 Search: "bm25_manager search" → 0.5 (rank 2) 3. Linear Combination: (0.7 × 0.68) + (0.3 × 0.5) = 0.626 4. Enhanced Ranking: 0.626 → 0.603 (penalties applied) ``` Each step reduces confidence rather than reinforcing good matches. ## Recommendations ### Immediate Improvements (Low Effort) 1. **Adjust Linear Combination Weights** - For code search: 0.5 vector + 0.5 BM25 - For docs search: 0.8 vector + 0.2 BM25 - Make weights configurable per search type 2. **Normalize BM25 Scores** - Convert reciprocal rank to 0-1 scale: `score = 1 - (1 / (rank + 1))` - Or use actual BM25 scores if available from the retriever 3. **Tune Enhanced Ranking** - Increase base score weight to 0.6 - Make enhancement additive (bonus) rather than multiplicative ### Medium-term Improvements (Moderate Effort) 1. **Implement Smarter Chunking** - Keep classes with their methods - Keep function signatures with bodies - Use AST-aware chunking boundaries 2. **Add Exact Match Boosting** - If query terms appear exactly in content, boost score by 0.2 - Implement phrase matching in BM25 3. **Context-Aware Scoring** - If searching for code, prioritize code patterns - If searching for docs, prioritize natural language ### Long-term Improvements (High Effort) 1. **Code-Specific Embeddings** - Use CodeBERT or similar code-trained models - Fine-tune on your codebase 2. **Learning to Rank** - Collect user feedback on result quality - Train a model to predict good matches 3. **Semantic Code Understanding** - Parse code structure (imports, calls, definitions) - Build a code graph for better relevance ## Proposed Scoring Formula ```python # Immediate fix def improved_score(vector_score, bm25_rank, query, content): # Normalize BM25 bm25_score = 1.0 - (1.0 / (bm25_rank + 2)) # Exact match bonus exact_match_bonus = 0.2 if all(term in content.lower() for term in query.lower().split()) else 0 # Base combination base_score = (0.5 * vector_score) + (0.5 * bm25_score) # Add bonus final_score = min(1.0, base_score + exact_match_bonus) return final_score ``` ## Expected Outcomes With these improvements: - Exact matches: 0.8-0.95 scores - Good matches: 0.6-0.8 scores - Weak matches: 0.3-0.6 scores - Poor matches: <0.3 scores This aligns better with user expectations and makes scores more interpretable.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ancoleman/qdrant-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

search-scoring-analysis.md•4.79 KiB