Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
cross-encoder-reranking.md7.98 kB
# Cross-Encoder Reranking **Two-Stage Retrieval for Higher Accuracy** ## Overview Cross-encoder reranking is an optional Stage 2 retrieval step that improves search quality by re-scoring candidates with a more accurate (but slower) model. ### How It Works ``` ┌─────────────────────────────────────────────────────────────────┐ │ Two-Stage Retrieval │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Stage 1 (Fast) Stage 2 (Accurate) │ │ ───────────── ───────────────── │ │ │ │ ┌─────────────┐ ┌─────────────────┐ │ │ │ Vector+BM25 │ ──→ │ Cross-Encoder │ ──→ Top 10 │ │ │ RRF Fusion │ │ Reranking │ Results │ │ └─────────────┘ └─────────────────┘ │ │ ↓ ↓ │ │ 100 candidates Re-scored │ │ (fast lookup) (query+doc together) │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Why Cross-Encoder? **Bi-encoders** (embeddings) encode query and document separately: ``` query_embedding = model.encode(query) doc_embedding = model.encode(document) // Pre-computed! score = cosine(query_embedding, doc_embedding) ``` **Cross-encoders** encode them together: ``` score = model.cross_encode(query, document) // Sees interaction! ``` The cross-encoder can capture fine-grained semantic relationships that bi-encoders miss, but it's O(N) vs O(log N). ### ELI12 Imagine finding a book in a library: - **Stage 1 (Bi-encoder)**: Using the card catalog to find 100 potentially relevant books. Fast, but might miss nuances. - **Stage 2 (Cross-encoder)**: Actually reading each book's summary to pick the best 10. More accurate, but takes longer. ## Quick Start ### Enable via Search Options ```go opts := search.DefaultSearchOptions() opts.RerankEnabled = true opts.RerankTopK = 100 // Rerank top 100 candidates opts.RerankMinScore = 0.3 // Filter low-confidence results results, err := svc.Search(ctx, query, embedding, opts) ``` ### Configure the Cross-Encoder ```go // Configure cross-encoder service svc.SetCrossEncoder(search.NewCrossEncoder(&search.CrossEncoderConfig{ Enabled: true, APIURL: "http://localhost:8081/rerank", Model: "cross-encoder/ms-marco-MiniLM-L-6-v2", TopK: 100, Timeout: 30 * time.Second, MinScore: 0.0, })) ``` ## Configuration Options | Option | Default | Description | |--------|---------|-------------| | `Enabled` | `false` | Enable cross-encoder reranking | | `APIURL` | `http://localhost:8081/rerank` | Reranking service endpoint | | `APIKey` | `""` | Authentication token (if required) | | `Model` | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Model name | | `TopK` | `100` | How many candidates to rerank | | `Timeout` | `30s` | Request timeout | | `MinScore` | `0.0` | Minimum score threshold | ## Supported Reranking Services ### Cohere Rerank API ```go ce := search.NewCrossEncoder(&search.CrossEncoderConfig{ Enabled: true, APIURL: "https://api.cohere.ai/v1/rerank", APIKey: "your-api-key", Model: "rerank-english-v3.0", }) ``` ### HuggingFace Text Embeddings Inference (TEI) ```bash # Start TEI with reranking model docker run -p 8081:80 ghcr.io/huggingface/text-embeddings-inference:latest \ --model-id cross-encoder/ms-marco-MiniLM-L-6-v2 ``` ```go ce := search.NewCrossEncoder(&search.CrossEncoderConfig{ Enabled: true, APIURL: "http://localhost:8081/rerank", Model: "cross-encoder/ms-marco-MiniLM-L-6-v2", }) ``` ### Local Models (llama.cpp with reranking) ```go ce := search.NewCrossEncoder(&search.CrossEncoderConfig{ Enabled: true, APIURL: "http://localhost:8081/rerank", Model: "bge-reranker-base", }) ``` ## Response Format The cross-encoder integration supports multiple response formats: ### Cohere Format ```json { "results": [ {"index": 0, "relevance_score": 0.95}, {"index": 2, "relevance_score": 0.82}, {"index": 1, "relevance_score": 0.71} ] } ``` ### HuggingFace TEI Format ```json { "scores": [0.95, 0.71, 0.82] } ``` ### Simple Format ```json { "rankings": [ {"index": 0, "score": 0.95}, {"index": 2, "score": 0.82} ] } ``` ## Performance Considerations ### Latency Trade-offs | Method | Latency | Accuracy | |--------|---------|----------| | Vector only | ~5ms | Good | | RRF Hybrid | ~10ms | Better | | RRF + Cross-Encoder | ~50-200ms | Best | ### When to Use ✅ **Use cross-encoder when:** - Accuracy is more important than latency - Users are willing to wait for better results - Search volume is low to moderate - High-stakes decisions based on search ❌ **Skip cross-encoder when:** - Low latency is critical (<50ms) - High query volume (>1000 QPS) - Results are "good enough" with bi-encoders - Cost is a concern (API calls) ### Optimization Tips 1. **Limit TopK**: Rerank fewer candidates for faster response ```go opts.RerankTopK = 50 // Instead of 100 ``` 2. **Use MinScore**: Filter low-confidence results early ```go opts.RerankMinScore = 0.5 // Skip weak matches ``` 3. **Batch Requests**: Cross-encoder processes all candidates in one call 4. **Cache Results**: For repeated queries, cache the reranked results ## Combining with MMR Cross-encoder reranking can be combined with MMR diversification: ```go opts := search.DefaultSearchOptions() opts.MMREnabled = true opts.MMRLambda = 0.7 opts.RerankEnabled = true // Pipeline: Vector+BM25 → RRF → MMR → Cross-Encoder → Results ``` The search method will show: `rrf_hybrid+mmr+rerank` ## Monitoring Check if cross-encoder is available: ```go if svc.CrossEncoderAvailable(ctx) { log.Println("Cross-encoder ready") } ``` The search response includes the method used: ```json { "search_method": "rrf_hybrid+rerank", "message": "RRF + Cross-Encoder Reranking" } ``` ## Error Handling The cross-encoder gracefully falls back to original rankings on errors: - API timeout → Use original RRF scores - Server unavailable → Use original RRF scores - Invalid response → Use original RRF scores No error is returned to the caller - the search continues with best-effort results. ## Popular Cross-Encoder Models | Model | Size | Quality | Speed | |-------|------|---------|-------| | `cross-encoder/ms-marco-MiniLM-L-6-v2` | 22M | Good | Fast | | `cross-encoder/ms-marco-TinyBERT-L-6` | 14M | Good | Fastest | | `BAAI/bge-reranker-base` | 278M | Better | Medium | | `BAAI/bge-reranker-large` | 560M | Best | Slow | | `Cohere rerank-english-v3.0` | - | Best | API | ## Related Documentation - [Vector Search Guide](../user-guides/vector-search.md) - [Hybrid Search Guide](../user-guides/hybrid-search.md) - [RRF Algorithm](../user-guides/hybrid-search.md#rrf-algorithm) - [Search Evaluation](../advanced/search-evaluation.md) --- _Cross-Encoder Reranking v1.0 - December 2025_

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server