Enriches code chunks with authorship, timestamps, churn metrics, and task IDs extracted from commit history and git blame data.
Supports extracting GitHub task IDs from commit messages to provide context and linking between code and project issues.
Enables extraction of JIRA task IDs from commit messages to associate indexed code chunks with specific project tickets.
Integrates with Ollama for local, privacy-first embedding generation and semantic codebase search.
Supports OpenAI embedding models for semantic vectorization and high-performance code search.
Provides specialized Ruby AST-aware chunking to improve the accuracy and relevance of semantic search in Ruby codebases.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Tea Rags MCPsearch for where user authentication is implemented"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
This is a fork of
A high-performance Model Context Protocol (MCP) server for semantic search using Qdrant vector database. Optimized for fast codebase indexing and incremental re-indexing.
๐ Acknowledgments
Huge thanks to the qdrant/mcp-server-qdrant team and all contributors to the original project!
Special appreciation for:
๐ Clean and extensible architecture
๐ Excellent documentation and examples
๐งช Solid test coverage
๐ค Open-source spirit and MIT license
This fork is built on the solid foundation of your work. Thank you for your contribution to the community! ๐
โก Fork Highlights
Why tea-rags-mcp?
๐ Optimized embedding pipeline โ indexing and re-indexing takes minutes, not hours
๐ฅ 1000x faster deletions โ payload indexes make filter-based deletes instant
โก Parallel processing โ sharded snapshots, concurrent workers, batched operations
๐ฏ Smart batching โ automatic batch formation with backpressure control
๐ ๏ธ Production-ready โ auto-migration, checkpointing, resume from interruption
๐ Ruby AST Aware - supports ruby Ruby AST aware chunking
๐ด Why Fork?
Why a fork instead of PRs to the original?
I love to experiment. A lot. And fast. ๐งช
Coordinating changes with maintainers is the right thing to do, but it takes time: discussions, reviews, compromises, waiting. Sometimes an idea lives for a day, sometimes it turns into something useful.
A fork gives me freedom to try crazy ideas without fear of breaking someone else's project or wasting anyone's time reviewing something that might not even work.
For maintainers & contributors: If you find something useful here โ feel free to
cherry-pick it into upstream. No need to ask, MIT license covers it.
Questions? Reach me at: artk0re@icloud.com ๐ฌ
TL;DR: This is an experimental playground. Use at your own risk. For production, I recommend the original project.
โจ What's New in This Fork
Feature | Original | This Fork |
Snapshot storage | Single JSON file | ๐ Sharded storage (v3) |
Change detection | Sequential | โก Parallel (N workers) |
Hash distribution | โ | ๐ฏ Consistent hashing |
Merkle tree | Single level | ๐ณ Two-level (shard + meta) |
Concurrency control | Fixed | ๐๏ธ |
Delete operations | Filter scan | โก Payload index (1000x faster) |
Batch pipeline | Sequential | ๐ Parallel with backpressure |
๐ Sharded Snapshots (v3 format)
File hashes are stored across multiple shards instead of a single file:
Parallel read/write across shards
Atomic updates via directory swap
Checksum validation per shard
โก Parallel Change Detection
Change detection runs in parallel across all shards:
๐ฏ Consistent Hashing
When changing the number of workers, minimal files are redistributed:
4 โ 8 workers: ~50% files stay in place (vs ~25% with modulo)
Virtual nodes ensure even distribution
๐ณ Two-Level Merkle Tree
Fast "any changes?" check:
Compare meta root hash (single read)
If changed โ read only affected shards
๐ Future Improvements
Auto-detection of optimal concurrency based on CPU/IO
Compression for large shards
File locking for concurrent access
Features
Zero Setup: Works out of the box with Ollama - no API keys required
Privacy-First: Local embeddings and vector storage - data never leaves your machine
Code Vectorization: Intelligent codebase indexing with AST-aware chunking and semantic code search
Multiple Providers: Ollama (default), OpenAI, Cohere, and Voyage AI
Hybrid Search: Combine semantic and keyword search for better results
Semantic Search: Natural language search with metadata filtering
Incremental Indexing: Efficient updates - only re-index changed files
Git Blame Metadata: Enrich code with authorship, dates, churn metrics, and task IDs from commit history
Flexible Performance Tuning: Configurable batch sizes, concurrency, and pipeline parameters for maximum resource utilization
Smart Caching: Two-level cache (memory + disk) with content-hash invalidation for git blame and file snapshots
Configurable Prompts: Create custom prompts for guided workflows without code changes
Rate Limiting: Intelligent throttling with exponential backoff
Full CRUD: Create, search, and manage collections and documents
Flexible Deployment: Run locally (stdio) or as a remote HTTP server
API Key Authentication: Connect to secured Qdrant instances (Qdrant Cloud, self-hosted with API keys)
Git Blame Metadata
Each code chunk is enriched with aggregated signals from git blame:
Dominant author โ who wrote most lines in the chunk (for ownership questions)
All authors โ everyone who contributed to this code
Timestamps โ first created and last modified dates
Age in days โ how stale is the code
Commit count โ churn indicator (high = frequently changed = potentially problematic)
Task IDs โ automatically extracted from commit messages (JIRA, GitHub, Azure DevOps patterns)
This enables powerful filters: find code by author, find legacy code, find high-churn areas, trace code to tickets.
Flexible Performance Tuning
Every bottleneck is configurable via environment variables:
Layer | Variables | Purpose |
Embedding |
| GPU utilization, parallel requests |
Pipeline |
| Batch accumulation strategy |
Qdrant |
| Bulk operations throughput |
I/O |
| Parallel file reads |
The pipeline uses backpressure control โ if Qdrant or embeddings slow down, file processing automatically pauses to prevent memory overflow.
Smart Caching
Two-level caching minimizes redundant work:
Cache | Storage | Invalidation | Purpose |
Git blame | Memory (L1) + Disk (L2) | Content hash | Avoid re-running |
File snapshots | Sharded JSON | Merkle tree | Fast "any changes?" check for incremental indexing |
Collection info | Memory | TTL | Reduce Qdrant API calls |
Content-hash invalidation: Cache keys include file content hash, so changing a file automatically invalidates its cached blame data โ no stale data, no manual cache clearing.
Quick Start
Prerequisites
Node.js 22+
Podman or Docker with Compose support
Installation
Configuration
Add to Claude Code (recommended)
Remote Setup (HTTP transport)
โ ๏ธ Security Warning: When deploying the HTTP transport in production:
Always run behind a reverse proxy (nginx, Caddy) with HTTPS
Implement authentication/authorization at the proxy level
Use firewalls to restrict access to trusted networks
Never expose directly to the public internet without protection
Consider implementing rate limiting at the proxy level
Monitor server logs for suspicious activity
Start the server:
Configure client:
Using a different provider:
Restart after making changes.
See Advanced Configuration section below for all options.
Tools
Collection Management
Tool | Description |
| Create collection with specified distance metric (Cosine/Euclid/Dot) |
| List all collections |
| Get collection details and statistics |
| Delete collection and all documents |
Document Operations
Tool | Description |
| Add documents with automatic embedding (supports string/number IDs, metadata) |
| Natural language search with optional metadata filtering |
| Hybrid search combining semantic and keyword (BM25) search with RRF |
| Delete specific documents by ID |
Code Vectorization
Tool | Description |
| Index a codebase for semantic code search with AST-aware chunking |
| Search indexed codebase using natural language queries |
| Incrementally re-index only changed files (detects added/modified/deleted) |
| Get indexing status and statistics for a codebase |
| Delete all indexed data for a codebase |
Resources
qdrant://collections- List all collectionsqdrant://collection/{name}- Collection details
Configurable Prompts
Create custom prompts tailored to your specific use cases without modifying code. Prompts provide guided workflows for common tasks.
Note: By default, the server looks for prompts.json in the project root directory. If the file exists, prompts are automatically loaded. You can specify a custom path using the PROMPTS_CONFIG_FILE environment variable.
Setup
Create a prompts configuration file (e.g.,
prompts.jsonin the project root):See
prompts.example.jsonfor example configurations you can copy and customize.Configure the server (optional - only needed for custom path):
If you place prompts.json in the project root, no additional configuration is needed. To use a custom path:
Use prompts in your AI assistant:
Claude Code:
VSCode:
Example Prompts
See prompts.example.json for ready-to-use prompts including:
find_similar_docs- Semantic search with result explanationsetup_rag_collection- Create RAG-optimized collectionsanalyze_collection- Collection insights and recommendationsbulk_add_documents- Guided bulk document insertionsearch_with_filter- Metadata filtering assistancecompare_search_methods- Semantic vs hybrid search comparisoncollection_maintenance- Maintenance and cleanup workflowsmigrate_to_hybrid- Collection migration guide
Template Syntax
Templates use {{variable}} placeholders:
Required arguments must be provided
Optional arguments use defaults if not specified
Unknown variables are left as-is in the output
Code Vectorization (1.1)
Intelligently index and search your codebase using semantic code search. Perfect for AI-assisted development, code exploration, and understanding large codebases.
Features (1.1.1)
AST-Aware Chunking: Intelligent code splitting at function/class boundaries using tree-sitter
Multi-Language Support: 35+ file types including TypeScript, Python, Java, Go, Rust, C++, and more
Incremental Updates: Only re-index changed files for fast updates
Smart Ignore Patterns: Respects .gitignore, .dockerignore, and custom .contextignore files
Semantic Search: Natural language queries to find relevant code
Metadata Filtering: Filter by file type, path patterns, or language
Local-First: All processing happens locally - your code never leaves your machine
Quick Start (1.1.1)
1. Index your codebase:
2. Search your code:
3. Update after changes:
Usage Examples
Index a TypeScript Project
Search for Authentication Code
Search with Filters
Incremental Re-indexing
Check Indexing Status
Supported Languages
Programming Languages (35+ file types):
Web: TypeScript, JavaScript, Vue, Svelte
Backend: Python, Java, Go, Rust, Ruby, PHP
Systems: C, C++, C#
Mobile: Swift, Kotlin, Dart
Functional: Scala, Clojure, Haskell, OCaml
Scripting: Bash, Shell, Fish
Data: SQL, GraphQL, Protocol Buffers
Config: JSON, YAML, TOML, XML, Markdown
See configuration for full list and customization options.
Custom Ignore Patterns
Create a .contextignore file in your project root to specify additional patterns to ignore:
Best Practices
Index Once, Update Incrementally: Use
index_codebasefor initial indexing, thenreindex_changesfor updatesUse Filters: Narrow search scope with
fileTypesandpathPatternfor better resultsMeaningful Queries: Use natural language that describes what you're looking for (e.g., "database connection pooling" instead of "db")
Check Status First: Use
get_index_statusto verify a codebase is indexed before searchingLocal Embedding: Use Ollama (default) to keep everything local and private
Git Metadata Enrichment
Enrich code search with git history information. When enabled, each code chunk is annotated with authorship, modification dates, and task IDs from commit messages.
Enable git metadata:
What's captured (per chunk):
Signal | Description | Use Case |
| Author with most lines in chunk | "Find code written by John" |
| All authors who touched the chunk | Team attribution |
| Unix timestamp of latest change | "Code changed after 2024-01-01" |
| Unix timestamp of oldest change | Code origin tracking |
| Days since last modification | "Old code (>365 days)" |
| Number of unique commits | Churn indicator (high = frequently changed) |
| Extracted from commit messages | "Find code for TD-1234" |
| Most recent commit SHA | Audit trail |
Search with git filters:
Task ID extraction:
Task IDs are automatically extracted from commit summary lines:
Pattern | Example | Extracted |
JIRA/Linear |
|
|
GitHub |
|
|
Azure DevOps |
|
|
GitLab MR |
|
|
Algorithm details:
One
git blamecall per file (cached by content hash)Aggregated signals only โ no per-line storage overhead
Commit messages are NOT stored (only extracted task IDs)
Cache invalidates automatically when file content changes
L1 (memory) + L2 (disk) caching for performance
By Author
Question | Filters |
What code did John write? |
|
Who is the expert on the auth module? |
|
Who can help me understand this code? | Search โ find author with most contributions |
Whose code needs review from last week? |
|
Whose code changes most frequently? |
|
By Code Age
Question | Filters |
What code hasn't been touched in a while? |
|
What changed in the last week? |
|
What legacy code needs documentation? |
|
What was done in this sprint? |
|
What old code is still being used? |
|
Which components haven't been updated in a year? |
|
By Change Frequency (Churn)
Question | Filters |
What code is frequently rewritten? (problematic) |
|
Where are there many hotfixes? |
|
Which modules are most unstable? |
|
What needs refactoring? |
|
Where do bugs appear most often? |
|
By Task/Ticket ID
Question | Filters |
What code relates to JIRA-1234? |
|
What was done for GitHub issue #567? |
|
What code is linked to this requirement? |
|
Show everything related to feature X |
|
Which files were affected by this task? |
|
By Date Range
Question | Filters |
What changed after release 1.0? |
|
What code existed before the refactoring? |
|
What changed between releases? |
|
What was done in Q1 2024? |
|
Combined Queries
Question | Filters |
Complex code that hasn't changed and needs docs |
|
John's recent code in the payment module |
|
Old high-churn code (risk!) |
|
Code for a task that was frequently reworked |
|
What a specific author did for a task |
|
Legacy code in critical modules |
|
Recent changes in authentication |
|
Problematic areas in the last month |
|
Analytical Questions
Question | Approach |
Where has technical debt accumulated? |
|
What code needs test coverage? |
|
Who owns which module? | Group by |
What code lacks documentation? |
|
What needs code review? |
|
Performance
Typical performance with GPU-accelerated embeddings (Ollama + CUDA/Metal):
Codebase Size | Files | Indexing Time | Search Latency |
Small (10k LOC) | ~30 | ~5s | <100ms |
Medium (50k LOC) | ~150 | ~15s | <100ms |
Large (100k LOC) | ~300 | ~30s | <200ms |
Very Large (500k LOC) | ~1,500 | ~2min | <300ms |
Enterprise (3.5M LOC) | ~10k | ~10min | <500ms |
Note: Benchmarked with Ollama nomic-embed-text on RTX 4090 / Apple M-series. CPU-only embedding is 5-10x slower.
Examples
See examples/ directory for detailed guides:
Basic Usage - Create collections, add documents, search
Hybrid Search - Combine semantic and keyword search
Knowledge Base - Structured documentation with metadata
Advanced Filtering - Complex boolean filters
Rate Limiting - Batch processing with cloud providers
Code Search - Index codebases and semantic code search
Advanced Configuration
Environment Variables
Core Configuration
Variable | Description | Default |
| "stdio" or "http" | stdio |
| Port for HTTP transport | 3000 |
| Request timeout for HTTP transport (ms) | 300000 |
| "ollama", "openai", "cohere", "voyage" | ollama |
| Qdrant server URL | |
| API key for Qdrant authentication | - |
| Path to prompts configuration JSON | prompts.json |
Embedding Configuration
Variable | Description | Default |
| Model name | Provider-specific |
| Custom API URL | Provider-specific |
| Vector dimensions (auto-detected from model) | Auto |
| Texts per embedding request (Ollama native batch) | 64 |
| Parallel embedding requests (for multiple GPUs) | 1 |
| Rate limit | Provider-specific |
| Retry count | 3 |
| Initial retry delay (ms) | 1000 |
| OpenAI API key | - |
| Cohere API key | - |
| Voyage AI API key | - |
Code Vectorization Configuration
Variable | Description | Default |
| Maximum chunk size in characters | 2500 |
| Overlap between chunks in characters | 300 |
| Enable AST-aware chunking (tree-sitter) | true |
| Number of chunks to embed in one batch | 100 |
| Additional file extensions (comma-separated) | - |
| Additional ignore patterns (comma-separated) | - |
| Default search result limit | 5 |
| Enrich chunks with git blame (author, dates, tasks) | false |
Qdrant Batch Pipeline Configuration
Variable | Description | Default |
| Auto-flush buffer interval (0 to disable timer) | 500 |
| Ordering mode: "weak", "medium", or "strong" | weak |
| Paths per delete batch (with payload index, larger is efficient) | 500 |
| Parallel delete requests (Qdrant-bound, not embedding-bound) | 8 |
Note: CODE_BATCH_SIZE controls both embedding batch size and Qdrant upsert buffer size for simplified configuration.
Delete Optimization (v4 schema): Collections created with schema v4+ have a relativePath payload index for fast filter-based deletes. Existing collections are auto-migrated on first reindex_changes call.
Performance & Debug Configuration
Variable | Description | Default |
| Max parallel file I/O operations during cache sync | 50 |
| Enable debug timing logs ( | false |
Performance Tuning Notes:
MAX_IO_CONCURRENCY: Controls parallel file reads duringreindex_changes. For MacBook with NVMe SSD, 50-100 is optimal. Too high (500+) can saturate the kernel I/O scheduler.DEBUG: When enabled, logs detailed timing for cache initialization, shard processing, and pipeline stages.
Data Directories
The server stores data in ~/.qdrant-mcp/:
Directory | Purpose |
| Sharded file hash snapshots for incremental indexing |
| Debug logs when |
Snapshot Structure (v3):
Debug Logs:
When DEBUG=1, pipeline operations are logged to ~/.qdrant-mcp/logs/pipeline-<timestamp>.log:
Batch formation and processing times
Queue depth and backpressure events
Embedding and Qdrant call durations
Fallback triggers and error details
Provider Comparison
Provider | Models | Dimensions | Rate Limit | Notes |
Ollama |
| 768, 768, 1024 | None | Local, no API key |
OpenAI |
| 1536, 3072 | 3500/min | Cloud API |
Cohere |
| 1024 | 100/min | Multilingual support |
Voyage |
| 1024, 1536 | 300/min | Code-specialized |
Recommended: Jina Code Embeddings
For code search, we recommend jina-embeddings-v2-base-code over the default nomic-embed-text:
Why Jina Code Embeddings?
Aspect | Benefit |
Code-optimized | Trained specifically on source code, understands syntax and semantics |
Multilingual | 30+ programming languages with consistent quality |
Enterprise-proven | Battle-tested on 3.5M+ LOC codebases with excellent search relevance |
Same dimensions | 768 dimensions โ drop-in replacement for |
Note: Ollama models require pulling before use:
Podman:
podman exec ollama ollama pull <model-name>Docker:
docker exec ollama ollama pull <model-name>
Troubleshooting
Issue | Solution |
Qdrant not running |
|
Collection missing | Create collection first before adding documents |
Ollama not running | Verify with |
Model missing |
|
Rate limit errors | Adjust |
API key errors | Verify correct API key in environment configuration |
Qdrant unauthorized | Set |
Filter errors | Ensure Qdrant filter format, check field names match metadata |
Codebase not indexed | Run |
Slow indexing | Use Ollama (local) for faster indexing, or increase |
Files not found | Check |
Search returns no results | Try broader queries, check if codebase is indexed with |
Out of memory during index | Reduce |
Performance Tuning
Recommended Configurations
Optimal parameters depend on your hardware and deployment setup:
Remote Server (Qdrant + Ollama on separate host)
Best for: Dedicated GPU server, shared team infrastructure
MacBook M1 (8-core, 8GB+ RAM)
Best for: Light development, small-to-medium codebases (<50k files)
MacBook M3 Pro (12-core, 18GB+ RAM)
Best for: Professional development, medium codebases (<100k files)
MacBook M4 Max (16-core, 48GB+ RAM)
Best for: Large codebases, maximum local performance
Quick Diagnostic
Run the diagnostic benchmark to automatically find optimal parameters for your setup:
The diagnostic will test and recommend optimal values for:
EMBEDDING_BATCH_SIZE- texts per embedding API requestCODE_BATCH_SIZE- chunks per Qdrant upsertEMBEDDING_CONCURRENCY- parallel embedding requests
Understanding Results
Green bar (โโโโ): Performance close to best
Yellow bar: Slight degradation
Degradation detected: Batch size too large for GPU memory
Benchmark Files
File | Purpose |
| Quick auto-tuning (~30s) |
| Detailed EMBEDDING_BATCH_SIZE analysis |
| Detailed CODE_BATCH_SIZE analysis |
| Concurrency + batch size matrix |
| Sequential vs pipelined comparison |
| Qdrant wait/ordering options |
| Buffer size + auto-flush optimization |
Batch Pipeline Optimization
The server uses an accumulator pattern for efficient Qdrant upserts:
How it works:
Points are accumulated in a buffer until
CODE_BATCH_SIZEthresholdIntermediate batches use
wait=false(fire-and-forget) for speedFinal flush uses
wait=truefor consistencyAuto-flush timer prevents data from being stuck in buffer
Run the accumulator benchmark to find optimal settings:
Typical Optimal Values
Hardware | EMBEDDING_BATCH_SIZE | CODE_BATCH_SIZE |
CPU only | 32-64 | 128-256 |
GPU 4GB | 128-256 | 256-384 |
GPU 8GB+ | 512-1024 | 512-768 |
GPU 12GB+ | 1024-2048 | 768+ |
Development
Testing
Unit Tests (Mocked)
864 tests across test files with 97%+ coverage:
Unit Tests: QdrantManager (56), Ollama (41), OpenAI (25), Cohere (29), Voyage (31), Factory (43), Prompts (50), Transport (15), MCP Server (19)
Integration Tests (Mocked): Code indexer (56), scanner (15), chunker (24), synchronizer (42), snapshot (26), merkle tree (28)
CI/CD: GitHub Actions runs build, type-check, and tests on Node.js 22 LTS for every push/PR.
Real Integration Tests
233 tests across 18 modular test suites testing against real Qdrant and Ollama:
Test Suites:
Embeddings (single, batch, parallel)
Qdrant Operations (CRUD, filters, batch delete)
PointsAccumulator (batch pipeline)
File Indexing Lifecycle
Hash & Snapshot Consistency
Ignore Patterns
Chunk Boundaries & Line Numbers
Multi-Language Support
Ruby AST Chunking (Rails patterns)
Search Accuracy
Edge Cases
Batch Pipeline in CodeIndexer
Concurrent Operations
Parallel File Sync & Sharded Snapshots
Pipeline & WorkerPool
Schema Migration & Delete Optimization
ForceReindex & Parallel Indexing
Git Metadata Integration
Requirements: Running Qdrant (default: http://localhost:6333) and Ollama (default: http://localhost:11434).
Contributing
Contributions welcome! See CONTRIBUTING.md for:
Development workflow
Conventional commit format (
feat:,fix:,BREAKING CHANGE:)Testing requirements (run
npm test,npm run type-check,npm run build)
Automated releases: Semantic versioning via conventional commits - feat: โ minor, fix: โ patch, BREAKING CHANGE: โ major.
Acknowledgments
The code vectorization feature is inspired by and builds upon concepts from the excellent claude-context project (MIT License, Copyright 2025 Zilliz).
License
MIT - see LICENSE file.