Uses OpenAI's embedding models to convert text documents into vectors for semantic search, with automatic rate limiting and error handling for API requests.
Qdrant MCP Server
A Model Context Protocol (MCP) server providing semantic search capabilities using Qdrant vector database with multiple embedding providers.
Features
Zero Setup: Works out of the box with Ollama - no API keys required
Privacy-First: Local embeddings and vector storage - data never leaves your machine
Code Vectorization: Intelligent codebase indexing with AST-aware chunking and semantic code search
Multiple Providers: Ollama (default), OpenAI, Cohere, and Voyage AI
Hybrid Search: Combine semantic and keyword search for better results
Semantic Search: Natural language search with metadata filtering
Incremental Indexing: Efficient updates - only re-index changed files
Configurable Prompts: Create custom prompts for guided workflows without code changes
Rate Limiting: Intelligent throttling with exponential backoff
Full CRUD: Create, search, and manage collections and documents
Flexible Deployment: Run locally (stdio) or as a remote HTTP server
Quick Start
Prerequisites
Node.js 20+
Docker and Docker Compose
Installation
Configuration
Local Setup (stdio transport)
Add to ~/.claude/claude_code_config.json:
Remote Setup (HTTP transport)
⚠️ Security Warning: When deploying the HTTP transport in production:
Always run behind a reverse proxy (nginx, Caddy) with HTTPS
Implement authentication/authorization at the proxy level
Use firewalls to restrict access to trusted networks
Never expose directly to the public internet without protection
Consider implementing rate limiting at the proxy level
Monitor server logs for suspicious activity
Start the server:
Configure client:
Using a different provider:
Restart after making changes.
See Advanced Configuration section below for all options.
Tools
Collection Management
Tool | Description |
| Create collection with specified distance metric (Cosine/Euclid/Dot) |
| List all collections |
| Get collection details and statistics |
| Delete collection and all documents |
Document Operations
Tool | Description |
| Add documents with automatic embedding (supports string/number IDs, metadata) |
| Natural language search with optional metadata filtering |
| Hybrid search combining semantic and keyword (BM25) search with RRF |
| Delete specific documents by ID |
Code Vectorization
Tool | Description |
| Index a codebase for semantic code search with AST-aware chunking |
| Search indexed codebase using natural language queries |
| Incrementally re-index only changed files (detects added/modified/deleted) |
| Get indexing status and statistics for a codebase |
| Delete all indexed data for a codebase |
Resources
qdrant://collections- List all collectionsqdrant://collection/{name}- Collection details
Configurable Prompts
Create custom prompts tailored to your specific use cases without modifying code. Prompts provide guided workflows for common tasks.
Note: By default, the server looks for prompts.json in the project root directory. If the file exists, prompts are automatically loaded. You can specify a custom path using the PROMPTS_CONFIG_FILE environment variable.
Setup
Create a prompts configuration file (e.g.,
prompts.jsonin the project root):See
prompts.example.jsonfor example configurations you can copy and customize.Configure the server (optional - only needed for custom path):
If you place prompts.json in the project root, no additional configuration is needed. To use a custom path:
Use prompts in your AI assistant:
Claude Code:
VSCode:
Example Prompts
See prompts.example.json for ready-to-use prompts including:
find_similar_docs- Semantic search with result explanationsetup_rag_collection- Create RAG-optimized collectionsanalyze_collection- Collection insights and recommendationsbulk_add_documents- Guided bulk document insertionsearch_with_filter- Metadata filtering assistancecompare_search_methods- Semantic vs hybrid search comparisoncollection_maintenance- Maintenance and cleanup workflowsmigrate_to_hybrid- Collection migration guide
Template Syntax
Templates use {{variable}} placeholders:
Required arguments must be provided
Optional arguments use defaults if not specified
Unknown variables are left as-is in the output
Code Vectorization
Intelligently index and search your codebase using semantic code search. Perfect for AI-assisted development, code exploration, and understanding large codebases.
Features
AST-Aware Chunking: Intelligent code splitting at function/class boundaries using tree-sitter
Multi-Language Support: 35+ file types including TypeScript, Python, Java, Go, Rust, C++, and more
Incremental Updates: Only re-index changed files for fast updates
Smart Ignore Patterns: Respects .gitignore, .dockerignore, and custom .contextignore files
Semantic Search: Natural language queries to find relevant code
Metadata Filtering: Filter by file type, path patterns, or language
Local-First: All processing happens locally - your code never leaves your machine
Quick Start
1. Index your codebase:
2. Search your code:
3. Update after changes:
Usage Examples
Index a TypeScript Project
Search for Authentication Code
Search with Filters
Incremental Re-indexing
Check Indexing Status
Supported Languages
Programming Languages (35+ file types):
Web: TypeScript, JavaScript, Vue, Svelte
Backend: Python, Java, Go, Rust, Ruby, PHP
Systems: C, C++, C#
Mobile: Swift, Kotlin, Dart
Functional: Scala, Clojure, Haskell, OCaml
Scripting: Bash, Shell, Fish
Data: SQL, GraphQL, Protocol Buffers
Config: JSON, YAML, TOML, XML, Markdown
See configuration for full list and customization options.
Custom Ignore Patterns
Create a .contextignore file in your project root to specify additional patterns to ignore:
Best Practices
Index Once, Update Incrementally: Use
index_codebasefor initial indexing, thenreindex_changesfor updatesUse Filters: Narrow search scope with
fileTypesandpathPatternfor better resultsMeaningful Queries: Use natural language that describes what you're looking for (e.g., "database connection pooling" instead of "db")
Check Status First: Use
get_index_statusto verify a codebase is indexed before searchingLocal Embedding: Use Ollama (default) to keep everything local and private
Performance
Typical performance on a modern laptop (Apple M1/M2 or similar):
Codebase Size | Files | Indexing Time | Search Latency |
Small (10k LOC) | 50 | ~10s | <100ms |
Medium (100k LOC) | 500 | ~2min | <200ms |
Large (500k LOC) | 2,500 | ~10min | <500ms |
Note: Indexing time varies based on embedding provider. Ollama (local) is fastest for initial indexing.
Examples
See examples/ directory for detailed guides:
Basic Usage - Create collections, add documents, search
Knowledge Base - Structured documentation with metadata
Advanced Filtering - Complex boolean filters
Rate Limiting - Batch processing with cloud providers
Code Search - Index codebases and semantic code search
Advanced Configuration
Environment Variables
Core Configuration
Variable | Description | Default |
| "stdio" or "http" | stdio |
| Port for HTTP transport | 3000 |
| "ollama", "openai", "cohere", "voyage" | ollama |
| Qdrant server URL | |
| Path to prompts configuration JSON | prompts.json |
Embedding Configuration
Variable | Description | Default |
| Model name | Provider-specific |
| Custom API URL | Provider-specific |
| Rate limit | Provider-specific |
| Retry count | 3 |
| Initial retry delay (ms) | 1000 |
| OpenAI API key | - |
| Cohere API key | - |
| Voyage AI API key | - |
Code Vectorization Configuration
Variable | Description | Default |
| Maximum chunk size in characters | 2500 |
| Overlap between chunks in characters | 300 |
| Enable AST-aware chunking (tree-sitter) | true |
| Number of chunks to embed in one batch | 100 |
| Additional file extensions (comma-separated) | - |
| Additional ignore patterns (comma-separated) | - |
| Default search result limit | 5 |
Provider Comparison
Provider | Models | Dimensions | Rate Limit | Notes |
Ollama |
(default),
,
| 768, 1024, 384 | None | Local, no API key |
OpenAI |
(default),
| 1536, 3072 | 3500/min | Cloud API |
Cohere |
(default),
| 1024 | 100/min | Multilingual support |
Voyage |
(default),
,
| 1024, 1536 | 300/min | Code-specialized |
Note: Ollama models require docker exec ollama ollama pull <model-name> before use.
Troubleshooting
Issue | Solution |
Qdrant not running |
|
Collection missing | Create collection first before adding documents |
Ollama not running | Verify with
, start with
|
Model missing |
|
Rate limit errors | Adjust
to match your provider tier |
API key errors | Verify correct API key in environment configuration |
Filter errors | Ensure Qdrant filter format, check field names match metadata |
Codebase not indexed | Run
before
|
Slow indexing | Use Ollama (local) for faster indexing, or increase
|
Files not found | Check
and
patterns |
Search returns no results | Try broader queries, check if codebase is indexed with
|
Out of memory during index | Reduce
or
|
Development
Testing
422 tests (376 unit + 46 functional) with 98%+ coverage:
Unit Tests: QdrantManager (21), Ollama (31), OpenAI (25), Cohere (29), Voyage (31), Factory (32), MCP Server (19)
Functional Tests: Live API integration, end-to-end workflows (46)
CI/CD: GitHub Actions runs build, type-check, and tests on Node.js 20 & 22 for every push/PR.
Contributing
Contributions welcome! See CONTRIBUTING.md for:
Development workflow
Conventional commit format (
feat:,fix:,BREAKING CHANGE:)Testing requirements (run
npm test,npm run type-check,npm run build)
Automated releases: Semantic versioning via conventional commits - feat: → minor, fix: → patch, BREAKING CHANGE: → major.
Acknowledgments
The code vectorization feature is inspired by and builds upon concepts from the excellent claude-context project (MIT License, Copyright 2025 Zilliz).
License
MIT - see LICENSE file.
This server cannot be installed