Qdrant MCP Server

README.md•5.41 KiB

# Rate Limiting Learn how the server handles embedding provider API rate limits automatically with intelligent throttling and retry mechanisms. **Time:** 10-15 minutes | **Difficulty:** Beginner to Intermediate ## Why It Matters | Provider | Rate Limits | Notes | | -------------------- | --------------- | ------------------------------- | | **Ollama** (default) | None | Local processing, no limits! | | **OpenAI** | 500-10,000+/min | Based on tier (Free/Tier 1/2/3) | | **Cohere** | ~100/min | Varies by plan | | **Voyage AI** | ~300/min | Varies by plan | Without rate limiting, batch operations with cloud providers can fail. **This is why Ollama is the default** - no rate limits to worry about! ## How It Works The server automatically: 1. **Throttles** - Queues API calls within limits 2. **Retries** - Exponential backoff (1s, 2s, 4s, 8s...) 3. **Respects Headers** - Follows provider retry guidance 4. **Provides Feedback** - Console shows retry progress ## Configuration ### Provider Defaults | Provider | Default Limit | Retry Attempts | Retry Delay | | --------- | ----------------- | -------------- | ----------- | | Ollama | 1000/min | 3 | 500ms | | OpenAI | 3500/min (Tier 1) | 3 | 1000ms | | Cohere | 100/min | 3 | 1000ms | | Voyage AI | 300/min | 3 | 1000ms | ### Custom Settings ```bash # Adjust for your provider tier EMBEDDING_MAX_REQUESTS_PER_MINUTE=500 # Free tier EMBEDDING_RETRY_ATTEMPTS=5 # More resilient EMBEDDING_RETRY_DELAY=2000 # Longer initial delay ``` ### Provider Examples **Ollama (Default):** ```bash EMBEDDING_PROVIDER=ollama EMBEDDING_BASE_URL=http://localhost:11434 # No rate limit config needed! ``` **OpenAI Free Tier:** ```bash EMBEDDING_PROVIDER=openai EMBEDDING_MAX_REQUESTS_PER_MINUTE=500 EMBEDDING_RETRY_ATTEMPTS=5 ``` **OpenAI Paid Tier:** ```bash EMBEDDING_PROVIDER=openai EMBEDDING_MAX_REQUESTS_PER_MINUTE=3500 # Tier 1 ``` ## Example: Batch Processing ``` # Create collection Create a collection named "rate-limit-test" # Add batch of documents (tests rate limiting) Add these documents to "rate-limit-test": - id: 1, text: "Introduction to machine learning algorithms", metadata: {"topic": "ml"} - id: 2, text: "Deep learning neural networks explained", metadata: {"topic": "dl"} - id: 3, text: "Natural language processing fundamentals", metadata: {"topic": "nlp"} - id: 4, text: "Computer vision and image recognition", metadata: {"topic": "cv"} - id: 5, text: "Reinforcement learning strategies", metadata: {"topic": "rl"} - id: 6, text: "Data preprocessing and feature engineering", metadata: {"topic": "data"} - id: 7, text: "Model evaluation and validation techniques", metadata: {"topic": "eval"} - id: 8, text: "Hyperparameter optimization methods", metadata: {"topic": "tuning"} - id: 9, text: "Transfer learning and fine-tuning", metadata: {"topic": "transfer"} - id: 10, text: "Ensemble methods and boosting", metadata: {"topic": "ensemble"} # Search Search "rate-limit-test" for "neural networks and deep learning" # Watch console for rate limit messages: # "Rate limit reached. Retrying in 1.0s (attempt 1/3)..." # "Rate limit reached. Retrying in 2.0s (attempt 2/3)..." # Cleanup Delete collection "rate-limit-test" ``` ## Retry Behavior ### Exponential Backoff With `EMBEDDING_RETRY_DELAY=1000`: | Attempt | Delay | Total Wait | | ------- | ----- | ---------- | | 1st | 1s | 1s | | 2nd | 2s | 3s | | 3rd | 4s | 7s | | 4th | 8s | 15s | ### Retry-After Header If provider sends `Retry-After` header (OpenAI): - Server uses exact delay - Ignores exponential backoff - Ensures optimal recovery ## Best Practices 1. **Match Your Tier** - Set `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to your provider's limit 2. **Check Dashboards** - Verify limits at provider's dashboard 3. **Start Conservative** - Lower limits, increase if needed 4. **Monitor Console** - Watch for retry messages 5. **Use Ollama** - For unlimited local processing ### Batch Operation Tips - **OpenAI**: Up to 2048 texts per request - **Cohere**: Batch support available - **Voyage AI**: Batch support available - **Ollama**: Sequential processing (one at a time) Server automatically uses batch APIs when available. ## Error Messages ``` # Success Successfully added 10 document(s) to collection "rate-limit-test". # Retry (Normal) Rate limit reached. Retrying in 2.0s (attempt 1/3)... # Action: None needed, automatic retry # Max Retries Exceeded (Rare) Error: API rate limit exceeded after 3 retry attempts. # Action: Wait, reduce EMBEDDING_MAX_REQUESTS_PER_MINUTE, check dashboard ``` ## Troubleshooting | Issue | Solution | | ---------------------- | -------------------------------------------------- | | Persistent rate limits | Reduce `EMBEDDING_MAX_REQUESTS_PER_MINUTE` by 20% | | Slow performance | Expected with rate limiting - better than failures | | Need faster processing | Upgrade provider tier or use Ollama | ## Next Steps - Explore [Knowledge Base](../knowledge-base/) for real-world usage patterns - Learn [Advanced Filtering](../filters/) for complex queries - Review [main README](../../README.md) for all configuration options

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mhalder/qdrant-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•5.41 KiB