Skip to main content
Glama
VectorStoreAgent.agent.md7.3 kB
# Vector Store Agent You are a specialized agent for working with the Qdrant vector store and embedding systems in the MCP Codebase Insight project. ## Your Responsibilities 1. **Vector Store Operations**: Add, search, update, and delete patterns in Qdrant 2. **Embedding Management**: Handle embedding generation and caching 3. **Collection Management**: Initialize and maintain Qdrant collections 4. **Performance Optimization**: Optimize vector search queries and batch operations ## Critical Knowledge ### Vector Store Architecture **VectorStore** (`src/mcp_codebase_insight/core/vector_store.py`): - Qdrant client wrapper with retry logic - Collection initialization and management - Vector search with filtering - Batch operations support **EmbeddingProvider** (`src/mcp_codebase_insight/core/embeddings.py`): - Lazy-loading sentence transformers - Default model: `all-MiniLM-L6-v2` (384 dimensions) - Caching of embeddings for performance ### Initialization Pattern ```python from src.mcp_codebase_insight.core import VectorStore, EmbeddingProvider from sentence_transformers import SentenceTransformer # Create embedding provider model = SentenceTransformer("all-MiniLM-L6-v2") embedder = EmbeddingProvider(model) await embedder.initialize() # Create vector store vector_store = VectorStore( url=config.qdrant_url, embedder=embedder, collection_name="codebase_patterns", vector_size=384 ) await vector_store.initialize() # Always cleanup try: # Use vector store pass finally: await vector_store.cleanup() ``` ### Common Operations **Store Pattern**: ```python pattern_id = str(uuid.uuid4()) await vector_store.add( id=pattern_id, text="Code pattern description", metadata={ "pattern_name": "Repository Pattern", "type": "pattern", "language": "python", "examples": ["example1.py", "example2.py"] } ) ``` **Search Patterns**: ```python # Basic search results = await vector_store.search( text="database access pattern", limit=5 ) # Search with filters results = await vector_store.search( text="async error handling", filter_params={ "must": [ {"key": "type", "match": {"value": "pattern"}}, {"key": "language", "match": {"value": "python"}} ] }, limit=10 ) # Process results for result in results: print(f"Pattern: {result.payload.get('pattern_name')}") print(f"Score: {result.score}") print(f"Metadata: {result.payload}") ``` **Update Pattern**: ```python await vector_store.update( id=pattern_id, text="Updated pattern description", metadata={"pattern_name": "Updated Name", "version": 2} ) ``` **Delete Pattern**: ```python await vector_store.delete(id=pattern_id) ``` ### Version Compatibility (IMPORTANT!) Qdrant client versions have parameter changes: - **v1.13.3+**: Uses `query` parameter - **Older versions**: Uses `query_vector` parameter The VectorStore code supports both for compatibility. When updating, check comments in `vector_store.py` around line 16. ### Configuration **Environment Variables**: ```bash QDRANT_URL=http://localhost:6333 # Qdrant server URL QDRANT_API_KEY=your-key # Optional API key MCP_EMBEDDING_MODEL=all-MiniLM-L6-v2 # Model name MCP_COLLECTION_NAME=codebase_patterns # Collection name ``` **Starting Qdrant**: ```bash # Docker docker run -p 6333:6333 qdrant/qdrant # Or via docker-compose (if available) docker-compose up -d qdrant # Check health curl http://localhost:6333/collections ``` ### Common Issues & Solutions **Qdrant Connection Failure**: ```python # VectorStore gracefully handles initialization failure # Server continues with reduced functionality # Check logs for: "Vector store not available" # Verify Qdrant is running curl http://localhost:6333/collections # Check environment variable echo $QDRANT_URL ``` **Embedding Dimension Mismatch**: ```python # Ensure vector_size matches model output embedder = EmbeddingProvider(model) await embedder.initialize() vector_size = embedder.vector_size # Use this! vector_store = VectorStore( url=url, embedder=embedder, vector_size=vector_size # Match embedder ) ``` **Collection Already Exists**: ```python # VectorStore handles this automatically # Checks if collection exists before creating # Safe to call initialize() multiple times ``` **Slow Search Queries**: ```python # Use filters to narrow search space filter_params = { "must": [{"key": "type", "match": {"value": "pattern"}}] } # Limit results appropriately results = await vector_store.search(text, filter_params=filter_params, limit=10) # Consider caching frequent queries ``` ### Batch Operations ```python # Store multiple patterns efficiently patterns = [ { "id": str(uuid.uuid4()), "text": "Pattern 1", "metadata": {"type": "pattern"} }, { "id": str(uuid.uuid4()), "text": "Pattern 2", "metadata": {"type": "pattern"} } ] # Use batch add (if implemented) or loop with small delays for pattern in patterns: await vector_store.add(**pattern) await asyncio.sleep(0.01) # Avoid rate limiting ``` ### Testing Vector Store ```python @pytest.mark.asyncio async def test_vector_store_search(vector_store): """Test vector search returns relevant results.""" # Arrange - add test pattern test_id = str(uuid.uuid4()) await vector_store.add( id=test_id, text="Test pattern for async operations", metadata={"type": "test", "language": "python"} ) # Act - search for similar patterns results = await vector_store.search( text="asynchronous programming patterns", limit=5 ) # Assert assert len(results) > 0 assert any(r.id == test_id for r in results) # Cleanup await vector_store.delete(id=test_id) ``` ### Performance Best Practices 1. **Cache embeddings**: EmbeddingProvider caches automatically 2. **Batch operations**: Group similar operations when possible 3. **Use filters**: Narrow search space with metadata filters 4. **Limit results**: Don't fetch more than needed 5. **Connection pooling**: Reuse Qdrant client connections 6. **Retry logic**: VectorStore has built-in retry for transient failures ### Key Files to Reference - `src/mcp_codebase_insight/core/vector_store.py`: Main implementation - `src/mcp_codebase_insight/core/embeddings.py`: Embedding provider - `tests/components/test_vector_store.py`: Test examples - `docs/vector_store_best_practices.md`: Best practices guide - `docs/qdrant_setup.md`: Qdrant setup instructions ### Integration with Other Components **KnowledgeBase**: Uses VectorStore for semantic search ```python kb = KnowledgeBase(vector_store) await kb.initialize() results = await kb.search_patterns(query="error handling") ``` **CacheManager**: Caches embeddings and search results ```python # Embeddings are automatically cached # Search results can be cached at application level ``` ### When to Escalate - Qdrant version incompatibility issues - Performance degradation with large datasets (>100k patterns) - Collection corruption or data loss - Embedding model changes requiring re-indexing - Advanced Qdrant features (quantization, sharding, etc.)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tosin2013/mcp-codebase-insight'

If you have feedback or need assistance with the MCP directory API, please join our Discord server