Skip to main content
Glama

MCP Server

by hburgoyne
EMBEDDINGS.md10.2 kB
# Vector Embeddings Implementation ## Overview The Picard MCP system now includes comprehensive vector embeddings functionality for semantic search of memories. This implementation uses OpenAI's text-embedding-3-small model and PostgreSQL's pgvector extension for efficient similarity search. ## Features Implemented ### ✅ Core Infrastructure - **OpenAI API v1.x Integration**: Upgraded from legacy 0.28.0 to modern >=1.0.0 API - **Automatic Embedding Generation**: All memories automatically get vector embeddings upon creation/update - **Asynchronous Processing**: Embeddings are generated asynchronously for better performance - **Strict Error Handling**: Comprehensive error handling for OpenAI API failures ### ✅ Database Integration - **Vector Storage**: Uses pgvector extension with 1536-dimension vectors (text-embedding-3-small) - **Cosine Similarity Search**: Efficient semantic search using pgvector's native operators - **Permission-Aware Search**: Respects memory permissions during vector search - **Configurable Similarity Threshold**: Adjustable minimum similarity for search results ### ✅ API Implementation - **query_memory Tool**: New MCP tool for semantic search functionality - **Enhanced Memory Operations**: Automatic embedding generation during memory CRUD operations - **Scope-Based Access**: Requires appropriate OAuth scopes for embedding operations ## Architecture ### Embedding Generation Flow 1. **Memory Creation/Update**: ``` User submits memory → Text processed → OpenAI API call → Vector stored → Database commit ``` 2. **Semantic Search**: ``` User query → Generate query embedding → PostgreSQL vector search → Similarity ranking → Results ``` ### Database Schema The `memories` table includes: ```sql CREATE TABLE memories ( id UUID PRIMARY KEY, user_id UUID REFERENCES users(id), text TEXT NOT NULL, permission VARCHAR(20) DEFAULT 'private', embedding VECTOR(1536), -- pgvector column for embeddings created_at TIMESTAMP, updated_at TIMESTAMP, expiration_date TIMESTAMP ); -- Index for efficient vector similarity search CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops); ``` ## API Usage ### Query Memory Tool **Endpoint**: `/api/tools` **Method**: POST **Tool**: `query_memory` **Request Parameters**: - `query` (required): Search query text - `limit` (optional): Maximum number of results (default: 10) - `similarity_threshold` (optional): Minimum similarity score (default: 0.5) - `permission_filter` (optional): Filter by permission level **Example Request**: ```json { "tool": "query_memory", "data": { "query": "artificial intelligence thoughts", "limit": 5, "similarity_threshold": 0.7, "permission_filter": "private" } } ``` **Example Response**: ```json { "data": { "memories": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "text": "I think AI will revolutionize how we work and interact with technology...", "permission": "private", "similarity": 0.85, "created_at": "2024-01-15T10:30:00Z", "updated_at": "2024-01-15T10:30:00Z", "expiration_date": null }, { "id": "440e8400-e29b-41d4-a716-446655440001", "text": "Machine learning algorithms are becoming more sophisticated...", "permission": "private", "similarity": 0.72, "created_at": "2024-01-14T15:20:00Z", "updated_at": "2024-01-14T15:20:00Z", "expiration_date": null } ] } } ``` ### Enhanced Memory Operations All memory creation and update operations now automatically generate embeddings: **Submit Memory** - Automatically generates embedding: ```json { "tool": "submit_memory", "data": { "text": "This is my new memory about AI", "permission": "private" } } ``` **Update Memory** - Regenerates embedding if text changes: ```json { "tool": "update_memory", "data": { "memory_id": "550e8400-e29b-41d4-a716-446655440000", "text": "Updated memory content about AI applications" } } ``` ## Configuration ### Environment Variables Add to your `.env` file: ```bash # OpenAI Configuration OPENAI_API_KEY=your_openai_api_key_here # Optional: Override embedding model (default: text-embedding-3-small) EMBEDDING_MODEL=text-embedding-3-small ``` ### Dependencies The following packages have been added/updated: - `openai>=1.0.0` (upgraded from 0.28.0) - `numpy>=1.24.0` (for vector calculations) - `pgvector==0.2.1` (already present) ## Error Handling The implementation includes comprehensive error handling: ### OpenAI API Errors - **Rate Limiting**: Automatic retry with exponential backoff - **Authentication Errors**: Clear error messages for invalid API keys - **Service Unavailable**: Graceful degradation when OpenAI API is down - **Invalid Input**: Validation for empty or invalid text inputs ### Database Errors - **Vector Dimension Mismatch**: Validation of embedding dimensions - **Index Errors**: Graceful handling of vector index issues - **Connection Failures**: Proper error propagation for database issues ### Example Error Response ```json { "error": "Failed to generate embedding", "detail": "OpenAI API rate limit exceeded. Please try again in 60 seconds.", "error_code": "EMBEDDING_RATE_LIMIT" } ``` ## Performance Considerations ### Embedding Generation - **Asynchronous Processing**: Embeddings generated in thread pool to avoid blocking - **Batch Processing**: Future enhancement for bulk embedding generation - **Caching**: Consider implementing embedding cache for repeated queries ### Vector Search - **Index Optimization**: pgvector uses IVFFlat index for fast approximate search - **Query Optimization**: Similarity threshold helps limit result set size - **Memory Usage**: Vector storage adds ~6KB per memory (1536 floats × 4 bytes) ### Scaling Recommendations - **Connection Pooling**: Use connection pooling for high-traffic deployments - **Embedding Cache**: Implement Redis cache for frequently accessed embeddings - **Batch Updates**: Process multiple embeddings in single OpenAI API call ## Testing ### Unit Tests The implementation includes comprehensive unit tests for: - Embedding generation functions - Vector similarity calculations - Database vector operations - Error handling scenarios ### Integration Tests End-to-end tests verify: - Complete memory-to-embedding workflow - Semantic search functionality - Permission enforcement in vector search - Error handling across the stack ### Manual Testing Commands Generate test embedding: ```python from app.utils.embeddings import get_embedding_async import asyncio # Test embedding generation embedding = asyncio.run(get_embedding_async("test memory content")) print(f"Generated embedding with {len(embedding)} dimensions") ``` Test semantic search: ```python from app.utils.embeddings import search_memories_by_embedding import asyncio # Test semantic search (requires database session) results = asyncio.run(search_memories_by_embedding( db=db_session, user_id="user-uuid-here", query_embedding=test_embedding, limit=5 )) print(f"Found {len(results)} similar memories") ``` ## Future Enhancements ### Planned Features - **Batch Embedding Processing**: Process multiple memories in single API call - **Embedding Model Options**: Support for different OpenAI embedding models - **Hybrid Search**: Combine vector search with traditional text search - **Clustering**: Group similar memories using vector clustering algorithms ### Performance Optimizations - **Embedding Cache**: Redis cache for frequently accessed embeddings - **Incremental Updates**: Only regenerate embeddings when text significantly changes - **Background Processing**: Queue-based background embedding generation - **Vector Compression**: Explore dimension reduction techniques for storage efficiency ### Advanced Features - **Cross-User Search**: Search across public memories from multiple users - **Temporal Embeddings**: Weight embeddings by recency and relevance - **Personalized Search**: Adapt search results based on user preferences - **Multi-Modal Embeddings**: Support for image and audio content embeddings ## Troubleshooting ### Common Issues 1. **OpenAI API Key Invalid**: - Verify API key is correctly set in environment variables - Check API key has sufficient credits and permissions 2. **Vector Index Missing**: - Ensure pgvector extension is installed: `CREATE EXTENSION vector;` - Run database migrations: `alembic upgrade head` 3. **Embedding Dimension Mismatch**: - Verify using text-embedding-3-small model (1536 dimensions) - Check database column definition matches model output 4. **Performance Issues**: - Monitor OpenAI API rate limits - Check vector index statistics: `SELECT * FROM pg_stat_user_indexes WHERE relname = 'memories';` - Consider adjusting similarity threshold for faster queries ### Debug Commands Check pgvector installation: ```sql SELECT * FROM pg_extension WHERE extname = 'vector'; ``` Verify embedding storage: ```sql SELECT id, array_length(embedding, 1) as dim_count FROM memories WHERE embedding IS NOT NULL LIMIT 5; ``` Test vector similarity: ```sql SELECT id, text, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity FROM memories WHERE user_id = 'your-user-id' ORDER BY similarity DESC LIMIT 5; ``` ## Security Considerations ### Data Protection - **API Key Security**: OpenAI API keys stored securely in environment variables - **Vector Data**: Embeddings don't expose original text but enable similarity search - **Permission Enforcement**: Vector search respects memory permission levels - **Rate Limiting**: Implement rate limiting for embedding generation endpoints ### Privacy Implications - **Embedding Privacy**: Vector embeddings may leak some semantic information - **Cross-User Search**: Ensure public memory search doesn't expose private data - **Audit Logging**: Log embedding generation and search operations for compliance This implementation provides a solid foundation for semantic search while maintaining security and performance standards.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hburgoyne/picard_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server