MCP Memory Service

Overview Schema Related Servers Score Discussions

external-embeddings.md•4.97 KiB

# External Embedding API Support MCP Memory Service can use external OpenAI-compatible embedding APIs instead of running embedding models locally. This is useful for: - **Shared infrastructure**: Run a single embedding service for multiple MCP instances - **Resource efficiency**: Offload GPU/CPU intensive embedding to dedicated servers - **Model flexibility**: Use embedding models not available in SentenceTransformers - **Hosted services**: Use OpenAI, Cohere, or other embedding APIs ## ⚠️ Storage Backend Compatibility **External embedding APIs are currently only supported with the `sqlite_vec` storage backend.** If you're using `hybrid` or `cloudflare` backends, the external API will NOT be used: - **Hybrid**: SQLite-vec will fall back to local models, Cloudflare uses Workers AI - **Cloudflare**: Always uses Workers AI (`@cf/baai/bge-base-en-v1.5`) To use external embedding APIs, configure: ```bash export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings ``` Support for external APIs with Hybrid/Cloudflare backends is planned for a future release. ## Configuration Set these environment variables to enable external embeddings: ```bash # Required: API endpoint URL export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings # Optional: Model name (default: nomic-embed-text) export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-embed-text # Optional: API key for authenticated endpoints export MCP_EXTERNAL_EMBEDDING_API_KEY=sk-xxx ``` ## Supported Backends ### vLLM [vLLM](https://docs.vllm.ai/) provides high-performance inference with OpenAI-compatible API. ```bash # Start vLLM with an embedding model vllm serve nomic-ai/nomic-embed-text-v1.5 --port 8890 # Configure MCP Memory Service export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5 ``` ### Ollama [Ollama](https://ollama.ai/) provides easy local model deployment. ```bash # Pull and run embedding model ollama pull nomic-embed-text # Configure MCP Memory Service export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:11434/v1/embeddings export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-embed-text ``` ### Text Embeddings Inference (TEI) [TEI](https://github.com/huggingface/text-embeddings-inference) is HuggingFace's optimized embedding server. ```bash # Start TEI docker run --gpus all -p 8080:80 \ ghcr.io/huggingface/text-embeddings-inference:latest \ --model-id nomic-ai/nomic-embed-text-v1.5 # Configure MCP Memory Service export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8080/v1/embeddings export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5 ``` ### OpenAI ```bash export MCP_EXTERNAL_EMBEDDING_URL=https://api.openai.com/v1/embeddings export MCP_EXTERNAL_EMBEDDING_MODEL=text-embedding-3-small export MCP_EXTERNAL_EMBEDDING_API_KEY=sk-xxx ``` ## Embedding Dimension Compatibility ⚠️ **Important**: The embedding dimension must match your database schema. | Model | Dimensions | |-------|------------| | nomic-embed-text | 768 | | text-embedding-3-small | 1536 | | text-embedding-3-large | 3072 | | all-MiniLM-L6-v2 | 384 | | all-mpnet-base-v2 | 768 | If you're migrating from a local model to an external API (or vice versa), ensure the dimensions match or you'll need to re-embed your memories. ## Fallback Behavior If the external API is unavailable at startup, MCP Memory Service will fall back to local embedding models (ONNX → SentenceTransformer → Hash embeddings). To require external embeddings without fallback, you can set: ```bash export MCP_MEMORY_USE_ONNX=false # Disable ONNX fallback # Don't install sentence-transformers # Disable ST fallback ``` ## Performance Considerations - **Batching**: The adapter batches requests (default 32 sentences) for efficiency - **Caching**: Embedding models are cached per API URL + model combination - **Timeout**: Default 30 second timeout per request (configurable) - **Retry**: Currently no automatic retry; failures fall back to local models ## Troubleshooting ### Connection refused ``` ConnectionError: Cannot connect to external embedding API at http://localhost:8890/v1/embeddings ``` - Verify the embedding service is running - Check the port is correct - Ensure no firewall blocking ### Dimension mismatch ``` RuntimeError: Dimension mismatch for inserted vector. Expected 768 dimensions but received 384. ``` - The external model produces different dimensions than your database - Either change the model or migrate your database ### Authentication error ``` ConnectionError: API returned status 401: Unauthorized ``` - Set `MCP_EXTERNAL_EMBEDDING_API_KEY` environment variable - Verify the API key is valid ## API Compatibility The adapter expects OpenAI-compatible `/v1/embeddings` endpoint: **Request:** ```json { "input": ["text to embed", "another text"], "model": "model-name" } ``` **Response:** ```json { "data": [ {"index": 0, "embedding": [0.1, 0.2, ...]}, {"index": 1, "embedding": [0.3, 0.4, ...]} ] } ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/doobidoo/mcp-memory-service'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

external-embeddings.md•4.97 KiB