vector-stores.mdβ’14.3 kB
# Vector Stores
Vector stores are databases optimized for storing and searching high-dimensional vectors (embeddings). Cipher supports multiple vector database providers for flexible deployment options.
## Supported Vector Stores
Cipher supports six vector database types:
- **Qdrant** - High-performance vector search engine
- **Milvus** - Open-source vector database with cloud options
- **ChromaDB** - Developer-friendly open-source embedding database
- **Pinecone** - Managed vector database service
- **Pgvector** - PostgreSQL extension with ACID compliance and enterprise features
- **Faiss** - FaissDB
- **Redis** - Redis
- **In-Memory** - Built-in solution for development/testing
## Vector Store Configurations
<details>
<summary><strong>π§ Qdrant Configuration</strong></summary>
[Qdrant](https://qdrant.tech/) is a high-performance vector search engine with excellent performance and features.
### βοΈ Qdrant Cloud (Managed)
The easiest way to get started with Qdrant:
```env
VECTOR_STORE_TYPE=qdrant
VECTOR_STORE_URL=https://your-cluster.qdrant.io
VECTOR_STORE_API_KEY=your-qdrant-api-key
```
**Setup Steps:**
1. Create account at [Qdrant Cloud](https://cloud.qdrant.io/)
2. Create a new cluster
3. Copy your cluster URL and API key
4. Add to your `.env` file or your `json` mcp config
### π³ Qdrant Local (Docker)
Run Qdrant locally using Docker:
```bash
# Basic setup (data lost on removing the container)
docker run -d --name qdrant-basic -p 6333:6333 qdrant/qdrant
# With persistent storage
docker run -d --name qdrant-storage -v ./qdrant-data:/qdrant/storage -p 6333:6333 qdrant/qdrant
```
```bash
# .env configuration
VECTOR_STORE_TYPE=qdrant
VECTOR_STORE_HOST=localhost
VECTOR_STORE_PORT=6333
VECTOR_STORE_URL=http://localhost:6333
```
### π³ Qdrant Docker Compose
Add to your `docker-compose.yml`:
```yaml
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
environment:
- QDRANT__SERVICE__HTTP_PORT=6333
volumes:
qdrant_data:
```
</details>
<details>
<summary><strong>π§ Milvus Configuration</strong></summary>
[Milvus](https://milvus.io/) is an open-source vector database with excellent scalability.
### βοΈ Zilliz Cloud (Managed Milvus)
[Zilliz Cloud](https://zilliz.com/) provides managed Milvus hosting:
```bash
# .env configuration
VECTOR_STORE_TYPE=milvus
VECTOR_STORE_URL=your-milvus-cluster-endpoint
VECTOR_STORE_USERNAME=your-zilliz-username
VECTOR_STORE_PASSWORD=your-zilliz-password
```
**Setup Steps:**
1. Create account at [Zilliz Cloud](https://cloud.zilliz.com/)
2. Create a new cluster
3. Get your cluster endpoint and credentials
4. Add to your `.env` file or your `json` mcp config
### π³ Milvus Local (Docker)
Run Milvus locally using the official installation script:
```bash
# Download the official installation script
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
# Start the Docker container
bash standalone_embed.sh start
```
```bash
# .env configuration
VECTOR_STORE_TYPE=milvus
VECTOR_STORE_HOST=localhost
VECTOR_STORE_PORT=19530
```
**Services Started:**
- **Milvus server**: Port 19530
- **Embedded etcd**: Port 2379
- **Web UI**: http://127.0.0.1:9091/webui/
- **Data volume**: `volumes/milvus`
**Service Management:**
```bash
# Restart Milvus
bash standalone_embed.sh restart
# Stop Milvus
bash standalone_embed.sh stop
# Upgrade Milvus
bash standalone_embed.sh upgrade
# Delete Milvus (removes all data)
bash standalone_embed.sh delete
```
</details>
<details>
<summary><strong>π§ ChromaDB Configuration</strong></summary>
[ChromaDB](https://www.trychroma.com/) is a developer-friendly open-source embedding database designed for AI applications.
### βοΈ ChromaDB Cloud (Managed)
ChromaDB offers managed cloud hosting for production deployments:
```bash
# .env configuration
VECTOR_STORE_TYPE=chroma
VECTOR_STORE_URL=https://your-chroma-instance.chroma.dev
VECTOR_STORE_API_KEY=your-chroma-api-key
```
**Setup Steps:**
1. Create account at [ChromaDB Cloud](https://www.trychroma.com/)
2. Create a new database instance
3. Copy your instance URL and API key
4. Add to your `.env` file or your `json` mcp config
### π³ ChromaDB Local (Docker)
Run ChromaDB locally using Docker:
```bash
# Basic setup (data lost on removing the container)
docker run -d --name chroma-basic -p 8000:8000 chromadb/chroma
# With persistent storage
docker run -d --name chroma-storage -v ./chroma-data:/data -p 8000:8000 chromadb/chroma
```
```bash
# .env configuration
VECTOR_STORE_TYPE=chroma
VECTOR_STORE_HOST=localhost
VECTOR_STORE_PORT=8000
VECTOR_STORE_URL=http://localhost:8000
```
**Important:** For production deployments, review the [ChromaDB deployment guide](https://docs.trychroma.com/deployment) and [security considerations](https://docs.trychroma.com/deployment#security).
### π³ ChromaDB Docker Compose
Add to your `docker-compose.yml`:
```yaml
services:
chromadb:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- chroma_data:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
- PERSIST_DIRECTORY=/chroma/chroma
- ANONYMIZED_TELEMETRY=FALSE
volumes:
chroma_data:
```
### βοΈ ChromaDB Configuration
```bash
# Basic setup
VECTOR_STORE_TYPE=chroma
VECTOR_STORE_URL=http://localhost:8000
# With SSL/TLS
VECTOR_STORE_TYPE=chroma
VECTOR_STORE_HOST=localhost
VECTOR_STORE_PORT=8000
VECTOR_STORE_SSL=true
```
**Distance Metrics:** Cipher automatically converts user-friendly terms:
- `euclidean` β `l2`
- `dot` β `ip`
- `cosine` β `cosine`
**Compatibility:** Use ChromaDB 1.10.5 for best results. Array fields in metadata are automatically converted to strings.
</details>
<details>
<summary><strong>π§ Pinecone Configuration</strong></summary>
[Pinecone](https://www.pinecone.io/) is a fully managed vector database service optimized for machine learning applications with excellent performance and scalability.
### βοΈ Pinecone Cloud (Managed)
Pinecone is a cloud-native service that provides serverless vector search:
```bash
# Basic configuration
VECTOR_STORE_TYPE=pinecone
VECTOR_STORE_API_KEY=your-pinecone-api-key
VECTOR_STORE_COLLECTION=your-index-name # Collection names are used as indexes in Pinecone
```
**Setup Steps:**
1. Create account at [Pinecone](https://app.pinecone.io/)
2. Generate an API key from your project settings
3. Choose your preferred region (us-east-1, us-west-2, etc.)
4. Add configuration to your `.env` file or your `json` mcp config
### βοΈ Pinecone Configuration
Pinecone automatically creates indexes with these settings:
```bash
VECTOR_STORE_TYPE=pinecone
VECTOR_STORE_API_KEY=your-pinecone-api-key
PINECONE_NAMESPACE=production
PINECONE_PROVIDER=aws
PINECONE_REGION=us-east-1
```
**Index Specifications:**
- **Serverless deployment** with automatic scaling
- **Cloud provider**: AWS (default)
- **Region**: us-east-1 (default, configurable)
- **Automatic index creation** if not exists
</details>
<details>
<summary><strong>π§ PgVector Configuration</strong></summary>
[PgVector](https://github.com/pgvector/pgvector) is a PostgreSQL extension for vector similarity search, combining the reliability of PostgreSQL with vector search capabilities.
### βοΈ Managed PostgreSQL Services
#### βοΈ PgVector Configuration
Build a PgVector Docker container in local
```bash
docker run --name pgvector \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_USER=user \
-e POSTGRES_DB=cipherDB \
-p 5432:5432 \
pgvector/pgvector:pg16
```
Build a PostgreSQL docker with pgvector from local
```bash
# Connection URL format
VECTOR_STORE_TYPE=pgvector
VECTOR_STORE_URL=postgresql://user:password@localhost:5432/cipherDB
```
Most cloud PostgreSQL services support pgvector extension:
```bash
VECTOR_STORE_TYPE=pgvector
VECTOR_STORE_URL=postgresql://<service-endpoint>
```
**Index Specifications:**
- **Index types**: HNSW (default) for better recall, IVFFlat for speed
- **ACID compliance**: Full PostgreSQL transaction support
- **Automatic table/index creation** if not exists
**Setup Steps:**
1. Install PostgreSQL with pgvector extension
2. Create database and user with appropriate permissions
3. Add configuration to your `.env` file or `json` mcp config
4. Tables and indexes are created automatically on first use
</details>
<details>
<summary><strong>π§ FaissDB Configuration</strong></summary>
#### βοΈ FaissDB Configuration
Build a PostgreSQL docker with pgvector from local
```bash
# Connection format
VECTOR_STORE_TYPE=faiss
FAISS_BASE_STORAGE_PATH=path/to/your/folder
```
**Specifications:**
- **Index types**: Based on metric selection
- **Automatic folder and index creation** if not exists
</details>
<details>
<summary><strong>π§ Redis Vector Store </strong></summary>
Redis Stack is supported as a vector storage backend in Cipher, using RediSearch for fast similarity search and metadata filtering. This backend is ideal for scalable deployments and supports multiple distance metrics.
### π³ Redis Local (Docker)
Run Redis Stack with Docker:
```sh
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest
```
---
### βοΈ Configuration (for local and cloud)
Add to your `.env` file:
```bash
VECTOR_STORE_TYPE=redis
#connect using host and port
VECTOR_STORE_HOST=localhost
VECTOR_STORE_PORT=6379
VECTOR_STORE_USERNAME=<your-username> # optional
VECTOR_STORE_PASSWORD=<your-password> # optional
#connect if using url
VECTOR_STORE_URL=redis://localhost:6379
VECTOR_STORE_DISTANCE=COSINE # Options: COSINE, L2, IP
```
For workspace-specific configuration:
```bash
WORKSPACE_VECTOR_STORE_TYPE=redis
# Connect using host and port
WORKSPACE_VECTOR_STORE_HOST=localhost
WORKSPACE_VECTOR_STORE_PORT=6379
WORKSPACE_VECTOR_STORE_USERNAME=<your-username> # optional
WORKSPACE_VECTOR_STORE_PASSWORD=<your-password> # optional
# Connect using url
WORKSPACE_VECTOR_STORE_URL=redis://localhost:6379
WORKSPACE_VECTOR_STORE_DISTANCE=COSINE # Options: COSINE, L2, IP
```
---
### π§© Supported Features
- **Distance Metrics:** COSINE, L2 (Euclidean), IP (Inner Product)
- **Vector Operations:** insert, update, search, delete, list
- **Metadata Filtering:** range queries, any filters
- **Pagination & Sorting:** via RediSearch
</details>
<details>
<summary><strong>π§ In-Memory Vector Store</strong></summary>
For development and testing, Cipher includes a built-in in-memory vector store:
```bash
# .env configuration
VECTOR_STORE_TYPE=in-memory
# No additional configuration needed
```
**Features:**
- No external dependencies
- Fast for small datasets
- Data is lost when application restarts
- Perfect for development and testing
</details>
## Configuration Settings
**ποΈ Note**: All the configuration variables below have a default value. By default, only **knowledge memory** is enabled, if you want enable **reflection memory** and **workspace memory**, please set `USE_WORKSPACE_MEMORY=true` and `DISABLE_REFLECTION_MEMORY=false`
<details>
<summary><strong>βοΈ Knowledge and Reflection Collections</strong></summary>
### π Collection Configuration
```bash
# Set the name for knowledge memory collection - default: "knowledge_memory"
VECTOR_STORE_COLLECTION=knowledge_memory
# Vector dimensions (must match your embedding model)
VECTOR_STORE_DIMENSION=1536
# Distance metric for similarity calculations
VECTOR_STORE_DISTANCE=Cosine # Options: Cosine, Euclidean, Dot (Qdrant/Milvus)
# VECTOR_STORE_DISTANCE=cosine # Options: cosine, l2, euclidean, ip, dot (ChromaDB)
```
### π§ Reflection Memory (Optional)
Cipher supports a separate collection for reflection memory:
```bash
# Set the name for reflection memory collection - default: "reflection_memory"
REFLECTION_VECTOR_STORE_COLLECTION=reflection_memory
# Disable reflection memory entirely
DISABLE_REFLECTION_MEMORY=true # default: true
```
### β‘ Performance Settings
```bash
# Maximum number of vectors to store (in-memory only)
VECTOR_STORE_MAX_VECTORS=10000
# Search parameters
VECTOR_STORE_SEARCH_LIMIT=50
VECTOR_STORE_SIMILARITY_THRESHOLD=0.7
```
</details>
<details>
<summary><strong>π’ Workspace Memory Collections</strong></summary>
When using [workspace memory](./workspace-memory.md), you can configure separate vector store settings:
```bash
# Enable workspace memory
USE_WORKSPACE_MEMORY=true # default: false
# Workspace-specific collection
WORKSPACE_VECTOR_STORE_COLLECTION=workspace_memory
# Use separate vector store for workspace (optional)
WORKSPACE_VECTOR_STORE_TYPE=qdrant # or: milvus, chroma, in-memory
WORKSPACE_VECTOR_STORE_HOST=localhost
WORKSPACE_VECTOR_STORE_PORT=6333
WORKSPACE_VECTOR_STORE_URL=http://localhost:6333
WORKSPACE_VECTOR_STORE_API_KEY=your-qdrant-api-key
# Workspace search settings
WORKSPACE_SEARCH_THRESHOLD=0.4
WORKSPACE_VECTOR_STORE_DIMENSION=1536
WORKSPACE_VECTOR_STORE_MAX_VECTORS=10000
```
</details>
## Troubleshooting
<details>
<summary><strong>π§ Common Issues</strong></summary>
### β Dimension Mismatch
**Dimension Error**
```
Error: Vector dimension mismatch
```
**Solution:**
- Check your embedding model dimensions
- Update `VECTOR_STORE_DIMENSION` to match
- Recreate collections if dimensions changed
### π Performance Issues
**Slow Search Performance**
- Increase `VECTOR_STORE_SEARCH_LIMIT` for more results
- Adjust `VECTOR_STORE_SIMILARITY_THRESHOLD` (lower = more results)
- Consider upgrading to cloud-hosted solutions for better performance
**Memory Usage (In-Memory Store)**
- Reduce `VECTOR_STORE_MAX_VECTORS` if memory is limited
- Switch to external vector store for larger datasets
### π§ ChromaDB Issues
**Common Errors:**
- `Cannot find package '@chroma-core/default-embed'` β Use ChromaDB 1.10.5
- `HTTP 422: Unprocessable Entity` β Metadata must be primitive types only
- `Invalid distance metric` β Use `cosine`, `l2`, or `ip` (auto-converted from `euclidean`/`dot`)
</details>
## Related Documentation
- [Configuration](./configuration.md) - Main configuration guide
- [Embedding Configuration](./embedding-configuration.md) - Embedding setup
- [Workspace Memory](./workspace-memory.md) - Team-aware memory system