Chroma MCP Server
<img height="24" width="24" src="https://docs.trychroma.com/favicon.ico" alt="ChromaDB Logo" /> Chroma MCP Server
A Model Context Protocol (MCP) server that provides semantic search and document management capabilities using ChromaDB. This server enables LLMs to perform natural language queries over document collections with intuitive similarity metrics, making it ideal for RAG (Retrieval Augmented Generation) applications.
Features
- Semantic Search: Find documents based on meaning using state-of-the-art embeddings
- Intuitive Similarity Metrics: Results include human-friendly similarity scores (0-100%)
- Document Management: Full CRUD operations for documents and collections
- Rich Metadata Support: Attach and search by custom metadata fields
- Persistent Storage: Reliable document storage with SQLite backend
- Security: Configurable access controls and input validation
- Error Handling: Comprehensive error messages and graceful failure recovery
Requirements
- Python 3.12 or higher
- ChromaDB 0.4.22 or higher
- MCP Python SDK 1.1.2 or higher
- uv package manager (recommended) or pip
Quick Start
For Claude Desktop integration, see Installation.
Architecture
The server is built on:
- ChromaDB for vector storage and search
- MCP Python SDK for server implementation
- SQLite for persistent storage
Data Flow
- Documents are embedded using ChromaDB's default embedding model
- Embeddings and metadata are stored in ChromaDB's SQLite backend
- Queries are processed through the same embedding model
- Results are normalized to a 0-100% similarity scale
Components
Collections and Documents
The server manages two main resource types:
- Collections: Containers for related documents with shared embedding settings
- Documents: Text content with metadata and automatically generated embeddings
Tools
Collection Management
list-collections
: List all available collectionscreate-collection
: Create a new collection with optional settingsdelete-collection
: Delete a collection and its documents
Document Operations
add-document
: Add a new document with content and metadataget-document
: Retrieve a specific document by IDupdate-document
: Modify document content or metadatadelete-document
: Remove a document from a collectionsearch-documents
: Semantic search with normalized similarity scores
Installation
Prerequisites
- Python 3.12+
- uv package manager (recommended) or pip
Setup
- Clone the repository:
- Create and activate virtual environment:
- Install dependencies:
Claude Desktop Integration
Add the server to your Claude Desktop configuration:
Windows (%APPDATA%/Claude/claude_desktop_config.json
):
MacOS (~/Library/Application Support/Claude/claude_desktop_config.json
):
Usage Examples
Managing Collections
Create a collection:
List collections:
Working with Documents
Add a document:
Get a specific document:
Update a document:
Search documents:
Understanding Similarity Scores
Search results include normalized similarity scores from 0-100%:
- 90-100%: Nearly identical content or very strong semantic match
- 70-89%: Highly relevant with strong semantic similarity
- 50-69%: Moderately related with partial semantic overlap
- 30-49%: Somewhat related with minimal semantic connection
- 0-29%: Likely unrelated or very weak semantic connection
Troubleshooting
Common Issues
- Database Connection Errors
- Ensure the database path is writable
- Check if another process is using the database
- Try deleting
.chroma
directory and restarting
- Memory Issues
- Large collections may require more RAM
- Consider using smaller batch sizes
- Monitor memory usage with
--log-level DEBUG
- Slow Search Performance
- Large collections may need index optimization
- Consider using fewer
n_results
- Check system resource usage
Debug Mode
Run the server in debug mode:
Getting Help
- Check ChromaDB Documentation
- Open an issue on GitHub
- Join MCP Community Discussions
Development
Running Tests
Run the test suite:
Run with coverage:
Debugging
For debugging, use the MCP Inspector:
The inspector provides:
- Real-time request/response monitoring
- Tool testing interface
- Performance metrics
- Error tracking
Error Handling
The server provides detailed error messages for common scenarios:
- Invalid collection names or IDs
- Missing or malformed documents
- Database connection issues
- Invalid search parameters
- Authentication/authorization failures
Security Considerations
- Input validation on all parameters
- Configurable access controls
- Safe handling of file paths
- Protection against injection attacks
- Rate limiting support
- Secure error messages
Configuration
Database Location
Set custom database path:
Default: .chroma
in the server directory
Environment Variables
CHROMA_DB_PATH
: Override database locationCHROMA_LOG_LEVEL
: Set logging verbosity (default: INFO)CHROMA_MAX_CONNECTIONS
: Database connection pool size (default: 10)
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
Please read our Contributing Guidelines for more details.
License
MIT License
Copyright (c) 2024 privetin
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Enables LLMs to perform semantic search and document management using ChromaDB, supporting natural language queries with intuitive similarity metrics for retrieval augmented generation applications.