Chroma MCP Server

<img height="24" width="24" src="https://docs.trychroma.com/favicon.ico" alt="ChromaDB Logo" /> Chroma MCP Server

A Model Context Protocol (MCP) server that provides semantic search and document management capabilities using ChromaDB. This server enables LLMs to perform natural language queries over document collections with intuitive similarity metrics, making it ideal for RAG (Retrieval Augmented Generation) applications.

Features

  • Semantic Search: Find documents based on meaning using state-of-the-art embeddings
  • Intuitive Similarity Metrics: Results include human-friendly similarity scores (0-100%)
  • Document Management: Full CRUD operations for documents and collections
  • Rich Metadata Support: Attach and search by custom metadata fields
  • Persistent Storage: Reliable document storage with SQLite backend
  • Security: Configurable access controls and input validation
  • Error Handling: Comprehensive error messages and graceful failure recovery

Requirements

  • Python 3.12 or higher
  • ChromaDB 0.4.22 or higher
  • MCP Python SDK 1.1.2 or higher
  • uv package manager (recommended) or pip

Quick Start

# Clone the repository git clone https://github.com/privetin/mcp-server-chroma.git cd mcp-server-chroma # Install with uv (recommended) uv venv .venv\Scripts\activate # Windows source .venv/bin/activate # Unix uv pip install -e . # Or with pip python -m venv .venv .venv\Scripts\activate # Windows source .venv/bin/activate # Unix pip install -e . # Run the server mcp-server-chroma

For Claude Desktop integration, see Installation.

Architecture

The server is built on:

Data Flow

  1. Documents are embedded using ChromaDB's default embedding model
  2. Embeddings and metadata are stored in ChromaDB's SQLite backend
  3. Queries are processed through the same embedding model
  4. Results are normalized to a 0-100% similarity scale

Components

Collections and Documents

The server manages two main resource types:

  • Collections: Containers for related documents with shared embedding settings
  • Documents: Text content with metadata and automatically generated embeddings

Tools

Collection Management

  • list-collections: List all available collections
  • create-collection: Create a new collection with optional settings
  • delete-collection: Delete a collection and its documents

Document Operations

  • add-document: Add a new document with content and metadata
  • get-document: Retrieve a specific document by ID
  • update-document: Modify document content or metadata
  • delete-document: Remove a document from a collection
  • search-documents: Semantic search with normalized similarity scores

Installation

Prerequisites

  • Python 3.12+
  • uv package manager (recommended) or pip

Setup

  1. Clone the repository:
git clone https://github.com/privetin/mcp-server-chroma.git cd mcp-server-chroma
  1. Create and activate virtual environment:
uv venv # On Windows: .venv\Scripts\activate # On Unix: source .venv/bin/activate
  1. Install dependencies:
uv pip install -e .

Claude Desktop Integration

Add the server to your Claude Desktop configuration:

Windows (%APPDATA%/Claude/claude_desktop_config.json):

{ "mcpServers": { "chroma": { "command": "uv", "args": [ "--directory", "C:\\path\\to\\mcp-server-chroma", "run", "mcp-server-chroma" ] } } }

MacOS (~/Library/Application Support/Claude/claude_desktop_config.json):

{ "mcpServers": { "chroma": { "command": "uv", "args": [ "--directory", "/path/to/mcp-server-chroma", "run", "mcp-server-chroma" ] } } }

Usage Examples

Managing Collections

Create a collection:

Tool: create-collection Args: {"name": "research-papers"}

List collections:

Tool: list-collections Args: {}

Working with Documents

Add a document:

Tool: add-document Args: { "collection": "research-papers", "content": "Recent advances in transformer architectures have led to significant improvements in natural language processing tasks.", "metadata": { "title": "Transformer Architectures", "year": 2024, "category": "ML" } }

Get a specific document:

Tool: get-document Args: { "collection": "research-papers", "document_id": "doc_123" }

Update a document:

Tool: update-document Args: { "collection": "research-papers", "document_id": "doc_123", "content": "Updated findings on transformer architectures show improvements in both efficiency and accuracy.", "metadata": { "title": "Transformer Architectures - Updated", "year": 2024, "category": "ML", "status": "updated" } }

Search documents:

Tool: search-documents Args: { "collection": "research-papers", "query": "What are the latest developments in transformers?", "n_results": 3 }

Understanding Similarity Scores

Search results include normalized similarity scores from 0-100%:

  • 90-100%: Nearly identical content or very strong semantic match
  • 70-89%: Highly relevant with strong semantic similarity
  • 50-69%: Moderately related with partial semantic overlap
  • 30-49%: Somewhat related with minimal semantic connection
  • 0-29%: Likely unrelated or very weak semantic connection

Troubleshooting

Common Issues

  1. Database Connection Errors
    • Ensure the database path is writable
    • Check if another process is using the database
    • Try deleting .chroma directory and restarting
  2. Memory Issues
    • Large collections may require more RAM
    • Consider using smaller batch sizes
    • Monitor memory usage with --log-level DEBUG
  3. Slow Search Performance
    • Large collections may need index optimization
    • Consider using fewer n_results
    • Check system resource usage

Debug Mode

Run the server in debug mode:

mcp-server-chroma --log-level DEBUG

Getting Help

Development

Running Tests

Run the test suite:

pytest -v

Run with coverage:

pytest --cov=chroma tests/

Debugging

For debugging, use the MCP Inspector:

# Install the inspector npm install -g @modelcontextprotocol/inspector # Run the server with inspector mcp-inspector uv --directory /path/to/mcp-server-chroma run mcp-server-chroma

The inspector provides:

  • Real-time request/response monitoring
  • Tool testing interface
  • Performance metrics
  • Error tracking

Error Handling

The server provides detailed error messages for common scenarios:

  • Invalid collection names or IDs
  • Missing or malformed documents
  • Database connection issues
  • Invalid search parameters
  • Authentication/authorization failures

Security Considerations

  • Input validation on all parameters
  • Configurable access controls
  • Safe handling of file paths
  • Protection against injection attacks
  • Rate limiting support
  • Secure error messages

Configuration

Database Location

Set custom database path:

mcp-server-chroma --db-path /path/to/db

Default: .chroma in the server directory

Environment Variables

  • CHROMA_DB_PATH: Override database location
  • CHROMA_LOG_LEVEL: Set logging verbosity (default: INFO)
  • CHROMA_MAX_CONNECTIONS: Database connection pool size (default: 10)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

Please read our Contributing Guidelines for more details.

License

MIT License

Copyright (c) 2024 privetin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.