Files-DB-MCP

Integrations

  • Uses Docker volumes for persistent model caching and deployment of the vector search service

  • Monitors Git-managed projects for file changes and provides real-time search updates as code evolves

  • Supports installation and deployment from GitHub repositories, with direct integration for source code access

Files-DB-MCP: Vector Search for Code Projects

A local vector database system that provides LLM coding agents with fast, efficient search capabilities for software projects via the Message Control Protocol (MCP).

Features

  • Zero Configuration - Auto-detects project structure with sensible defaults
  • Real-Time Monitoring - Continuously watches for file changes
  • Vector Search - Semantic search for finding relevant code
  • MCP Interface - Compatible with Claude Code and other LLM tools
  • Open Source Models - Uses Hugging Face models for code embeddings

Installation

# Using SSH (recommended if you have SSH keys set up with GitHub) git clone git@github.com:randomm/files-db-mcp.git ~/.files-db-mcp && bash ~/.files-db-mcp/install/setup.sh # Using HTTPS (if you don't have SSH keys set up) git clone https://github.com/randomm/files-db-mcp.git ~/.files-db-mcp && bash ~/.files-db-mcp/install/setup.sh

Option 2: Automated Installation Script

curl -fsSL https://raw.githubusercontent.com/randomm/files-db-mcp/main/install/install.sh | bash

Usage

After installation, run in any project directory:

files-db-mcp

The service will:

  1. Detect your project files
  2. Start indexing in the background
  3. Begin responding to MCP search queries immediately

Requirements

  • Docker
  • Docker Compose

Configuration

Files-DB-MCP works without configuration, but you can customize it with environment variables:

  • EMBEDDING_MODEL - Change the embedding model (default: 'jinaai/jina-embeddings-v2-base-code' or project-specific model)
  • FAST_STARTUP - Set to 'true' to use a smaller model for faster startup (default: 'false')
  • QUANTIZATION - Enable/disable quantization (default: 'true')
  • BINARY_EMBEDDINGS - Enable/disable binary embeddings (default: 'false')
  • IGNORE_PATTERNS - Comma-separated list of files/dirs to ignore

First-Time Startup

On first run, Files-DB-MCP will download embedding models which may take several minutes depending on:

  • The size of the selected model (300-500MB for high-quality models)
  • Your internet connection speed

Subsequent startups will be much faster as models are cached in a persistent Docker volume. For faster initial startup, you can:

# Use a smaller, faster model (90MB) EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 files-db-mcp # Or enable fast startup mode FAST_STARTUP=true files-db-mcp

Model Caching

Files-DB-MCP automatically persists downloaded embedding models, so you only need to download them once:

  • Models are stored in a Docker volume called model_cache
  • This volume persists between container restarts and across different projects
  • The cache is shared for all projects using Files-DB-MCP on your machine
  • You don't need to download the model again for each project

Claude Code Integration

Add to your Claude Code configuration:

{ "mcpServers": { "files-db-mcp": { "command": "python", "args": ["/path/to/src/claude_mcp_server.py", "--host", "localhost", "--port", "6333"] } } }

For details, see Claude MCP Integration.

Documentation

Repository Structure

  • /src - Source code
  • /tests - Unit and integration tests
  • /docs - Documentation
  • /scripts - Utility scripts
  • /install - Installation scripts
  • /.docker - Docker configuration
  • /config - Configuration files
  • /ai-assist - AI assistance files

License

MIT License

Contributing

Contributions welcome! Please feel free to submit a pull request.

-
security - not tested
A
license - permissive license
-
quality - not tested

local-only server

The server can only run on the client's local machine because it depends on local resources.

A local vector database system that provides LLM coding agents with fast, efficient semantic search capabilities for software projects via the Message Control Protocol.

  1. Features
    1. Installation
      1. Option 1: Clone and Setup (Recommended)
      2. Option 2: Automated Installation Script
    2. Usage
      1. Requirements
        1. Configuration
          1. First-Time Startup
          2. Model Caching
        2. Claude Code Integration
          1. Documentation
            1. Repository Structure
              1. License
                1. Contributing

                  Related MCP Servers

                  • -
                    security
                    A
                    license
                    -
                    quality
                    Provides a semantic memory layer that integrates LLMs with OpenSearch, enabling storage and retrieval of memories within the OpenSearch engine.
                    Last updated -
                    Python
                    Apache 2.0
                  • A
                    security
                    A
                    license
                    A
                    quality
                    Chat with your codebase through intelligent code searching without embeddings by breaking files into logical chunks, giving the LLM tools to search these chunks, and letting it find specific code needed to answer your questions.
                    Last updated -
                    8
                    22
                    Python
                    MIT License
                  • -
                    security
                    F
                    license
                    -
                    quality
                    An integration server implementing the Model Context Protocol that enables LLM applications to interact with Milvus vector database functionality, allowing vector search, collection management, and data operations through natural language.
                    Last updated -
                    91
                    Python
                    • Apple
                  • -
                    security
                    -
                    license
                    -
                    quality
                    A long-term memory storage system for LLMs that helps them remember context across multiple sessions using semantic search with embeddings to provide relevant historical information from past interactions and development decisions.
                    Last updated -
                    3
                    TypeScript
                    MIT License

                  View all related MCP servers

                  ID: xedtkxqtfn