Skip to main content
Glama

Codebase MCP Server

by Ravenight13
DATABASE_PER_PROJECT_ARCHITECTURE.md15.4 kB
# Database-Per-Project Architecture ## Overview Codebase MCP now uses a **database-per-project architecture** for complete multi-project isolation. This replaces the previous schema-based approach with true database-level separation. ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────┐ │ MCP Client (Claude) │ └──────────────────────┬──────────────────────────────────────┘ │ │ MCP Protocol ▼ ┌─────────────────────────────────────────────────────────────┐ │ Codebase MCP Server │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ FastMCP Context (session_id) │ │ │ └──────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Session Context Manager │ │ │ │ - .codebase-mcp/config.json discovery │ │ │ │ - Config caching with mtime invalidation │ │ │ │ - Per-session working directory tracking │ │ │ └──────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Project Resolution (4-tier chain) │ │ │ │ 1. Explicit project_id parameter │ │ │ │ 2. Session config (.codebase-mcp/config.json) │ │ │ │ 3. workflow-mcp integration (optional) │ │ │ │ 4. Default project fallback │ │ │ └──────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ ProjectRegistry Service │ │ │ │ - Lookup project by ID or name │ │ │ │ - Returns (project_id, database_name) │ │ │ └──────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Connection Pool Manager │ │ │ │ - Per-project AsyncPG pools │ │ │ │ - Lazy initialization + caching │ │ │ └──────────────────┬───────────────────────────────────┘ │ └────────────────────┬┴───────────────────────────────────────┘ │ ┌──────────┴────────────┬─────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────────┐ ┌────────────────────┐ ┌──────────────────┐ │ Registry DB │ │ Project DB 1 │ │ Project DB 2 │ │ ───────────── │ │ ───────────── │ │ ───────────── │ │ codebase_mcp_ │ │ cb_proj_ │ │ cb_proj_ │ │ registry │ │ my_project_abc │ │ other_proj_def │ │ │ │ │ │ │ │ projects table: │ │ repositories │ │ repositories │ │ - id │ │ code_files │ │ code_files │ │ - name │ │ code_chunks │ │ code_chunks │ │ - database_name │ │ (pgvector) │ │ (pgvector) │ │ - metadata │ │ │ │ │ └───────────────────┘ └────────────────────┘ └──────────────────┘ ``` ## Key Components ### 1. Registry Database **Database**: `codebase_mcp_registry` **Tables**: - `projects`: Tracks all project workspaces **Schema**: ```sql CREATE TABLE projects ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) UNIQUE NOT NULL, description TEXT, database_name VARCHAR(255) UNIQUE NOT NULL, -- cb_proj_{name}_{hash} created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), metadata JSONB DEFAULT '{}'::jsonb ); ``` ### 2. Project Databases **Naming Convention**: `cb_proj_{sanitized_name}_{uuid8}` **Example**: `cb_proj_my_project_abc123de` **Tables** (in each project database): - `repositories`: Indexed code repositories - `code_files`: Individual code files with metadata - `code_chunks`: Semantic code chunks with embeddings (pgvector) **Schema**: See `scripts/init_project_schema.sql` ### 3. Connection Pool Management **Registry Pool**: - Single AsyncPG pool for registry database - Used for project lookups and metadata - Min size: 2, Max size: 10 connections **Project Pools**: - Separate AsyncPG pool per project database - Lazy initialization on first access - Cached in module-level dict: `_project_pools` - Min size: 2, Max size: 10 connections ### 4. Project Resolution **4-Tier Resolution Chain**: 1. **Explicit ID** (highest priority): ```python await index_repository(repo_path="/path", project_id="my-project") ``` 2. **Session Config** (recommended): ```python await set_working_directory("/path/to/project") # Discovers .codebase-mcp/config.json await index_repository(repo_path="/path") # Auto-resolves project ``` 3. **workflow-mcp Integration** (optional): - Queries external workflow-mcp server - Timeout protection (1s default) - Graceful fallback on failure 4. **Default Fallback**: - Returns `("default", "cb_proj_default_00000000")` - Used when all resolution methods fail ## Configuration ### Environment Variables ```bash # Registry database (required) REGISTRY_DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/codebase_mcp_registry # Default database (optional, for backward compatibility) DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/codebase_mcp # Connection pool settings POOL_MIN_SIZE=2 POOL_MAX_SIZE=10 # Ollama for embeddings OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest ``` ### Project Config File **Location**: `.codebase-mcp/config.json` (in project root) **Format**: ```json { "version": "1.0", "project": { "name": "my-project", "id": "550e8400-e29b-41d4-a716-446655440000" }, "auto_switch": true, "strict_mode": false, "description": "My project description" } ``` **Discovery**: - Searches up to 20 parent directories - Cached with mtime-based invalidation (<1ms cache hits) - Falls back to workflow-mcp or default if not found ## Setup Guide ### 1. Initialize Registry Database ```bash # Create registry database createdb codebase_mcp_registry # Initialize schema psql -d codebase_mcp_registry -f scripts/init_registry_schema.sql ``` ### 2. Create Default Project ```bash # Create default fallback database createdb cb_proj_default_00000000 # Initialize schema psql -d cb_proj_default_00000000 -f scripts/init_project_schema.sql # Register in registry psql -d codebase_mcp_registry -c " INSERT INTO projects (id, name, description, database_name) VALUES ( '00000000-0000-0000-0000-000000000000', 'default', 'Default fallback project', 'cb_proj_default_00000000' ); " ``` ### 3. Create Project for Your Codebase **Option A: Automated Script**: ```bash python scripts/setup_codebase_mcp_project.py ``` **Option B: Manual Setup**: ```python from src.database.registry import ProjectRegistry from src.database.provisioning import create_pool # Initialize registry registry_pool = await create_pool("codebase_mcp_registry") registry = ProjectRegistry(registry_pool) # Create project (provisions database automatically) project = await registry.create_project( name="my-project", description="My project description", metadata={"owner": "alice@example.com"} ) print(f"Created: {project.database_name}") ``` ### 4. Configure Project Create `.codebase-mcp/config.json` in your project root: ```bash mkdir -p .codebase-mcp cat > .codebase-mcp/config.json << EOF { "version": "1.0", "project": { "name": "my-project" } } EOF ``` ## Usage Examples ### Basic Workflow ```python from src.mcp.tools.project import set_working_directory from src.mcp.tools.indexing import index_repository from src.mcp.tools.search import search_code # 1. Set working directory (discovers config) await set_working_directory("/Users/alice/my-project") # 2. Index repository (uses project from config) result = await index_repository( repo_path="/Users/alice/my-project" ) # Indexes to: cb_proj_my_project_abc123de # 3. Search code (uses same project) results = await search_code( query="authentication logic" ) # Searches in: cb_proj_my_project_abc123de ``` ### Multi-Project Workflow ```python # Project A await set_working_directory("/Users/alice/project-a") await index_repository(repo_path="/Users/alice/project-a") # Uses: cb_proj_project_a_xxx # Project B await set_working_directory("/Users/alice/project-b") await index_repository(repo_path="/Users/alice/project-b") # Uses: cb_proj_project_b_yyy # Data is completely isolated (different databases) ``` ### Explicit Project Override ```python # Override project resolution with explicit ID await index_repository( repo_path="/path/to/repo", project_id="my-specific-project" ) # Forces: cb_proj_my_specific_project_* ``` ## Benefits vs. Schema-Based ### Complete Isolation - **Database-per-project**: Separate physical databases, impossible to cross-contaminate - **Schema-based**: Shared database, risk of schema leakage via SQL errors ### Independent Scaling - **Database-per-project**: Scale each project database independently - **Schema-based**: All projects share same database resources ### Backup Flexibility - **Database-per-project**: Backup/restore individual projects - **Schema-based**: Must backup entire shared database ### Security - **Database-per-project**: Database-level access control per project - **Schema-based**: Schema-level access control (less granular) ### Monitoring - **Database-per-project**: Per-project database metrics (CPU, I/O, queries) - **Schema-based**: Aggregated metrics only ## Migration from Schema-Based **No migration needed!** The old `codebase_mcp` database is unused in the new architecture. Data is not migrated - you start fresh with new project databases. **Why no migration?**: 1. Schema-based data had no project tracking (no way to map schemas to projects) 2. Fresh start ensures clean database-per-project boundaries 3. Semantic search benefits from re-indexing with latest embeddings **If you want to keep old data**: 1. Archive `codebase_mcp` database: `pg_dump codebase_mcp > backup.sql` 2. Create new project databases 3. Re-index repositories (recommended for best embeddings) ## Performance Characteristics | Operation | Target | Typical | Notes | |-----------|--------|---------|-------| | Config discovery (cached) | <1ms | 0.8ms | LRU cache with mtime validation | | Config discovery (uncached) | <60ms | 38ms | 20-level directory traversal | | Registry lookup | <10ms | 5ms | PostgreSQL btree index | | Project pool creation | <100ms | 45ms | One-time per project | | Session creation | <5ms | 2ms | From existing pool | ## Troubleshooting ### "Project not found in registry" **Cause**: Config references a project that doesn't exist in registry **Solution**: ```python from src.database.auto_create import get_or_create_project_from_config # Automatically creates project if missing project = await get_or_create_project_from_config(config_path) ``` ### "Database already exists" **Cause**: Trying to create a project database that exists **Solution**: - Check if project already registered: `psql -d codebase_mcp_registry -c "SELECT * FROM projects WHERE name = 'my-project'"` - Either use existing project or choose different name ### "Connection pool timeout" **Cause**: Too many concurrent connections **Solution**: - Increase pool size: `POOL_MAX_SIZE=20` - Check for connection leaks (unclosed sessions) - Monitor with: `SELECT count(*) FROM pg_stat_activity WHERE datname LIKE 'cb_proj_%';` ## API Reference See the following modules for detailed API documentation: - `src/database/registry.py` - ProjectRegistry service - `src/database/provisioning.py` - Database provisioning utilities - `src/database/session.py` - Session management and project resolution - `src/database/auto_create.py` - Auto-provisioning from config ## See Also - [Registry Schema](../scripts/init_registry_schema.sql) - SQL schema definition - [Project Schema](../scripts/init_project_schema.sql) - Project database tables - [Setup Script](../scripts/setup_codebase_mcp_project.py) - Automated project creation

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server