Codebase MCP Server

README.md•40.6 kB

# Codebase MCP Server A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants. ## What's New in v2.0 Version 2.0 represents a major architectural refactoring focused exclusively on semantic code search capabilities. This release removes project management, entity tracking, and work item features to maintain single-responsibility focus. **Breaking Changes**: - 14 tools removed (project management, entity tracking, work item features extracted to workflow-mcp) - 3 tools remaining: `start_indexing_background`, `get_indexing_status`, and `search_code` with multi-project support - Foreground `index_repository` removed (all indexing now uses background jobs to prevent timeouts) - Database schema simplified (9 tables dropped, `project_id` parameter added) - New environment variables for optional workflow-mcp integration **Migration Required**: Existing v1.x users must follow the migration guide to upgrade safely. See [Migration Guide](docs/migration/v1-to-v2-migration.md) for complete upgrade and rollback procedures. **What's Preserved**: All indexed repositories and code embeddings remain searchable after migration. **What's Discarded**: All v1.x project management data, entities, and work items are permanently removed. --- ## Features The Codebase MCP Server provides exactly 3 MCP tools for semantic code search with multi-project workspace support: 1. **`start_indexing_background`**: Start a background indexing job for a repository - Returns job_id immediately to prevent MCP client timeouts - Accepts optional `project_id` parameter for workspace isolation - Default behavior: indexes to default project workspace if `project_id` not specified - Performance target: 60-second indexing for 10,000 files 2. **`get_indexing_status`**: Poll the status of a background indexing job - Query job progress using job_id from start_indexing_background - Returns files_indexed, chunks_created, and completion status - Enables responsive UIs with progress indicators 3. **`search_code`**: Semantic code search with natural language queries - Accepts optional `project_id` parameter to restrict search scope - Default behavior: searches default project workspace if `project_id` not specified - Performance target: 500ms p95 search latency ### Multi-Project Support The v2.0 architecture supports isolated project workspaces through the optional `project_id` parameter: **Single Project Workflow** (default): ```python # Start background indexing job - uses default workspace job = await start_indexing_background(repo_path="/path/to/repo") job_id = job["job_id"] # Poll for completion while True: status = await get_indexing_status(job_id=job_id) if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Search without project_id - searches default workspace search_code(query="authentication logic") ``` **Multi-Project Workflow**: ```python # Index to specific project workspace job = await start_indexing_background( repo_path="/path/to/client-a-repo", project_id="client-a" ) job_id = job["job_id"] # Poll for completion while True: status = await get_indexing_status(job_id=job_id, project_id="client-a") if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Search specific project workspace search_code(query="authentication logic", project_id="client-a") ``` **Use Cases**: - **Single Project**: Individual developers or small teams working on one codebase - **Multi-Project**: Consultants managing multiple client codebases, organizations with separate product lines, or multi-tenant deployments requiring workspace isolation **Optional Integration**: The `project_id` can be automatically resolved from Git repository context when the optional [workflow-mcp](https://github.com/workflow-mcp) server is configured. Without workflow-mcp, all operations default to a single shared workspace. ## Quick Start ### 1. Database Setup ```bash # Create database createdb codebase_mcp # Initialize schema psql -d codebase_mcp -f db/init_tables.sql ``` ### 2. Install Dependencies ```bash # Install dependencies including FastMCP framework uv sync # Or with pip pip install -r requirements.txt ``` **Key Dependencies:** - `fastmcp>=0.1.0` - Modern MCP framework with decorator-based tools - `anthropic-mcp` - MCP protocol implementation - `sqlalchemy>=2.0` - Async ORM - `pgvector` - PostgreSQL vector extension - `ollama` - Embedding generation ### 3. Configure Claude Desktop Edit `~/Library/Application Support/Claude/claude_desktop_config.json`: ```json { "mcpServers": { "codebase-mcp": { "command": "uv", "args": [ "run", "--with", "fastmcp", "python", "/absolute/path/to/codebase-mcp/server_fastmcp.py" ] } } } ``` **Important:** - Use absolute paths! - Server uses FastMCP framework with decorator-based tool definitions - All logs go to `/tmp/codebase-mcp.log` (no stdout/stderr pollution) ### 4. Start Ollama ```bash ollama serve ollama pull nomic-embed-text ``` ### 5. Test ```bash # Test database and tools uv run python tests/test_tool_handlers.py # Test repository indexing uv run python tests/test_embeddings.py ``` ## Current Status ### Working Tools (3/3) ✅ | Tool | Status | Description | |------|--------|-------------| | `start_indexing_background` | ✅ Working | Start background indexing job, returns job_id immediately | | `get_indexing_status` | ✅ Working | Poll indexing job status with files_indexed/chunks_created | | `search_code` | ✅ Working | Semantic code search with pgvector similarity | ### Recent Fixes (Oct 6, 2025) - ✅ Parameter passing architecture (Pydantic models) - ✅ MCP schema mismatches (status enums, missing parameters) - ✅ Timezone/datetime compatibility (PostgreSQL) - ✅ Binary file filtering (images, cache dirs) ### Test Results ``` ✅ Task Management: 7/7 tests passed ✅ Repository Indexing: 2 files indexed, 6 chunks created ✅ Embeddings: 100% coverage (768-dim vectors) ✅ Database: Connection pool, async operations working ``` ## Tool Usage Examples ### Index a Repository (Background Job) In Claude Desktop: ``` Index the repository at /Users/username/projects/myapp ``` Initial Response (immediate): ```json { "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "pending", "message": "Indexing job started", "project_id": "default", "database_name": "cb_proj_default_00000000" } ``` Poll for Status: ``` Check the status of indexing job 550e8400-e29b-41d4-a716-446655440000 ``` Completed Response: ```json { "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "repo_path": "/Users/username/projects/myapp", "files_indexed": 234, "chunks_created": 1456, "error_message": null, "created_at": "2025-10-18T10:30:00Z", "started_at": "2025-10-18T10:30:01Z", "completed_at": "2025-10-18T10:30:15Z" } ``` ### Search Code ``` Search for "authentication middleware" in Python files ``` Response: ```json { "results": [ { "file_path": "src/middleware/auth.py", "content": "def authenticate_request(request):\n ...", "start_line": 45, "similarity_score": 0.92 } ], "total_count": 5, "latency_ms": 250 } ``` ## Architecture ``` Claude Desktop ↔ FastMCP Server ↔ Tool Handlers ↔ Services ↔ PostgreSQL ↓ Ollama (embeddings) ``` **MCP Framework**: Built with [FastMCP](https://github.com/jlowin/fastmcp) - a modern, decorator-based framework for building MCP servers with: - Type-safe tool definitions via `@mcp.tool()` decorators - Automatic JSON Schema generation from Pydantic models - Dual logging (file + MCP protocol) without stdout pollution - Async/await support throughout See [Multi-Project Architecture](docs/architecture/multi-project-design.md) for detailed component diagrams. ## Documentation - **[Multi-Project Architecture](docs/architecture/multi-project-design.md)** - System architecture and data flow - **[Auto-Switch Architecture](docs/architecture/AUTO_SWITCH.md)** - Config-based project switching internals - **[Configuration Guide](docs/configuration/production-config.md)** - Production deployment and tuning - **[API Reference](docs/api/tool-reference.md)** - Complete MCP tool documentation - **[CLAUDE.md](CLAUDE.md)** - Specify workflow for AI-assisted development ## Database Schema 11 tables with pgvector for semantic search: **Core Tables:** - `repositories` - Indexed repositories - `code_files` - Source files with metadata - `code_chunks` - Semantic chunks with embeddings (vector(768)) - `tasks` - Development tasks with git tracking - `task_status_history` - Audit trail See [Multi-Project Architecture](docs/architecture/multi-project-design.md) for complete schema documentation. ## Technology Stack - **MCP Framework:** FastMCP 0.1+ (decorator-based tool definitions) - **Server:** Python 3.13+, FastAPI patterns, async/await - **Database:** PostgreSQL 14+ with pgvector extension - **Embeddings:** Ollama (nomic-embed-text, 768 dimensions) - **ORM:** SQLAlchemy 2.0 (async), Pydantic V2 for validation - **Type Safety:** Full mypy --strict compliance ## Development ### Running Tests ```bash # Tool handlers uv run python tests/test_tool_handlers.py # Repository indexing uv run python tests/test_embeddings.py # Unit tests uv run pytest tests/ -v ``` ### Code Structure ``` codebase-mcp/ ├── server_fastmcp.py # FastMCP server entry point (NEW) ├── src/ │ ├── mcp/ │ │ └── tools/ # Tool handlers with service integration │ │ ├── tasks.py # Task management │ │ ├── indexing.py # Repository indexing │ │ └── search.py # Semantic search │ ├── services/ # Business logic layer │ │ ├── tasks.py # Task CRUD + git tracking │ │ ├── indexer.py # Indexing orchestration │ │ ├── scanner.py # File discovery │ │ ├── chunker.py # AST-based chunking │ │ ├── embedder.py # Ollama integration │ │ └── searcher.py # pgvector similarity search │ └── models/ # Database models + Pydantic schemas │ ├── task.py # Task, TaskCreate, TaskUpdate │ ├── code_chunk.py # CodeChunk │ └── ... └── tests/ ├── test_tool_handlers.py # Integration tests └── test_embeddings.py # Embedding validation ``` **FastMCP Server Architecture:** - `server_fastmcp.py` - Main entry point using `@mcp.tool()` decorators - Tool handlers in `src/mcp/tools/` provide service integration - Services in `src/services/` contain all business logic - Dual logging: file (`/tmp/codebase-mcp.log`) + MCP protocol ## Installation ### Prerequisites Before installing Codebase MCP Server v2.0, ensure the following requirements are met: **Required Software:** - **PostgreSQL 14+** - Database with pgvector extension for vector similarity search - **Python 3.11+** - Runtime environment (Python 3.13 compatible) - **Ollama** - Local embedding model server with nomic-embed-text model **System Requirements:** - 4GB+ RAM recommended for typical workloads - SSD storage for optimal performance (database and embedding operations are I/O intensive) - Network access to Ollama server (default: localhost:11434) ### Installation Commands Install Codebase MCP Server v2.0 using pip: ```bash # Install latest v2.0 release pip install codebase-mcp ``` **Alternative Installation Methods:** ```bash # Install specific v2.0 version pip install codebase-mcp==2.0.0 # Install from source (for development) git clone https://github.com/cliffclarke/codebase-mcp.git cd codebase-mcp pip install -e . ``` **Key Dependencies Installed Automatically:** - `fastmcp>=0.1.0` - Modern MCP framework - `sqlalchemy>=2.0` - Async database ORM - `pgvector` - PostgreSQL vector extension Python bindings - `ollama` - Embedding generation client - `pydantic>=2.0` - Data validation and settings ### Verification Steps After installation, verify the setup is correct: ```bash # Verify codebase-mcp is installed codebase-mcp --version # Expected output: codebase-mcp 2.0.0 # Check PostgreSQL is accessible psql --version # Expected output: psql (PostgreSQL) 14.x or higher # Verify Ollama is running curl http://localhost:11434/api/tags # Expected output: JSON response with available models # Confirm embedding model is available ollama list | grep nomic-embed-text # Expected output: nomic-embed-text model listed ``` **Setup Complete**: If all verification steps pass, Codebase MCP Server v2.0 is ready for use. Proceed to the Quick Start section for first-time indexing and search operations. ## Multi-Project Configuration The Codebase MCP server supports automatic project switching based on your working directory using `.codebase-mcp/config.json` files. ### Quick Start 1. **Create a config file** in your project root: ```bash mkdir -p .codebase-mcp cat > .codebase-mcp/config.json <<EOF { "version": "1.0", "project": { "name": "my-project", "id": "optional-uuid-here" }, "auto_switch": true } EOF ``` 2. **Set your working directory** (via MCP client): ```javascript await mcpClient.callTool("set_working_directory", { directory: "/absolute/path/to/your/project" }); ``` 3. **Use tools normally** - they'll automatically use your project: ```javascript // Automatically uses "my-project" workspace const result = await mcpClient.callTool("start_indexing_background", { repo_path: "/path/to/repo" }); const jobId = result.job_id; // Poll for completion while (true) { const status = await mcpClient.callTool("get_indexing_status", { job_id: jobId }); if (status.status === "completed" || status.status === "failed") { break; } await sleep(2000); } ``` ### Config File Format ```json { "version": "1.0", "project": { "name": "my-project-name", "id": "optional-project-uuid", "database_name": "optional-database-override" }, "auto_switch": true, "strict_mode": false, "dry_run": false, "description": "Optional project description" } ``` **Fields:** - `version` (required): Config version (currently "1.0") - `project.name` (required): Project identifier (used if no ID provided) - `project.id` (optional): Explicit project UUID (takes priority over name) - `project.database_name` (optional): Override computed database name (see Database Name Resolution below) - `auto_switch` (optional, default true): Enable automatic project switching - `strict_mode` (optional, default false): Reject operations if project mismatch - `dry_run` (optional, default false): Log intended switches without executing **Database Name Resolution:** The server determines which database to use in this order: 1. **Explicit `database_name` in config** - Uses exact database name specified ```json {"project": {"database_name": "cb_proj_my_project_550e8400"}} ``` 2. **Computed from `name` + `id`** - Automatically generates database name ``` Format: cb_proj_{sanitized_name}_{id_prefix} Example: cb_proj_my_project_550e8400 ``` **Use Cases for `database_name` Override:** - Recovering from database name mismatches - Migrating from old database naming schemes - Explicit control over database selection - Debugging and troubleshooting **Example - Auto-generated (default):** ```json { "version": "1.0", "project": { "name": "my-project", "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` Database used: `cb_proj_my_project_550e8400` (auto-computed) **Example - Explicit override:** ```json { "version": "1.0", "project": { "name": "my-project", "id": "550e8400-e29b-41d4-a716-446655440000", "database_name": "cb_proj_legacy_database_12345678" } } ``` Database used: `cb_proj_legacy_database_12345678` (explicit override) ### Project Resolution Priority When you call MCP tools, the server resolves the project workspace using this 4-tier priority system: 1. **Explicit `project_id` parameter** (highest priority) ```javascript await mcpClient.callTool("start_indexing_background", { repo_path: "/path/to/repo", project_id: "explicit-project-id" // Always takes priority }); ``` 2. **Session-based config file** (via `set_working_directory`) - Server searches up to 20 directory levels for `.codebase-mcp/config.json` - Cached with mtime-based invalidation for performance - Isolated per MCP session (multiple clients stay independent) 3. **workflow-mcp integration** (external project tracking) - Queries workflow-mcp server for active project context - Configurable timeout and caching 4. **Default workspace** (fallback) - Uses `project_default` schema when no other resolution succeeds ### Multi-Session Isolation The server maintains separate working directories for each MCP session (client connection): ```javascript // Session 1 (Claude Code instance A) await mcpClient1.callTool("set_working_directory", { directory: "/Users/alice/project-a" }); // Session 2 (Claude Code instance B) await mcpClient2.callTool("set_working_directory", { directory: "/Users/bob/project-b" }); // Each session independently resolves its own project // No cross-contamination between sessions ``` ### Config File Discovery The server searches for `.codebase-mcp/config.json` by: 1. Starting from your working directory 2. Searching up to 20 parent directories 3. Stopping at the first config file found 4. Caching the result (with automatic invalidation on file modification) **Example directory structure:** ``` /Users/alice/projects/my-app/ <- .codebase-mcp/config.json here ├── .codebase-mcp/ │ └── config.json ├── src/ │ └── components/ <- Working directory │ └── Button.tsx ``` If you set working directory to `/Users/alice/projects/my-app/src/components/`, the server will find the config at `/Users/alice/projects/my-app/.codebase-mcp/config.json`. ### Performance - **Config discovery**: <50ms (with upward traversal) - **Cache hit**: <5ms - **Session lookup**: <1ms - **Background cleanup**: Hourly (removes sessions inactive >24h) ## Database Setup ### 1. Create Database ```bash # Connect to PostgreSQL psql -U postgres # Create database CREATE DATABASE codebase_mcp; # Enable pgvector extension \c codebase_mcp CREATE EXTENSION IF NOT EXISTS vector; \q ``` ### 2. Initialize Schema ```bash # Run database initialization script python scripts/init_db.py # Verify schema creation alembic current ``` The initialization script will: - Create all required tables (repositories, files, chunks, tasks) - Set up vector indexes for similarity search - Configure connection pooling - Apply all database migrations ### 3. Verify Setup ```bash # Check database connectivity python -c "from src.database import Database; import asyncio; asyncio.run(Database.create_pool())" # Run migration status check alembic current ``` ### 4. Database Reset & Cleanup During development, you may need to reset your database using the following reset options: - **scripts/clear_data.sh** - Clear all data, keep schema (fastest, no restart needed) - **scripts/reset_database.sh** - Drop and recreate all tables (recommended for schema changes) - **scripts/nuclear_reset.sh** - Drop entire database (requires Claude Desktop restart) ```bash # Quick data wipe (keeps schema) ./scripts/clear_data.sh # Full table reset (recommended) ./scripts/reset_database.sh # Nuclear option (drops database) ./scripts/nuclear_reset.sh ``` ## Running the Server ### FastMCP Server (Recommended) The primary way to run the server is via Claude Desktop or other MCP clients: ```bash # Via Claude Desktop (configured in claude_desktop_config.json) # Server starts automatically when Claude Desktop launches # Manual testing with FastMCP CLI uv run --with fastmcp python server_fastmcp.py # With custom log level LOG_LEVEL=DEBUG uv run --with fastmcp python server_fastmcp.py ``` **Server Entry Point**: `server_fastmcp.py` in repository root **Logging**: All output goes to `/tmp/codebase-mcp.log` (configurable via `LOG_FILE` env var) ### Development Mode (Legacy FastAPI) ```bash # Start with auto-reload (if FastAPI server exists) uvicorn src.main:app --reload --host 127.0.0.1 --port 3000 # With custom log level LOG_LEVEL=DEBUG uvicorn src.main:app --reload ``` ### Production Mode (Legacy) ```bash # Start production server uvicorn src.main:app --host 0.0.0.0 --port 3000 --workers 4 # With gunicorn (recommended for production) gunicorn src.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:3000 ``` ### stdio Transport (Legacy CLI Mode) The legacy MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout. ```bash # Start stdio server (reads JSON-RPC from stdin) python -m src.mcp.stdio_server # Echo a single request echo '{"jsonrpc":"2.0","id":1,"method":"list_tasks","params":{"limit":5}}' | python -m src.mcp.stdio_server # Pipe requests from a file (one JSON-RPC request per line) cat requests.jsonl | python -m src.mcp.stdio_server # Interactive mode (type JSON-RPC requests manually) python -m src.mcp.stdio_server {"jsonrpc":"2.0","id":1,"method":"get_task","params":{"task_id":"..."}} ``` **JSON-RPC 2.0 Request Format:** ```json { "jsonrpc": "2.0", "id": 1, "method": "search_code", "params": { "query": "async def", "limit": 10 } } ``` **JSON-RPC 2.0 Response Format:** ```json { "jsonrpc": "2.0", "id": 1, "result": { "results": [...], "total_count": 42, "latency_ms": 250 } } ``` **Available Methods:** - `search_code` - Semantic code search - `start_indexing_background` - Start background indexing job - `get_indexing_status` - Poll indexing job status **Logging:** All logs go to `/tmp/codebase-mcp.log` (configurable via `LOG_FILE` env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout. ### Health Check ```bash # Check server health curl http://localhost:3000/health # Expected response: { "status": "healthy", "database": "connected", "ollama": "connected", "version": "0.1.0" } ``` ## Usage Examples ### 1. Index a Repository (Background Job) ```python # Start indexing job via MCP protocol { "tool": "start_indexing_background", "arguments": { "repo_path": "/path/to/your/repo" } } # Immediate response { "job_id": "uuid-here", "status": "pending", "message": "Indexing job started", "project_id": "default", "database_name": "cb_proj_default_00000000" } # Poll for status { "tool": "get_indexing_status", "arguments": { "job_id": "uuid-here" } } # Completed response { "job_id": "uuid-here", "status": "completed", "repo_path": "/path/to/your/repo", "files_indexed": 150, "chunks_created": 1200, "error_message": null, "created_at": "2025-10-18T10:30:00Z", "started_at": "2025-10-18T10:30:01Z", "completed_at": "2025-10-18T10:30:45Z" } ``` ### 2. Search Code ```python # Search for authentication logic { "tool": "search_code", "arguments": { "query": "user authentication password validation", "limit": 10, "file_type": "py" } } # Response includes ranked code chunks with context { "results": [...], "total_count": 25, "latency_ms": 230 } ``` ## Architecture ``` ┌─────────────────────────────────────────────────┐ │ MCP Client (AI) │ └─────────────────┬───────────────────────────────┘ │ SSE Protocol ┌─────────────────▼───────────────────────────────┐ │ MCP Server Layer │ │ ┌──────────────────────────────────────────┐ │ │ │ Tool Registration & Routing │ │ │ └──────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────┐ │ │ │ Request/Response Handling │ │ │ └──────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ ┌─────────────────▼───────────────────────────────┐ │ Service Layer │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Indexer │ │ Searcher │ │Task Manager│ │ │ └──────┬─────┘ └──────┬─────┘ └──────┬─────┘ │ │ │ │ │ │ │ ┌──────▼──────────────▼──────────────▼──────┐ │ │ │ Repository Service │ │ │ └──────┬─────────────────────────────────────┘ │ │ │ │ │ ┌──────▼─────────────────────────────────────┐ │ │ │ Embedding Service (Ollama) │ │ │ └─────────────────────────────────────────────┘│ └─────────────────┬───────────────────────────────┘ │ ┌─────────────────▼───────────────────────────────┐ │ Data Layer │ │ ┌──────────────────────────────────────────┐ │ │ │ PostgreSQL with pgvector │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │Repository│ │ Files │ │ Chunks │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ ┌──────────┐ ┌──────────────────────┐ │ │ │ │ │ Tasks │ │ Vector Embeddings │ │ │ │ │ └──────────┘ └──────────────────────┘ │ │ │ └──────────────────────────────────────────┘ │ └──────────────────────────────────────────────────┘ ``` ### Component Overview - **MCP Layer**: Handles protocol compliance, tool registration, SSE transport - **Service Layer**: Business logic for indexing, searching, task management - **Repository Service**: File system operations, git integration, .gitignore handling - **Embedding Service**: Ollama integration for generating text embeddings - **Data Layer**: PostgreSQL with pgvector for storage and similarity search ### Data Flow 1. **Indexing**: Repository → Parse → Chunk → Embed → Store 2. **Searching**: Query → Embed → Vector Search → Rank → Return 3. **Task Tracking**: Create → Update → Git Integration → Query ## Testing ### Run All Tests ```bash # Run all tests with coverage pytest tests/ -v --cov=src --cov-report=term-missing # Run specific test categories pytest tests/unit/ -v # Unit tests only pytest tests/integration/ -v # Integration tests pytest tests/contract/ -v # Contract tests ``` ### Test Categories - **Unit Tests**: Fast, isolated component tests - **Integration Tests**: Database and service integration - **Contract Tests**: MCP protocol compliance validation - **Performance Tests**: Latency and throughput benchmarks ### Coverage Requirements - Minimum coverage: 95% - Critical paths: 100% - View HTML report: `open htmlcov/index.html` ## Performance Tuning ### Database Optimization ```sql -- Optimize vector searches CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Adjust work_mem for large result sets ALTER SYSTEM SET work_mem = '256MB'; SELECT pg_reload_conf(); ``` ### Connection Pool Settings ```python # In .env DATABASE_POOL_SIZE=20 # Connection pool size DATABASE_MAX_OVERFLOW=10 # Max overflow connections DATABASE_POOL_TIMEOUT=30 # Connection timeout in seconds ``` ### Embedding Batch Size ```python # Adjust based on available memory EMBEDDING_BATCH_SIZE=100 # For systems with 8GB+ RAM EMBEDDING_BATCH_SIZE=50 # Default for 4GB RAM EMBEDDING_BATCH_SIZE=25 # For constrained environments ``` ## Troubleshooting ### Common Issues 1. **Database Connection Failed** - Check PostgreSQL is running: `pg_ctl status` - Verify DATABASE_URL in .env - Ensure database exists: `psql -U postgres -l` 2. **Ollama Connection Error** - Check Ollama is running: `curl http://localhost:11434/api/tags` - Verify model is installed: `ollama list` - Check OLLAMA_BASE_URL in .env 3. **Slow Performance** - Check database indexes: `\di` in psql - Monitor query performance: See logs at LOG_FILE path - Adjust batch sizes and connection pool For detailed troubleshooting, see the Configuration Guide troubleshooting section. ## Contributing We follow a specification-driven development workflow using the Specify framework. ### Development Workflow 1. **Feature Specification**: Use `/specify` command to create feature specs 2. **Planning**: Generate implementation plan with `/plan` 3. **Task Breakdown**: Create tasks with `/tasks` 4. **Implementation**: Execute tasks with `/implement` ### Git Workflow ```bash # Create feature branch git checkout -b 001-feature-name # Make atomic commits git add . git commit -m "feat(component): add specific feature" # Push and create PR git push origin 001-feature-name ``` ### Code Quality Standards - **Type Safety**: `mypy --strict` must pass - **Linting**: `ruff check` with no errors - **Testing**: All tests must pass with 95%+ coverage - **Documentation**: Update relevant docs with changes ### Constitutional Principles 1. **Simplicity Over Features**: Focus on core semantic search 2. **Local-First Architecture**: No cloud dependencies 3. **Protocol Compliance**: Strict MCP adherence 4. **Performance Guarantees**: Meet stated benchmarks 5. **Production Quality**: Comprehensive error handling See [.specify/memory/constitution.md](.specify/memory/constitution.md) for full principles. ## FastMCP Migration (Oct 2025) **Migration Complete**: The server has been successfully migrated from the legacy MCP SDK to the modern FastMCP framework. ### What Changed **Before (MCP SDK):** ```python # Old: Manual tool registration with JSON schemas class MCPServer: def __init__(self): self.tools = { "search_code": { "name": "search_code", "description": "...", "inputSchema": {...} } } ``` **After (FastMCP):** ```python # New: Decorator-based tool definitions @mcp.tool() async def search_code(query: str, limit: int = 10) -> dict[str, Any]: """Semantic code search with natural language queries.""" # Implementation ``` ### Key Benefits 1. **Simpler Tool Definitions**: Decorators replace manual JSON schema creation 2. **Type Safety**: Automatic schema generation from Pydantic models 3. **Dual Logging**: File logging + MCP protocol without stdout pollution 4. **Better Error Handling**: Structured error responses with context 5. **Cleaner Architecture**: Separation of tool interface from business logic ### Server Files - **New Entry Point**: `server_fastmcp.py` (root directory) - **Legacy Server**: `src/mcp/mcp_stdio_server_v3.py` (deprecated, will be removed) - **Tool Handlers**: `src/mcp/tools/*.py` (unchanged, reused by FastMCP) - **Services**: `src/services/*.py` (unchanged, business logic intact) ### Configuration Update Required **Update your Claude Desktop config** to use the new server: ```json { "mcpServers": { "codebase-mcp": { "command": "uv", "args": ["run", "--with", "fastmcp", "python", "/path/to/server_fastmcp.py"] } } } ``` ### Migration Notes - All 6 MCP tools remain functional (100% backward compatible) - No database schema changes required - Tool signatures and responses unchanged - Logging now goes exclusively to `/tmp/codebase-mcp.log` - All tests pass with FastMCP implementation ### Performance FastMCP maintains performance targets: - Repository indexing: <60 seconds for 10K files - Code search: <500ms p95 latency - Async/await throughout for optimal concurrency ## License MIT License (LICENSE file pending). ## Support - **Issues**: [GitHub Issues](https://github.com/cliffclarke/codebase-mcp/issues) - **Documentation**: [Full documentation](docs/) - **Logs**: Check `/tmp/codebase-mcp.log` for detailed debugging ## Quick Start ### Basic Usage (Default Project) For most users, the default project workspace is sufficient. All indexing now uses background jobs to prevent MCP client timeouts: ```python # Start background indexing job (returns immediately) job = await start_indexing_background(repo_path="/path/to/your/repo") job_id = job["job_id"] # Poll for completion while True: status = await get_indexing_status(job_id=job_id) if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Check result if status["status"] == "completed": print(f"✅ Indexed {status['files_indexed']} files, {status['chunks_created']} chunks") else: print(f"❌ Indexing failed: {status['error_message']}") # Search code results = await search_code(query="function to handle authentication") # Search with filters results = await search_code( query="database query", file_type="py", limit=20 ) ``` The server automatically uses a default project workspace (`project_default`) if no project ID is specified. ### Multi-Project Usage For users managing multiple codebases or client projects, use the `project_id` parameter to isolate repositories: ```python # Index repositories with project_id job_a = await start_indexing_background( repo_path="/path/to/client-a-repo", project_id="client-a" ) job_b = await start_indexing_background( repo_path="/path/to/client-b-repo", project_id="client-b" ) # Poll both jobs for job in [job_a, job_b]: while True: status = await get_indexing_status(job_id=job["job_id"]) if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Search within specific project results_a = await search_code( query="authentication logic", project_id="client-a" ) results_b = await search_code( query="payment processing", project_id="client-b" ) ``` Each project has its own isolated database schema, ensuring repositories and embeddings are completely separated. ## workflow-mcp Integration (Optional) The Codebase MCP Server can **optionally** integrate with [workflow-mcp](https://github.com/cliffclarke/workflow-mcp) for automatic project context resolution. This is an advanced feature and not required for basic usage. ### Standalone Usage (Default) By default, Codebase MCP operates independently: ```python # Works out of the box without workflow-mcp job = await start_indexing_background(repo_path="/path/to/repo") results = await search_code(query="search query") ``` ### Integration with workflow-mcp If you're using workflow-mcp to manage development projects, Codebase MCP can automatically resolve project context: ```bash # Set workflow-mcp URL in environment export WORKFLOW_MCP_URL=http://localhost:8001 ``` ```python # Now project_id is automatically resolved from workflow-mcp's active project job = await start_indexing_background(repo_path="/path/to/repo") # Uses active project results = await search_code(query="search query") # Searches in active project's context ``` **How It Works:** 1. Codebase MCP queries workflow-mcp for the active project 2. If an active project exists, it's used as the `project_id` 3. If no active project or workflow-mcp is unavailable, falls back to default project 4. You can still override with `--project-id` flag **Configuration:** ```bash # In .env file WORKFLOW_MCP_URL=http://localhost:8001 # Optional, enables integration ``` **See Also:** [workflow-mcp repository](https://github.com/cliffclarke/workflow-mcp) for details on project workspace management. ## Documentation Comprehensive documentation is available for different use cases: - **[Migration Guide](docs/migration/v1-to-v2-migration.md)** - Upgrading from v1.x to v2.x with multi-project support - **[Configuration Guide](docs/configuration/production-config.md)** - Production deployment and tuning - **[Architecture Documentation](docs/architecture/multi-project-design.md)** - System design and multi-project isolation - **[API Reference](docs/api/tool-reference.md)** - Complete MCP tool documentation - **[Glossary](docs/glossary.md)** - Canonical terminology definitions For quick setup, refer to the Installation section above. ## Contributing We welcome contributions to the Codebase MCP Server. This project follows a specification-driven development workflow. ### Getting Started 1. **Read the Architecture**: Start with [docs/architecture/multi-project-design.md](docs/architecture/multi-project-design.md) to understand the system design 2. **Review the Constitution**: See [.specify/memory/constitution.md](.specify/memory/constitution.md) for project principles 3. **Follow the Workflow**: Use the Specify workflow documented in [CLAUDE.md](CLAUDE.md) ### Development Process 1. **Create a feature specification** using `/specify` command 2. **Plan the implementation** with `/plan` 3. **Generate tasks** using `/tasks` 4. **Implement incrementally** with atomic commits ### Code Standards - **Type Safety**: Full mypy --strict compliance - **Testing**: 95%+ test coverage, contract tests for MCP protocol - **Performance**: Meet benchmarks (60s indexing, 500ms search p95) - **Documentation**: Update docs with all changes ### Code of Conduct This project adheres to a code of conduct that promotes a welcoming, inclusive environment. We expect: - Respectful communication in issues and PRs - Constructive feedback focused on code and ideas - Recognition that contributors volunteer their time - Patience with maintainers and fellow contributors By participating, you agree to uphold these standards. ## Acknowledgments - MCP framework powered by [FastMCP](https://github.com/jlowin/fastmcp) - Built with FastAPI, SQLAlchemy, and Pydantic - Vector search powered by [pgvector](https://github.com/pgvector/pgvector) - Embeddings via [Ollama](https://ollama.com/) and nomic-embed-text - Code parsing with tree-sitter - MCP protocol by [Anthropic](https://modelcontextprotocol.io/)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server