Codebase MCP Server

implementation-phases.md•38.6 KiB

# Implementation Phases for Codebase MCP Refactor ## Overview This document provides a phase-by-phase breakdown of the refactoring effort to transform codebase-mcp into a pure semantic code search MCP with multi-project support. Each phase has clear objectives, acceptance criteria, git strategy, and testing requirements. --- ## Development Approach **Strategy**: Option B Sequential - workflow-mcp core exists first (prerequisite) - Refactor codebase-mcp to integrate with workflow-mcp - Independent deployment and testing **Git Strategy**: - **Branch**: `002-refactor-pure-search` - **Commit Approach**: Micro-commits after each phase - **Commit Format**: Conventional Commits (`type(scope): description`) - **Working State**: All tests pass at every commit **Testing Strategy**: - **Unit Tests**: Test individual functions in isolation - **Integration Tests**: Test database operations, Ollama interactions - **Protocol Tests**: Validate MCP compliance via mcp-inspector - **Performance Tests**: Benchmark search latency, indexing throughput - **Multi-Project Tests**: Validate isolation (no cross-contamination) --- ## Phase 0: Prerequisites and Planning ### Objective Ensure workflow-mcp is deployed and validate planning artifacts ### Tasks 1. **Verify workflow-mcp Deployment** ```bash # Test workflow-mcp is running curl http://localhost:3000/mcp/workflow-mcp/health # Expected: {"status": "ok", "version": "1.0.0"} ``` 2. **Validate Planning Artifacts** - ✅ README.md (this document's parent) - ✅ constitution.md - ✅ user-stories.md - ✅ specify-prompt.txt - ✅ tech-stack.md - ✅ refactoring-plan.md - ✅ implementation-phases.md (this document) 3. **Review Current Codebase State** ```bash # Count lines of code cloc src/ tests/ # List all MCP tools grep "@mcp.tool" src/ -r # Run baseline tests pytest tests/ -v --cov=src/codebase_mcp ``` ### Acceptance Criteria - [ ] workflow-mcp is deployed and responding - [ ] All planning documents reviewed and approved - [ ] Baseline metrics captured (LOC, test count, coverage) - [ ] Feature branch ready: `002-refactor-pure-search` ### Git Strategy - No commits in this phase (preparation only) ### Time Estimate - **Duration**: 1 hour - **Dependencies**: workflow-mcp deployment complete --- ## Phase 1: Create Feature Branch and Baseline ### Objective Establish refactor branch and document current state ### Tasks 1. **Create Feature Branch** ```bash git checkout main git pull origin main git checkout -b 002-refactor-pure-search ``` 2. **Capture Baseline Metrics** ```bash # Lines of code cloc src/ tests/ > docs/baseline/loc-before.txt # Test results pytest tests/ -v --cov=src/codebase_mcp --cov-report=html > docs/baseline/tests-before.txt # Database schema pg_dump --schema-only codebase_mcp > docs/baseline/schema-before.sql # Tool list grep "@mcp.tool" src/ -r > docs/baseline/tools-before.txt ``` 3. **Document Baseline State** Create `docs/baseline/baseline-state.md`: ```markdown # Baseline State (Before Refactor) **Date**: 2025-10-11 **Branch**: 002-refactor-pure-search **Commit**: [current commit hash] ## Metrics - Lines of Code: X - Test Count: Y - Test Coverage: Z% - MCP Tools: 16 ## Tool Surface - Search: index_repository, search_code - Work Items: create_work_item, list_work_items, query_work_item, update_work_item - Tasks: create_task, get_task, list_tasks, update_task - Deployments: record_deployment - Vendors: create_vendor, query_vendor_status, update_vendor_status - Configuration: get_project_configuration, update_project_configuration ## Database Schema - Tables: repositories, code_chunks, work_items, tasks, vendors, deployments, project_configuration ``` ### Acceptance Criteria - [ ] Branch created: `002-refactor-pure-search` - [ ] Baseline metrics captured in `docs/baseline/` - [ ] Baseline state documented in `baseline-state.md` - [ ] All baseline tests pass (100%) ### Git Strategy ```bash # After documenting baseline git add docs/baseline/ git commit -m "chore(refactor): establish baseline for pure-search refactor" ``` ### Testing - Run existing test suite to ensure clean starting state - Expected: 100% tests passing ### Time Estimate - **Duration**: 1 hour - **Dependencies**: None --- ## Phase 2: Database Schema Refactoring ### Objective Remove non-search database tables and add multi-project support ### Tasks 1. **Create Migration Script** Create `migrations/002_remove_non_search_tables.sql`: ```sql -- Backup warning -- Run: pg_dump codebase_mcp > backup-before-002.sql -- Drop non-search tables DROP TABLE IF EXISTS deployment_work_items CASCADE; DROP TABLE IF EXISTS deployment_vendors CASCADE; DROP TABLE IF EXISTS deployments CASCADE; DROP TABLE IF EXISTS work_item_dependencies CASCADE; DROP TABLE IF EXISTS work_items CASCADE; DROP TABLE IF EXISTS task_planning_references CASCADE; DROP TABLE IF EXISTS tasks CASCADE; DROP TABLE IF EXISTS vendors CASCADE; DROP TABLE IF EXISTS project_configuration CASCADE; -- Add project_id to remaining tables ALTER TABLE repositories ADD COLUMN project_id TEXT; ALTER TABLE code_chunks ADD COLUMN project_id TEXT; -- Backfill project_id (default project for existing data) UPDATE repositories SET project_id = 'default' WHERE project_id IS NULL; UPDATE code_chunks SET project_id = 'default' WHERE project_id IS NULL; -- Make project_id NOT NULL ALTER TABLE repositories ALTER COLUMN project_id SET NOT NULL; ALTER TABLE code_chunks ALTER COLUMN project_id SET NOT NULL; -- Update unique constraints ALTER TABLE repositories DROP CONSTRAINT IF EXISTS repositories_path_key; ALTER TABLE repositories ADD CONSTRAINT repositories_project_path_key UNIQUE(project_id, path); -- Create indexes for performance CREATE INDEX IF NOT EXISTS idx_repositories_project_id ON repositories(project_id); CREATE INDEX IF NOT EXISTS idx_code_chunks_project_id ON code_chunks(project_id); -- Verify migration SELECT 'Migration 002 complete' AS status; ``` 2. **Update Schema File** Update `src/codebase_mcp/database/schema.sql`: - Remove all non-search table definitions - Add `project_id` to repositories and code_chunks - Update indexes and constraints 3. **Test Migration on Test Database** ```bash # Create test database createdb codebase_mcp_test psql codebase_mcp_test < src/codebase_mcp/database/schema.sql # Populate with test data psql codebase_mcp_test < tests/fixtures/test_data.sql # Run migration psql codebase_mcp_test < migrations/002_remove_non_search_tables.sql # Verify result psql codebase_mcp_test -c "\dt" # Expected: Only repositories and code_chunks ``` ### Acceptance Criteria - [ ] Migration script created and tested - [ ] Schema file updated (only 2 tables: repositories, code_chunks) - [ ] Migration tested on test database - [ ] No data loss (test data preserved in remaining tables) - [ ] Indexes created for performance ### Git Strategy ```bash git add migrations/002_remove_non_search_tables.sql git add src/codebase_mcp/database/schema.sql git commit -m "refactor(db): remove non-search tables, add project_id for multi-project support" ``` ### Testing ```bash # Test migration on fresh database pytest tests/test_migrations.py::test_migration_002 # Test schema integrity pytest tests/test_database.py::test_schema_integrity ``` ### Time Estimate - **Duration**: 2-3 hours - **Dependencies**: Phase 1 complete --- ## Phase 3: Remove Non-Search Tool Implementations ### Objective Delete all tool implementations except search ### Tasks 1. **Delete Tool Files** ```bash # Remove work item tools git rm src/codebase_mcp/tools/work_items.py # Remove task tools git rm src/codebase_mcp/tools/tasks.py # Remove deployment tools git rm src/codebase_mcp/tools/deployments.py # Remove vendor tools git rm src/codebase_mcp/tools/vendors.py # Remove project configuration tools git rm src/codebase_mcp/tools/project_config.py ``` 2. **Update Tool Module Initialization** Edit `src/codebase_mcp/tools/__init__.py`: ```python # Remove all imports except search from .search import index_repository, search_code __all__ = [ "index_repository", "search_code", ] ``` 3. **Fix Import Errors Across Codebase** ```bash # Find all import references to removed tools grep -r "from .tools import" src/ # Update imports in server.py and other files ``` ### Acceptance Criteria - [ ] 5 tool files deleted - [ ] `tools/__init__.py` updated (only search tools exported) - [ ] No import errors in codebase - [ ] Server starts successfully (even if tools not registered yet) ### Git Strategy ```bash git add src/codebase_mcp/tools/ git commit -m "refactor(tools): remove non-search tool implementations" ``` ### Testing ```bash # Test imports python -c "from src.codebase_mcp.tools import index_repository, search_code" # Expected: No ImportError ``` ### Time Estimate - **Duration**: 1 hour - **Dependencies**: Phase 2 complete --- ## Phase 4: Remove Non-Search Database Operations ### Objective Remove CRUD functions for non-search tables ### Tasks 1. **Edit Database Operations File** Edit `src/codebase_mcp/database/operations.py`: - Delete: `create_work_item_db`, `list_work_items_db`, `query_work_item_db`, `update_work_item_db` - Delete: `create_task_db`, `get_task_db`, `list_tasks_db`, `update_task_db` - Delete: `record_deployment_db` - Delete: `create_vendor_db`, `query_vendor_status_db`, `update_vendor_status_db` - Delete: `get_project_configuration_db`, `update_project_configuration_db` - Keep: `index_repository_db`, `search_code_db`, connection utilities 2. **Update Search Functions for Multi-Project** ```python async def index_repository_db( pool: asyncpg.Pool, repo_path: str, project_id: str, # NEW force_reindex: bool ) -> dict: """Index repository with project isolation""" # Add project_id to INSERT statements async with pool.acquire() as conn: # Check if already indexed existing = await conn.fetchrow( "SELECT id FROM repositories WHERE project_id = $1 AND path = $2", project_id, repo_path ) if existing and not force_reindex: return {"status": "skipped", "message": "Already indexed"} # Index repository... # INSERT INTO repositories (project_id, path, ...) VALUES ($1, $2, ...) # INSERT INTO code_chunks (project_id, repository_id, ...) VALUES (...) async def search_code_db( pool: asyncpg.Pool, query_embedding: list[float], project_id: str, # NEW filters: dict, limit: int ) -> list[dict]: """Search code within project""" async with pool.acquire() as conn: # Add WHERE project_id = $1 to SELECT results = await conn.fetch( """ SELECT file_path, content, start_line, end_line, 1 - (embedding <=> $2) AS similarity FROM code_chunks WHERE project_id = $1 ORDER BY embedding <=> $2 LIMIT $3 """, project_id, query_embedding, limit ) return [dict(row) for row in results] ``` ### Acceptance Criteria - [ ] All non-search database functions removed - [ ] Search functions updated with `project_id` parameter - [ ] Database queries include `WHERE project_id = $X` filters - [ ] No unused imports or dead code ### Git Strategy ```bash git add src/codebase_mcp/database/operations.py git commit -m "refactor(db): remove non-search operations, add project_id filters" ``` ### Testing ```bash # Test search database operations pytest tests/test_database.py::test_index_repository_with_project_id pytest tests/test_database.py::test_search_code_with_project_id pytest tests/test_database.py::test_multi_project_isolation ``` ### Time Estimate - **Duration**: 2 hours - **Dependencies**: Phase 3 complete --- ## Phase 5: Update MCP Server Tool Registration ### Objective Register only search tools with MCP server ### Tasks 1. **Update Server File** Edit `src/codebase_mcp/server.py`: ```python from fastmcp import FastMCP from .tools import index_repository, search_code mcp = FastMCP("codebase-mcp") # Register search tools only mcp.tool()(index_repository) mcp.tool()(search_code) if __name__ == "__main__": mcp.run() ``` 2. **Test Server Startup** ```bash # Start server python -m codebase_mcp.server # In another terminal, test with mcp-inspector mcp-inspector http://localhost:8000 ``` 3. **Validate MCP Protocol Compliance** ```bash # List available tools mcp-inspector http://localhost:8000 --list-tools # Expected output: # Tools: # - index_repository # - search_code ``` ### Acceptance Criteria - [ ] Only 2 tools registered with MCP server - [ ] Server starts without errors - [ ] mcp-inspector validation passes - [ ] Tool schemas are valid (Pydantic → JSON Schema) ### Git Strategy ```bash git add src/codebase_mcp/server.py git commit -m "refactor(server): register only search tools (index_repository, search_code)" ``` ### Testing ```bash # Test server startup pytest tests/test_server.py::test_server_startup # Test tool registration pytest tests/test_server.py::test_tool_list # Test MCP protocol compliance pytest tests/test_protocol.py::test_mcp_compliance ``` ### Time Estimate - **Duration**: 1 hour - **Dependencies**: Phase 4 complete --- ## Phase 6: Remove Non-Search Tests ### Objective Delete test files for removed functionality ### Tasks 1. **Delete Test Files** ```bash git rm tests/test_work_items.py git rm tests/test_tasks.py git rm tests/test_deployments.py git rm tests/test_vendors.py git rm tests/test_project_config.py ``` 2. **Update Test Fixtures** Edit `tests/conftest.py`: - Remove fixtures for non-search features - Keep: `db_pool`, `test_repo_path`, `ollama_mock` 3. **Run Remaining Tests** ```bash pytest tests/ -v --cov=src/codebase_mcp ``` ### Acceptance Criteria - [ ] 5 test files deleted - [ ] Remaining tests pass (100%) - [ ] Coverage remains >80% - [ ] No unused fixtures in `conftest.py` ### Git Strategy ```bash git add tests/ git commit -m "test(refactor): remove tests for non-search functionality" ``` ### Testing ```bash # Run all remaining tests pytest tests/ -v # Check coverage pytest tests/ --cov=src/codebase_mcp --cov-report=term # Expected: >80% coverage ``` ### Time Estimate - **Duration**: 1 hour - **Dependencies**: Phase 5 complete --- ## Phase 7: Add Multi-Project Support to Search Tools ### Objective Implement project_id parameter and workflow-mcp integration ### Tasks 1. **Create Project Context Module** Create `src/codebase_mcp/utils/project_context.py`: ```python import aiohttp import os from typing import Optional async def get_active_project_id() -> Optional[str]: """ Query workflow-mcp for active project ID. Returns None if workflow-mcp unavailable. """ workflow_mcp_url = os.getenv( "WORKFLOW_MCP_URL", "http://localhost:3000/mcp/workflow-mcp" ) try: async with aiohttp.ClientSession() as session: async with session.get( f"{workflow_mcp_url}/get_active_project", timeout=aiohttp.ClientTimeout(total=2.0) ) as response: if response.status == 200: data = await response.json() return data.get("project_id") return None except Exception as e: # workflow-mcp not available, log warning import logging logging.warning(f"Failed to get active project from workflow-mcp: {e}") return None ``` 2. **Update Search Tool Parameters** Edit `src/codebase_mcp/tools/search.py`: ```python from pydantic import BaseModel, Field from ..utils.project_context import get_active_project_id class IndexRepositoryParams(BaseModel): repo_path: str = Field(description="Absolute path to repository") project_id: str = Field(description="Project ID for isolation") force_reindex: bool = Field(default=False) class SearchCodeParams(BaseModel): query: str = Field(description="Natural language search query") project_id: str | None = Field( default=None, description="Project ID (uses active project if None)" ) file_type: str | None = Field(default=None, description="File type filter") directory: str | None = Field(default=None, description="Directory filter") limit: int = Field(default=10, ge=1, le=50) @mcp.tool() async def index_repository(params: IndexRepositoryParams) -> dict: """Index a code repository for semantic search""" pool = await get_db_pool(params.project_id) try: result = await index_repository_db( pool=pool, repo_path=params.repo_path, project_id=params.project_id, force_reindex=params.force_reindex ) return result finally: await pool.close() @mcp.tool() async def search_code(params: SearchCodeParams) -> dict: """Search code using semantic similarity""" # Resolve project_id project_id = params.project_id or await get_active_project_id() if not project_id: raise ValueError( "project_id required when workflow-mcp is unavailable" ) # Get project database pool pool = await get_db_pool(project_id) try: # Generate query embedding query_embedding = await generate_embedding(params.query) # Search database results = await search_code_db( pool=pool, query_embedding=query_embedding, project_id=project_id, filters={ "file_type": params.file_type, "directory": params.directory, }, limit=params.limit ) return { "results": results, "project_id": project_id, "total_count": len(results), } finally: await pool.close() ``` 3. **Write Multi-Project Tests** Create `tests/test_multi_project.py`: ```python import pytest import pytest_asyncio @pytest.mark.asyncio async def test_multi_project_isolation(db_pool_project_a, db_pool_project_b): """Test that projects are isolated (no cross-contamination)""" # Index same file in two projects await index_repository_db(db_pool_project_a, "/repo", "project-a", False) await index_repository_db(db_pool_project_b, "/repo", "project-b", False) # Search in project-a results_a = await search_code_db( db_pool_project_a, [0.1]*768, "project-a", {}, 10 ) # Search in project-b results_b = await search_code_db( db_pool_project_b, [0.1]*768, "project-b", {}, 10 ) # Verify no overlap assert len(results_a) > 0 assert len(results_b) > 0 assert all(r["project_id"] == "project-a" for r in results_a) assert all(r["project_id"] == "project-b" for r in results_b) @pytest.mark.asyncio async def test_search_with_workflow_mcp_integration(mock_workflow_mcp): """Test search uses active project from workflow-mcp""" # Mock workflow-mcp to return "project-alpha" mock_workflow_mcp.set_active_project("project-alpha") # Search without explicit project_id params = SearchCodeParams(query="authentication", project_id=None) result = await search_code(params) # Verify used project-alpha assert result["project_id"] == "project-alpha" ``` ### Acceptance Criteria - [ ] `project_id` parameter added to both tools - [ ] `get_active_project_id()` helper implemented - [ ] Search falls back to explicit `project_id` if workflow-mcp unavailable - [ ] Multi-project isolation tests pass (no cross-contamination) - [ ] Integration tests with workflow-mcp pass ### Git Strategy ```bash git add src/codebase_mcp/utils/project_context.py git add src/codebase_mcp/tools/search.py git add tests/test_multi_project.py git commit -m "feat(search): add multi-project support with workflow-mcp integration" ``` ### Testing ```bash # Test multi-project isolation pytest tests/test_multi_project.py -v # Test workflow-mcp integration pytest tests/test_workflow_integration.py -v ``` ### Time Estimate - **Duration**: 3 hours - **Dependencies**: Phase 6 complete --- ## Phase 8: Update Database Connection Management ### Objective Implement per-project database connection pools ### Tasks 1. **Refactor Connection Module** Edit `src/codebase_mcp/database/connection.py`: ```python import asyncpg import os from typing import Dict, Optional # Per-project connection pools _pools: Dict[str, asyncpg.Pool] = {} async def get_db_pool(project_id: str) -> asyncpg.Pool: """Get or create connection pool for project database""" if project_id not in _pools: database_name = f"codebase_{project_id}" _pools[project_id] = await asyncpg.create_pool( host=os.getenv("POSTGRES_HOST", "localhost"), port=int(os.getenv("POSTGRES_PORT", "5432")), database=database_name, user=os.getenv("POSTGRES_USER", "postgres"), password=os.getenv("POSTGRES_PASSWORD", ""), min_size=5, max_size=20, ) return _pools[project_id] async def close_all_pools(): """Close all database connection pools (for shutdown)""" for pool in _pools.values(): await pool.close() _pools.clear() async def ensure_database_exists(project_id: str): """Create project database if it doesn't exist""" database_name = f"codebase_{project_id}" # Connect to default postgres database conn = await asyncpg.connect( host=os.getenv("POSTGRES_HOST", "localhost"), port=int(os.getenv("POSTGRES_PORT", "5432")), database="postgres", user=os.getenv("POSTGRES_USER", "postgres"), password=os.getenv("POSTGRES_PASSWORD", ""), ) try: # Check if database exists exists = await conn.fetchval( "SELECT 1 FROM pg_database WHERE datname = $1", database_name ) if not exists: # Create database await conn.execute(f"CREATE DATABASE {database_name}") # Connect to new database and create schema project_conn = await asyncpg.connect( host=os.getenv("POSTGRES_HOST", "localhost"), port=int(os.getenv("POSTGRES_PORT", "5432")), database=database_name, user=os.getenv("POSTGRES_USER", "postgres"), password=os.getenv("POSTGRES_PASSWORD", ""), ) try: # Load schema from schema.sql with open("src/codebase_mcp/database/schema.sql") as f: schema_sql = f.read() await project_conn.execute(schema_sql) finally: await project_conn.close() finally: await conn.close() ``` 2. **Update Tool Implementations** - Ensure all tools call `ensure_database_exists(project_id)` before operations - Update `index_repository` and `search_code` to use `get_db_pool(project_id)` 3. **Write Connection Pool Tests** ```python @pytest.mark.asyncio async def test_per_project_pools(): """Test that each project has its own connection pool""" pool_a = await get_db_pool("project-a") pool_b = await get_db_pool("project-b") assert pool_a is not pool_b assert pool_a is await get_db_pool("project-a") # Same pool on second call @pytest.mark.asyncio async def test_database_auto_creation(): """Test that databases are created automatically""" project_id = "test-auto-create" await ensure_database_exists(project_id) # Verify database exists pool = await get_db_pool(project_id) async with pool.acquire() as conn: result = await conn.fetchval("SELECT 1") assert result == 1 ``` ### Acceptance Criteria - [ ] One connection pool per project - [ ] Pools created lazily on first access - [ ] Databases auto-created if missing - [ ] Cleanup function closes all pools - [ ] Tests validate pool isolation ### Git Strategy ```bash git add src/codebase_mcp/database/connection.py git add tests/test_connection_pooling.py git commit -m "feat(db): implement per-project connection pools with auto-creation" ``` ### Testing ```bash # Test connection pooling pytest tests/test_connection_pooling.py -v # Test multi-project with real databases pytest tests/test_multi_project.py --integration -v ``` ### Time Estimate - **Duration**: 2 hours - **Dependencies**: Phase 7 complete --- ## Phase 9: Update Documentation ### Objective Update all documentation to reflect search-only scope ### Tasks 1. **Update README.md** - Remove: Work Items, Tasks, Vendors, Deployments sections - Add: Multi-Project Support section - Update: Tool list (only 2 tools) - Update: Architecture diagram (simplified) 2. **Update API Documentation** Edit `docs/api.md`: - Remove: All non-search tool documentation - Update: `index_repository` with `project_id` parameter - Update: `search_code` with `project_id` parameter - Add: Examples with multi-project usage 3. **Update Architecture Documentation** Edit `docs/architecture.md`: - Update: Database schema diagram (only 2 tables) - Add: Multi-project architecture diagram - Add: workflow-mcp integration diagram 4. **Update CHANGELOG.md** ```markdown # Changelog ## [2.0.0] - 2025-10-XX ### BREAKING CHANGES - Removed all non-search functionality (work items, tasks, vendors, deployments) - Moved removed functionality to `workflow-mcp` (install separately) - Added `project_id` parameter to `index_repository` and `search_code` - Database schema simplified (only `repositories` and `code_chunks` tables) ### Added - Multi-project support (one database per project) - Integration with workflow-mcp for active project context - Automatic database creation for new projects - Per-project connection pooling ### Removed - `create_work_item`, `list_work_items`, `query_work_item`, `update_work_item` - `create_task`, `get_task`, `list_tasks`, `update_task` - `record_deployment` - `create_vendor`, `query_vendor_status`, `update_vendor_status` - `get_project_configuration`, `update_project_configuration` ### Migration - See `docs/migration-guide.md` for migration instructions - Database migration script: `migrations/002_remove_non_search_tables.sql` ``` ### Acceptance Criteria - [ ] README.md updated (search-only scope) - [ ] API docs updated (only 2 tools) - [ ] Architecture docs updated (simplified diagrams) - [ ] CHANGELOG.md documents breaking changes - [ ] All documentation reviewed for accuracy ### Git Strategy ```bash git add README.md docs/ CHANGELOG.md git commit -m "docs(refactor): update documentation for search-only scope and multi-project support" ``` ### Testing - Manual review of all documentation - Verify links work (no broken references) ### Time Estimate - **Duration**: 2 hours - **Dependencies**: Phase 8 complete --- ## Phase 10: Create Migration Guide ### Objective Provide clear migration path for existing users ### Tasks 1. **Create Migration Guide** Create `docs/migration-guide.md` (see refactoring-plan.md for full content) 2. **Test Migration Instructions** - Follow migration guide step-by-step on test database - Verify all steps work as documented - Update guide based on testing results 3. **Create Migration Checklist** ```markdown # Migration Checklist - [ ] Backup existing database: `pg_dump codebase_mcp > backup.sql` - [ ] Install workflow-mcp: `pip install workflow-mcp` - [ ] Update MCP client config (add workflow-mcp server) - [ ] Run migration script: `psql codebase_mcp < migrations/002_remove_non_search_tables.sql` - [ ] Verify migration: `psql codebase_mcp -c "\dt"` - [ ] Update tool calls (add `project_id` parameter) - [ ] Test search functionality - [ ] Migrate work items/tasks data to workflow-mcp (see workflow-mcp docs) ``` ### Acceptance Criteria - [ ] Migration guide created and comprehensive - [ ] Migration tested on real database - [ ] Checklist provided for users - [ ] Common issues documented with solutions ### Git Strategy ```bash git add docs/migration-guide.md git commit -m "docs(migration): add comprehensive migration guide for v2.0.0" ``` ### Testing - Manual testing of migration steps - Validate backup/restore process ### Time Estimate - **Duration**: 2 hours - **Dependencies**: Phase 9 complete --- ## Phase 11: Performance Testing and Optimization ### Objective Validate performance meets constitutional requirements ### Tasks 1. **Create Performance Test Suite** Create `tests/test_performance.py`: ```python import pytest import time @pytest.mark.performance @pytest.mark.asyncio async def test_search_latency_p95(large_indexed_repo): """Test search latency meets <500ms p95 target""" latencies = [] # Run 100 searches for i in range(100): start = time.time() result = await search_code( SearchCodeParams( query=f"test query {i}", project_id="large-repo", limit=10 ) ) latency = (time.time() - start) * 1000 # Convert to ms latencies.append(latency) # Calculate p95 latencies.sort() p95 = latencies[int(0.95 * len(latencies))] p50 = latencies[int(0.50 * len(latencies))] print(f"Search latency - p50: {p50:.2f}ms, p95: {p95:.2f}ms") assert p95 < 500, f"p95 latency {p95:.2f}ms exceeds 500ms target" assert p50 < 300, f"p50 latency {p50:.2f}ms exceeds 300ms target" @pytest.mark.performance @pytest.mark.asyncio async def test_indexing_throughput(temp_repo_10k_files): """Test indexing meets <60s for 10k files target""" start = time.time() result = await index_repository( IndexRepositoryParams( repo_path=temp_repo_10k_files, project_id="perf-test", force_reindex=True ) ) duration = time.time() - start print(f"Indexed {result['files_indexed']} files in {duration:.2f}s") assert duration < 60, f"Indexing took {duration:.2f}s, exceeds 60s target" assert result["status"] == "success" ``` 2. **Run Performance Benchmarks** ```bash # Run performance tests pytest tests/test_performance.py -v -m performance # Generate performance report pytest tests/test_performance.py --benchmark-only --benchmark-json=perf-results.json ``` 3. **Optimize if Needed** - If performance targets not met, profile and optimize: - HNSW index parameters (m, ef_construction) - Connection pool size - Batch sizes for indexing - Ollama embedding generation (concurrent requests) ### Acceptance Criteria - [ ] Search latency <500ms (p95) - [ ] Indexing throughput <60s for 10k files - [ ] Performance tests pass consistently (3+ runs) - [ ] Performance results documented ### Git Strategy ```bash git add tests/test_performance.py git commit -m "test(performance): add performance benchmarks for search and indexing" ``` ### Testing ```bash # Run performance suite multiple times for i in {1..3}; do pytest tests/test_performance.py -v -m performance done ``` ### Time Estimate - **Duration**: 3 hours (includes optimization if needed) - **Dependencies**: Phase 10 complete --- ## Phase 12: Final Validation and Release Preparation ### Objective Validate all requirements met and prepare for release ### Tasks 1. **Run Full Test Suite** ```bash # All tests pytest tests/ -v --cov=src/codebase_mcp --cov-report=html # Verify coverage >80% open htmlcov/index.html ``` 2. **Run mypy Type Checking** ```bash mypy --strict src/codebase_mcp/ # Expected: Success: no issues found ``` 3. **Run MCP Protocol Compliance Tests** ```bash # Start server python -m codebase_mcp.server & # Run mcp-inspector mcp-inspector http://localhost:8000 --full-validation # Expected: 100% protocol compliance ``` 4. **Validate Against Constitution** - [ ] Principle I: Simplicity Over Features (only search tools) - [ ] Principle II: Local-First Architecture (Ollama, PostgreSQL local) - [ ] Principle III: Protocol Compliance (mcp-inspector passes) - [ ] Principle IV: Performance Guarantees (benchmarks pass) - [ ] Principle V: Production Quality (mypy passes, coverage >80%) - [ ] Principle VI: Specification-First (spec.md exists, approved) - [ ] Principle VII: TDD (all tests pass) - [ ] Principle VIII: Pydantic Type Safety (all models use Pydantic) - [ ] Principle IX: Orchestrated Subagents (N/A for refactor) - [ ] Principle X: Git Micro-Commits (verified in commit history) - [ ] Principle XI: FastMCP Foundation (FastMCP used) 5. **Compare Baseline vs Final State** ```bash # Final metrics cloc src/ tests/ > docs/baseline/loc-after.txt pytest tests/ --cov=src/codebase_mcp > docs/baseline/tests-after.txt # Generate comparison report echo "## Refactor Results" > docs/baseline/comparison.md echo "" >> docs/baseline/comparison.md echo "### Before:" >> docs/baseline/comparison.md cat docs/baseline/loc-before.txt >> docs/baseline/comparison.md echo "" >> docs/baseline/comparison.md echo "### After:" >> docs/baseline/comparison.md cat docs/baseline/loc-after.txt >> docs/baseline/comparison.md ``` 6. **Update Version Number** ```bash # Update version in pyproject.toml sed -i 's/version = "1.0.0"/version = "2.0.0"/' pyproject.toml # Update version in __init__.py sed -i 's/__version__ = "1.0.0"/__version__ = "2.0.0"/' src/codebase_mcp/__init__.py ``` 7. **Create Release Tag** ```bash git tag -a v2.0.0 -m "Release v2.0.0: Pure semantic search MCP with multi-project support" ``` ### Acceptance Criteria - [ ] All tests pass (100%) - [ ] Coverage >80% - [ ] mypy --strict passes (0 errors) - [ ] mcp-inspector validation passes (100%) - [ ] All constitutional principles satisfied - [ ] Version bumped to 2.0.0 - [ ] Release tag created ### Git Strategy ```bash git add pyproject.toml src/codebase_mcp/__init__.py docs/baseline/ git commit -m "chore(release): prepare v2.0.0 release" git tag -a v2.0.0 -m "Release v2.0.0: Pure semantic search MCP" ``` ### Testing - Full regression testing (all tests) - Manual smoke testing (index + search) ### Time Estimate - **Duration**: 2 hours - **Dependencies**: Phase 11 complete --- ## Post-Implementation: Merge and Deploy ### Tasks 1. **Create Pull Request** ```bash git push origin 002-refactor-pure-search git push origin v2.0.0 # Create PR via GitHub CLI gh pr create \ --title "Refactor: Pure semantic search MCP with multi-project support" \ --body "$(cat docs/mcp-split-plan/01-codebase-mcp/refactoring-plan.md)" ``` 2. **Code Review** - Review all changes systematically - Verify constitutional compliance - Check documentation completeness - Validate test coverage 3. **Merge to Main** ```bash # After approval gh pr merge 002-refactor-pure-search --squash ``` 4. **Deploy Documentation** - Update README on repository homepage - Publish migration guide - Announce breaking changes (v2.0.0) ### Time Estimate - **Duration**: 1-2 hours (review time varies) - **Dependencies**: Phase 12 complete --- ## Timeline Summary | Phase | Description | Duration | Dependencies | |-------|-------------|----------|--------------| | 0 | Prerequisites | 1h | None | | 1 | Branch & Baseline | 1h | Phase 0 | | 2 | Database Schema | 2-3h | Phase 1 | | 3 | Remove Tool Files | 1h | Phase 2 | | 4 | Remove DB Operations | 2h | Phase 3 | | 5 | Update Server | 1h | Phase 4 | | 6 | Remove Tests | 1h | Phase 5 | | 7 | Multi-Project Support | 3h | Phase 6 | | 8 | Connection Pooling | 2h | Phase 7 | | 9 | Documentation | 2h | Phase 8 | | 10 | Migration Guide | 2h | Phase 9 | | 11 | Performance Testing | 3h | Phase 10 | | 12 | Final Validation | 2h | Phase 11 | | Post | PR & Merge | 1-2h | Phase 12 | **Total Estimated Time**: 23-25 hours (3-4 days with testing and validation) --- ## Success Criteria (Overall) ### Functional - [ ] Only 2 MCP tools: `index_repository`, `search_code` - [ ] Multi-project support works (isolation validated) - [ ] workflow-mcp integration works (active project detection) - [ ] Falls back to explicit project_id when workflow-mcp unavailable ### Non-Functional - [ ] Search latency <500ms (p95) - [ ] Indexing throughput <60s for 10k files - [ ] 100% MCP protocol compliance - [ ] Type-safe: mypy --strict passes - [ ] Test coverage >80% ### Process - [ ] All phases completed in order - [ ] Micro-commits after each phase (working state) - [ ] Documentation updated and comprehensive - [ ] Migration guide tested and accurate ### Constitutional Compliance - [ ] All 11 principles satisfied - [ ] No scope creep (only search) - [ ] Local-first (no cloud dependencies) - [ ] Production quality (error handling, logging, types) --- ## Risk Management ### Risk: Performance Regression **Mitigation**: Performance tests in Phase 11, optimize before release ### Risk: Data Loss During Migration **Mitigation**: Migration guide emphasizes backup, migration tested on test DB first ### Risk: Breaking Changes Impact Users **Mitigation**: Clear migration guide, semantic versioning (v2.0.0), deprecation notices ### Risk: workflow-mcp Integration Failures **Mitigation**: Fallback to explicit project_id, clear error messages, integration tests --- ## Next Steps After Completion 1. **Monitor Production Usage** - Track error rates, latency percentiles - Collect user feedback on migration experience 2. **Future Enhancements** (v2.1.0+) - Add incremental indexing (only changed files) - Support for additional languages (via tree-sitter) - Query suggestions (based on indexed code) - Search analytics (popular queries, result relevance) 3. **Integration with Other MCPs** - Collaborate with workflow-mcp for deeper integration - Support project templates (auto-index on project creation)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

implementation-phases.md•38.6 KiB