Codebase MCP Server

codebase-mcp
specs
006-database-schema-refactoring

plan.md•20.9 KiB

# Implementation Plan: Database Schema Refactoring for Multi-Project Support **Branch**: `006-database-schema-refactoring` | **Date**: 2025-10-11 | **Spec**: [spec.md](./spec.md) **Input**: Feature specification from `/Users/cliffclarke/Claude_Code/codebase-mcp/specs/006-database-schema-refactoring/spec.md` ## Execution Flow (/plan command scope) ``` 1. Load feature spec from Input path → ✅ COMPLETE: Spec loaded successfully 2. Fill Technical Context (scan for NEEDS CLARIFICATION) → ✅ COMPLETE: Context filled, single Python project detected 3. Fill the Constitution Check section based on the content of the constitution document. → IN PROGRESS 4. Evaluate Constitution Check section below → PENDING 5. Execute Phase 0 → research.md → PENDING 6. Execute Phase 1 → contracts, data-model.md, quickstart.md, CLAUDE.md → PENDING 7. Re-evaluate Constitution Check section → PENDING 8. Plan Phase 2 → Describe task generation approach (DO NOT create tasks.md) → PENDING 9. STOP - Ready for /tasks command → PENDING ``` **IMPORTANT**: The /plan command STOPS at step 8. Phases 2-4 are executed by other commands: - Phase 2: /tasks command creates tasks.md - Phase 3-4: Implementation execution (manual or via tools) ## Summary Refactor the Codebase MCP Server database schema to remove 9 unused tables from non-search features (work tracking, vendor management, deployments) and add `project_id` columns to `repositories` and `code_chunks` tables. This establishes the foundation for multi-project support (Phase 03) while simplifying the database to focus exclusively on semantic code search functionality. **Key Technical Approach**: - Alembic migration with atomic transaction execution - Database-level validation via CHECK constraints (regex pattern `^[a-z0-9-]{1,50}$`) - Referential integrity preservation (code_chunks copy project_id from parent repository) - Performance target: < 5 minutes migration duration - Safety: Single transaction with automatic rollback, comprehensive logging ## Technical Context **Language/Version**: Python 3.11+ (confirmed in constitution) **Primary Dependencies**: PostgreSQL 14+ with pgvector, SQLAlchemy (async), psycopg2/asyncpg **Storage**: PostgreSQL 14+ (existing database, migration-based schema changes) **Testing**: pytest with contract tests, integration tests, performance validation **Target Platform**: Linux/macOS server (local-first architecture) **Project Type**: Single Python project (MCP server) **Performance Goals**: Migration < 5 minutes, no impact to search query performance (< 500ms p95) **Constraints**: - Single atomic transaction (all-or-nothing rollback) - Zero data loss for repositories/code_chunks - Database-level validation enforcement - Transactional DDL support (PostgreSQL standard) **Scale/Scope**: - Testing: 100 repositories, 10,000 code chunks minimum - Production: Supports fresh start with workflow-mcp integration - 9 tables dropped, 2 tables modified, 3 SQL scripts created ## Constitution Check *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* ### Principle I: Simplicity Over Features **Status**: ✅ PASS - **Rationale**: This feature REMOVES complexity (9 unused tables) while adding minimal foundation (project_id columns) for future multi-project support. Directly serves the "Simplicity Over Features" principle by cleaning up schema cruft. ### Principle II: Local-First Architecture **Status**: ✅ PASS - **Rationale**: Database migration is purely local. No cloud dependencies, no external API calls. PostgreSQL remains the single source of truth. ### Principle III: Protocol Compliance (MCP via SSE) **Status**: ✅ PASS - **Rationale**: Schema changes are orthogonal to MCP protocol. No changes to protocol communication, logging remains file-based (`/tmp/codebase-mcp.log`). ### Principle IV: Performance Guarantees **Status**: ✅ PASS (with monitoring) - **Rationale**: - Migration performance explicitly specified (< 5 minutes - FR-031) - Index added on (project_id, repository_id) for search performance (FR-008) - No degradation to existing 500ms search p95 target (verified through testing) ### Principle V: Production Quality Standards **Status**: ✅ PASS - **Rationale**: - Comprehensive error handling: Single transaction with automatic rollback (FR-018) - Clear error messages: Fail fast with explicit errors (FR-019) - Logging: All major migration steps logged (FR-020) - Validation: CHECK constraints at database level (FR-024) - Testing: Requires testing on database copy before production (FR-027) ### Principle VI: Specification-First Development **Status**: ✅ PASS - **Rationale**: Spec completed with clarifications before planning. 31 functional requirements defined with measurable acceptance criteria. ### Principle VII: Test-Driven Development **Status**: ✅ PASS - **Rationale**: Migration requires validation script (FR-015), schema validation tests (FR-030), and rollback testing (FR-029). Tests verify correctness before and after migration. ### Principle VIII: Pydantic-Based Type Safety **Status**: ✅ PASS (post-migration) - **Rationale**: While Alembic migrations don't use Pydantic directly, Phase 02 will update Pydantic models (Repository, CodeChunk) to include project_id field with validation. This phase lays database foundation. ### Principle IX: Orchestrated Subagent Execution **Status**: ✅ PASS - **Rationale**: Implementation phase (Phase 02+) will use subagents for parallel test writing and script creation where appropriate. ### Principle X: Git Micro-Commit Strategy **Status**: ✅ PASS - **Rationale**: Feature branch `006-database-schema-refactoring` already created. Implementation will use micro-commits per task with Conventional Commits format. ### Principle XI: FastMCP and Python SDK Foundation **Status**: ✅ PASS (no changes required) - **Rationale**: Schema changes don't affect FastMCP tool registration. MCP server continues using FastMCP decorators unchanged. **Initial Assessment**: NO VIOLATIONS - All constitutional principles satisfied. ## Project Structure ### Documentation (this feature) ``` specs/006-database-schema-refactoring/ ├── spec.md # Feature specification (complete) ├── plan.md # This file (/plan command output) ├── research.md # Phase 0 output (in progress) ├── data-model.md # Phase 1 output (pending) ├── quickstart.md # Phase 1 output (pending) ├── contracts/ # Phase 1 output (pending) └── tasks.md # Phase 2 output (/tasks command - NOT created by /plan) ``` ### Source Code (repository root) ``` src/ ├── models/ # Pydantic models (Phase 02: update for project_id) ├── services/ # Business logic (Phase 02: update queries) ├── database/ # Database connection & schema └── mcp/ # FastMCP server implementation migrations/ # Alembic database migration scripts ├── versions/ # Alembic migration versions │ ├── 001_initial_schema.py # Existing baseline (Alembic) │ ├── 003_project_tracking.py # Existing (Alembic) │ ├── 005_case_insensitive_vendor.py # Existing (Alembic) │ └── 002_remove_non_search_tables.py # NEW: This migration (Alembic) ├── env.py # Alembic environment configuration └── script.py.mako # Alembic template tests/ ├── contract/ # Contract tests (minimal for schema changes) ├── integration/ # Migration integration tests │ ├── test_migration_002_upgrade.py # NEW: Test upgrade execution │ ├── test_migration_002_downgrade.py # NEW: Test downgrade execution │ └── test_migration_002_validation.py # NEW: Validation test suite ├── unit/ # Unit tests for validation logic └── performance/ # Migration performance tests └── test_migration_002_performance.py # NEW: Performance test tests/fixtures/ └── generate_test_data.py # NEW: Test data generation utility ``` **Structure Decision**: Single Python project with standard src/tests layout. Database migrations use **Alembic** (existing migration tool), with migrations in `migrations/versions/` following Alembic's auto-generated revision IDs. This feature adds migration 002 as an Alembic Python migration that revises 005. ## Phase 0: Research & Design Decisions **Research Complete**: ✅ ### R1: Migration Tool Selection **Decision**: Use Alembic (existing migration framework) with transactional DDL execution **Rationale**: - Project already uses Alembic (001_initial_schema.py, 003, 005 exist) - Alembic handles transactions automatically (single transaction per upgrade/downgrade) - Built-in migration tracking via `alembic_version` table - Better integration with SQLAlchemy (existing ORM) - Automatic rollback on any error (connection loss, constraint violation, etc.) - Standard tooling: `alembic upgrade head`, `alembic downgrade -1` **Alternatives Considered**: - Raw SQL scripts: Rejected (creates parallel migration system, manual tracking required) - Manual transactions: Rejected (Alembic handles this correctly) ### R2: project_id Validation Strategy **Decision**: Multi-layer validation (database CHECK constraint + Pydantic validator) **Rationale**: - Database CHECK constraint: Last line of defense, prevents SQL injection, enforces at insert/update - Pydantic validator (Phase 02): Early validation in application code, better error messages - Regex pattern `^[a-z0-9-]{1,50}$`: Simple, database-native, no extensions needed **Alternatives Considered**: - Application-only validation: Rejected (can be bypassed via direct DB access) - Custom PostgreSQL domain type: Rejected (unnecessary complexity for simple pattern) - ENUM type: Rejected (inflexible, requires DDL for each new project_id) ### R3: Foreign Key Verification Approach **Decision**: Query `information_schema.table_constraints` and `information_schema.referential_constraints` before dropping tables **Rationale**: - Standard SQL approach, works across PostgreSQL versions - Explicit verification prevents accidental data corruption - Clear error message identifies problematic foreign keys - Fails fast before any destructive operations **Alternatives Considered**: - `DROP TABLE ... CASCADE`: Rejected (too aggressive, hides potential issues) - Manual inspection: Rejected (error-prone, not automatable) - Rely on PostgreSQL error: Rejected (less clear error messages, fails after starting drops) ### R4: Migration Idempotency Strategy **Decision**: Rely on Alembic's built-in tracking with defensive IF EXISTS/IF NOT EXISTS clauses **Rationale**: - Alembic's `alembic_version` table tracks applied migrations automatically - `alembic upgrade head` only runs unapplied migrations - Use `IF EXISTS` / `IF NOT EXISTS` as defense-in-depth (protects against manual schema changes) - Clear status: `alembic current` shows current migration state - Idempotency built into framework: running `alembic upgrade head` twice is safe (no-op on second run) **Alternatives Considered**: - Manual version tracking: Rejected (Alembic provides this) - No idempotency checks: Rejected (defensive programming principle) - Schema comparison: Rejected (Alembic tracks at migration level, not schema level) ### R5: Code Chunk project_id Population Strategy **Decision**: Three-step ADD COLUMN approach with UPDATE between steps **Rationale**: ```sql -- Step 1: Add nullable column ALTER TABLE code_chunks ADD COLUMN project_id VARCHAR(50); -- Step 2: Populate from parent repository UPDATE code_chunks SET project_id = ( SELECT project_id FROM repositories WHERE repositories.id = code_chunks.repository_id ); -- Step 3: Add NOT NULL constraint + CHECK constraint ALTER TABLE code_chunks ALTER COLUMN project_id SET NOT NULL, ADD CONSTRAINT check_project_id_pattern CHECK (project_id ~ '^[a-z0-9-]{1,50}$'); ``` - Approach: Separate steps within single transaction - Handles existing data gracefully (copies from parent) - Validates referential integrity (UPDATE fails if orphan chunks exist) - NOT NULL constraint applied after population ensures no NULLs **Alternatives Considered**: - Direct ADD COLUMN with NOT NULL + UPDATE: Rejected (PostgreSQL doesn't allow UPDATE between constraint checks) - Default value: Rejected (loses referential integrity, assigns 'default' to all chunks) - Two separate transactions: Rejected (violates single-transaction requirement) ### R6: Migration Performance Optimization **Decision**: Create index AFTER data population, use CONCURRENTLY for production (optional step) **Rationale**: - Index creation on existing data: Faster to populate first, then index - `CREATE INDEX CONCURRENTLY`: Optional for production (allows reads during index build, but not transactional) - For initial migration (minimal data): Standard index creation acceptable (< 1 second) - Performance target: < 5 minutes easily met with up to 100K rows **Alternatives Considered**: - Index before populating: Rejected (slower, each UPDATE incurs index maintenance cost) - No performance optimization: Rejected (violates FR-031 performance requirement) ### R7: Logging and Observability Strategy **Decision**: Use Python logging within Alembic migration + SQL RAISE NOTICE statements **Rationale**: - Python `logger.info()` statements in migration code: Structured, timestamped, configurable - `RAISE NOTICE` for SQL-level progress: Inline visibility during SQL execution - Alembic CLI captures all output by default (stdout/stderr) - Log to `/tmp/codebase-mcp-migration.log` via Python logging configuration - Better exception context: Python stack traces + SQL error messages **Alternatives Considered**: - SQL-only logging: Rejected (less structured, harder to parse) - No logging: Rejected (violates FR-020 logging requirement) - Custom logging table: Rejected (overkill, complicates transaction isolation) ### R8: Rollback Design Pattern (Alembic Downgrade) **Decision**: Implement `downgrade()` function with inverse operations + data restoration caveat **Rationale**: - Alembic's `downgrade()` function mirrors `upgrade()`: Inverse DDL operations - `DROP COLUMN` mirrors `ADD COLUMN`, `CREATE TABLE` mirrors `DROP TABLE` - Schema-only rollback: Structure restored, data in dropped tables lost (acceptable per spec) - Data restoration requires backup: Documented in migration docstring - Fast execution: `alembic downgrade -1` runs in < 30 seconds - Version tracking: Alembic updates `alembic_version` table automatically **Alternatives Considered**: - Separate rollback script: Rejected (Alembic provides downgrade function) - No downgrade implementation: Rejected (violates FR-014 reversibility requirement) - Data restoration in downgrade: Rejected (slow, requires backup in migration code) ### R9: Database Connection Configuration (Alembic) **Decision**: Use existing Alembic `env.py` configuration with `DATABASE_URL` environment variable **Rationale**: - Alembic already configured in project (`migrations/env.py` exists) - Follows standard pattern: `DATABASE_URL=postgresql://user:pass@host/dbname` - Existing config likely reads from environment or application config - For testing: Override via `alembic -x dbname=codebase_mcp_test upgrade head` - Production: Use existing `DATABASE_URL` from application deployment **Alternatives Considered**: - Hardcoded connection strings: Rejected (insecure, inflexible) - Custom config file: Rejected (Alembic env.py is standard) - Command-line arguments: Rejected (env vars more secure, standard practice) **Implementation Pattern**: ```python # migrations/env.py (already exists, use existing pattern) config.set_main_option("sqlalchemy.url", os.getenv("DATABASE_URL")) ``` ## Phase 1: Design & Contracts ### Data Model Changes See [data-model.md](./data-model.md) for complete entity definitions. **Summary of Changes**: - **repositories table**: Add `project_id VARCHAR(50) NOT NULL DEFAULT 'default'` with CHECK constraint - **code_chunks table**: Add `project_id VARCHAR(50) NOT NULL` with CHECK constraint - **Index**: Create `idx_project_repository` on `(project_id, repository_id)` for multi-project queries - **9 tables dropped**: work_items, work_item_dependencies, tasks, task_branches, task_commits, vendors, vendor_test_results, deployments, deployment_vendors ### API Contracts **Note**: This feature is database schema-only. No API changes. Phase 02 will update MCP tool implementations to use project_id. See [contracts/](./contracts/) for: - `migration_execution.md`: Manual execution contract (SQL script interface) - `validation_contract.md`: Post-migration validation expectations ### Integration Test Scenarios See [quickstart.md](./quickstart.md) for complete test execution flow. **Key Scenarios**: 1. **Fresh database migration**: Execute on empty database, verify schema changes 2. **Migration with existing data**: Execute on database with sample repositories/chunks, verify data preservation 3. **Idempotency test**: Run migration twice, verify second run skips gracefully 4. **Rollback test**: Apply migration, execute rollback, verify schema restoration 5. **Performance test**: Execute on 100 repos + 10K chunks, verify < 5 minute duration 6. **Validation test**: Attempt invalid project_id insertions, verify CHECK constraint rejection ## Phase 2: Task Planning Approach *This section describes what the /tasks command will do - DO NOT execute during /plan* **Task Generation Strategy**: 1. Load `.specify/templates/tasks-template.md` as base structure 2. Generate tasks from Phase 1 artifacts: - **Alembic migration task**: Create `002_remove_non_search_tables.py` with upgrade/downgrade functions - **Test utility task**: Create test data generation script - **Test tasks**: Validation tests, integration tests (upgrade/downgrade), performance tests - **Documentation tasks**: Update migration docs, add troubleshooting guide 3. Task ordering: - Phase 01A: Create test data generation utility - Phase 01B: Write Alembic migration (upgrade + downgrade functions) - Phase 01C: Write validation test suite [P] - Phase 01D: Write integration tests (upgrade) [P] - Phase 01E: Write integration tests (downgrade) [P] - Phase 01F: Write performance tests [P] - Phase 01G: Execute migration on test database - Phase 01H: Run all tests and validate - Phase 01I: Update documentation **Ordering Strategy**: - **TDD order**: Tests before migration execution (but migration code needed for testing) - **Dependency order**: Test data utility first, then migration code, then tests, then execution - **Parallel execution**: All test writing can happen in parallel ([P] marker) after migration code exists - **Safety order**: All tests must pass before marking migration complete **Estimated Output**: 10-12 tasks in tasks.md - 1 utility creation task - 1 Alembic migration creation task (upgrade + downgrade in one file) - 5 test creation tasks (parallel) - 1 execution task (run alembic upgrade on test DB) - 1 validation task (run all tests) - 1 documentation task **IMPORTANT**: This phase is executed by the /tasks command, NOT by /plan ## Phase 3+: Future Implementation *These phases are beyond the scope of the /plan command* **Phase 01 (this phase)**: Alembic migration for database schema changes **Phase 02**: Update Python code (models, services, MCP tools) to use project_id **Phase 03**: Implement multi-project search/indexing logic **Phase 04**: Connection pool for database-per-project architecture ## Complexity Tracking **No constitutional violations requiring justification**. All complexity in this feature is essential and serves constitutional principles: - Alembic migrations: Required for schema evolution (Production Quality) - Transaction wrapping: Required for atomic operation (Production Quality) - Three-step column addition: Required for data integrity (Production Quality) - Validation layers: Required for security (Production Quality) ## Progress Tracking **Phase Status**: - [X] Phase 0: Research complete (/plan command) - [X] Phase 1: Design complete (/plan command) - [X] Phase 2: Task planning approach described (/plan command) - [ ] Phase 3: Tasks generated (/tasks command) - [ ] Phase 4: Implementation complete - [ ] Phase 5: Validation passed **Gate Status**: - [X] Initial Constitution Check: PASS - [X] Post-Design Constitution Check: PASS (re-evaluated after Phase 1) - [X] All NEEDS CLARIFICATION resolved (none existed after /clarify) - [X] Complexity deviations documented (none required) **Execution Flow Status**: 1. ✅ Load feature spec from Input path 2. ✅ Fill Technical Context 3. ✅ Fill Constitution Check section 4. ✅ Evaluate Constitution Check (PASS) 5. ✅ Execute Phase 0 (research decisions documented) 6. ✅ Execute Phase 1 (artifacts generated) 7. ✅ Re-evaluate Constitution Check (PASS) 8. ✅ Plan Phase 2 (approach described) 9. ✅ STOP - Ready for /tasks command --- *Based on Constitution v2.2.0 - See `.specify/memory/constitution.md`* *Feature Specification: `specs/006-database-schema-refactoring/spec.md`* *Branch: `006-database-schema-refactoring` | Created: 2025-10-11*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

plan.md•20.9 KiB