# codebase-mcp Planning Review
**Date**: 2025-10-11
**Reviewer**: Senior Planning Reviewer
**Plan Version**: Initial (Pre-Implementation)
---
## Executive Summary
The codebase-mcp refactoring plan demonstrates **strong architectural thinking** with clear separation of concerns, comprehensive documentation, and adherence to constitutional principles. The plan successfully addresses the core objective: transforming a monolithic MCP into a pure semantic code search service with multi-project support.
**Overall Quality**: 8/10 (Good to Excellent)
**Strengths**:
- Clear constitutional framework with non-negotiable principles
- Comprehensive user stories with testable acceptance criteria
- Detailed phase-by-phase implementation roadmap
- Strong emphasis on testing (TDD, performance, protocol compliance)
- Well-documented migration path for existing users
**Critical Concerns**:
1. **Missing database naming strategy** - No clear specification for how project databases are named/discovered
2. **Incomplete error handling specification** - workflow-mcp integration failure scenarios need more detail
3. **No rollback strategy** - Missing plan for reverting if refactor fails mid-implementation
4. **Performance baseline missing** - No current performance metrics to compare against
5. **Concurrent access patterns undefined** - Multiple clients accessing same project database
**Recommendation**: **APPROVED WITH CHANGES**
Address critical issues #1-3 before implementation begins. Issues #4-5 can be addressed during Phase 0-1.
---
## Critical Issues (Must Fix Before Implementation)
### C1: Database Naming and Discovery Strategy Incomplete
**Location**: `refactoring-plan.md` Phase 8, `specify-prompt.txt` Questions section
**Problem**: The plan mentions `database_name = f"codebase_{project_id}"` but doesn't specify:
- How project_id format is validated (alphanumeric only? length limits?)
- How database names are sanitized (what if project_id contains spaces/special chars?)
- How the system discovers what databases exist (list all `codebase_*` databases?)
- What happens if database creation fails due to permissions
**Impact**: Runtime errors, SQL injection risks, unclear operational model
**Required Changes**:
1. Add validation rules for project_id:
```python
# Example specification needed
project_id: str = Field(
pattern=r'^[a-z0-9-]{1,50}$',
description="Lowercase alphanumeric with hyphens, max 50 chars"
)
```
2. Document database naming convention explicitly:
```markdown
## Database Naming Convention
- Format: `codebase_<project_id>`
- Example: project_id="my-app" → database="codebase_my-app"
- Invalid characters replaced with hyphens
- Length limit: 63 chars (PostgreSQL identifier limit)
```
3. Add database discovery endpoint specification:
```markdown
### Future Tool (v2.1.0)
- `list_indexed_projects()` → returns list of project_ids with databases
```
4. Specify permission handling in `tech-stack.md`:
```markdown
## Database Permissions
- User must have CREATEDB privilege
- Error message: "Insufficient permissions to create database. Grant CREATEDB: ALTER USER postgres CREATEDB;"
```
### C2: workflow-mcp Integration Error Handling Underspecified
**Location**: `refactoring-plan.md` Phase 7, `user-stories.md` Story 3
**Problem**: The plan shows basic error handling (`return None` if workflow-mcp unavailable) but doesn't specify:
- How to distinguish between "workflow-mcp not installed" vs "workflow-mcp crashed" vs "network timeout"
- Whether to cache active_project_id (and for how long?)
- What happens if workflow-mcp returns invalid project_id (doesn't exist, no database)
- Whether to retry on transient failures
**Impact**: Poor user experience, silent failures, unclear debugging
**Required Changes**:
1. Add error categorization in `refactoring-plan.md`:
```python
class WorkflowMCPError(Exception):
"""Base class for workflow-mcp integration errors"""
pass
class WorkflowMCPUnavailable(WorkflowMCPError):
"""workflow-mcp not running or unreachable"""
message = "workflow-mcp unavailable. Provide explicit project_id parameter."
class WorkflowMCPTimeout(WorkflowMCPError):
"""workflow-mcp request timed out"""
message = "workflow-mcp timed out. Retrying with explicit project_id..."
```
2. Specify caching strategy in `implementation-phases.md` Phase 7:
```markdown
### Active Project Caching
- Cache active_project_id for 60 seconds
- Invalidate on explicit project_id parameter
- Log cache hits/misses for observability
```
3. Add validation in `user-stories.md` Story 3:
```markdown
### Scenario 5: Invalid project_id from workflow-mcp
workflow-mcp returns project_id="nonexistent-project"
search_code(query="test", project_id=None)
# Expected: Error "Project 'nonexistent-project' not indexed. Run index_repository first."
```
### C3: No Rollback Strategy Defined
**Location**: Missing from `refactoring-plan.md` and `implementation-phases.md`
**Problem**: The plan assumes linear success through all 12 phases. No specification for:
- What to do if Phase 7 (multi-project support) fails after Phase 2-6 deleted code
- How to revert database migrations if tests fail
- Whether to maintain a "last known good" branch
- How to handle partial migration state (some projects migrated, others not)
**Impact**: High risk of getting stuck in broken state mid-refactor
**Required Changes**:
1. Add "Rollback Strategy" section to `implementation-phases.md`:
```markdown
## Rollback Strategy
### Per-Phase Rollback
- Each phase has its own commit (micro-commits)
- Rollback: `git revert <phase-commit-hash>`
- Test rollback after each phase completion
### Database Migration Rollback
Create `migrations/002_rollback.sql`:
```sql
-- Restore non-search tables from backup
-- Remove project_id columns
-- Restore original unique constraints
```
### Emergency Rollback (Full Refactor)
If refactor must be abandoned:
1. Checkout main branch: `git checkout main`
2. Restore database backup: `psql codebase_mcp < backup-before-002.sql`
3. Document lessons learned for next attempt
```
2. Add phase-specific rollback instructions in each phase:
```markdown
### Phase 2: Database Schema Refactoring
**Rollback Plan**:
- If migration fails: `psql codebase_mcp < backup-before-002.sql`
- If tests fail after migration: Revert schema.sql, revert migration script
```
### C4: Performance Baseline Missing
**Location**: `implementation-phases.md` Phase 1, `user-stories.md` Story 5
**Problem**: Phase 11 defines performance targets (<500ms search, <60s indexing for 10k files) but Phase 1 baseline doesn't capture current performance. Can't prove refactor doesn't regress performance.
**Impact**: Risk of shipping slower implementation, no data-driven optimization
**Required Changes**:
1. Add to Phase 1 baseline tasks in `implementation-phases.md`:
```markdown
### Phase 1: Additional Baseline Metrics
4. **Capture Performance Baseline**
```bash
# Create test repository with 10k files
python scripts/generate_test_repo.py --files=10000 --output=/tmp/test-repo
# Benchmark indexing (current implementation)
time python -m codebase_mcp.tools.search index /tmp/test-repo
# Benchmark search (100 queries, measure p50/p95)
python scripts/benchmark_search.py --queries=100 --output=docs/baseline/search-latency.json
```
5. **Document Baseline Performance**
Create `docs/baseline/performance-before.md`:
```markdown
# Performance Baseline (Before Refactor)
## Indexing
- 10,000 files: X seconds
- Throughput: Y files/second
## Search
- p50 latency: X ms
- p95 latency: Y ms
- p99 latency: Z ms
```
```
2. Add comparison requirement to Phase 11 acceptance criteria:
```markdown
### Acceptance Criteria
- [ ] Search latency <500ms (p95) AND no worse than 20% regression from baseline
- [ ] Indexing throughput <60s for 10k files AND no worse than 10% regression
```
### C5: Concurrent Access Patterns Undefined
**Location**: `refactoring-plan.md` Phase 8, `tech-stack.md` AsyncPG section
**Problem**: Plan specifies connection pooling (5-20 connections per project) but doesn't address:
- What happens when multiple Claude Code instances search same project simultaneously?
- Whether connection pools are shared across MCP server instances (if running multiple servers)
- How indexing locks repositories (prevent concurrent index operations on same repo)
- Whether search during indexing returns partial/stale results
**Impact**: Data races, connection pool exhaustion, unclear concurrency model
**Required Changes**:
1. Add "Concurrency Model" section to `tech-stack.md`:
```markdown
## Concurrency Model
### Single MCP Server Instance (Primary Use Case)
- Connection pool shared by all requests to that server
- 20 concurrent searches per project (pool max_size)
- Indexing operations use advisory locks (prevents concurrent indexing of same repo)
### Multiple MCP Server Instances (Advanced)
- Each server has independent connection pools
- PostgreSQL handles concurrent connections (default max_connections=100)
- Advisory locks shared across servers (PostgreSQL-level)
### Search During Indexing
- Search returns results from current index state
- No blocking (search and index are concurrent)
- Stale results possible during re-indexing (acceptable tradeoff)
```
2. Add advisory lock implementation to `refactoring-plan.md` Phase 7:
```python
async def index_repository_db(pool, repo_path, project_id, force_reindex):
"""Index repository with advisory lock to prevent concurrent indexing"""
async with pool.acquire() as conn:
# Acquire advisory lock (hash of repo_path + project_id)
lock_id = hash(f"{project_id}:{repo_path}") % (2**31)
acquired = await conn.fetchval(
"SELECT pg_try_advisory_lock($1)",
lock_id
)
if not acquired:
raise RuntimeError(
f"Repository {repo_path} is currently being indexed. "
"Wait for completion or use force_reindex=True."
)
try:
# Indexing logic here...
pass
finally:
# Release lock
await conn.execute("SELECT pg_advisory_unlock($1)", lock_id)
```
---
## Important Issues (Should Fix)
### I1: Migration Guide Lacks Detailed Data Migration Instructions
**Location**: `refactoring-plan.md` Phase 10
**Problem**: Migration guide tells users to "restore to workflow-mcp's database schema" for work items/tasks data but doesn't provide concrete steps or SQL scripts.
**Suggested Fix**: Add data migration section:
```markdown
## Data Migration to workflow-mcp
### Step 1: Export Non-Search Data
```bash
# Export work items
pg_dump codebase_mcp \
--table=work_items \
--table=work_item_dependencies \
--data-only \
--file=work_items_export.sql
# Export tasks
pg_dump codebase_mcp \
--table=tasks \
--table=task_planning_references \
--data-only \
--file=tasks_export.sql
```
### Step 2: Transform Schema (if needed)
```sql
-- workflow-mcp may have different column names
-- Manual review and adjustment required
```
### Step 3: Import to workflow-mcp
```bash
psql workflow_mcp < work_items_export.sql
psql workflow_mcp < tasks_export.sql
```
```
### I2: No Specification for Database Cleanup/Archival
**Location**: Missing from all documents
**Problem**: Multi-project design means one database per project. No plan for:
- How to delete project databases when project is removed
- Whether to archive databases before deletion (backup old projects)
- How to reclaim disk space from unused projects
- Whether to warn users about orphaned databases
**Suggested Fix**: Add to `refactoring-plan.md`:
```markdown
## Future Tool: Database Lifecycle Management (v2.2.0)
### delete_project_index(project_id: str, backup: bool = True)
- Optionally backup database to `backups/codebase_<project_id>_<timestamp>.sql`
- Drop database `codebase_<project_id>`
- Remove connection pool entry
### list_orphaned_databases()
- Query PostgreSQL for `codebase_*` databases
- Compare against workflow-mcp's active projects
- Return list of orphaned databases (no matching project)
```
### I3: Error Messages Not Fully Specified
**Location**: `user-stories.md` Story 8 has examples but incomplete
**Problem**: Story 8 shows 4 error scenarios but many more exist:
- Invalid file_type parameter (e.g., "python" instead of "py")
- Invalid directory parameter (e.g., "../../../etc/passwd")
- Embedding generation fails (Ollama model unloaded from memory)
- Database connection pool exhausted (all 20 connections busy)
- Repository path is a file, not a directory
**Suggested Fix**: Create comprehensive error catalog in new document `docs/error-catalog.md`:
```markdown
# Error Catalog
## E001: Invalid Repository Path
**Trigger**: `repo_path` does not exist or is not a directory
**Error**: `ValueError("Repository path does not exist: /path")`
**Action**: "Verify the path is correct and points to a directory"
## E002: Invalid Project ID Format
**Trigger**: `project_id` contains invalid characters
**Error**: `ValueError("Invalid project_id format. Use lowercase alphanumeric with hyphens.")`
**Action**: "Example valid project_id: my-project-123"
## E003: Connection Pool Exhausted
**Trigger**: All 20 connections in use
**Error**: `TimeoutError("Database connection pool exhausted for project <project_id>")`
**Action**: "Wait for active queries to complete or increase max_size in connection.py"
[... continue for all error scenarios]
```
### I4: No Specification for Incremental Updates
**Location**: `implementation-phases.md` mentions "Future Enhancements" but no detail
**Problem**: Plan mentions "incremental indexing (only changed files)" as future work but doesn't specify:
- How to detect changed files (git diff? filesystem timestamps? content hash?)
- Whether to store file hashes in database for change detection
- How to handle file deletions (remove chunks? mark as deleted?)
**Suggested Fix**: Not blocking for v2.0.0, but add placeholder in `user-stories.md`:
```markdown
## Future Story: Incremental Indexing (v2.1.0)
**As a** developer re-indexing a large repository
**I need** only changed files to be re-indexed
**So that** re-indexing is fast (seconds instead of minutes)
### Acceptance Criteria
1. Store file content hash (SHA-256) in repositories table
2. On re-index, compare current hash vs stored hash
3. Skip unchanged files (hash match)
4. Update chunks only for changed files
5. Remove chunks for deleted files
[NEEDS CLARIFICATION: How to detect file renames?]
```
### I5: Test Data Generation Not Specified
**Location**: `implementation-phases.md` Phase 11 mentions "large_indexed_repo" fixture but no details
**Problem**: Performance tests need realistic test data (10k files) but plan doesn't specify:
- What languages to generate (Python, JS, TypeScript mix?)
- File size distribution (realistic: small configs, large source files)
- How to ensure deterministic generation (for reproducible benchmarks)
**Suggested Fix**: Add test data specification to `implementation-phases.md` Phase 1:
```markdown
### Phase 1: Create Test Data Generators
Create `tests/fixtures/generate_test_repo.py`:
```python
def generate_test_repo(
num_files: int,
output_dir: Path,
languages: list[str] = ["py", "js", "ts"],
seed: int = 42 # For reproducibility
) -> Path:
"""
Generate realistic test repository with:
- 70% small files (<1KB): configs, tests
- 25% medium files (1-10KB): source code
- 5% large files (>10KB): generated code, data
- Mixture of languages (weighted by usage)
"""
pass
```
```
### I6: No Multi-Language Testing Strategy
**Location**: `tech-stack.md` mentions tree-sitter supports 50+ languages, but no test plan
**Problem**: Plan assumes tree-sitter "just works" for all languages but doesn't specify:
- Which languages are tested in integration tests (Python? JS? Rust?)
- Whether to test chunking quality per language (functions split correctly?)
- How to handle languages without tree-sitter parser (fallback to line-based chunking?)
**Suggested Fix**: Add to `implementation-phases.md` Phase 11:
```markdown
### Phase 11: Multi-Language Testing
Create `tests/test_language_support.py`:
```python
@pytest.mark.parametrize("language,file_path", [
("python", "tests/fixtures/sample.py"),
("javascript", "tests/fixtures/sample.js"),
("typescript", "tests/fixtures/sample.ts"),
("rust", "tests/fixtures/sample.rs"),
("go", "tests/fixtures/sample.go"),
])
@pytest.mark.asyncio
async def test_language_chunking_quality(language, file_path):
"""Test that tree-sitter chunks code correctly per language"""
chunks = await chunk_file(file_path, language)
# Verify functions not split across chunks
# Verify classes kept together
# Verify imports in separate chunks
pass
```
```
---
## Suggestions (Nice to Have)
### S1: Add Architectural Decision Records (ADRs)
**Location**: New document `docs/adr/`
**Suggestion**: Document key decisions with rationale:
- ADR-001: Why pgvector instead of dedicated vector store (Pinecone, Weaviate)?
- ADR-002: Why one database per project instead of multi-tenancy in single database?
- ADR-003: Why FastMCP instead of raw MCP Python SDK?
- ADR-004: Why AsyncPG instead of SQLAlchemy async?
**Benefit**: Future maintainers understand tradeoffs, avoid relitigating decisions
### S2: Add Observability/Metrics Specification
**Location**: New section in `tech-stack.md`
**Suggestion**: Specify what metrics to track:
```markdown
## Observability
### Metrics (Prometheus-compatible)
- `codebase_search_latency_seconds{project_id, percentile}` - Search latency histogram
- `codebase_indexing_duration_seconds{project_id}` - Indexing time histogram
- `codebase_embeddings_generated_total{project_id}` - Counter of embedding requests
- `codebase_database_connections{project_id, state}` - Connection pool usage
### Logs (Structured JSON)
- Every search: `{"op": "search", "project_id": "...", "query": "...", "latency_ms": 250}`
- Every index: `{"op": "index", "project_id": "...", "files": 1234, "duration_s": 45.2}`
- Errors: `{"op": "search", "error": "...", "project_id": "...", "traceback": "..."}`
```
**Benefit**: Production debugging, performance monitoring, SLA tracking
### S3: Add Configuration File Examples
**Location**: New file `docs/configuration-examples.md`
**Suggestion**: Show example configurations for different scenarios:
```yaml
# Example 1: Local development (single developer)
POSTGRES_HOST: localhost
POSTGRES_PORT: 5432
OLLAMA_URL: http://localhost:11434
WORKFLOW_MCP_URL: http://localhost:3000/mcp/workflow-mcp
# Example 2: Team setup (shared PostgreSQL, local Ollama)
POSTGRES_HOST: postgres.team.local
POSTGRES_PORT: 5432
OLLAMA_URL: http://localhost:11434 # Each dev runs own Ollama
# Example 3: No workflow-mcp (standalone)
POSTGRES_HOST: localhost
POSTGRES_PORT: 5432
OLLAMA_URL: http://localhost:11434
# WORKFLOW_MCP_URL not set (explicit project_id required)
```
**Benefit**: Easier onboarding, reduced support questions
### S4: Add Troubleshooting Guide
**Location**: New file `docs/troubleshooting.md`
**Suggestion**: Common problems and solutions:
```markdown
## Troubleshooting
### Problem: Search is slow (>1s latency)
**Possible Causes**:
- HNSW index not created (check `\d code_chunks` in psql)
- Connection pool size too small (increase max_size in connection.py)
- Ollama model not loaded in memory (first query loads model)
**Solution**:
```sql
-- Verify HNSW index exists
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'code_chunks';
-- Should show: idx_code_chunks_embedding using hnsw
```
### Problem: Indexing fails with "Out of memory"
**Possible Causes**:
- Batch size too large (50 files at once)
- Large files (>1MB) cause embedding generation to fail
**Solution**: Reduce batch size in `indexing.py`:
```python
BATCH_SIZE = 25 # Reduce from 50
```
```
### S5: Add Diagram: Multi-Project Architecture
**Location**: `README.md` or `docs/architecture.md`
**Suggestion**: Visual diagram showing:
```
┌─────────────────────────────────────────────────────┐
│ Claude Code (MCP Client) │
└────────────────────┬────────────────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│workflow │ │codebase │ │ other │
│ -mcp │ │ -mcp │ │ MCPs │
└────┬────┘ └────┬────┘ └─────────┘
│ │
│ get_active_project_id()
└───────────>│
│
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│codebase_│ │codebase_│ │codebase_│
│project_a│ │project_b│ │project_c│
└─────────┘ └─────────┘ └─────────┘
```
**Benefit**: Clearer mental model of system architecture
### S6: Add Upgrade Path from v1.x to v2.0 in Code
**Location**: New migration tool `scripts/migrate_v1_to_v2.py`
**Suggestion**: Automated migration script:
```python
#!/usr/bin/env python3
"""
Automated migration from codebase-mcp v1.x to v2.0
Usage: python scripts/migrate_v1_to_v2.py --backup --project-id=default
"""
import argparse
import subprocess
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--backup", action="store_true", help="Backup database before migration")
parser.add_argument("--project-id", required=True, help="Project ID for existing data")
args = parser.parse_args()
if args.backup:
print("Creating backup...")
subprocess.run(["pg_dump", "codebase_mcp", "-f", "backup-v1.sql"])
print("Running migration...")
subprocess.run(["psql", "codebase_mcp", "-f", "migrations/002_remove_non_search_tables.sql"])
print("Migration complete!")
print(f"All existing data tagged with project_id: {args.project_id}")
if __name__ == "__main__":
main()
```
**Benefit**: Reduces migration friction, fewer support requests
---
## Completeness Analysis
### Requirements Coverage: 90%
**Well Covered**:
- ✅ Functional requirements: Search, indexing, multi-project support
- ✅ Non-functional requirements: Performance, local-first, protocol compliance
- ✅ User stories: 8 comprehensive stories with acceptance criteria
- ✅ Technical stack: Fully specified with rationale
- ✅ Constitutional principles: 11 principles, clear enforcement
**Missing/Incomplete**:
- ⚠️ Database naming/discovery strategy (Critical C1)
- ⚠️ Error handling for workflow-mcp integration (Critical C2)
- ⚠️ Rollback strategy (Critical C3)
- ⚠️ Concurrent access patterns (Critical C5)
- ⚠️ Data migration to workflow-mcp (Important I1)
### Success Criteria Coverage: 85%
**Measurable Criteria**:
- ✅ Functional: "Index and search across 3+ projects simultaneously"
- ✅ Performance: "<500ms search latency (p95)", "<60s indexing for 10k files"
- ✅ Quality: "mypy --strict passes", "coverage >80%", "mcp-inspector 100%"
**Incomplete Criteria**:
- ⚠️ No baseline comparison (how to prove no regression?) (Critical C4)
- ⚠️ Operational criteria vague ("Clear error messages" - what's the standard?)
### Edge Cases Coverage: 70%
**Well Covered**:
- ✅ Repository doesn't exist (Story 8, Scenario 1)
- ✅ Invalid project_id (Story 8, Scenario 2)
- ✅ Ollama not running (Story 8, Scenario 3)
- ✅ Empty search results (Story 1, Scenario 2)
**Missing Edge Cases**:
- ⚠️ Project has no repositories indexed (search returns empty, or error?)
- ⚠️ Repository path is a file, not directory
- ⚠️ Repository contains only binary files (no text files to index)
- ⚠️ Project_id contains SQL injection attempt
- ⚠️ Two repositories with same path in different projects
- ⚠️ Database disk full during indexing
- ⚠️ Ollama model unloaded from memory mid-operation
### Error Scenarios Coverage: 75%
**Documented Errors**:
- ✅ Missing repository path
- ✅ Invalid project_id
- ✅ Ollama unavailable
- ✅ PostgreSQL unavailable
- ✅ Schema version mismatch
**Missing Error Scenarios**:
- ⚠️ Invalid file_type parameter format
- ⚠️ Directory traversal attack via directory parameter
- ⚠️ Connection pool exhaustion
- ⚠️ workflow-mcp returns malformed response
- ⚠️ Database migration partially completed (half tables dropped)
---
## Correctness Analysis
### Phase Dependency Order: Correct ✅
The 12 phases follow logical dependency order:
- Phase 2 (schema) before Phase 4 (database operations) ✅
- Phase 3 (remove tools) before Phase 5 (update server) ✅
- Phase 7 (multi-project) before Phase 8 (connection pooling) ✅
- Phase 11 (performance) before Phase 12 (final validation) ✅
**No circular dependencies detected**.
### Git Strategy: Correct ✅
- Micro-commits after each phase (working state guaranteed) ✅
- Conventional commit format specified ✅
- Feature branch isolation (`002-refactor-pure-search`) ✅
- Tag for release (v2.0.0) ✅
**Concern**: No mention of squashing commits before merge. With 12+ commits, PR review may be difficult.
**Suggestion**: Add to Phase 12:
```markdown
### Optional: Squash Commits for Clean History
```bash
# Interactive rebase to squash into ~3-4 logical commits
git rebase -i main
# Group by: 1) Remove non-search features, 2) Add multi-project, 3) Documentation
```
```
### Testing Strategy: Mostly Correct ✅
- TDD approach (tests before implementation) ✅
- Unit, integration, protocol, performance tests specified ✅
- mcp-inspector validation included ✅
**Concern**: Test execution order not specified. Should run:
1. Fast unit tests (on every commit)
2. Integration tests (before push)
3. Performance tests (before merge)
4. Protocol tests (before merge)
**Suggestion**: Add to `implementation-phases.md`:
```markdown
## Test Execution Strategy
### Pre-Commit (Fast, <10s)
```bash
pytest tests/unit/ -v
```
### Pre-Push (Medium, <2min)
```bash
pytest tests/ -v --ignore=tests/performance/
```
### Pre-Merge (Full, <10min)
```bash
pytest tests/ -v --cov=src/codebase_mcp --cov-report=html
pytest tests/test_performance.py -v -m performance
mcp-inspector http://localhost:8000 --full-validation
```
```
### Performance Targets: Realistic ✅
- <500ms search latency (p95): Realistic for pgvector HNSW indexes ✅
- <60s indexing for 10k files: Realistic at ~166 files/second ✅
- 20 concurrent connections per project: Realistic for AsyncPG pooling ✅
**Validation**: Targets align with pgvector benchmarks (sub-ms vector search on millions of vectors) and Ollama embedding generation (~200ms per query).
---
## Clarity Analysis
### Developer Implementability: 85%
**Clear Sections**:
- ✅ Phase-by-phase breakdown with specific files to modify
- ✅ Code examples showing "before" and "after" states
- ✅ Acceptance criteria for each phase (clear success definition)
- ✅ Git commit messages specified per phase
**Ambiguous Sections**:
- ⚠️ "Update imports across codebase" (Phase 3) - Which files? How to find them systematically?
- ⚠️ "Optimize if needed" (Phase 11) - What's the optimization procedure? Profile with what tool?
- ⚠️ "Follow migration guide step-by-step" (Phase 10) - Guide hasn't been tested yet, may have errors
**Suggestions**:
1. Add grep commands for finding all import references:
```bash
# Find all files importing removed tools
grep -r "from.*work_items import" src/
grep -r "from.*tasks import" src/
```
2. Specify profiling tool:
```bash
# Profile search_code function
python -m cProfile -o search.prof -m pytest tests/test_performance.py::test_search_latency
snakeviz search.prof # Visualize profile
```
### Terminology Consistency: 90%
**Consistent Terms**:
- ✅ "project_id" used throughout (not "projectId" or "project-id")
- ✅ "repository" vs "project" distinction clear
- ✅ "embedding" vs "vector" used consistently
**Inconsistent Terms**:
- ⚠️ "workflow-mcp" vs "workflow_mcp" (hyphen vs underscore)
- ⚠️ "code_chunks" vs "chunks" interchangeably
**Suggestion**: Add glossary to `README.md`:
```markdown
## Glossary
- **project**: A workspace containing one or more repositories (workflow-mcp concept)
- **project_id**: Unique identifier for a project (e.g., "my-app")
- **repository**: A single code repository indexed for search
- **chunk**: A segment of code (function, class, or ~500 lines) with embedding
- **embedding**: 768-dimensional vector representing semantic meaning of code
- **workflow-mcp**: Separate MCP server handling project/task management
```
### Scope Clarity: 95%
**Clear In-Scope**:
- ✅ Semantic code search ✅
- ✅ Multi-project support ✅
- ✅ Repository indexing ✅
**Clear Out-of-Scope**:
- ✅ Work item tracking (moved to workflow-mcp) ✅
- ✅ Task management (moved to workflow-mcp) ✅
- ✅ Vendor tracking (moved to workflow-mcp) ✅
- ✅ Code analysis/linting ✅
**Ambiguous**:
- ⚠️ "Context-aware results" (Story 1) - Is this just lines before/after, or semantic context?
- ⚠️ "Metadata preservation" (README) - What metadata? File path, language, git commit hash?
**Suggestion**: Clarify in `user-stories.md` Story 1:
```markdown
### Context Lines
- context_before: 3 lines immediately before matched chunk
- context_after: 3 lines immediately after matched chunk
- NOT semantic context (that would be "similar code" feature, out of scope)
```
---
## Risk Analysis
### High-Risk Areas
#### R1: workflow-mcp Integration Brittleness (HIGH)
**Risk**: codebase-mcp depends on workflow-mcp for project context. If workflow-mcp changes API or becomes unavailable, codebase-mcp breaks.
**Impact**: Users can't search without explicitly providing project_id every time
**Mitigation Specified**: Fallback to explicit project_id parameter ✅
**Additional Mitigation Needed**:
1. Version workflow-mcp API in URL:
```python
WORKFLOW_MCP_URL = "http://localhost:3000/api/v1/mcp/workflow-mcp"
# If v1 fails, try v2, then fall back to explicit project_id
```
2. Add workflow-mcp health check at startup:
```python
async def check_workflow_mcp_health():
"""Warn if workflow-mcp unavailable at startup"""
try:
project_id = await get_active_project_id()
if project_id:
logger.info(f"workflow-mcp integration: OK (active project: {project_id})")
else:
logger.warning("workflow-mcp integration: UNAVAILABLE (explicit project_id required)")
except Exception as e:
logger.error(f"workflow-mcp integration: ERROR ({e})")
```
#### R2: Database Proliferation (MEDIUM)
**Risk**: One database per project means users with 50 projects have 50 databases. Operational overhead (backups, monitoring, disk space).
**Impact**: High disk usage, PostgreSQL connection limit hit, slow backups
**Mitigation Specified**: None ❌
**Mitigation Needed**:
1. Document in `README.md`:
```markdown
## Multi-Project Considerations
### Database Count
- One database per project (project isolation guarantee)
- Example: 20 projects = 20 databases (codebase_project-1, codebase_project-2, ...)
### PostgreSQL Configuration
Increase `max_connections` if you have many projects:
```sql
ALTER SYSTEM SET max_connections = 200; -- Default 100
```
### Disk Space
Each database requires:
- Small project (1k files): ~100 MB
- Medium project (10k files): ~1 GB
- Large project (100k files): ~10 GB
```
2. Add database archival strategy (Important I2)
#### R3: Performance Regression Risk (MEDIUM)
**Risk**: Refactoring introduces performance regressions (slower search, slower indexing). No baseline means can't detect regression.
**Impact**: Users downgrade back to v1.x
**Mitigation Specified**: Performance tests in Phase 11 ✅
**Missing**: Baseline comparison (Critical C4) ❌
**Mitigation Needed**: Add baseline capture in Phase 1 (see Critical C4)
#### R4: Tree-sitter Language Support Gaps (LOW)
**Risk**: tree-sitter doesn't support some languages (e.g., proprietary DSLs). Indexing fails for those files.
**Impact**: Incomplete index for mixed-language repositories
**Mitigation Specified**: None ❌
**Mitigation Needed**:
1. Add fallback chunking strategy in `refactoring-plan.md`:
```python
async def chunk_file(file_path: Path, language: str) -> list[Chunk]:
"""Chunk file with tree-sitter, fallback to line-based chunking"""
try:
# Try tree-sitter parser
chunks = await chunk_with_tree_sitter(file_path, language)
except LanguageNotSupported:
logger.warning(f"tree-sitter parser unavailable for {language}, using line-based chunking")
chunks = await chunk_by_lines(file_path, max_lines=500)
return chunks
```
2. Document supported languages in `README.md`:
```markdown
## Supported Languages
- **Full Support** (tree-sitter AST-aware): Python, JavaScript, TypeScript, Rust, Go, Java, C, C++, C#, Ruby, PHP
- **Basic Support** (line-based chunking): Any text file
- **Not Supported**: Binary files (images, compiled binaries)
```
#### R5: Data Loss During Migration (LOW)
**Risk**: Migration script bugs cause data loss. Users lose indexed repositories or code chunks.
**Impact**: Users must re-index all repositories (time-consuming)
**Mitigation Specified**: Backup instructions in migration guide ✅
**Additional Mitigation**:
1. Add migration validation script in `migrations/002_validate.sql`:
```sql
-- Validate migration completed successfully
DO $$
DECLARE
table_count INT;
BEGIN
-- Check only 2 tables remain
SELECT COUNT(*) INTO table_count
FROM information_schema.tables
WHERE table_schema = 'public';
IF table_count != 2 THEN
RAISE EXCEPTION 'Expected 2 tables (repositories, code_chunks), found %', table_count;
END IF;
-- Check project_id column exists
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_name = 'repositories' AND column_name = 'project_id'
) THEN
RAISE EXCEPTION 'Migration failed: project_id column missing from repositories';
END IF;
RAISE NOTICE 'Migration validation: PASSED';
END $$;
```
---
## Gap Analysis
### Requirements Gaps
1. **Database Lifecycle Management** (Important I2)
- Missing: How to delete project databases
- Missing: How to archive old projects
- Missing: How to reclaim disk space
2. **Search Result Ranking Customization** (Nice to Have)
- Missing: Can users adjust similarity threshold? (default: no minimum, returns top N)
- Missing: Can users boost results from specific directories?
- Missing: Can users exclude certain file patterns from search?
3. **Authentication/Authorization** (Acknowledged Out of Scope)
- Missing: Multi-user environments (who can search which projects?)
- Acceptable: Relies on filesystem permissions ✅
### Testing Gaps
1. **Load Testing** (Important)
- Missing: What happens with 1000 concurrent searches? (performance test has 100 queries sequentially)
- Missing: Connection pool behavior under load
- Missing: PostgreSQL query queue behavior
2. **Chaos Testing** (Nice to Have)
- Missing: What if PostgreSQL crashes during indexing?
- Missing: What if Ollama crashes during embedding generation?
- Missing: What if disk fills up during indexing?
3. **Security Testing** (Critical)
- Missing: SQL injection testing (even though using parameterized queries)
- Missing: Path traversal testing (directory parameter)
- Missing: Resource exhaustion testing (giant file crashes indexing?)
**Suggestion**: Add security testing to Phase 11:
```python
@pytest.mark.security
@pytest.mark.asyncio
async def test_sql_injection_in_project_id():
"""Test that SQL injection attempts are rejected"""
malicious_project_id = "'; DROP TABLE repositories; --"
with pytest.raises(ValueError, match="Invalid project_id format"):
await search_code(
SearchCodeParams(
query="test",
project_id=malicious_project_id
)
)
@pytest.mark.security
@pytest.mark.asyncio
async def test_path_traversal_in_directory_filter():
"""Test that path traversal is blocked"""
malicious_directory = "../../../etc/"
result = await search_code(
SearchCodeParams(
query="test",
project_id="test-project",
directory=malicious_directory
)
)
# Should return empty results (no files in that directory within project)
assert len(result["results"]) == 0
```
### Performance Gaps
1. **Latency Breakdown** (Important I2)
- Missing: How much of <500ms is embedding generation vs database query?
- Missing: If search is slow, how to diagnose bottleneck?
**Suggestion**: Add latency breakdown logging:
```python
async def search_code(params: SearchCodeParams) -> dict:
start = time.time()
# Generate embedding
embed_start = time.time()
query_embedding = await generate_embedding(params.query)
embed_duration = time.time() - embed_start
# Database query
db_start = time.time()
results = await search_code_db(...)
db_duration = time.time() - db_start
total_duration = time.time() - start
logger.info(
"search_latency",
total_ms=total_duration * 1000,
embedding_ms=embed_duration * 1000,
database_ms=db_duration * 1000,
project_id=project_id
)
return {
"results": results,
"latency_ms": total_duration * 1000,
"latency_breakdown": {
"embedding_ms": embed_duration * 1000,
"database_ms": db_duration * 1000,
}
}
```
2. **Indexing Throughput by Language** (Nice to Have)
- Missing: Does Python index faster than Rust? (tree-sitter performance varies)
- Missing: Baseline throughput by file type
### Documentation Gaps
1. **Deployment Guide** (Important)
- Missing: How to deploy in production? (Docker? systemd service?)
- Missing: How to configure PostgreSQL for production? (connection limits, memory)
- Missing: How to configure Ollama for production? (GPU vs CPU, memory limits)
**Suggestion**: Create `docs/deployment-guide.md`:
```markdown
# Production Deployment Guide
## System Requirements
- CPU: 4 cores minimum, 8 cores recommended
- RAM: 8 GB minimum, 16 GB recommended
- Disk: SSD preferred, 100 GB per 10 projects
- GPU: Optional, speeds up Ollama embeddings by 5-10x
## PostgreSQL Configuration
```sql
ALTER SYSTEM SET shared_buffers = '4GB';
ALTER SYSTEM SET effective_cache_size = '12GB';
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM SET max_connections = 200;
ALTER SYSTEM SET work_mem = '20MB';
```
## Ollama Configuration
```bash
# Set memory limit (prevents OOM)
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_NUM_PARALLEL=1
# Start Ollama service
ollama serve
```
## codebase-mcp as systemd service
```ini
[Unit]
Description=Codebase MCP Server
After=network.target postgresql.service
[Service]
Type=simple
User=codebase
WorkingDirectory=/opt/codebase-mcp
ExecStart=/opt/codebase-mcp/venv/bin/python -m codebase_mcp.server
Restart=on-failure
Environment="POSTGRES_HOST=localhost"
Environment="POSTGRES_PORT=5432"
Environment="OLLAMA_URL=http://localhost:11434"
[Install]
WantedBy=multi-user.target
```
```
2. **API Reference Documentation** (Important)
- Missing: Full API documentation (only examples in user-stories.md)
- Missing: OpenAPI/Swagger spec for HTTP API (if applicable)
3. **Development Setup Guide** (Important)
- Missing: Step-by-step setup for contributors
- Missing: How to run tests locally
- Missing: How to build documentation
**Suggestion**: Create `CONTRIBUTING.md`:
```markdown
# Contributing to codebase-mcp
## Development Setup
1. **Clone Repository**
```bash
git clone https://github.com/yourusername/codebase-mcp.git
cd codebase-mcp
```
2. **Install Dependencies**
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
```
3. **Setup PostgreSQL**
```bash
# Start PostgreSQL (Docker)
docker compose up -d postgres
# Create test database
createdb codebase_mcp_test
psql codebase_mcp_test < src/codebase_mcp/database/schema.sql
```
4. **Setup Ollama**
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull embedding model
ollama pull nomic-embed-text
```
5. **Run Tests**
```bash
pytest tests/ -v
```
## Code Standards
- Black for formatting: `black src/ tests/`
- Ruff for linting: `ruff check src/ tests/`
- mypy for type checking: `mypy --strict src/`
- Coverage >80%: `pytest --cov=src/codebase_mcp --cov-report=term`
## Pull Request Process
1. Create feature branch: `git checkout -b feature-name`
2. Write tests (TDD approach)
3. Implement feature
4. Run full test suite: `pytest tests/`
5. Update documentation
6. Submit PR with descriptive title and description
```
---
## Positive Aspects
### Excellent Constitutional Framework ⭐⭐⭐⭐⭐
The 11 constitutional principles are **exceptionally well-defined**:
- Clear statement, rationale, constraints for each principle
- Non-negotiable principles explicitly marked
- Enforcement mechanisms specified (CI/CD, code review, /analyze)
- Acceptable deviations documented (shows thoughtfulness)
**Example of Excellence**: Principle IV (Performance Guarantees) specifies:
- Exact targets (<500ms, <60s)
- Batching strategies (100 files/batch, 50 embeddings/batch)
- Index types (HNSW, not IVFFlat)
- Connection pooling (min 5, max 20)
This level of specificity is rare and will prevent ambiguity during implementation.
### Comprehensive User Stories ⭐⭐⭐⭐⭐
User stories demonstrate **strong requirements engineering**:
- Follows "As a/I need/So that" format consistently
- Testable acceptance criteria (not vague "should work" statements)
- Concrete test scenarios with expected outcomes
- Edge cases documented (empty results, errors, offline mode)
**Example of Excellence**: Story 5 (Performance) includes:
- Specific scenario: "50,000 files (250,000 chunks)"
- Measurable outcome: "p95 latency <500ms"
- Failure mode: "Mock Ollama to return embedding in 250ms"
- Degradation handling: "Warning logged, total time <550ms, degraded but acceptable"
This is production-quality requirements specification.
### Detailed Implementation Roadmap ⭐⭐⭐⭐
The 12-phase implementation plan is **thorough and practical**:
- Clear objectives per phase
- Specific files to modify (with before/after code examples)
- Acceptance criteria for each phase
- Time estimates (17-25 hours total, realistic)
- Git strategy with commit messages specified
**Example of Excellence**: Phase 2 (Database Schema) includes:
- SQL migration script (with rollback warning)
- Before/after schema comparison
- Test instructions (`psql codebase_mcp_test < migrations/002_*.sql`)
- Acceptance criteria ("Migration tested on test database")
This level of detail enables a developer to execute the plan with minimal guidance.
### Strong Testing Strategy ⭐⭐⭐⭐
The plan emphasizes **test-driven development** throughout:
- TDD approach (tests before implementation) specified in constitution
- Multiple testing layers: unit, integration, protocol, performance
- Specific test scenarios in user stories
- mcp-inspector validation for protocol compliance
- Performance benchmarks with p50/p95 targets
**Example of Excellence**: Phase 11 includes:
- Performance test suite with latency histograms
- 100 searches to calculate p95
- Assertion: `assert p95 < 500`
- Optimization procedure if targets not met
This rigor will prevent quality regressions.
### Clear Migration Strategy ⭐⭐⭐⭐
The migration guide shows **empathy for users**:
- Explicit breaking changes documented
- Tool mapping (old → new)
- Database migration script provided
- Configuration examples (before/after)
- FAQ section addresses common questions
**Example of Excellence**: Migration guide includes:
- Backup instructions **before** destructive operations
- Step-by-step tool call updates (with code examples)
- Checklist for migration validation
This will reduce support burden and user frustration.
### Technology Stack Justification ⭐⭐⭐⭐
`tech-stack.md` demonstrates **informed technical decisions**:
- Each technology has "Purpose" and "Justification" sections
- Version constraints specified (e.g., `>=3.11,<4.0`)
- Key features explained (why AsyncPG over psycopg2?)
- Non-negotiable stack marked clearly
**Example of Excellence**: PostgreSQL section specifies:
- Version constraint: `>=14.0` (with rationale: pgvector performance)
- Extensions required: `vector`, `pg_trgm`
- Index type: HNSW with specific parameters (`m = 16, ef_construction = 64`)
This prevents bikeshedding and ensures consistent decisions.
---
## Overall Recommendation
### Recommendation: APPROVED WITH CHANGES ✅
**Rationale**:
The codebase-mcp refactoring plan is **well-architected and thoroughly documented**. The constitutional framework, user stories, and implementation roadmap demonstrate strong software engineering practices. However, **five critical issues** must be addressed before implementation to prevent runtime failures and operational confusion.
### Required Changes Before Implementation
**Must Fix (Critical)**:
1. ✅ **C1**: Specify database naming and discovery strategy (SQL injection risk, unclear operations)
2. ✅ **C2**: Detail workflow-mcp integration error handling (poor UX, silent failures)
3. ✅ **C3**: Define rollback strategy (high risk of getting stuck in broken state)
4. ✅ **C4**: Capture performance baseline in Phase 1 (can't prove no regression)
5. ✅ **C5**: Document concurrent access patterns (data races, connection exhaustion)
**Should Fix (Important)**:
- I1: Add data migration SQL scripts (user friction)
- I2: Plan database lifecycle management (disk space issues)
- I3: Create comprehensive error catalog (debugging difficulty)
**Nice to Have (Suggestions)**:
- S1: Add Architectural Decision Records (future maintainability)
- S2: Specify observability/metrics (production debugging)
- S3: Provide configuration examples (easier onboarding)
### Confidence Level: High (85%)
**Confidence in Plan Success**:
- Architecture: 95% (excellent separation of concerns, local-first design)
- Requirements: 85% (comprehensive user stories, some edge cases missing)
- Implementation: 80% (detailed phases, but lacks rollback strategy)
- Testing: 90% (strong TDD approach, performance benchmarks)
- Migration: 85% (clear guide, but data migration needs more detail)
**Risk Areas**:
- workflow-mcp integration brittleness (mitigated by fallback to explicit project_id)
- Database proliferation (20+ databases for power users, needs monitoring)
- Performance regression (mitigated by performance tests, but needs baseline)
### Next Steps
1. **Address Critical Issues** (C1-C5) - 4-6 hours of planning work
2. **Review Updated Plan** - Stakeholder approval of changes
3. **Begin Phase 0** (Prerequisites) - Validate workflow-mcp integration
4. **Execute Phases 1-12** - Follow implementation roadmap
5. **Monitor Phase 1-2** - High-risk phases (database schema changes)
---
## Conclusion
This is a **high-quality refactoring plan** that follows industry best practices:
- Constitutional principles establish guardrails
- User stories provide clear requirements
- Implementation phases are actionable
- Testing strategy ensures quality
- Migration guide reduces user friction
With the **five critical issues addressed**, this plan has a **high probability of success** (85%+). The plan demonstrates maturity in software architecture, requirements engineering, and operational thinking.
**Approval Status**: ✅ **APPROVED WITH CHANGES**
**Conditional on**: Addressing critical issues C1-C5 before implementation begins
---
**Reviewer Signature**: Senior Planning Reviewer
**Date**: 2025-10-11