Codebase MCP Server

codebase-mcp
specs
010-v2-documentation

research.md•63.9 KiB

# Research: Documentation Structure & Best Practices ## Decision Summary | Decision | Choice | Rationale | |----------|--------|-----------| | **Markdown Documentation Structure** | GitHub-flavored Markdown with hierarchical TOC navigation | GitHub native rendering, ubiquitous tooling, accessibility friendly, version control friendly | | **Migration Guide Structure** | Breaking Changes First + Step-by-Step + Rollback pattern | Risk communication upfront, procedural clarity, safety-first approach matches production quality principle | | **Configuration Documentation** | Environment Variable Tables + PostgreSQL Calculation Examples | Scannable reference format, explicit defaults, concrete calculation examples prevent misconfiguration | | **Architecture Documentation** | Mermaid Diagrams + Prose + Code Examples | Native GitHub rendering, maintainable text-based diagrams, visual-textual balance | | **API Reference Documentation** | OpenAPI-Inspired Structure + Explicit Deprecation Notices | Industry-standard parameter documentation, clear migration path for removed tools | | **Documentation Validation** | Manual Checklist + Documented Procedures | Appropriate for Phase 05 scope, defers automation to Phase 07, ensures accuracy through structured process | --- ## 1. Markdown Documentation Structure ### Decision Use **GitHub-flavored Markdown (GFM)** with the following structure pattern for all documentation artifacts: **Standard Structure Template**: ```markdown # Title (H1 - single per document) > Brief description (1-2 sentences explaining purpose) ## Table of Contents (H2 - for documents >500 lines) - [Section 1](#section-1) - [Section 2](#section-2) ## Overview (H2) High-level introduction to the content ## Core Content Sections (H2) Detailed content organized hierarchically ### Subsections (H3) Specific topics within sections #### Details (H4 - sparingly) Fine-grained details when necessary ## Examples (H2) Practical usage examples with code blocks ## Troubleshooting (H2 - where applicable) Common issues and solutions ## Related Resources (H2) Links to other relevant documentation ``` **Typography Conventions**: - **Bold** for emphasis on key terms first mention - `Code font` for all technical terms (tool names, parameters, file paths, commands) - *Italic* for document references and annotations - > Blockquotes for warnings, important notes, and callouts - Tables for structured data (configuration variables, parameter lists) - Ordered lists for sequential procedures - Unordered lists for non-sequential information **Navigation Patterns**: - README.md acts as the entry point with links to specialized guides - Each guide starts with a brief description and links back to README - Cross-references use relative paths: `[Migration Guide](docs/migration/v1-to-v2-migration.md)` - Anchor links for intra-document navigation: `[Configuration](#configuration)` ### Rationale GitHub-flavored Markdown is the de facto standard for developer documentation because: 1. **Native Rendering**: GitHub/GitLab render GFM without tooling, making docs immediately accessible 2. **Version Control Friendly**: Plain text format enables meaningful diffs, conflict resolution, and attribution 3. **Accessibility**: Screen readers handle Markdown well, semantic HTML output supports WCAG compliance 4. **Universal Tooling**: Every code editor, IDE, and static site generator supports Markdown 5. **Low Friction**: No build step, no special tooling, immediate editing and preview 6. **Search Friendly**: Plain text enables grep, IDE search, and GitHub's code search The hierarchical structure with explicit TOC navigation addresses the documented scale (5 artifacts, 38 functional requirements) by providing: - Quick orientation (overview sections) - Direct navigation to relevant sections (TOC) - Scannable structure (H2-H4 hierarchy) - Progressive disclosure (high-level → detailed) ### Alternatives Considered - **Alternative 1: Sphinx/reStructuredText** - Rejected because it requires Python tooling, build step, and specialized markup. Adds complexity for marginal benefit (better cross-referencing). Violates "simplicity over features" principle. - **Alternative 2: AsciiDoc** - Rejected because GitHub rendering support is limited, tooling less universal than Markdown. Better semantic markup doesn't justify adoption barrier for contributors. - **Alternative 3: Static Site Generator (Docusaurus, MkDocs)** - Rejected for Phase 05 because it requires build infrastructure, deployment process, and maintenance overhead. Documentation-only phase should not introduce new tooling dependencies. Future consideration for Phase 07. - **Alternative 4: Wiki-style documentation** - Rejected because version control is weaker (separate wiki repo), no PR workflow for changes, harder to keep in sync with code. ### Implementation Guidance **For this project, apply these specific patterns**: 1. **README.md Structure**: ```markdown # Codebase MCP Server > Brief description ## What's New in v2.0 [Breaking Changes Summary] → link to migration guide ## Features - 2 tools: index_repository, search_code - Multi-project support with project_id parameter - workflow-mcp integration (optional) ## Installation Prerequisites → Installation → Verification ## Quick Start Basic usage → Multi-project usage ## Documentation Links to 4 specialized guides ## Contributing Link to architecture docs ``` 2. **File Naming Convention**: - Use lowercase with hyphens: `v1-to-v2-migration.md` - Use descriptive names: `production-config.md` not `config.md` - Version-specific files include version: `v1-to-v2-migration.md` 3. **Code Block Standards**: - Always specify language: ` ```bash `, ` ```python `, ` ```yaml ` - Include comments explaining non-obvious commands - Show expected output where helpful: ```bash # Command $ pg_dump -h localhost -d codebase_mcp > backup.sql # Expected output pg_dump: dumping contents of table "repositories" pg_dump: dumping contents of table "code_chunks" ``` 4. **Link Validation**: - Use relative paths for internal links: `docs/migration/v1-to-v2-migration.md` - Use absolute URLs for external links: `https://github.com/modelcontextprotocol/specification` - Maintain a link inventory during authoring (spreadsheet or checklist) - Test all links manually before PR submission (FR-032) 5. **Cross-Reference Pattern**: - First mention of another guide: full link + brief description - Subsequent mentions: inline link only - Example: "See the [Configuration Guide](docs/configuration/production-config.md) for PostgreSQL tuning details." --- ## 2. Migration Guide Structure ### Decision Use **Breaking Changes First + Step-by-Step Procedures + Rollback** pattern: ```markdown # Migration Guide: Upgrading from v1.x to v2.0 > This guide walks through upgrading codebase-mcp from v1.x to v2.0, including breaking changes, data preservation, and rollback procedures. ## Overview - Who should use this guide - Migration scope (what's preserved, what's discarded) - Estimated time (deferred to Phase 06 per FR-020) ## Breaking Changes Summary [Upfront, before any procedures - FR-009] ### Tool API Changes - 14 tools removed (explicit list - FR-010) - 2 tools retained with project_id parameter added (FR-012) ### Database Schema Changes - 9 tables dropped (FR-011) - project_id column added to repositories, code_chunks ### Environment Variable Changes - New required variables (FR-013) - Changed defaults ### Data Preservation Policy - ✅ Preserved: Indexed code repositories - ❌ Discarded: All v1.x project management data (FR-019) ## Prerequisites - v1.x installation running - PostgreSQL access for backup - Disk space for backups - Downtime window (production systems) ## Pre-Migration Checklist - [ ] Review breaking changes section - [ ] Verify PostgreSQL version (14+) - [ ] Verify Python version (3.11+) - [ ] Test rollback procedure in non-production environment - [ ] Schedule maintenance window ## Backup Procedures (FR-015) Exact commands with explanations ### Database Backup ```bash # Command with parameters pg_dump -h localhost -d codebase_mcp > backup_$(date +%Y%m%d_%H%M%S).sql ``` ### Configuration Backup ```bash cp .env .env.backup ``` ## Upgrade Procedure (FR-014, FR-018) ### Step 1: Stop v1.x Server [Command, validation] ### Step 2: Update Dependencies [Command, validation] ### Step 3: Run Migration Script [Command, expected output, duration guidance] ### Step 4: Update Configuration [New environment variables, changed defaults] ### Step 5: Restart Server [Command, validation] ### Step 6: Verify Migration Success [Validation commands - FR-017] ## Post-Migration Validation (FR-017) ### Verify v2.0 Functionality - Test index_repository with project_id - Test search_code with project_id - Verify existing repositories still searchable ### Diagnostic Commands (FR-036) Commands to detect partial migration state ## Rollback Procedure (FR-016) ### When to Roll Back - Migration script fails - v2.0 functionality not working - Unacceptable data loss discovered ### Rollback Steps 1. Stop v2.0 server 2. Restore database backup 3. Restore configuration backup 4. Reinstall v1.x version 5. Start v1.x server 6. Verify v1.x functionality restored ### Rollback Validation (FR-017) Commands to confirm v1.x restoration ## Troubleshooting ### Common Issues - Migration script fails midway (FR-036) - Missing environment variables - PostgreSQL connection errors - Data validation failures ### Support Resources - GitHub Issues - Discussion forum link ## Frequently Asked Questions ### What data is lost during migration? [Explicit answer from FR-019] ### Can I migrate incrementally? [Answer: no, explain why] ### How long does migration take? [Deferred to Phase 06 per FR-020] ``` ### Rationale This structure addresses the documented user persona needs: 1. **Breaking Changes First (FR-009)**: Risk communication upfront allows users to make informed go/no-go decisions before investing time in procedures. Matches "production quality standards" constitutional principle by prioritizing safety. 2. **Step-by-Step Procedures (FR-014)**: Sequential numbered steps reduce cognitive load, enable checkpoint validation, and support partial completion. Each step has: - Action description - Exact command - Expected output - Validation criteria - Recovery procedure if step fails 3. **Rollback Procedure (FR-016)**: Complete restoration path provides safety net for production upgrades. Reduces risk perception and increases confidence in attempting migration. 4. **Validation Commands (FR-017)**: Explicit validation steps throughout (pre-migration, post-step, post-migration, post-rollback) enable users to detect problems early and confirm success. 5. **Troubleshooting Section**: Addresses edge cases from spec (11 documented edge cases) with diagnostic procedures and solutions. ### Alternatives Considered - **Alternative 1: Procedures First, Breaking Changes Later** - Rejected because users need to understand impact before committing to procedures. Burying breaking changes deep in docs violates production quality principle. - **Alternative 2: Automated Migration Script Only** - Rejected because interactive scripts require Phase 07 tooling development. Manual procedures with validation commands are appropriate for Phase 05 documentation-only scope. - **Alternative 3: FAQ-Style Migration Guide** - Rejected because sequential procedures require ordered presentation. FAQs work for reference but not for procedural workflows. - **Alternative 4: Rollback as Separate Document** - Rejected because users need rollback context during upgrade planning. Separating it increases risk of users missing critical safety information. ### Implementation Guidance **For this project, apply these specific patterns**: 1. **Breaking Changes Section**: - Lead with highest-impact changes (tool removals) - Use tables for comparison (v1.x vs v2.0) - Explicit data preservation policy callout box - Link to detailed API changes in API reference doc 2. **Command Documentation**: - Always include comments explaining what command does - Show expected output for validation - Provide error recovery instructions - Use consistent prompt notation (`$` for user commands) 3. **Validation Commands**: - Provide SQL queries to check schema state - Provide Python scripts to verify tool functionality - Provide PostgreSQL commands to check database existence - Include success criteria for each validation 4. **Rollback Procedure**: - Mirror upgrade procedure structure (same step count) - Reference backup locations from backup section - Include validation for each rollback step - Provide troubleshooting for partial rollback scenarios 5. **Data Preservation Communication**: - Use visual indicators (✅ ❌) for scannable lists - Be explicit about what's discarded (FR-019) - Explain why data is not migrated (architectural decision) - Provide export options if users need v1.x data archived --- ## 3. Configuration Documentation ### Decision Use **Environment Variable Reference Tables + PostgreSQL Calculation Examples + Production Tuning Guidance** pattern: ```markdown # Configuration Guide: Production Deployment > Comprehensive configuration reference for production deployments of codebase-mcp v2.0, covering environment variables, connection pool tuning, and PostgreSQL optimization. ## Configuration Overview Quick links to sections: - [Environment Variables Reference](#environment-variables-reference) - [Connection Pool Tuning](#connection-pool-tuning) - [PostgreSQL Configuration](#postgresql-configuration) - [workflow-mcp Integration](#workflow-mcp-integration) - [Monitoring & Health Checks](#monitoring--health-checks) ## Environment Variables Reference (FR-021) ### Core Configuration | Variable | Required | Default | Description | Example | |----------|----------|---------|-------------|---------| | `DATABASE_URL` | Yes | - | PostgreSQL connection string | `postgresql://user:pass@localhost/codebase_mcp` | | `OLLAMA_BASE_URL` | Yes | `http://localhost:11434` | Ollama API endpoint | `http://ollama:11434` | | `EMBEDDING_MODEL` | No | `nomic-embed-text` | Ollama embedding model | `all-minilm` | ### Multi-Project Configuration (FR-022, FR-023) | Variable | Required | Default | Description | Impact | |----------|----------|---------|-------------|--------| | `MAX_PROJECTS` | No | `10` | Maximum concurrent project connection pools | 1 pool = MAX_CONNECTIONS_PER_POOL connections | | `MAX_CONNECTIONS_PER_POOL` | No | `20` | PostgreSQL connections per project pool | Total = MAX_PROJECTS × MAX_CONNECTIONS_PER_POOL | | `PROJECT_POOL_TIMEOUT` | No | `300` | Pool idle timeout (seconds) before LRU eviction | Higher = less churn, more memory | ### workflow-mcp Integration (FR-026) | Variable | Required | Default | Description | Example | |----------|----------|---------|-------------|---------| | `WORKFLOW_MCP_URL` | No | - | workflow-mcp SSE endpoint for project detection | `http://localhost:8080/sse` | | `WORKFLOW_MCP_TIMEOUT` | No | `5` | Request timeout (seconds) | `10` | ## Connection Pool Tuning (FR-022, FR-023) ### Understanding the Calculation (FR-024) Total PostgreSQL connections consumed: ``` Total Connections = MAX_PROJECTS × MAX_CONNECTIONS_PER_POOL ``` **Example Scenarios**: | Scenario | MAX_PROJECTS | MAX_CONNECTIONS_PER_POOL | Total Connections | Use Case | |----------|--------------|---------------------------|-------------------|----------| | Small Deployment | 5 | 10 | 50 | Single-user development | | Medium Deployment | 10 | 20 | 200 | Small team (default) | | Large Deployment | 20 | 25 | 500 | Multi-tenant production | ### PostgreSQL max_connections Requirement (FR-024) Your PostgreSQL `max_connections` setting MUST exceed total connection pool consumption plus overhead: ``` PostgreSQL max_connections ≥ (MAX_PROJECTS × MAX_CONNECTIONS_PER_POOL) + 10 ``` The +10 buffer accounts for: - Administrative connections (pg_dump, psql) - Monitoring tools - Connection pool initialization spikes **Configuration Validation**: ```bash # Check current PostgreSQL max_connections psql -h localhost -c "SHOW max_connections;" # Expected: Value ≥ (MAX_PROJECTS × MAX_CONNECTIONS_PER_POOL + 10) ``` ### Tuning MAX_PROJECTS (FR-022) **Tradeoffs**: - ✅ Higher value: More concurrent projects, less LRU eviction churn - ❌ Higher value: More PostgreSQL connections, more memory usage - ✅ Lower value: Less resource consumption, more predictable performance - ❌ Lower value: More frequent pool eviction, connection overhead **Recommendations**: - Development: 5 projects (minimal overhead) - Production (single-team): 10 projects (default, balanced) - Production (multi-tenant): 20-50 projects (requires PostgreSQL tuning) ### Tuning MAX_CONNECTIONS_PER_POOL (FR-023) **Tradeoffs**: - ✅ Higher value: More concurrent queries per project, better throughput - ❌ Higher value: More total connections, PostgreSQL overhead - ✅ Lower value: Fewer total connections, less memory - ❌ Lower value: Query queueing under load, slower response times **Recommendations**: - Development: 5 connections (low concurrency expected) - Production (small repos): 10-15 connections (moderate concurrency) - Production (large repos): 20-30 connections (high concurrency, large indexing jobs) ## PostgreSQL Configuration (FR-025) ### Production Tuning Parameters (FR-025) ```ini # postgresql.conf # Connection limits (adjust based on MAX_PROJECTS calculation) max_connections = 300 # Memory settings (tune based on available RAM) shared_buffers = 4GB effective_cache_size = 12GB maintenance_work_mem = 1GB work_mem = 64MB # Query performance random_page_cost = 1.1 # For SSD storage effective_io_concurrency = 200 # Write performance wal_buffers = 16MB checkpoint_completion_target = 0.9 # Logging (for production monitoring) log_min_duration_statement = 1000 # Log queries >1s log_line_prefix = '%m [%p] %q%u@%d ' ``` **Rationale for Settings**: - `shared_buffers = 4GB`: codebase-mcp performs frequent GIN index lookups on JSONB columns; larger buffer cache reduces disk I/O - `effective_cache_size = 12GB`: Hints query planner about available OS cache for embedding vector similarity queries - `work_mem = 64MB`: Embedding searches may require sorting; adequate work_mem prevents disk-based sorts ## Monitoring & Health Checks (FR-038) ### Connection Pool Metrics (FR-038) **Key Indicators**: 1. Active pool count (should not exceed MAX_PROJECTS) 2. Connections per pool (should not exceed MAX_CONNECTIONS_PER_POOL) 3. Pool eviction frequency (high frequency indicates MAX_PROJECTS too low) 4. Query latency (increasing latency may indicate connection exhaustion) **Monitoring Queries**: ```sql -- Check active databases (one per project) SELECT datname FROM pg_database WHERE datname LIKE 'codebase_%'; -- Check connection count per database SELECT datname, count(*) FROM pg_stat_activity WHERE datname LIKE 'codebase_%' GROUP BY datname; -- Check total connections SELECT count(*) FROM pg_stat_activity; ``` ### Health Check Endpoints [Document any health check endpoints if they exist in v2.0] ## Configuration Validation Checklist (FR-027) Before deploying to production: - [ ] `DATABASE_URL` connects successfully: `psql $DATABASE_URL -c "SELECT 1;"` - [ ] `OLLAMA_BASE_URL` responds: `curl $OLLAMA_BASE_URL/api/version` - [ ] Embedding model available: `curl $OLLAMA_BASE_URL/api/show -d '{"name":"nomic-embed-text"}'` - [ ] PostgreSQL `max_connections` sufficient: Use calculation formula - [ ] workflow-mcp reachable (if configured): `curl $WORKFLOW_MCP_URL` - [ ] Test indexing: Index small test repository - [ ] Test search: Perform test search query - [ ] Verify project isolation: Index two projects, confirm separate databases ## Troubleshooting ### "Too many connections" error - Symptom: PostgreSQL rejects connections - Cause: `max_connections` too low or connection leak - Solution: Increase PostgreSQL `max_connections` or reduce `MAX_PROJECTS × MAX_CONNECTIONS_PER_POOL` ### Slow query performance - Symptom: Search queries >1s response time - Cause: Insufficient PostgreSQL memory or disk I/O bottleneck - Solution: Increase `shared_buffers` and `effective_cache_size`, use SSD storage ### Frequent pool evictions - Symptom: Logs show "Evicting project pool X" - Cause: `MAX_PROJECTS` too low for working set - Solution: Increase `MAX_PROJECTS` (and PostgreSQL `max_connections`) ``` ### Rationale This structure addresses administrator persona needs (User Story 3) with: 1. **Reference Tables (FR-021)**: Scannable format for quick lookup during configuration. Columns include defaults to support "what happens when user doesn't set variable" edge case. 2. **Calculation Examples (FR-024)**: Concrete scenarios prevent misconfiguration. Formula + examples + validation commands enable administrators to verify their configuration before deployment. 3. **Tradeoff Documentation (FR-022, FR-023)**: Explicit pros/cons for tuning decisions help administrators make informed choices based on their deployment context (small team vs multi-tenant). 4. **Production Tuning Guidance (FR-025)**: PostgreSQL-specific recommendations address database performance requirements. Rationale column explains why settings matter for codebase-mcp workload. 5. **Monitoring Metrics (FR-038)**: SQL queries for connection pool monitoring enable administrators to validate deployment health and diagnose issues. 6. **Validation Checklist (FR-027)**: Pre-deployment checklist with commands prevents common configuration errors. Addresses "administrator deploys correctly on first attempt" success criterion (SC-010). ### Alternatives Considered - **Alternative 1: Prose-Only Configuration Docs** - Rejected because administrators need reference format, not narrative. Tables enable quick scanning during configuration. - **Alternative 2: Auto-Generated Config Docs from Schema** - Rejected because Phase 05 is documentation-only. Auto-generation requires Phase 07 tooling. - **Alternative 3: Separate Configuration File with Comments** - Rejected because environment variables are the configuration mechanism (12-factor app pattern). Documentation explains the variables, not replaces them. - **Alternative 4: Configuration Wizard/Script** - Rejected because interactive tooling requires Phase 07 development. Manual configuration with validation checklist is appropriate for Phase 05. ### Implementation Guidance **For this project, apply these specific patterns**: 1. **Environment Variable Tables**: - Group related variables (Core, Multi-Project, Integration) - Required column distinguishes mandatory from optional - Default column shows behavior when unset - Example column provides concrete value format 2. **Calculation Examples**: - Lead with formula - Provide 3-5 concrete scenarios - Include validation commands - Explain buffer/overhead calculations 3. **Tradeoff Documentation**: - Use ✅ ❌ symbols for scannable pros/cons - Provide context-specific recommendations - Link to monitoring section for validation 4. **PostgreSQL Tuning**: - Use actual configuration file syntax (postgresql.conf) - Include comments explaining rationale - Provide ranges, not single values - Link to PostgreSQL official docs for deeper tuning 5. **Validation Checklist**: - Checkbox format for copy-paste workflow - Include exact validation commands - Show expected output where helpful - Order by dependency (database before application) --- ## 4. Architecture Documentation ### Decision Use **Mermaid Diagrams + Component Prose + Code Examples** pattern: ```markdown # Architecture: Multi-Project Design > Technical overview of codebase-mcp v2.0 architecture, covering database-per-project strategy, connection pool management, and workflow-mcp integration. ## Architecture Overview High-level description of system components and design philosophy. ## Multi-Project Architecture Diagram (FR-028) ```mermaid graph TB Client[MCP Client Claude/Cursor/etc] Server[codebase-mcp Server FastMCP + Python SDK] PoolMgr[Connection Pool Manager LRU Cache] WorkflowMCP[workflow-mcp Server Optional] DB_Default[(PostgreSQL codebase_default)] DB_Proj1[(PostgreSQL codebase_proj1)] DB_Proj2[(PostgreSQL codebase_proj2)] Client -->|SSE| Server Server -->|index_repository search_code| PoolMgr Server -.->|get_active_project Optional| WorkflowMCP PoolMgr -->|Project: default| DB_Default PoolMgr -->|Project: proj1| DB_Proj1 PoolMgr -->|Project: proj2| DB_Proj2 PoolMgr -.->|LRU Eviction when MAX_PROJECTS exceeded| DB_Proj2 style Server fill:#e1f5ff style PoolMgr fill:#fff4e1 style WorkflowMCP fill:#f0f0f0,stroke-dasharray: 5 5 ``` **Component Responsibilities**: - **MCP Client**: AI coding assistant (Claude, Cursor) invoking tools via SSE - **codebase-mcp Server**: FastMCP-based server exposing 2 tools with multi-project routing - **Connection Pool Manager**: Maintains up to MAX_PROJECTS pools, LRU eviction policy - **workflow-mcp Server**: Optional automatic project detection via get_active_project - **PostgreSQL Databases**: One database per project for schema isolation ## Database-Per-Project Strategy (FR-029) ### Design Decision Each project gets an isolated PostgreSQL database with naming convention: ``` codebase_{sanitized_project_id} ``` **Examples**: | project_id | Database Name | Notes | |------------|---------------|-------| | `default` | `codebase_default` | Used when no project_id specified | | `my-app` | `codebase_my_app` | Hyphens converted to underscores | | `frontend.ui` | `codebase_frontend_ui` | Dots converted to underscores | | `client_a` | `codebase_client_a` | Underscores preserved | **Sanitization Rules**: - Convert non-alphanumeric characters to underscores - Lowercase all characters - Preserve underscores - Max length: 63 characters (PostgreSQL limit) minus `codebase_` prefix (9 chars) = 54 chars for project_id ### Rationale Database-per-project isolation provides: 1. **Schema Independence**: Each project has identical schema (repositories, code_chunks, embeddings tables) without cross-project contamination 2. **Query Performance**: No project_id filtering needed in queries; PostgreSQL query planner optimizes per-database 3. **Data Locality**: Related code chunks physically co-located, improving cache efficiency 4. **Backup Granularity**: Drop/backup/restore projects independently 5. **Multi-Tenancy**: Natural tenant isolation for SaaS deployments 6. **Resource Limits**: PostgreSQL database-level resource limits if needed **Tradeoffs**: - ❌ More PostgreSQL databases to manage (monitoring, backups) - ❌ More connections consumed (MAX_PROJECTS × MAX_CONNECTIONS_PER_POOL) - ✅ Better isolation, performance, and operational flexibility ### Schema Structure Each project database contains identical schema: ```sql -- Simplified schema (see full schema in migrations/) CREATE TABLE repositories ( id UUID PRIMARY KEY, repo_path TEXT NOT NULL, indexed_at TIMESTAMP DEFAULT NOW() ); CREATE TABLE code_chunks ( id UUID PRIMARY KEY, repository_id UUID REFERENCES repositories(id), file_path TEXT NOT NULL, content TEXT NOT NULL, start_line INT, end_line INT, embedding vector(768) -- pgvector type ); CREATE INDEX idx_embedding ON code_chunks USING ivfflat (embedding vector_cosine_ops); ``` ## Connection Pool Design (FR-030) ### Pool Lifecycle ```mermaid stateDiagram-v2 [*] --> Uninitialized Uninitialized --> Creating: First query for project Creating --> Active: Pool created Active --> Active: Queries served Active --> Idle: No queries for PROJECT_POOL_TIMEOUT Idle --> Active: New query received Idle --> Evicted: LRU eviction (MAX_PROJECTS limit) Active --> Evicted: LRU eviction (MAX_PROJECTS limit) Evicted --> [*] Evicted --> Creating: Query after eviction (recreates pool) ``` **State Descriptions**: - **Uninitialized**: No pool exists for project yet - **Creating**: Pool initialization in progress (database creation, connection opening) - **Active**: Pool serving queries, connections open - **Idle**: No active queries, connections kept warm for PROJECT_POOL_TIMEOUT seconds - **Evicted**: Pool closed due to LRU eviction when MAX_PROJECTS limit exceeded ### LRU Eviction Policy (FR-030) When a query arrives for project N+1 and MAX_PROJECTS pools already exist: 1. Identify least-recently-used pool (oldest last query timestamp) 2. Close all connections in LRU pool 3. Remove pool from cache 4. Create new pool for project N+1 5. Serve query **Example Timeline**: ``` MAX_PROJECTS=3 t0: Query project A → Pool A created (Pools: A) t1: Query project B → Pool B created (Pools: A, B) t2: Query project C → Pool C created (Pools: A, B, C) t3: Query project A → Pool A reused (Pools: A, B, C; LRU order: B, C, A) t4: Query project D → Pool B evicted (least recently used), Pool D created (Pools: A, C, D) ``` **Performance Implications**: - Pool creation: ~100-500ms (database creation, schema initialization) - Pool eviction: ~50-100ms (connection cleanup) - Frequent eviction indicates MAX_PROJECTS too low for working set ### Connection Pool Configuration Each pool maintains: - **Min connections**: 2 (kept warm) - **Max connections**: MAX_CONNECTIONS_PER_POOL (default 20) - **Timeout**: 30 seconds for acquiring connection - **Idle timeout**: PROJECT_POOL_TIMEOUT (default 300s) **Concurrency Example**: ``` MAX_CONNECTIONS_PER_POOL=20 - 5 concurrent indexing operations (5 connections) - 15 concurrent search queries (15 connections) Total: 20 connections in use 21st query waits for available connection (up to 30s timeout) ``` ## workflow-mcp Integration Architecture (FR-031) ```mermaid sequenceDiagram participant Client as MCP Client participant CBServer as codebase-mcp participant WFServer as workflow-mcp participant DB as PostgreSQL Client->>CBServer: search_code(query, project_id=None) CBServer->>WFServer: get_active_project() [timeout=5s] alt workflow-mcp responds WFServer-->>CBServer: {"active_project": "frontend-ui"} CBServer->>DB: Query codebase_frontend_ui DB-->>CBServer: Search results else workflow-mcp timeout CBServer->>DB: Query codebase_default (fallback) DB-->>CBServer: Search results end CBServer-->>Client: Search results ``` **Integration Behavior**: 1. If `project_id` parameter provided: Use specified project (no workflow-mcp call) 2. If `project_id=None` and `WORKFLOW_MCP_URL` set: Call `get_active_project()` with timeout 3. If workflow-mcp responds: Use returned project_id 4. If workflow-mcp timeout/error: Fallback to "default" project 5. If `WORKFLOW_MCP_URL` not set: Use "default" project directly **Fallback Rationale**: - Local-first architecture principle (Principle II) - System must function without external dependencies - Graceful degradation under network issues ## Component Interactions ### Indexing Flow ``` Client → index_repository(repo_path, project_id) ↓ Pool Manager → Get/create pool for project_id ↓ Database Creation → Create codebase_{project_id} if not exists ↓ Repository Scanning → Tree-sitter AST parsing ↓ Chunking → Extract code chunks with context ↓ Embedding Generation → Ollama API call (nomic-embed-text) ↓ Storage → Insert into code_chunks table with vector ↓ Response → {files_indexed, chunks_created, duration} ``` ### Search Flow ``` Client → search_code(query, project_id) ↓ Project Resolution → project_id or workflow-mcp or default ↓ Pool Manager → Get pool for resolved project_id ↓ Query Embedding → Ollama API call for query vector ↓ Vector Search → pgvector cosine similarity query ↓ Result Ranking → Top-k results with scores ↓ Context Retrieval → Fetch surrounding lines ↓ Response → {results, similarity_scores, context} ``` ## Design Rationale Summary **Why Multi-Project Architecture?** - Enables SaaS/multi-tenant deployments - Supports workflow-mcp ecosystem integration - Maintains simplicity (single codebase, generic tools) **Why Database-Per-Project?** - Better isolation, performance, and operational flexibility - Aligns with PostgreSQL strengths (per-database optimization) - Simplifies schema management (no discriminator columns) **Why Connection Pool Manager?** - Amortizes connection overhead across queries - LRU eviction balances resource usage and performance - Configurable limits adapt to deployment size **Why Optional workflow-mcp?** - Local-first architecture (works standalone) - Graceful degradation (fallback to default project) - Enables advanced workflows without coupling ``` ### Rationale This structure addresses maintainer persona needs (User Story 5) with: 1. **Mermaid Diagrams (FR-028, FR-031)**: Visual representation of architecture aids comprehension. GitHub renders Mermaid natively (no build tooling). Text-based format enables version control diffs and review. 2. **Component Prose**: Explains design decisions with rationale. Answers "why this design?" for contributors evaluating alternatives or debugging issues. 3. **Code Examples**: Concrete SQL, naming examples, and flow diagrams make abstract concepts tangible. Supports "maintainer understands architecture well enough to contribute" (SC-012). 4. **Database Naming Convention (FR-029)**: Explicit sanitization rules with examples prevent confusion. Documents implementation detail critical for debugging database connection issues. 5. **LRU Eviction Explanation (FR-030)**: Timeline example clarifies lifecycle. Performance implications guide tuning decisions and troubleshooting. ### Alternatives Considered - **Alternative 1: PlantUML Diagrams** - Rejected because GitHub doesn't render PlantUML natively (requires external tool or embedded images). Mermaid provides native GitHub support. - **Alternative 2: ASCII Diagrams** - Rejected because complexity of multi-project architecture exceeds ASCII diagram readability. Mermaid provides better visual clarity. - **Alternative 3: External Diagramming Tool (Lucidchart, Draw.io)** - Rejected because external tools create binary artifacts not version-controllable as text. Updates require manual export/import workflow. - **Alternative 4: Architecture Decision Records (ADRs)** - Considered complementary but not replacement. ADRs document decisions over time; architecture guide documents current state. Future consideration for Phase 07. ### Implementation Guidance **For this project, apply these specific patterns**: 1. **Mermaid Diagram Standards**: - Use `graph TB` (top-bottom) for component diagrams - Use `sequenceDiagram` for interaction flows - Use `stateDiagram-v2` for lifecycle diagrams - Include legend/annotations for symbol meanings - Keep diagrams focused (one concept per diagram) 2. **Database Naming Examples**: - Provide table with 5-10 examples - Cover edge cases (special characters, length limits) - Show sanitization rules explicitly - Link to code implementing sanitization 3. **Component Descriptions**: - Start with overview (what it does) - Explain responsibilities (what it's responsible for) - Provide rationale (why this design) - Show interactions (how it connects to other components) 4. **Flow Diagrams**: - Use indented ASCII for simple flows - Use Mermaid sequence diagrams for complex flows - Include failure paths, not just happy path - Annotate with timing/performance notes 5. **Design Rationale**: - Summarize key decisions at end - Link decisions to constitutional principles - Explain tradeoffs explicitly - Provide "Why X?" subsections for major choices --- ## 5. API Reference Documentation ### Decision Use **OpenAPI-Inspired Structure + Explicit Deprecation Notices** pattern: ```markdown # API Reference: MCP Tools > Complete reference for codebase-mcp v2.0 MCP tools, including parameters, return values, and examples. ## Overview codebase-mcp v2.0 provides **2 MCP tools** for semantic code search: 1. `index_repository` - Index a code repository for semantic search 2. `search_code` - Search indexed repositories using natural language **Breaking Changes from v1.x**: See [Removed Tools](#removed-tools) section below. ## index_repository Indexes a code repository into PostgreSQL with semantic embeddings for natural language search. ### Parameters | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `repo_path` | string | Yes | - | Absolute path to repository directory to index | | `project_id` | string | No | `"default"` | Project identifier for multi-project isolation. Defaults to "default" for standalone usage. | | `force_reindex` | boolean | No | `false` | Force full re-index even if already indexed | **Parameter Details**: **`repo_path`** (string, required) - Must be absolute path, not relative - Directory must exist and be readable - Supports git repositories (respects .gitignore) and plain directories - Example: `/Users/alice/projects/myapp` **`project_id`** (string, optional) - Identifies project for database isolation - Sanitized for PostgreSQL database name: lowercase, non-alphanumeric → underscore - Max length: 54 characters (PostgreSQL limit - `codebase_` prefix) - Example: `"frontend-ui"` → database `codebase_frontend_ui` - If omitted: Uses "default" project (database `codebase_default`) - If `WORKFLOW_MCP_URL` set and `project_id` omitted: Automatically resolved via `get_active_project()` **`force_reindex`** (boolean, optional) - `false` (default): Skip indexing if repository already indexed (checks by repo_path) - `true`: Delete existing index and re-index from scratch - Use case: Code changes since last index, corrupted index ### Return Value ```json { "repository_id": "uuid", "files_indexed": 1234, "chunks_created": 5678, "duration_seconds": 45.2, "project_id": "frontend-ui", "schema_name": "codebase_frontend_ui", "status": "success" } ``` | Field | Type | Description | |-------|------|-------------| | `repository_id` | string (UUID) | Unique identifier for indexed repository | | `files_indexed` | integer | Count of files successfully indexed | | `chunks_created` | integer | Count of code chunks extracted and embedded | | `duration_seconds` | float | Time taken for indexing operation | | `project_id` | string | Project identifier used (resolved or provided) | | `schema_name` | string | PostgreSQL database name used | | `status` | string | Operation status: "success", "partial", "failed" | ### Examples **Basic Usage (Default Project)**: ```json { "repo_path": "/Users/alice/projects/myapp" } ``` **Multi-Project Usage**: ```json { "repo_path": "/Users/alice/clients/acme-corp/backend", "project_id": "acme-backend" } ``` **Force Re-Index**: ```json { "repo_path": "/Users/alice/projects/myapp", "force_reindex": true } ``` ### Error Handling | Error | Cause | Resolution | |-------|-------|------------| | `ValueError: repo_path must be absolute` | Relative path provided | Provide absolute path starting with `/` | | `ValueError: Directory does not exist` | Invalid repo_path | Verify directory exists and is readable | | `RuntimeError: Database connection failed` | PostgreSQL unreachable | Check `DATABASE_URL` configuration | | `RuntimeError: Ollama API unavailable` | Ollama not running | Start Ollama server, verify `OLLAMA_BASE_URL` | --- ## search_code Searches indexed code repositories using natural language queries with semantic similarity matching. ### Parameters | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `query` | string | Yes | - | Natural language search query | | `project_id` | string | No | `"default"` | Project identifier for multi-project isolation | | `repository_id` | string (UUID) | No | - | Filter by specific repository | | `file_type` | string | No | - | Filter by file extension (e.g., "py", "js") | | `directory` | string | No | - | Filter by directory path (supports wildcards) | | `limit` | integer | No | `10` | Maximum results to return (1-50) | **Parameter Details**: **`query`** (string, required) - Natural language description of code to find - Examples: "authentication middleware", "database connection pooling", "error handling in API routes" - Minimum length: 2 characters **`project_id`** (string, optional) - Same behavior as `index_repository`: defaults to "default", auto-resolved if workflow-mcp configured **`repository_id`** (string, optional) - UUID returned from `index_repository` - Filters results to single repository - Useful when multiple repositories indexed in same project **`file_type`** (string, optional) - File extension without dot: "py", "js", "go", "rs" - Case-insensitive - Filters results to matching file types **`directory`** (string, optional) - Relative path within repository: "src/api", "tests" - Supports wildcards: "src/*/handlers" - Case-sensitive **`limit`** (integer, optional) - Range: 1-50 - Default: 10 - Returns top-k results by similarity score ### Return Value ```json { "results": [ { "chunk_id": "uuid", "file_path": "src/api/auth.py", "content": "def authenticate_user(username, password):\n ...", "start_line": 15, "end_line": 25, "similarity_score": 0.92, "context_before": "# Authentication module\n", "context_after": "\n# Password hashing utilities" } ], "total_count": 42, "project_id": "acme-backend", "schema_name": "codebase_acme_backend", "latency_ms": 250 } ``` | Field | Type | Description | |-------|------|-------------| | `results` | array | List of matching code chunks (see Result Object below) | | `total_count` | integer | Total matches before limit applied | | `project_id` | string | Project identifier used | | `schema_name` | string | PostgreSQL database queried | | `latency_ms` | float | Query execution time (milliseconds) | **Result Object**: | Field | Type | Description | |-------|------|-------------| | `chunk_id` | string (UUID) | Unique identifier for code chunk | | `file_path` | string | Relative path to file within repository | | `content` | string | Code chunk content | | `start_line` | integer | Starting line number in file | | `end_line` | integer | Ending line number in file | | `similarity_score` | float | Cosine similarity score (0.0-1.0, higher is better) | | `context_before` | string | Lines before chunk (for context) | | `context_after` | string | Lines after chunk (for context) | ### Examples **Basic Search**: ```json { "query": "database connection pooling" } ``` **Multi-Project Search**: ```json { "query": "authentication middleware", "project_id": "acme-backend" } ``` **Filtered Search**: ```json { "query": "error handling", "file_type": "py", "directory": "src/api", "limit": 20 } ``` ### Error Handling | Error | Cause | Resolution | |-------|-------|------------| | `ValueError: query too short` | Query <2 characters | Provide longer query | | `ValueError: limit out of range` | Limit <1 or >50 | Use limit between 1-50 | | `ValueError: project not found` | project_id has no database | Index repository first | | `RuntimeError: Ollama API unavailable` | Ollama not running | Start Ollama server | --- ## Removed Tools (v2.0) The following **14 tools** were removed in v2.0 as part of the architecture refactoring to focus exclusively on semantic code search. Users needing project management functionality should use the separate [workflow-mcp server](https://github.com/workflow-mcp/workflow-mcp). ### Removed Project Management Tools - ❌ **`create_project`** - Removed in v2.0. Use workflow-mcp for project management. - ❌ **`switch_project`** - Removed in v2.0. Use `project_id` parameter instead. - ❌ **`get_active_project`** - Removed in v2.0. Use workflow-mcp integration or explicit `project_id`. - ❌ **`list_projects`** - Removed in v2.0. Use workflow-mcp for project management. ### Removed Entity Management Tools - ❌ **`register_entity_type`** - Removed in v2.0. Use workflow-mcp for entity management. - ❌ **`create_entity`** - Removed in v2.0. Use workflow-mcp for entity management. - ❌ **`query_entities`** - Removed in v2.0. Use workflow-mcp for entity management. - ❌ **`update_entity`** - Removed in v2.0. Use workflow-mcp for entity management. - ❌ **`delete_entity`** - Removed in v2.0. Use workflow-mcp for entity management. - ❌ **`update_entity_type_schema`** - Removed in v2.0. Use workflow-mcp for entity management. ### Removed Work Item Management Tools - ❌ **`create_work_item`** - Removed in v2.0. Use workflow-mcp for work item management. - ❌ **`update_work_item`** - Removed in v2.0. Use workflow-mcp for work item management. - ❌ **`query_work_items`** - Removed in v2.0. Use workflow-mcp for work item management. - ❌ **`get_work_item_hierarchy`** - Removed in v2.0. Use workflow-mcp for work item management. **Migration Path**: See [v1-to-v2 Migration Guide](../migration/v1-to-v2-migration.md) for upgrade instructions. --- ## Tool Discovery MCP clients discover tools via SSE (Server-Sent Events) `tools/list` request. codebase-mcp v2.0 returns: ```json { "tools": [ { "name": "index_repository", "description": "Index a code repository for semantic search", "inputSchema": { ... } }, { "name": "search_code", "description": "Search indexed code using natural language", "inputSchema": { ... } } ] } ``` **JSON Schema Validation**: MCP clients validate parameters against `inputSchema` before tool invocation. See [MCP Protocol Specification](https://github.com/modelcontextprotocol/specification) for details. ``` ### Rationale This structure addresses multiple personas: 1. **OpenAPI-Inspired Structure**: Familiar format for developers accustomed to REST API documentation. Clear sections (Parameters, Return Value, Examples, Errors) provide complete reference. 2. **Explicit Deprecation Notices (FR-007, FR-008)**: Removed tools section prominently displays 14 removed tools by name with v2.0 notation. Includes migration path link. Addresses "v1.x user looks for removed tool" edge case. 3. **Parameter Documentation**: Table format for scannable reference. Detailed subsections explain behavior, validation, examples. Addresses `project_id` parameter default behavior (FR-005, FR-006). 4. **Return Value Documentation**: JSON schema + table description. Shows actual response structure, not prose description. 5. **Error Handling**: Table of common errors with causes and resolutions. Enables users to troubleshoot independently. 6. **Examples**: Multiple examples per tool showing common use cases. Addresses "new user completes first indexing/search" (User Story 2). ### Alternatives Considered - **Alternative 1: Pure OpenAPI YAML** - Rejected because OpenAPI is REST-focused, not MCP-focused. MCP uses JSON-RPC over SSE, not HTTP. OpenAPI structure inspires documentation but full spec doesn't apply. - **Alternative 2: Swagger UI Generation** - Rejected because it requires build tooling and doesn't match MCP protocol. Manual Markdown is more appropriate for Phase 05 documentation-only scope. - **Alternative 3: JSDoc/Pydantic Schema Generation** - Rejected because Phase 05 is documentation-only (no code changes). Auto-generation deferred to Phase 07. - **Alternative 4: Separate Page Per Tool** - Rejected because 2 tools don't justify multi-page structure. Single page with anchor links provides better UX for small API surface. ### Implementation Guidance **For this project, apply these specific patterns**: 1. **Tool Documentation Structure**: - Lead with overview (brief description) - Parameters section: table + detailed subsections - Return value section: JSON example + table description - Examples section: 3-5 common use cases - Error handling section: table of errors 2. **Parameter Tables**: - Columns: Parameter, Type, Required, Default, Description - Follow with detailed subsections for complex parameters - Show validation rules explicitly - Link to related configuration (e.g., workflow-mcp integration) 3. **Return Value Documentation**: - Show actual JSON structure (copy from implementation) - Table describing each field - Explain field semantics (not just types) 4. **Examples**: - Use JSON format (MCP tool parameters are JSON) - Progress from simple to complex - Cover multi-project usage - Show filters and advanced parameters 5. **Removed Tools Section**: - List all 14 tools by name (FR-010) - Use ❌ symbol for visual indicator - Group by category (Project, Entity, Work Item) - Link to migration guide - Link to workflow-mcp as alternative --- ## 6. Documentation Validation ### Decision Use **Manual Validation Checklist + Documented Procedures** pattern: ```markdown # Documentation Validation Procedures > Manual validation procedures for ensuring documentation accuracy, completeness, and correctness before publication. ## Validation Overview This document defines manual validation procedures for Phase 05 documentation artifacts. Automated validation (link checking, example testing) is deferred to Phase 07 CI integration per FR-032/FR-033. ## Validation Checklist Complete before PR submission: ### Completeness Validation - [ ] All 38 functional requirements addressed in documentation - [ ] All 5 user personas served (README, migration, config, architecture, API reference) - [ ] All 11 edge cases addressed in relevant docs - [ ] All 14 removed tools documented by name - [ ] All environment variables documented with defaults - [ ] All constitutional principles reflected accurately ### Accuracy Validation - [ ] Tool count is 2 (index_repository, search_code) - no other tools mentioned - [ ] Multi-project support explained with project_id parameter - [ ] workflow-mcp integration marked optional with standalone usage documented first - [ ] Database-per-project naming convention examples correct - [ ] Connection pool calculation formula correct - [ ] PostgreSQL version requirements accurate (14+) - [ ] Python version requirements accurate (3.11+) ### Link Validation (FR-032) - [ ] All internal links tested (relative paths resolve) - [ ] All external links tested (URLs return 200) - [ ] All anchor links tested (heading anchors exist) - [ ] Link inventory spreadsheet complete ### Code Example Validation (FR-033) - [ ] All bash commands tested and execute successfully - [ ] All SQL queries tested against actual database - [ ] All JSON examples validate against schemas - [ ] All configuration examples tested in staging environment ### Terminology Consistency (FR-034) - [ ] Glossary terms used consistently across all docs - [ ] Technical terms match implementation (codebase, not codebase-mcp) - [ ] Capitalization consistent (PostgreSQL, not Postgres) - [ ] Abbreviations defined on first use ### Formatting Consistency (FR-035) - [ ] Markdown style guide followed - [ ] Code blocks have language specifiers - [ ] Tables formatted consistently - [ ] Heading hierarchy correct (no skipped levels) - [ ] File names follow convention (lowercase-with-hyphens.md) ## Link Validation Procedure ### Tools Required - Web browser - Text editor - Spreadsheet for link inventory ### Procedure 1. **Extract All Links**: ```bash # Find all markdown links in docs/ grep -r '\[.*\](.*)' docs/ README.md | grep -v '.pyc' > links.txt ``` 2. **Create Link Inventory**: - Spreadsheet columns: Source File, Link Text, URL/Path, Status, Notes - Categorize: Internal (relative), External (absolute), Anchor (fragment) 3. **Test Internal Links**: - For each relative path link: - Verify target file exists at path - If anchor: verify heading exists in target - Record status: ✅ Pass / ❌ Fail 4. **Test External Links**: - For each absolute URL: - Visit URL in browser - Verify returns 200 (not 404, 403, etc.) - Record status and note if redirect occurred 5. **Test Anchor Links**: - For each fragment identifier (#heading): - Verify heading exists in same document - Verify anchor matches GitHub slug format (lowercase, hyphens) - Record status 6. **Document Failures**: - Create GitHub issue for each broken link - Fix before PR submission - Re-test after fix ### Expected Outcome - 0 broken links (SC-003) - Link inventory spreadsheet complete - All links marked ✅ Pass ## Code Example Validation Procedure ### Tools Required - v2.0 codebase-mcp installation - PostgreSQL 14+ - Test repository for indexing - Staging environment (for production config examples) ### Procedure 1. **Identify All Code Examples**: - Bash commands (` ```bash `) - SQL queries (` ```sql `) - JSON examples (` ```json `) - Configuration files (` ```ini `, ` ```yaml `) 2. **Test Bash Commands**: - Copy command from documentation - Execute in terminal - Verify exit code 0 (success) - Verify expected output matches documentation - Record: Command, Expected Output, Actual Output, Status 3. **Test SQL Queries**: - Copy query from documentation - Execute against test database: `psql $DATABASE_URL -c "QUERY"` - Verify result matches documentation - Record: Query, Expected Result, Actual Result, Status 4. **Test JSON Examples**: - Copy JSON from documentation - Validate syntax: `echo 'JSON' | jq .` - Verify against schema (if applicable) - Test with actual MCP client (if tool invocation example) - Record: Example, Validation Result, Status 5. **Test Configuration Examples**: - Copy configuration to staging environment - Apply configuration (environment variables, postgresql.conf) - Verify system behavior matches documentation - Record: Config, Expected Behavior, Actual Behavior, Status 6. **Document Failures**: - Create GitHub issue for each failed example - Fix documentation or note caveats - Re-test after fix ### Expected Outcome - 100% of code examples execute successfully (SC-004) - Code example test log complete - All examples marked ✅ Pass ## Migration Guide Validation Procedure ### Tools Required - v1.x codebase-mcp installation - v2.0 codebase-mcp installation - Test PostgreSQL database - Backup storage ### Procedure 1. **Setup Test Environment**: - Install v1.x in isolated environment - Create test data (indexed repositories) - Document baseline state 2. **Test Backup Procedures**: - Follow backup procedures from migration guide exactly - Verify backup files created - Verify backup files restorable - Record: Backup Step, Command, Result 3. **Test Upgrade Procedure**: - Follow upgrade procedure from migration guide step-by-step - Execute each command exactly as documented - Verify expected output at each step - Record: Upgrade Step, Expected Output, Actual Output 4. **Test Validation Commands**: - Run all validation commands from guide - Verify v2.0 functionality works - Record: Validation Command, Expected Result, Actual Result 5. **Test Rollback Procedure**: - Follow rollback procedure from migration guide step-by-step - Restore backups - Verify v1.x functionality restored - Record: Rollback Step, Expected Output, Actual Output 6. **Test Diagnostic Commands**: - Simulate partial migration failure - Run diagnostic commands (FR-036) - Verify commands detect state correctly - Record: Diagnostic Command, Scenario, Result 7. **Document Issues**: - Note any steps that fail or produce unexpected output - Update migration guide with corrections - Re-test full procedure after updates ### Expected Outcome - Upgrade procedure completes successfully - Rollback procedure restores v1.x functionality - Diagnostic commands detect migration state - Migration guide validated end-to-end ## Configuration Guide Validation Procedure ### Tools Required - Staging environment (not production) - PostgreSQL 14+ - Monitoring tools (psql, connection count queries) ### Procedure 1. **Test Environment Variable Examples**: - Copy each example value from table - Set environment variable - Verify application accepts value - Record: Variable, Example Value, Status 2. **Test Connection Calculation Examples**: - Copy calculation formula - Plug in example scenario values - Verify calculation result matches table - Record: Scenario, Formula, Calculated Result, Table Result 3. **Test PostgreSQL Tuning Examples**: - Apply postgresql.conf settings to staging database - Restart PostgreSQL - Verify settings applied: `SHOW setting_name;` - Record: Setting, Documented Value, Actual Value 4. **Test Validation Checklist**: - Follow validation checklist commands - Verify each command produces expected result - Record: Checklist Item, Command, Expected Result, Actual Result 5. **Test Monitoring Queries**: - Run connection pool monitoring SQL queries - Verify queries execute and return sensible data - Record: Query, Expected Columns, Actual Result 6. **Document Issues**: - Note any configuration that doesn't work as documented - Update configuration guide with corrections - Re-test after updates ### Expected Outcome - All environment variable examples valid - All calculations correct - All PostgreSQL settings apply successfully - Validation checklist commands work ## Review Checklist Final review before PR submission: ### Content Review - [ ] All validation procedures completed - [ ] All failures documented and fixed - [ ] Cross-reference consistency (no conflicting information) - [ ] Tone consistent (technical but approachable) - [ ] No placeholder text (`TODO`, `[FILL IN]`, etc.) ### Persona Coverage - [ ] New user can complete first installation and search (User Story 2) - [ ] Existing user can upgrade safely (User Story 1) - [ ] Administrator can configure production (User Story 3) - [ ] Developer can integrate workflow-mcp (User Story 4) - [ ] Maintainer understands architecture (User Story 5) ### Quality Metrics - [ ] 0 broken links (SC-003) - [ ] 100% code examples tested (SC-004) - [ ] 100% functional requirements addressed - [ ] 100% removed tools documented (SC-002) ### Approval - [ ] Author validation complete - [ ] Peer review completed (separate reviewer) - [ ] Reviewer confirms link validation - [ ] Reviewer confirms code example validation ``` ### Rationale This approach addresses Phase 05 scope constraints: 1. **Manual Validation (FR-032, FR-033)**: Appropriate for documentation-only phase. Automated testing deferred to Phase 07 CI integration per explicit scope boundary. Manual validation ensures accuracy without introducing new tooling dependencies. 2. **Documented Procedures**: Checklists and step-by-step procedures enable consistent validation across contributors. Reduces subjectivity in "documentation quality" assessment. 3. **Spreadsheet-Based Tracking**: Simple tool (Excel/Google Sheets) for link inventory and test logs. No custom tooling required. Enables progress tracking and issue identification. 4. **Staging Environment Testing**: Configuration and migration guides tested in non-production environment. Prevents documentation errors from impacting production systems. 5. **Peer Review Requirement**: Separate reviewer confirms validation completed. Catches author blind spots and ensures compliance with checklist. ### Alternatives Considered - **Alternative 1: Automated Link Checking (markdownlint, linkchecker)** - Rejected for Phase 05 because automation requires CI integration. Explicitly scoped to Phase 07. Manual validation sufficient for initial publication. - **Alternative 2: Automated Example Testing (doctest-style)** - Rejected for Phase 05 because it requires code annotations and test runner. Deferred to Phase 07 along with other automation. - **Alternative 3: Documentation Linter (Vale, write-good)** - Rejected because it requires tooling installation and configuration. Style consistency achieved through markdown style guide and manual review. - **Alternative 4: Third-Party Review Service** - Rejected because it introduces external dependency and cost. Peer review within team is sufficient for technical documentation. ### Implementation Guidance **For this project, apply these specific patterns**: 1. **Link Inventory Spreadsheet Structure**: - Columns: Source File, Link Text, URL/Path, Type (Internal/External/Anchor), Status, Tested By, Tested Date, Notes - Sort by source file for easier cross-referencing - Filter by status to identify failures quickly 2. **Code Example Test Log Structure**: - Columns: Doc File, Example Type (Bash/SQL/JSON/Config), Example Description, Command/Code, Expected Result, Actual Result, Status, Tested By, Tested Date - One row per example - Include screenshots for complex examples if helpful 3. **Validation Checklist Usage**: - Copy checklist to PR description - Check boxes as completed - Link to evidence (link inventory spreadsheet, test log) - Reviewer re-verifies critical items 4. **Migration Guide Testing**: - Use disposable VM or Docker container for test environment - Document exact steps taken (enables reproducibility) - Take database dumps at checkpoints (enables rollback testing) - Include timing information (informs duration guidance in future) 5. **Configuration Guide Testing**: - Use staging database separate from test/dev databases - Test both minimal and maximal configurations - Verify monitoring queries with actual data - Document any platform-specific issues (Linux vs macOS vs Docker) 6. **Review Process**: - Author completes validation, creates PR - Reviewer spot-checks 20% of links and examples - Reviewer runs migration guide in their own environment - Both author and reviewer sign off on checklist --- ## Key Takeaways 1. **GitHub-Flavored Markdown with Hierarchical Navigation**: Use GFM for universal compatibility. Structure docs with H1 title, H2 sections, TOC for long docs. Cross-reference with relative paths. Enables version control, accessibility, and no-build-tooling simplicity. 2. **Breaking Changes First + Step-by-Step + Rollback Pattern**: Migration guides must communicate risk upfront before procedures. Sequential steps with validation at each checkpoint. Complete rollback procedure provides safety net for production upgrades. 3. **Environment Variable Tables + Calculation Examples**: Configuration docs need scannable reference format (tables) with concrete examples. PostgreSQL connection calculation formula + scenarios prevent misconfiguration. Validation checklist enables pre-deployment verification. 4. **Mermaid Diagrams + Component Prose + Code Examples**: Architecture docs balance visual (Mermaid) and textual (prose) explanation. Text-based diagrams enable version control. Database naming examples with sanitization rules address implementation details critical for debugging. 5. **OpenAPI-Inspired Structure + Explicit Deprecation Notices**: API reference uses familiar structure (Parameters, Return Value, Examples, Errors) from REST API documentation. Removed tools section prominently lists 14 tools by name with migration path link. 6. **Manual Validation Checklist + Documented Procedures**: Phase 05 scope uses manual validation (link checking, example testing) with structured procedures. Automation deferred to Phase 07. Spreadsheet-based tracking and peer review ensure quality without new tooling dependencies. 7. **Multi-Persona Documentation Strategy**: 5 artifacts serve 5 distinct personas: README (new users), Migration Guide (upgrading users), Configuration Guide (administrators), Architecture Docs (maintainers), API Reference (developers). Each artifact has persona-specific content focus and technical depth. 8. **Production Quality Through Validation**: Manual validation procedures (link checking, example testing, migration testing, configuration testing) ensure documentation accuracy before publication. Validation checklists in PR process enforce quality standards. 9. **Progressive Disclosure Pattern**: Documentation structure moves from high-level (overview, quick start) to detailed (specific sections, troubleshooting). TOC and cross-references enable readers to drill down to relevant details without overwhelming initial presentation. 10. **Concrete Examples Over Abstract Descriptions**: Every concept illustrated with concrete example: database naming rules → example table; connection calculations → scenario table; migration procedure → exact commands with expected output. Reduces ambiguity and enables direct application.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

research.md•63.9 KiB