Codebase MCP Server

codebase-mcp
specs
001-build-a-production

quickstart.md•12.3 KiB

# Quickstart Integration Tests **Feature**: Production-Grade MCP Server for Semantic Code Search **Date**: 2025-10-06 **Purpose**: End-to-end validation scenarios derived from user stories ## Prerequisites 1. **PostgreSQL 14+** with pgvector extension installed 2. **Ollama** running locally with `nomic-embed-text` model 3. **Python 3.11+** with dependencies installed 4. **Test Repository** at `/tmp/test-repo` with sample code files ## Setup ```bash # Start PostgreSQL (ensure pgvector extension available) createdb codebase_mcp_test # Start Ollama ollama serve # Pull embedding model ollama pull nomic-embed-text # Initialize database python scripts/init_db.py --database-url postgresql+asyncpg://localhost/codebase_mcp_test # Create test repository mkdir -p /tmp/test-repo/src cat > /tmp/test-repo/src/calculator.py << 'EOF' def add(a: int, b: int) -> int: """Add two numbers and return the result.""" return a + b def multiply(a: int, b: int) -> int: """Multiply two numbers and return the result.""" return a * b class Calculator: """A simple calculator class.""" def __init__(self): self.history = [] def calculate(self, operation: str, a: int, b: int) -> int: """Perform calculation and store in history.""" if operation == "add": result = add(a, b) elif operation == "multiply": result = multiply(a, b) else: raise ValueError(f"Unknown operation: {operation}") self.history.append((operation, a, b, result)) return result EOF cat > /tmp/test-repo/src/utils.py << 'EOF' def format_result(value: int, decimals: int = 2) -> str: """Format a numeric result as a string.""" return f"{value:.{decimals}f}" def log_message(message: str, level: str = "INFO") -> None: """Log a message to console.""" print(f"[{level}] {message}") EOF # Create .gitignore cat > /tmp/test-repo/.gitignore << 'EOF' __pycache__/ *.pyc .env EOF # Start MCP server python -m src.main ``` --- ## Scenario 1: Repository Indexing (FR-001 to FR-006) **Given**: A code repository exists at `/tmp/test-repo` **When**: The repository is indexed via MCP tool **Then**: All source files are scanned, embeddings generated, and stored ### Test Steps 1. **Call `index_repository` tool**: ```json { "path": "/tmp/test-repo", "name": "Test Repository", "force_reindex": false } ``` 2. **Verify response**: - `status`: "success" - `files_indexed`: 2 (calculator.py, utils.py) - `chunks_created`: ≥5 (functions and class) - `duration_seconds`: <60 3. **Verify database state**: ```sql -- Check repository created SELECT * FROM repositories WHERE path = '/tmp/test-repo'; -- Check files indexed SELECT relative_path, language, is_deleted FROM code_files WHERE repository_id = '<repo_id>'; -- Expected: 2 rows (calculator.py, utils.py), is_deleted=false -- Check chunks created SELECT chunk_type, start_line, end_line FROM code_chunks JOIN code_files ON code_chunks.code_file_id = code_files.id WHERE code_files.repository_id = '<repo_id>'; -- Expected: function chunks for add, multiply, calculate; class chunk for Calculator -- Check embeddings generated SELECT COUNT(*) FROM code_chunks WHERE embedding IS NOT NULL; -- Expected: All chunks have embeddings ``` 4. **Verify .gitignore respected**: ```sql SELECT relative_path FROM code_files WHERE relative_path LIKE '%__pycache__%' OR relative_path LIKE '%.pyc'; -- Expected: 0 rows (ignored files not indexed) ``` **Success Criteria**: ✅ Repository indexed in <60s, all text files processed, ignore patterns respected --- ## Scenario 2: Incremental Updates (FR-007 to FR-010) **Given**: Repository is already indexed **When**: A file is modified **Then**: Only changed file is re-indexed ### Test Steps 1. **Modify file**: ```bash # Add new function to calculator.py cat >> /tmp/test-repo/src/calculator.py << 'EOF' def subtract(a: int, b: int) -> int: """Subtract b from a and return the result.""" return a - b EOF ``` 2. **Trigger incremental update** (automatic via file watching or manual): ```json { "path": "/tmp/test-repo", "name": "Test Repository", "force_reindex": false } ``` 3. **Verify response**: - `files_indexed`: 1 (only calculator.py) - `chunks_created`: +1 (new subtract function) - `duration_seconds`: <10 (much faster than full index) 4. **Verify change event recorded**: ```sql SELECT event_type, processed FROM change_events WHERE code_file_id IN ( SELECT id FROM code_files WHERE relative_path = 'src/calculator.py' ) ORDER BY detected_at DESC LIMIT 1; -- Expected: event_type='modified', processed=true ``` **Success Criteria**: ✅ Only modified file re-indexed, update completes in <10s --- ## Scenario 3: Semantic Code Search (FR-011 to FR-015) **Given**: Repository is indexed with embeddings **When**: A semantic search query is submitted **Then**: Relevant code snippets are returned ranked by similarity ### Test Steps 1. **Search for addition functionality**: ```json { "query": "How do I add two numbers together?", "limit": 5 } ``` 2. **Verify response**: - `results`: Contains chunk from `add()` function with high similarity score (>0.8) - `latency_ms`: <500 (p95 target) - `context_before` and `context_after`: Provide surrounding lines 3. **Search with filters**: ```json { "query": "calculator class", "file_type": "py", "directory": "src", "limit": 3 } ``` 4. **Verify filtered results**: - All results from `.py` files - All results from `src/` directory - Calculator class chunk ranked highly 5. **Search with no results**: ```json { "query": "blockchain consensus algorithm", "limit": 10 } ``` 6. **Verify graceful handling**: - `results`: [] (empty array) - `total_count`: 0 - No errors, valid response structure **Success Criteria**: ✅ Search returns relevant results in <500ms, filters work correctly, handles no-match gracefully --- ## Scenario 4: Task Creation and Tracking (FR-016 to FR-023) **Given**: MCP server is running **When**: Developer creates and manages tasks **Then**: Tasks are stored with metadata and status tracking ### Test Steps 1. **Create task**: ```json { "title": "Implement division operation", "description": "Add divide() function to calculator.py", "notes": "Remember to handle division by zero", "planning_references": ["specs/001-calculator/spec.md", "specs/001-calculator/plan.md"] } ``` 2. **Verify task created**: - Response contains `id` (UUID) - `status`: "need to be done" - `planning_references`: Contains provided file paths - `branches`: [] (empty initially) - `commits`: [] (empty initially) 3. **Update task to in-progress with branch**: ```json { "task_id": "<task_id>", "status": "in-progress", "branch": "feature/division-operation" } ``` 4. **Verify status transition**: ```sql SELECT from_status, to_status, changed_at FROM task_status_history WHERE task_id = '<task_id>' ORDER BY changed_at; -- Expected: Row with from_status='need to be done', to_status='in-progress' ``` 5. **Verify branch link**: ```sql SELECT branch_name FROM task_branch_links WHERE task_id = '<task_id>'; -- Expected: 'feature/division-operation' ``` 6. **Complete task with commit**: ```json { "task_id": "<task_id>", "status": "complete", "commit": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0" } ``` 7. **Verify commit link and history**: ```sql SELECT commit_hash FROM task_commit_links WHERE task_id = '<task_id>'; -- Expected: 'a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0' SELECT COUNT(*) FROM task_status_history WHERE task_id = '<task_id>'; -- Expected: 2 (need to be done → in-progress, in-progress → complete) ``` **Success Criteria**: ✅ Task CRUD operations work, status transitions tracked, git metadata stored --- ## Scenario 5: AI Assistant Integration (FR-024 to FR-028) **Given**: MCP server exposes tools via SSE **When**: AI assistant queries for code and task context **Then**: Contextual information is returned ### Test Steps 1. **List tasks for AI context**: ```json { "status": "in-progress", "limit": 10 } ``` 2. **Verify AI-friendly response**: - `tasks`: Array of in-progress tasks with full context - Each task includes: title, description, notes, planning_references, branches, commits - AI can understand current development state 3. **Combined search + task context**: ```json { "query": "calculator implementation", "limit": 5 } ``` 4. **Verify results linked to tasks**: - Search returns code chunks from calculator.py - If task exists referencing this code, include task reference in metadata - AI understands which code relates to which tasks 5. **Get specific task details**: ```json { "task_id": "<task_id>" } ``` 6. **Verify complete context**: - Full task details with history - Planning references point to spec/plan documents - Branch and commit information available - AI can provide informed assistance **Success Criteria**: ✅ AI assistant receives complete project context, can query code and tasks seamlessly --- ## Scenario 6: File Deletion and Retention (FR-009, Clarification) **Given**: Repository has indexed files **When**: File is deleted from filesystem **Then**: File marked deleted, retained for 90 days ### Test Steps 1. **Delete file**: ```bash rm /tmp/test-repo/src/utils.py ``` 2. **Trigger change detection**: ```json { "path": "/tmp/test-repo", "name": "Test Repository", "force_reindex": false } ``` 3. **Verify file marked deleted**: ```sql SELECT is_deleted, deleted_at FROM code_files WHERE relative_path = 'src/utils.py'; -- Expected: is_deleted=true, deleted_at=NOW ``` 4. **Verify chunks still exist**: ```sql SELECT COUNT(*) FROM code_chunks WHERE code_file_id IN ( SELECT id FROM code_files WHERE relative_path = 'src/utils.py' ); -- Expected: >0 (chunks retained) ``` 5. **Simulate 90-day retention**: ```sql -- Manually set deleted_at to 91 days ago UPDATE code_files SET deleted_at = NOW() - INTERVAL '91 days' WHERE relative_path = 'src/utils.py'; -- Run cleanup SELECT cleanup_deleted_files(); ``` 6. **Verify permanent deletion**: ```sql SELECT COUNT(*) FROM code_files WHERE relative_path = 'src/utils.py'; -- Expected: 0 (permanently removed) ``` **Success Criteria**: ✅ Deleted files marked and retained for 90 days, cleanup removes old deletions --- ## Scenario 7: Performance Validation (FR-034, FR-035) **Given**: Repository with 10,000 files **When**: Indexing and searching operations execute **Then**: Performance targets are met ### Test Steps 1. **Generate large test repository**: ```bash python scripts/generate_test_repo.py --files 10000 --output /tmp/large-repo # Creates 10,000 Python files with realistic code ``` 2. **Index large repository**: ```json { "path": "/tmp/large-repo", "name": "Large Test Repository", "force_reindex": true } ``` 3. **Verify indexing performance**: - `duration_seconds`: <60 (target met) - `files_indexed`: 10,000 - `status`: "success" 4. **Run 100 search queries and measure latency**: ```python import asyncio import time async def benchmark_search(): latencies = [] queries = [ "function to parse JSON", "error handling for network requests", # ... 98 more diverse queries ] for query in queries: start = time.perf_counter() result = await mcp_client.search_code(query=query, limit=10) latency_ms = (time.perf_counter() - start) * 1000 latencies.append(latency_ms) latencies.sort() p50 = latencies[50] p95 = latencies[95] p99 = latencies[99] print(f"P50: {p50:.0f}ms, P95: {p95:.0f}ms, P99: {p99:.0f}ms") assert p95 < 500, f"P95 latency {p95}ms exceeds 500ms target" asyncio.run(benchmark_search()) ``` 5. **Verify search performance**: - P95 latency: <500ms - P99 latency: <1000ms - No timeouts or errors **Success Criteria**: ✅ 10K file indexing in <60s, search p95 latency <500ms --- ## Cleanup ```bash # Stop MCP server # Ctrl+C # Drop test database dropdb codebase_mcp_test # Remove test repositories rm -rf /tmp/test-repo /tmp/large-repo # Stop Ollama # Ctrl+C ``` --- ## Summary All acceptance scenarios validated: 1. ✅ Repository indexing with ignore patterns 2. ✅ Incremental updates for modified files 3. ✅ Semantic search with filters and relevance 4. ✅ Task CRUD with git integration 5. ✅ AI assistant context queries 6. ✅ File deletion 90-day retention 7. ✅ Performance targets (60s indexing, 500ms search) The system meets all functional and non-functional requirements from the specification.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

quickstart.md•12.3 KiB