README.md•40.6 kB
# Codebase MCP Server
A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants.
## What's New in v2.0
Version 2.0 represents a major architectural refactoring focused exclusively on semantic code search capabilities. This release removes project management, entity tracking, and work item features to maintain single-responsibility focus.
**Breaking Changes**:
- 14 tools removed (project management, entity tracking, work item features extracted to workflow-mcp)
- 3 tools remaining: `start_indexing_background`, `get_indexing_status`, and `search_code` with multi-project support
- Foreground `index_repository` removed (all indexing now uses background jobs to prevent timeouts)
- Database schema simplified (9 tables dropped, `project_id` parameter added)
- New environment variables for optional workflow-mcp integration
**Migration Required**: Existing v1.x users must follow the migration guide to upgrade safely. See [Migration Guide](docs/migration/v1-to-v2-migration.md) for complete upgrade and rollback procedures.
**What's Preserved**: All indexed repositories and code embeddings remain searchable after migration.
**What's Discarded**: All v1.x project management data, entities, and work items are permanently removed.
---
## Features
The Codebase MCP Server provides exactly 3 MCP tools for semantic code search with multi-project workspace support:
1. **`start_indexing_background`**: Start a background indexing job for a repository
- Returns job_id immediately to prevent MCP client timeouts
- Accepts optional `project_id` parameter for workspace isolation
- Default behavior: indexes to default project workspace if `project_id` not specified
- Performance target: 60-second indexing for 10,000 files
2. **`get_indexing_status`**: Poll the status of a background indexing job
- Query job progress using job_id from start_indexing_background
- Returns files_indexed, chunks_created, and completion status
- Enables responsive UIs with progress indicators
3. **`search_code`**: Semantic code search with natural language queries
- Accepts optional `project_id` parameter to restrict search scope
- Default behavior: searches default project workspace if `project_id` not specified
- Performance target: 500ms p95 search latency
### Multi-Project Support
The v2.0 architecture supports isolated project workspaces through the optional `project_id` parameter:
**Single Project Workflow** (default):
```python
# Start background indexing job - uses default workspace
job = await start_indexing_background(repo_path="/path/to/repo")
job_id = job["job_id"]
# Poll for completion
while True:
status = await get_indexing_status(job_id=job_id)
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Search without project_id - searches default workspace
search_code(query="authentication logic")
```
**Multi-Project Workflow**:
```python
# Index to specific project workspace
job = await start_indexing_background(
repo_path="/path/to/client-a-repo",
project_id="client-a"
)
job_id = job["job_id"]
# Poll for completion
while True:
status = await get_indexing_status(job_id=job_id, project_id="client-a")
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Search specific project workspace
search_code(query="authentication logic", project_id="client-a")
```
**Use Cases**:
- **Single Project**: Individual developers or small teams working on one codebase
- **Multi-Project**: Consultants managing multiple client codebases, organizations with separate product lines, or multi-tenant deployments requiring workspace isolation
**Optional Integration**: The `project_id` can be automatically resolved from Git repository context when the optional [workflow-mcp](https://github.com/workflow-mcp) server is configured. Without workflow-mcp, all operations default to a single shared workspace.
## Quick Start
### 1. Database Setup
```bash
# Create database
createdb codebase_mcp
# Initialize schema
psql -d codebase_mcp -f db/init_tables.sql
```
### 2. Install Dependencies
```bash
# Install dependencies including FastMCP framework
uv sync
# Or with pip
pip install -r requirements.txt
```
**Key Dependencies:**
- `fastmcp>=0.1.0` - Modern MCP framework with decorator-based tools
- `anthropic-mcp` - MCP protocol implementation
- `sqlalchemy>=2.0` - Async ORM
- `pgvector` - PostgreSQL vector extension
- `ollama` - Embedding generation
### 3. Configure Claude Desktop
Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"codebase-mcp": {
"command": "uv",
"args": [
"run",
"--with",
"fastmcp",
"python",
"/absolute/path/to/codebase-mcp/server_fastmcp.py"
]
}
}
}
```
**Important:**
- Use absolute paths!
- Server uses FastMCP framework with decorator-based tool definitions
- All logs go to `/tmp/codebase-mcp.log` (no stdout/stderr pollution)
### 4. Start Ollama
```bash
ollama serve
ollama pull nomic-embed-text
```
### 5. Test
```bash
# Test database and tools
uv run python tests/test_tool_handlers.py
# Test repository indexing
uv run python tests/test_embeddings.py
```
## Current Status
### Working Tools (3/3) ✅
| Tool | Status | Description |
|------|--------|-------------|
| `start_indexing_background` | ✅ Working | Start background indexing job, returns job_id immediately |
| `get_indexing_status` | ✅ Working | Poll indexing job status with files_indexed/chunks_created |
| `search_code` | ✅ Working | Semantic code search with pgvector similarity |
### Recent Fixes (Oct 6, 2025)
- ✅ Parameter passing architecture (Pydantic models)
- ✅ MCP schema mismatches (status enums, missing parameters)
- ✅ Timezone/datetime compatibility (PostgreSQL)
- ✅ Binary file filtering (images, cache dirs)
### Test Results
```
✅ Task Management: 7/7 tests passed
✅ Repository Indexing: 2 files indexed, 6 chunks created
✅ Embeddings: 100% coverage (768-dim vectors)
✅ Database: Connection pool, async operations working
```
## Tool Usage Examples
### Index a Repository (Background Job)
In Claude Desktop:
```
Index the repository at /Users/username/projects/myapp
```
Initial Response (immediate):
```json
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"message": "Indexing job started",
"project_id": "default",
"database_name": "cb_proj_default_00000000"
}
```
Poll for Status:
```
Check the status of indexing job 550e8400-e29b-41d4-a716-446655440000
```
Completed Response:
```json
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"repo_path": "/Users/username/projects/myapp",
"files_indexed": 234,
"chunks_created": 1456,
"error_message": null,
"created_at": "2025-10-18T10:30:00Z",
"started_at": "2025-10-18T10:30:01Z",
"completed_at": "2025-10-18T10:30:15Z"
}
```
### Search Code
```
Search for "authentication middleware" in Python files
```
Response:
```json
{
"results": [
{
"file_path": "src/middleware/auth.py",
"content": "def authenticate_request(request):\n ...",
"start_line": 45,
"similarity_score": 0.92
}
],
"total_count": 5,
"latency_ms": 250
}
```
## Architecture
```
Claude Desktop ↔ FastMCP Server ↔ Tool Handlers ↔ Services ↔ PostgreSQL
↓
Ollama (embeddings)
```
**MCP Framework**: Built with [FastMCP](https://github.com/jlowin/fastmcp) - a modern, decorator-based framework for building MCP servers with:
- Type-safe tool definitions via `@mcp.tool()` decorators
- Automatic JSON Schema generation from Pydantic models
- Dual logging (file + MCP protocol) without stdout pollution
- Async/await support throughout
See [Multi-Project Architecture](docs/architecture/multi-project-design.md) for detailed component diagrams.
## Documentation
- **[Multi-Project Architecture](docs/architecture/multi-project-design.md)** - System architecture and data flow
- **[Auto-Switch Architecture](docs/architecture/AUTO_SWITCH.md)** - Config-based project switching internals
- **[Configuration Guide](docs/configuration/production-config.md)** - Production deployment and tuning
- **[API Reference](docs/api/tool-reference.md)** - Complete MCP tool documentation
- **[CLAUDE.md](CLAUDE.md)** - Specify workflow for AI-assisted development
## Database Schema
11 tables with pgvector for semantic search:
**Core Tables:**
- `repositories` - Indexed repositories
- `code_files` - Source files with metadata
- `code_chunks` - Semantic chunks with embeddings (vector(768))
- `tasks` - Development tasks with git tracking
- `task_status_history` - Audit trail
See [Multi-Project Architecture](docs/architecture/multi-project-design.md) for complete schema documentation.
## Technology Stack
- **MCP Framework:** FastMCP 0.1+ (decorator-based tool definitions)
- **Server:** Python 3.13+, FastAPI patterns, async/await
- **Database:** PostgreSQL 14+ with pgvector extension
- **Embeddings:** Ollama (nomic-embed-text, 768 dimensions)
- **ORM:** SQLAlchemy 2.0 (async), Pydantic V2 for validation
- **Type Safety:** Full mypy --strict compliance
## Development
### Running Tests
```bash
# Tool handlers
uv run python tests/test_tool_handlers.py
# Repository indexing
uv run python tests/test_embeddings.py
# Unit tests
uv run pytest tests/ -v
```
### Code Structure
```
codebase-mcp/
├── server_fastmcp.py # FastMCP server entry point (NEW)
├── src/
│ ├── mcp/
│ │ └── tools/ # Tool handlers with service integration
│ │ ├── tasks.py # Task management
│ │ ├── indexing.py # Repository indexing
│ │ └── search.py # Semantic search
│ ├── services/ # Business logic layer
│ │ ├── tasks.py # Task CRUD + git tracking
│ │ ├── indexer.py # Indexing orchestration
│ │ ├── scanner.py # File discovery
│ │ ├── chunker.py # AST-based chunking
│ │ ├── embedder.py # Ollama integration
│ │ └── searcher.py # pgvector similarity search
│ └── models/ # Database models + Pydantic schemas
│ ├── task.py # Task, TaskCreate, TaskUpdate
│ ├── code_chunk.py # CodeChunk
│ └── ...
└── tests/
├── test_tool_handlers.py # Integration tests
└── test_embeddings.py # Embedding validation
```
**FastMCP Server Architecture:**
- `server_fastmcp.py` - Main entry point using `@mcp.tool()` decorators
- Tool handlers in `src/mcp/tools/` provide service integration
- Services in `src/services/` contain all business logic
- Dual logging: file (`/tmp/codebase-mcp.log`) + MCP protocol
## Installation
### Prerequisites
Before installing Codebase MCP Server v2.0, ensure the following requirements are met:
**Required Software:**
- **PostgreSQL 14+** - Database with pgvector extension for vector similarity search
- **Python 3.11+** - Runtime environment (Python 3.13 compatible)
- **Ollama** - Local embedding model server with nomic-embed-text model
**System Requirements:**
- 4GB+ RAM recommended for typical workloads
- SSD storage for optimal performance (database and embedding operations are I/O intensive)
- Network access to Ollama server (default: localhost:11434)
### Installation Commands
Install Codebase MCP Server v2.0 using pip:
```bash
# Install latest v2.0 release
pip install codebase-mcp
```
**Alternative Installation Methods:**
```bash
# Install specific v2.0 version
pip install codebase-mcp==2.0.0
# Install from source (for development)
git clone https://github.com/cliffclarke/codebase-mcp.git
cd codebase-mcp
pip install -e .
```
**Key Dependencies Installed Automatically:**
- `fastmcp>=0.1.0` - Modern MCP framework
- `sqlalchemy>=2.0` - Async database ORM
- `pgvector` - PostgreSQL vector extension Python bindings
- `ollama` - Embedding generation client
- `pydantic>=2.0` - Data validation and settings
### Verification Steps
After installation, verify the setup is correct:
```bash
# Verify codebase-mcp is installed
codebase-mcp --version
# Expected output: codebase-mcp 2.0.0
# Check PostgreSQL is accessible
psql --version
# Expected output: psql (PostgreSQL) 14.x or higher
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Expected output: JSON response with available models
# Confirm embedding model is available
ollama list | grep nomic-embed-text
# Expected output: nomic-embed-text model listed
```
**Setup Complete**: If all verification steps pass, Codebase MCP Server v2.0 is ready for use. Proceed to the Quick Start section for first-time indexing and search operations.
## Multi-Project Configuration
The Codebase MCP server supports automatic project switching based on your working directory using `.codebase-mcp/config.json` files.
### Quick Start
1. **Create a config file** in your project root:
```bash
mkdir -p .codebase-mcp
cat > .codebase-mcp/config.json <<EOF
{
"version": "1.0",
"project": {
"name": "my-project",
"id": "optional-uuid-here"
},
"auto_switch": true
}
EOF
```
2. **Set your working directory** (via MCP client):
```javascript
await mcpClient.callTool("set_working_directory", {
directory: "/absolute/path/to/your/project"
});
```
3. **Use tools normally** - they'll automatically use your project:
```javascript
// Automatically uses "my-project" workspace
const result = await mcpClient.callTool("start_indexing_background", {
repo_path: "/path/to/repo"
});
const jobId = result.job_id;
// Poll for completion
while (true) {
const status = await mcpClient.callTool("get_indexing_status", {
job_id: jobId
});
if (status.status === "completed" || status.status === "failed") {
break;
}
await sleep(2000);
}
```
### Config File Format
```json
{
"version": "1.0",
"project": {
"name": "my-project-name",
"id": "optional-project-uuid",
"database_name": "optional-database-override"
},
"auto_switch": true,
"strict_mode": false,
"dry_run": false,
"description": "Optional project description"
}
```
**Fields:**
- `version` (required): Config version (currently "1.0")
- `project.name` (required): Project identifier (used if no ID provided)
- `project.id` (optional): Explicit project UUID (takes priority over name)
- `project.database_name` (optional): Override computed database name (see Database Name Resolution below)
- `auto_switch` (optional, default true): Enable automatic project switching
- `strict_mode` (optional, default false): Reject operations if project mismatch
- `dry_run` (optional, default false): Log intended switches without executing
**Database Name Resolution:**
The server determines which database to use in this order:
1. **Explicit `database_name` in config** - Uses exact database name specified
```json
{"project": {"database_name": "cb_proj_my_project_550e8400"}}
```
2. **Computed from `name` + `id`** - Automatically generates database name
```
Format: cb_proj_{sanitized_name}_{id_prefix}
Example: cb_proj_my_project_550e8400
```
**Use Cases for `database_name` Override:**
- Recovering from database name mismatches
- Migrating from old database naming schemes
- Explicit control over database selection
- Debugging and troubleshooting
**Example - Auto-generated (default):**
```json
{
"version": "1.0",
"project": {
"name": "my-project",
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
Database used: `cb_proj_my_project_550e8400` (auto-computed)
**Example - Explicit override:**
```json
{
"version": "1.0",
"project": {
"name": "my-project",
"id": "550e8400-e29b-41d4-a716-446655440000",
"database_name": "cb_proj_legacy_database_12345678"
}
}
```
Database used: `cb_proj_legacy_database_12345678` (explicit override)
### Project Resolution Priority
When you call MCP tools, the server resolves the project workspace using this 4-tier priority system:
1. **Explicit `project_id` parameter** (highest priority)
```javascript
await mcpClient.callTool("start_indexing_background", {
repo_path: "/path/to/repo",
project_id: "explicit-project-id" // Always takes priority
});
```
2. **Session-based config file** (via `set_working_directory`)
- Server searches up to 20 directory levels for `.codebase-mcp/config.json`
- Cached with mtime-based invalidation for performance
- Isolated per MCP session (multiple clients stay independent)
3. **workflow-mcp integration** (external project tracking)
- Queries workflow-mcp server for active project context
- Configurable timeout and caching
4. **Default workspace** (fallback)
- Uses `project_default` schema when no other resolution succeeds
### Multi-Session Isolation
The server maintains separate working directories for each MCP session (client connection):
```javascript
// Session 1 (Claude Code instance A)
await mcpClient1.callTool("set_working_directory", {
directory: "/Users/alice/project-a"
});
// Session 2 (Claude Code instance B)
await mcpClient2.callTool("set_working_directory", {
directory: "/Users/bob/project-b"
});
// Each session independently resolves its own project
// No cross-contamination between sessions
```
### Config File Discovery
The server searches for `.codebase-mcp/config.json` by:
1. Starting from your working directory
2. Searching up to 20 parent directories
3. Stopping at the first config file found
4. Caching the result (with automatic invalidation on file modification)
**Example directory structure:**
```
/Users/alice/projects/my-app/ <- .codebase-mcp/config.json here
├── .codebase-mcp/
│ └── config.json
├── src/
│ └── components/ <- Working directory
│ └── Button.tsx
```
If you set working directory to `/Users/alice/projects/my-app/src/components/`, the server will find the config at `/Users/alice/projects/my-app/.codebase-mcp/config.json`.
### Performance
- **Config discovery**: <50ms (with upward traversal)
- **Cache hit**: <5ms
- **Session lookup**: <1ms
- **Background cleanup**: Hourly (removes sessions inactive >24h)
## Database Setup
### 1. Create Database
```bash
# Connect to PostgreSQL
psql -U postgres
# Create database
CREATE DATABASE codebase_mcp;
# Enable pgvector extension
\c codebase_mcp
CREATE EXTENSION IF NOT EXISTS vector;
\q
```
### 2. Initialize Schema
```bash
# Run database initialization script
python scripts/init_db.py
# Verify schema creation
alembic current
```
The initialization script will:
- Create all required tables (repositories, files, chunks, tasks)
- Set up vector indexes for similarity search
- Configure connection pooling
- Apply all database migrations
### 3. Verify Setup
```bash
# Check database connectivity
python -c "from src.database import Database; import asyncio; asyncio.run(Database.create_pool())"
# Run migration status check
alembic current
```
### 4. Database Reset & Cleanup
During development, you may need to reset your database using the following reset options:
- **scripts/clear_data.sh** - Clear all data, keep schema (fastest, no restart needed)
- **scripts/reset_database.sh** - Drop and recreate all tables (recommended for schema changes)
- **scripts/nuclear_reset.sh** - Drop entire database (requires Claude Desktop restart)
```bash
# Quick data wipe (keeps schema)
./scripts/clear_data.sh
# Full table reset (recommended)
./scripts/reset_database.sh
# Nuclear option (drops database)
./scripts/nuclear_reset.sh
```
## Running the Server
### FastMCP Server (Recommended)
The primary way to run the server is via Claude Desktop or other MCP clients:
```bash
# Via Claude Desktop (configured in claude_desktop_config.json)
# Server starts automatically when Claude Desktop launches
# Manual testing with FastMCP CLI
uv run --with fastmcp python server_fastmcp.py
# With custom log level
LOG_LEVEL=DEBUG uv run --with fastmcp python server_fastmcp.py
```
**Server Entry Point**: `server_fastmcp.py` in repository root
**Logging**: All output goes to `/tmp/codebase-mcp.log` (configurable via `LOG_FILE` env var)
### Development Mode (Legacy FastAPI)
```bash
# Start with auto-reload (if FastAPI server exists)
uvicorn src.main:app --reload --host 127.0.0.1 --port 3000
# With custom log level
LOG_LEVEL=DEBUG uvicorn src.main:app --reload
```
### Production Mode (Legacy)
```bash
# Start production server
uvicorn src.main:app --host 0.0.0.0 --port 3000 --workers 4
# With gunicorn (recommended for production)
gunicorn src.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:3000
```
### stdio Transport (Legacy CLI Mode)
The legacy MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout.
```bash
# Start stdio server (reads JSON-RPC from stdin)
python -m src.mcp.stdio_server
# Echo a single request
echo '{"jsonrpc":"2.0","id":1,"method":"list_tasks","params":{"limit":5}}' | python -m src.mcp.stdio_server
# Pipe requests from a file (one JSON-RPC request per line)
cat requests.jsonl | python -m src.mcp.stdio_server
# Interactive mode (type JSON-RPC requests manually)
python -m src.mcp.stdio_server
{"jsonrpc":"2.0","id":1,"method":"get_task","params":{"task_id":"..."}}
```
**JSON-RPC 2.0 Request Format:**
```json
{
"jsonrpc": "2.0",
"id": 1,
"method": "search_code",
"params": {
"query": "async def",
"limit": 10
}
}
```
**JSON-RPC 2.0 Response Format:**
```json
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"results": [...],
"total_count": 42,
"latency_ms": 250
}
}
```
**Available Methods:**
- `search_code` - Semantic code search
- `start_indexing_background` - Start background indexing job
- `get_indexing_status` - Poll indexing job status
**Logging:**
All logs go to `/tmp/codebase-mcp.log` (configurable via `LOG_FILE` env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout.
### Health Check
```bash
# Check server health
curl http://localhost:3000/health
# Expected response:
{
"status": "healthy",
"database": "connected",
"ollama": "connected",
"version": "0.1.0"
}
```
## Usage Examples
### 1. Index a Repository (Background Job)
```python
# Start indexing job via MCP protocol
{
"tool": "start_indexing_background",
"arguments": {
"repo_path": "/path/to/your/repo"
}
}
# Immediate response
{
"job_id": "uuid-here",
"status": "pending",
"message": "Indexing job started",
"project_id": "default",
"database_name": "cb_proj_default_00000000"
}
# Poll for status
{
"tool": "get_indexing_status",
"arguments": {
"job_id": "uuid-here"
}
}
# Completed response
{
"job_id": "uuid-here",
"status": "completed",
"repo_path": "/path/to/your/repo",
"files_indexed": 150,
"chunks_created": 1200,
"error_message": null,
"created_at": "2025-10-18T10:30:00Z",
"started_at": "2025-10-18T10:30:01Z",
"completed_at": "2025-10-18T10:30:45Z"
}
```
### 2. Search Code
```python
# Search for authentication logic
{
"tool": "search_code",
"arguments": {
"query": "user authentication password validation",
"limit": 10,
"file_type": "py"
}
}
# Response includes ranked code chunks with context
{
"results": [...],
"total_count": 25,
"latency_ms": 230
}
```
## Architecture
```
┌─────────────────────────────────────────────────┐
│ MCP Client (AI) │
└─────────────────┬───────────────────────────────┘
│ SSE Protocol
┌─────────────────▼───────────────────────────────┐
│ MCP Server Layer │
│ ┌──────────────────────────────────────────┐ │
│ │ Tool Registration & Routing │ │
│ └──────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ Request/Response Handling │ │
│ └──────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
│
┌─────────────────▼───────────────────────────────┐
│ Service Layer │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Indexer │ │ Searcher │ │Task Manager│ │
│ └──────┬─────┘ └──────┬─────┘ └──────┬─────┘ │
│ │ │ │ │
│ ┌──────▼──────────────▼──────────────▼──────┐ │
│ │ Repository Service │ │
│ └──────┬─────────────────────────────────────┘ │
│ │ │
│ ┌──────▼─────────────────────────────────────┐ │
│ │ Embedding Service (Ollama) │ │
│ └─────────────────────────────────────────────┘│
└─────────────────┬───────────────────────────────┘
│
┌─────────────────▼───────────────────────────────┐
│ Data Layer │
│ ┌──────────────────────────────────────────┐ │
│ │ PostgreSQL with pgvector │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │Repository│ │ Files │ │ Chunks │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ ┌──────────┐ ┌──────────────────────┐ │ │
│ │ │ Tasks │ │ Vector Embeddings │ │ │
│ │ └──────────┘ └──────────────────────┘ │ │
│ └──────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
```
### Component Overview
- **MCP Layer**: Handles protocol compliance, tool registration, SSE transport
- **Service Layer**: Business logic for indexing, searching, task management
- **Repository Service**: File system operations, git integration, .gitignore handling
- **Embedding Service**: Ollama integration for generating text embeddings
- **Data Layer**: PostgreSQL with pgvector for storage and similarity search
### Data Flow
1. **Indexing**: Repository → Parse → Chunk → Embed → Store
2. **Searching**: Query → Embed → Vector Search → Rank → Return
3. **Task Tracking**: Create → Update → Git Integration → Query
## Testing
### Run All Tests
```bash
# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=term-missing
# Run specific test categories
pytest tests/unit/ -v # Unit tests only
pytest tests/integration/ -v # Integration tests
pytest tests/contract/ -v # Contract tests
```
### Test Categories
- **Unit Tests**: Fast, isolated component tests
- **Integration Tests**: Database and service integration
- **Contract Tests**: MCP protocol compliance validation
- **Performance Tests**: Latency and throughput benchmarks
### Coverage Requirements
- Minimum coverage: 95%
- Critical paths: 100%
- View HTML report: `open htmlcov/index.html`
## Performance Tuning
### Database Optimization
```sql
-- Optimize vector searches
CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Adjust work_mem for large result sets
ALTER SYSTEM SET work_mem = '256MB';
SELECT pg_reload_conf();
```
### Connection Pool Settings
```python
# In .env
DATABASE_POOL_SIZE=20 # Connection pool size
DATABASE_MAX_OVERFLOW=10 # Max overflow connections
DATABASE_POOL_TIMEOUT=30 # Connection timeout in seconds
```
### Embedding Batch Size
```python
# Adjust based on available memory
EMBEDDING_BATCH_SIZE=100 # For systems with 8GB+ RAM
EMBEDDING_BATCH_SIZE=50 # Default for 4GB RAM
EMBEDDING_BATCH_SIZE=25 # For constrained environments
```
## Troubleshooting
### Common Issues
1. **Database Connection Failed**
- Check PostgreSQL is running: `pg_ctl status`
- Verify DATABASE_URL in .env
- Ensure database exists: `psql -U postgres -l`
2. **Ollama Connection Error**
- Check Ollama is running: `curl http://localhost:11434/api/tags`
- Verify model is installed: `ollama list`
- Check OLLAMA_BASE_URL in .env
3. **Slow Performance**
- Check database indexes: `\di` in psql
- Monitor query performance: See logs at LOG_FILE path
- Adjust batch sizes and connection pool
For detailed troubleshooting, see the Configuration Guide troubleshooting section.
## Contributing
We follow a specification-driven development workflow using the Specify framework.
### Development Workflow
1. **Feature Specification**: Use `/specify` command to create feature specs
2. **Planning**: Generate implementation plan with `/plan`
3. **Task Breakdown**: Create tasks with `/tasks`
4. **Implementation**: Execute tasks with `/implement`
### Git Workflow
```bash
# Create feature branch
git checkout -b 001-feature-name
# Make atomic commits
git add .
git commit -m "feat(component): add specific feature"
# Push and create PR
git push origin 001-feature-name
```
### Code Quality Standards
- **Type Safety**: `mypy --strict` must pass
- **Linting**: `ruff check` with no errors
- **Testing**: All tests must pass with 95%+ coverage
- **Documentation**: Update relevant docs with changes
### Constitutional Principles
1. **Simplicity Over Features**: Focus on core semantic search
2. **Local-First Architecture**: No cloud dependencies
3. **Protocol Compliance**: Strict MCP adherence
4. **Performance Guarantees**: Meet stated benchmarks
5. **Production Quality**: Comprehensive error handling
See [.specify/memory/constitution.md](.specify/memory/constitution.md) for full principles.
## FastMCP Migration (Oct 2025)
**Migration Complete**: The server has been successfully migrated from the legacy MCP SDK to the modern FastMCP framework.
### What Changed
**Before (MCP SDK):**
```python
# Old: Manual tool registration with JSON schemas
class MCPServer:
def __init__(self):
self.tools = {
"search_code": {
"name": "search_code",
"description": "...",
"inputSchema": {...}
}
}
```
**After (FastMCP):**
```python
# New: Decorator-based tool definitions
@mcp.tool()
async def search_code(query: str, limit: int = 10) -> dict[str, Any]:
"""Semantic code search with natural language queries."""
# Implementation
```
### Key Benefits
1. **Simpler Tool Definitions**: Decorators replace manual JSON schema creation
2. **Type Safety**: Automatic schema generation from Pydantic models
3. **Dual Logging**: File logging + MCP protocol without stdout pollution
4. **Better Error Handling**: Structured error responses with context
5. **Cleaner Architecture**: Separation of tool interface from business logic
### Server Files
- **New Entry Point**: `server_fastmcp.py` (root directory)
- **Legacy Server**: `src/mcp/mcp_stdio_server_v3.py` (deprecated, will be removed)
- **Tool Handlers**: `src/mcp/tools/*.py` (unchanged, reused by FastMCP)
- **Services**: `src/services/*.py` (unchanged, business logic intact)
### Configuration Update Required
**Update your Claude Desktop config** to use the new server:
```json
{
"mcpServers": {
"codebase-mcp": {
"command": "uv",
"args": ["run", "--with", "fastmcp", "python", "/path/to/server_fastmcp.py"]
}
}
}
```
### Migration Notes
- All 6 MCP tools remain functional (100% backward compatible)
- No database schema changes required
- Tool signatures and responses unchanged
- Logging now goes exclusively to `/tmp/codebase-mcp.log`
- All tests pass with FastMCP implementation
### Performance
FastMCP maintains performance targets:
- Repository indexing: <60 seconds for 10K files
- Code search: <500ms p95 latency
- Async/await throughout for optimal concurrency
## License
MIT License (LICENSE file pending).
## Support
- **Issues**: [GitHub Issues](https://github.com/cliffclarke/codebase-mcp/issues)
- **Documentation**: [Full documentation](docs/)
- **Logs**: Check `/tmp/codebase-mcp.log` for detailed debugging
## Quick Start
### Basic Usage (Default Project)
For most users, the default project workspace is sufficient. All indexing now uses background jobs to prevent MCP client timeouts:
```python
# Start background indexing job (returns immediately)
job = await start_indexing_background(repo_path="/path/to/your/repo")
job_id = job["job_id"]
# Poll for completion
while True:
status = await get_indexing_status(job_id=job_id)
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Check result
if status["status"] == "completed":
print(f"✅ Indexed {status['files_indexed']} files, {status['chunks_created']} chunks")
else:
print(f"❌ Indexing failed: {status['error_message']}")
# Search code
results = await search_code(query="function to handle authentication")
# Search with filters
results = await search_code(
query="database query",
file_type="py",
limit=20
)
```
The server automatically uses a default project workspace (`project_default`) if no project ID is specified.
### Multi-Project Usage
For users managing multiple codebases or client projects, use the `project_id` parameter to isolate repositories:
```python
# Index repositories with project_id
job_a = await start_indexing_background(
repo_path="/path/to/client-a-repo",
project_id="client-a"
)
job_b = await start_indexing_background(
repo_path="/path/to/client-b-repo",
project_id="client-b"
)
# Poll both jobs
for job in [job_a, job_b]:
while True:
status = await get_indexing_status(job_id=job["job_id"])
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Search within specific project
results_a = await search_code(
query="authentication logic",
project_id="client-a"
)
results_b = await search_code(
query="payment processing",
project_id="client-b"
)
```
Each project has its own isolated database schema, ensuring repositories and embeddings are completely separated.
## workflow-mcp Integration (Optional)
The Codebase MCP Server can **optionally** integrate with [workflow-mcp](https://github.com/cliffclarke/workflow-mcp) for automatic project context resolution. This is an advanced feature and not required for basic usage.
### Standalone Usage (Default)
By default, Codebase MCP operates independently:
```python
# Works out of the box without workflow-mcp
job = await start_indexing_background(repo_path="/path/to/repo")
results = await search_code(query="search query")
```
### Integration with workflow-mcp
If you're using workflow-mcp to manage development projects, Codebase MCP can automatically resolve project context:
```bash
# Set workflow-mcp URL in environment
export WORKFLOW_MCP_URL=http://localhost:8001
```
```python
# Now project_id is automatically resolved from workflow-mcp's active project
job = await start_indexing_background(repo_path="/path/to/repo") # Uses active project
results = await search_code(query="search query") # Searches in active project's context
```
**How It Works:**
1. Codebase MCP queries workflow-mcp for the active project
2. If an active project exists, it's used as the `project_id`
3. If no active project or workflow-mcp is unavailable, falls back to default project
4. You can still override with `--project-id` flag
**Configuration:**
```bash
# In .env file
WORKFLOW_MCP_URL=http://localhost:8001 # Optional, enables integration
```
**See Also:** [workflow-mcp repository](https://github.com/cliffclarke/workflow-mcp) for details on project workspace management.
## Documentation
Comprehensive documentation is available for different use cases:
- **[Migration Guide](docs/migration/v1-to-v2-migration.md)** - Upgrading from v1.x to v2.x with multi-project support
- **[Configuration Guide](docs/configuration/production-config.md)** - Production deployment and tuning
- **[Architecture Documentation](docs/architecture/multi-project-design.md)** - System design and multi-project isolation
- **[API Reference](docs/api/tool-reference.md)** - Complete MCP tool documentation
- **[Glossary](docs/glossary.md)** - Canonical terminology definitions
For quick setup, refer to the Installation section above.
## Contributing
We welcome contributions to the Codebase MCP Server. This project follows a specification-driven development workflow.
### Getting Started
1. **Read the Architecture**: Start with [docs/architecture/multi-project-design.md](docs/architecture/multi-project-design.md) to understand the system design
2. **Review the Constitution**: See [.specify/memory/constitution.md](.specify/memory/constitution.md) for project principles
3. **Follow the Workflow**: Use the Specify workflow documented in [CLAUDE.md](CLAUDE.md)
### Development Process
1. **Create a feature specification** using `/specify` command
2. **Plan the implementation** with `/plan`
3. **Generate tasks** using `/tasks`
4. **Implement incrementally** with atomic commits
### Code Standards
- **Type Safety**: Full mypy --strict compliance
- **Testing**: 95%+ test coverage, contract tests for MCP protocol
- **Performance**: Meet benchmarks (60s indexing, 500ms search p95)
- **Documentation**: Update docs with all changes
### Code of Conduct
This project adheres to a code of conduct that promotes a welcoming, inclusive environment. We expect:
- Respectful communication in issues and PRs
- Constructive feedback focused on code and ideas
- Recognition that contributors volunteer their time
- Patience with maintainers and fellow contributors
By participating, you agree to uphold these standards.
## Acknowledgments
- MCP framework powered by [FastMCP](https://github.com/jlowin/fastmcp)
- Built with FastAPI, SQLAlchemy, and Pydantic
- Vector search powered by [pgvector](https://github.com/pgvector/pgvector)
- Embeddings via [Ollama](https://ollama.com/) and nomic-embed-text
- Code parsing with tree-sitter
- MCP protocol by [Anthropic](https://modelcontextprotocol.io/)