Which integrations are available for this server?

Provides [git](/mcp/servers/integrations/git) integration for tracking development tasks with branch and commit linking capabilities Integrates with [Ollama](/mcp/servers/integrations/ollama) for generating text embeddings used in semantic code search across indexed repositories Uses [PostgreSQL](/mcp/servers/integrations/postgresql) with pgvector extension for storing and performing semantic search on code repositories and development tasks

Codebase MCP Server

A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants.

What's New in v2.0

Version 2.0 represents a major architectural refactoring focused exclusively on semantic code search capabilities. This release removes project management, entity tracking, and work item features to maintain single-responsibility focus.

Breaking Changes:

14 tools removed (project management, entity tracking, work item features extracted to workflow-mcp)
3 tools remaining: start_indexing_background, get_indexing_status, and search_code with multi-project support
Foreground index_repository removed (all indexing now uses background jobs to prevent timeouts)
Database schema simplified (9 tables dropped, project_id parameter added)
New environment variables for optional workflow-mcp integration

Migration Required: Existing v1.x users must follow the migration guide to upgrade safely. See Migration Guide for complete upgrade and rollback procedures.

What's Preserved: All indexed repositories and code embeddings remain searchable after migration.

What's Discarded: All v1.x project management data, entities, and work items are permanently removed.

Features

The Codebase MCP Server provides exactly 3 MCP tools for semantic code search with multi-project workspace support:

start_indexing_background: Start a background indexing job for a repository
- Returns job_id immediately to prevent MCP client timeouts
- Accepts optional project_id parameter for workspace isolation
- Default behavior: indexes to default project workspace if project_id not specified
- Performance target: 60-second indexing for 10,000 files
get_indexing_status: Poll the status of a background indexing job
- Query job progress using job_id from start_indexing_background
- Returns files_indexed, chunks_created, and completion status
- Enables responsive UIs with progress indicators
search_code: Semantic code search with natural language queries
- Accepts optional project_id parameter to restrict search scope
- Default behavior: searches default project workspace if project_id not specified
- Performance target: 500ms p95 search latency

Multi-Project Support

The v2.0 architecture supports isolated project workspaces through the optional project_id parameter:

Single Project Workflow (default):

# Start background indexing job - uses default workspace job = await start_indexing_background(repo_path="/path/to/repo") job_id = job["job_id"] # Poll for completion while True: status = await get_indexing_status(job_id=job_id) if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Search without project_id - searches default workspace search_code(query="authentication logic")

Multi-Project Workflow:

# Index to specific project workspace job = await start_indexing_background( repo_path="/path/to/client-a-repo", project_id="client-a" ) job_id = job["job_id"] # Poll for completion while True: status = await get_indexing_status(job_id=job_id, project_id="client-a") if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Search specific project workspace search_code(query="authentication logic", project_id="client-a")

Use Cases:

Single Project: Individual developers or small teams working on one codebase
Multi-Project: Consultants managing multiple client codebases, organizations with separate product lines, or multi-tenant deployments requiring workspace isolation

Optional Integration: The project_id can be automatically resolved from Git repository context when the optional workflow-mcp server is configured. Without workflow-mcp, all operations default to a single shared workspace.

Quick Start

1. Database Setup

# Create database createdb codebase_mcp # Initialize schema psql -d codebase_mcp -f db/init_tables.sql

2. Install Dependencies

# Install dependencies including FastMCP framework uv sync # Or with pip pip install -r requirements.txt

Key Dependencies:

fastmcp>=0.1.0 - Modern MCP framework with decorator-based tools
anthropic-mcp - MCP protocol implementation
sqlalchemy>=2.0 - Async ORM
pgvector - PostgreSQL vector extension
ollama - Embedding generation

3. Configure Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{ "mcpServers": { "codebase-mcp": { "command": "uv", "args": [ "run", "--with", "fastmcp", "python", "/absolute/path/to/codebase-mcp/server_fastmcp.py" ] } } }

Important:

Use absolute paths!
Server uses FastMCP framework with decorator-based tool definitions
All logs go to /tmp/codebase-mcp.log (no stdout/stderr pollution)

4. Start Ollama

ollama serve ollama pull nomic-embed-text

5. Test

# Test database and tools uv run python tests/test_tool_handlers.py # Test repository indexing uv run python tests/test_embeddings.py

Current Status

Working Tools (3/3) ✅

Tool	Status	Description
`start_indexing_background`	✅ Working	Start background indexing job, returns job_id immediately
`get_indexing_status`	✅ Working	Poll indexing job status with files_indexed/chunks_created
`search_code`	✅ Working	Semantic code search with pgvector similarity

Recent Fixes (Oct 6, 2025)

✅ Parameter passing architecture (Pydantic models)
✅ MCP schema mismatches (status enums, missing parameters)
✅ Timezone/datetime compatibility (PostgreSQL)
✅ Binary file filtering (images, cache dirs)

Test Results

✅ Task Management: 7/7 tests passed ✅ Repository Indexing: 2 files indexed, 6 chunks created ✅ Embeddings: 100% coverage (768-dim vectors) ✅ Database: Connection pool, async operations working

Tool Usage Examples

Index a Repository (Background Job)

In Claude Desktop:

Index the repository at /Users/username/projects/myapp

Initial Response (immediate):

{ "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "pending", "message": "Indexing job started", "project_id": "default", "database_name": "cb_proj_default_00000000" }

Poll for Status:

Check the status of indexing job 550e8400-e29b-41d4-a716-446655440000

Completed Response:

{ "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "repo_path": "/Users/username/projects/myapp", "files_indexed": 234, "chunks_created": 1456, "error_message": null, "created_at": "2025-10-18T10:30:00Z", "started_at": "2025-10-18T10:30:01Z", "completed_at": "2025-10-18T10:30:15Z" }

Search Code

Search for "authentication middleware" in Python files

Response:

{ "results": [ { "file_path": "src/middleware/auth.py", "content": "def authenticate_request(request):\n ...", "start_line": 45, "similarity_score": 0.92 } ], "total_count": 5, "latency_ms": 250 }

Architecture

Claude Desktop ↔ FastMCP Server ↔ Tool Handlers ↔ Services ↔ PostgreSQL ↓ Ollama (embeddings)

MCP Framework: Built with FastMCP - a modern, decorator-based framework for building MCP servers with:

Type-safe tool definitions via @mcp.tool() decorators
Automatic JSON Schema generation from Pydantic models
Dual logging (file + MCP protocol) without stdout pollution
Async/await support throughout

See Multi-Project Architecture for detailed component diagrams.

Documentation

Multi-Project Architecture - System architecture and data flow
Auto-Switch Architecture - Config-based project switching internals
Configuration Guide - Production deployment and tuning
API Reference - Complete MCP tool documentation
CLAUDE.md - Specify workflow for AI-assisted development

Database Schema

11 tables with pgvector for semantic search:

Core Tables:

repositories - Indexed repositories
code_files - Source files with metadata
code_chunks - Semantic chunks with embeddings (vector(768))
tasks - Development tasks with git tracking
task_status_history - Audit trail

See Multi-Project Architecture for complete schema documentation.

Technology Stack

MCP Framework: FastMCP 0.1+ (decorator-based tool definitions)
Server: Python 3.13+, FastAPI patterns, async/await
Database: PostgreSQL 14+ with pgvector extension
Embeddings: Ollama (nomic-embed-text, 768 dimensions)
ORM: SQLAlchemy 2.0 (async), Pydantic V2 for validation
Type Safety: Full mypy --strict compliance

Development

Running Tests

# Tool handlers uv run python tests/test_tool_handlers.py # Repository indexing uv run python tests/test_embeddings.py # Unit tests uv run pytest tests/ -v

Code Structure

codebase-mcp/ ├── server_fastmcp.py # FastMCP server entry point (NEW) ├── src/ │ ├── mcp/ │ │ └── tools/ # Tool handlers with service integration │ │ ├── tasks.py # Task management │ │ ├── indexing.py # Repository indexing │ │ └── search.py # Semantic search │ ├── services/ # Business logic layer │ │ ├── tasks.py # Task CRUD + git tracking │ │ ├── indexer.py # Indexing orchestration │ │ ├── scanner.py # File discovery │ │ ├── chunker.py # AST-based chunking │ │ ├── embedder.py # Ollama integration │ │ └── searcher.py # pgvector similarity search │ └── models/ # Database models + Pydantic schemas │ ├── task.py # Task, TaskCreate, TaskUpdate │ ├── code_chunk.py # CodeChunk │ └── ... └── tests/ ├── test_tool_handlers.py # Integration tests └── test_embeddings.py # Embedding validation

FastMCP Server Architecture:

server_fastmcp.py - Main entry point using @mcp.tool() decorators
Tool handlers in src/mcp/tools/ provide service integration
Services in src/services/ contain all business logic
Dual logging: file (/tmp/codebase-mcp.log) + MCP protocol

Installation

Prerequisites

Before installing Codebase MCP Server v2.0, ensure the following requirements are met:

Required Software:

PostgreSQL 14+ - Database with pgvector extension for vector similarity search
Python 3.11+ - Runtime environment (Python 3.13 compatible)
Ollama - Local embedding model server with nomic-embed-text model

System Requirements:

4GB+ RAM recommended for typical workloads
SSD storage for optimal performance (database and embedding operations are I/O intensive)
Network access to Ollama server (default: localhost:11434)

Installation Commands

Install Codebase MCP Server v2.0 using pip:

# Install latest v2.0 release pip install codebase-mcp

Alternative Installation Methods:

# Install specific v2.0 version pip install codebase-mcp==2.0.0 # Install from source (for development) git clone https://github.com/cliffclarke/codebase-mcp.git cd codebase-mcp pip install -e .

Key Dependencies Installed Automatically:

fastmcp>=0.1.0 - Modern MCP framework
sqlalchemy>=2.0 - Async database ORM
pgvector - PostgreSQL vector extension Python bindings
ollama - Embedding generation client
pydantic>=2.0 - Data validation and settings

Verification Steps

After installation, verify the setup is correct:

# Verify codebase-mcp is installed codebase-mcp --version # Expected output: codebase-mcp 2.0.0 # Check PostgreSQL is accessible psql --version # Expected output: psql (PostgreSQL) 14.x or higher # Verify Ollama is running curl http://localhost:11434/api/tags # Expected output: JSON response with available models # Confirm embedding model is available ollama list | grep nomic-embed-text # Expected output: nomic-embed-text model listed

Setup Complete: If all verification steps pass, Codebase MCP Server v2.0 is ready for use. Proceed to the Quick Start section for first-time indexing and search operations.

Multi-Project Configuration

The Codebase MCP server supports automatic project switching based on your working directory using .codebase-mcp/config.json files.

Quick Start

Create a config file in your project root:

mkdir -p .codebase-mcp cat > .codebase-mcp/config.json <<EOF { "version": "1.0", "project": { "name": "my-project", "id": "optional-uuid-here" }, "auto_switch": true } EOF

Set your working directory (via MCP client):

await mcpClient.callTool("set_working_directory", { directory: "/absolute/path/to/your/project" });

Use tools normally - they'll automatically use your project:

// Automatically uses "my-project" workspace const result = await mcpClient.callTool("start_indexing_background", { repo_path: "/path/to/repo" }); const jobId = result.job_id; // Poll for completion while (true) { const status = await mcpClient.callTool("get_indexing_status", { job_id: jobId }); if (status.status === "completed" || status.status === "failed") { break; } await sleep(2000); }

Config File Format

{ "version": "1.0", "project": { "name": "my-project-name", "id": "optional-project-uuid", "database_name": "optional-database-override" }, "auto_switch": true, "strict_mode": false, "dry_run": false, "description": "Optional project description" }

Fields:

version (required): Config version (currently "1.0")
project.name (required): Project identifier (used if no ID provided)
project.id (optional): Explicit project UUID (takes priority over name)
project.database_name (optional): Override computed database name (see Database Name Resolution below)
auto_switch (optional, default true): Enable automatic project switching
strict_mode (optional, default false): Reject operations if project mismatch
dry_run (optional, default false): Log intended switches without executing

Database Name Resolution:

The server determines which database to use in this order:

Explicit - Uses exact database name specified
{"project": {"database_name": "cb_proj_my_project_550e8400"}}
Computed from - Automatically generates database name
Format: cb_proj_{sanitized_name}_{id_prefix} Example: cb_proj_my_project_550e8400

Use Cases for

Recovering from database name mismatches
Migrating from old database naming schemes
Explicit control over database selection
Debugging and troubleshooting

Example - Auto-generated (default):

{ "version": "1.0", "project": { "name": "my-project", "id": "550e8400-e29b-41d4-a716-446655440000" } }

Database used: cb_proj_my_project_550e8400 (auto-computed)

Example - Explicit override:

{ "version": "1.0", "project": { "name": "my-project", "id": "550e8400-e29b-41d4-a716-446655440000", "database_name": "cb_proj_legacy_database_12345678" } }

Database used: cb_proj_legacy_database_12345678 (explicit override)

Project Resolution Priority

When you call MCP tools, the server resolves the project workspace using this 4-tier priority system:

Explicit (highest priority)
await mcpClient.callTool("start_indexing_background", { repo_path: "/path/to/repo", project_id: "explicit-project-id" // Always takes priority });
Session-based config file (via set_working_directory)
- Server searches up to 20 directory levels for .codebase-mcp/config.json
- Cached with mtime-based invalidation for performance
- Isolated per MCP session (multiple clients stay independent)
workflow-mcp integration (external project tracking)
- Queries workflow-mcp server for active project context
- Configurable timeout and caching
Default workspace (fallback)
- Uses project_default schema when no other resolution succeeds

Multi-Session Isolation

The server maintains separate working directories for each MCP session (client connection):

// Session 1 (Claude Code instance A) await mcpClient1.callTool("set_working_directory", { directory: "/Users/alice/project-a" }); // Session 2 (Claude Code instance B) await mcpClient2.callTool("set_working_directory", { directory: "/Users/bob/project-b" }); // Each session independently resolves its own project // No cross-contamination between sessions

Config File Discovery

The server searches for .codebase-mcp/config.json by:

Starting from your working directory
Searching up to 20 parent directories
Stopping at the first config file found
Caching the result (with automatic invalidation on file modification)

Example directory structure:

/Users/alice/projects/my-app/ <- .codebase-mcp/config.json here ├── .codebase-mcp/ │ └── config.json ├── src/ │ └── components/ <- Working directory │ └── Button.tsx

If you set working directory to /Users/alice/projects/my-app/src/components/, the server will find the config at /Users/alice/projects/my-app/.codebase-mcp/config.json.

Performance

Config discovery: <50ms (with upward traversal)
Cache hit: <5ms
Session lookup: <1ms
Background cleanup: Hourly (removes sessions inactive >24h)

Database Setup

1. Create Database

# Connect to PostgreSQL psql -U postgres # Create database CREATE DATABASE codebase_mcp; # Enable pgvector extension \c codebase_mcp CREATE EXTENSION IF NOT EXISTS vector; \q

2. Initialize Schema

# Run database initialization script python scripts/init_db.py # Verify schema creation alembic current

The initialization script will:

Create all required tables (repositories, files, chunks, tasks)
Set up vector indexes for similarity search
Configure connection pooling
Apply all database migrations

3. Verify Setup

# Check database connectivity python -c "from src.database import Database; import asyncio; asyncio.run(Database.create_pool())" # Run migration status check alembic current

4. Database Reset & Cleanup

During development, you may need to reset your database using the following reset options:

scripts/clear_data.sh - Clear all data, keep schema (fastest, no restart needed)
scripts/reset_database.sh - Drop and recreate all tables (recommended for schema changes)
scripts/nuclear_reset.sh - Drop entire database (requires Claude Desktop restart)

# Quick data wipe (keeps schema) ./scripts/clear_data.sh # Full table reset (recommended) ./scripts/reset_database.sh # Nuclear option (drops database) ./scripts/nuclear_reset.sh

Running the Server

FastMCP Server (Recommended)

The primary way to run the server is via Claude Desktop or other MCP clients:

# Via Claude Desktop (configured in claude_desktop_config.json) # Server starts automatically when Claude Desktop launches # Manual testing with FastMCP CLI uv run --with fastmcp python server_fastmcp.py # With custom log level LOG_LEVEL=DEBUG uv run --with fastmcp python server_fastmcp.py

Server Entry Point: server_fastmcp.py in repository root

Logging: All output goes to /tmp/codebase-mcp.log (configurable via LOG_FILE env var)

Development Mode (Legacy FastAPI)

# Start with auto-reload (if FastAPI server exists) uvicorn src.main:app --reload --host 127.0.0.1 --port 3000 # With custom log level LOG_LEVEL=DEBUG uvicorn src.main:app --reload

Production Mode (Legacy)

# Start production server uvicorn src.main:app --host 0.0.0.0 --port 3000 --workers 4 # With gunicorn (recommended for production) gunicorn src.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:3000

stdio Transport (Legacy CLI Mode)

The legacy MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout.

# Start stdio server (reads JSON-RPC from stdin) python -m src.mcp.stdio_server # Echo a single request echo '{"jsonrpc":"2.0","id":1,"method":"list_tasks","params":{"limit":5}}' | python -m src.mcp.stdio_server # Pipe requests from a file (one JSON-RPC request per line) cat requests.jsonl | python -m src.mcp.stdio_server # Interactive mode (type JSON-RPC requests manually) python -m src.mcp.stdio_server {"jsonrpc":"2.0","id":1,"method":"get_task","params":{"task_id":"..."}}

JSON-RPC 2.0 Request Format:

{ "jsonrpc": "2.0", "id": 1, "method": "search_code", "params": { "query": "async def", "limit": 10 } }

JSON-RPC 2.0 Response Format:

{ "jsonrpc": "2.0", "id": 1, "result": { "results": [...], "total_count": 42, "latency_ms": 250 } }

Available Methods:

search_code - Semantic code search
start_indexing_background - Start background indexing job
get_indexing_status - Poll indexing job status

Logging: All logs go to /tmp/codebase-mcp.log (configurable via LOG_FILE env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout.

Health Check

# Check server health curl http://localhost:3000/health # Expected response: { "status": "healthy", "database": "connected", "ollama": "connected", "version": "0.1.0" }

Usage Examples

1. Index a Repository (Background Job)

# Start indexing job via MCP protocol { "tool": "start_indexing_background", "arguments": { "repo_path": "/path/to/your/repo" } } # Immediate response { "job_id": "uuid-here", "status": "pending", "message": "Indexing job started", "project_id": "default", "database_name": "cb_proj_default_00000000" } # Poll for status { "tool": "get_indexing_status", "arguments": { "job_id": "uuid-here" } } # Completed response { "job_id": "uuid-here", "status": "completed", "repo_path": "/path/to/your/repo", "files_indexed": 150, "chunks_created": 1200, "error_message": null, "created_at": "2025-10-18T10:30:00Z", "started_at": "2025-10-18T10:30:01Z", "completed_at": "2025-10-18T10:30:45Z" }

2. Search Code

# Search for authentication logic { "tool": "search_code", "arguments": { "query": "user authentication password validation", "limit": 10, "file_type": "py" } } # Response includes ranked code chunks with context { "results": [...], "total_count": 25, "latency_ms": 230 }

Architecture

┌─────────────────────────────────────────────────┐ │ MCP Client (AI) │ └─────────────────┬───────────────────────────────┘ │ SSE Protocol ┌─────────────────▼───────────────────────────────┐ │ MCP Server Layer │ │ ┌──────────────────────────────────────────┐ │ │ │ Tool Registration & Routing │ │ │ └──────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────┐ │ │ │ Request/Response Handling │ │ │ └──────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ ┌─────────────────▼───────────────────────────────┐ │ Service Layer │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Indexer │ │ Searcher │ │Task Manager│ │ │ └──────┬─────┘ └──────┬─────┘ └──────┬─────┘ │ │ │ │ │ │ │ ┌──────▼──────────────▼──────────────▼──────┐ │ │ │ Repository Service │ │ │ └──────┬─────────────────────────────────────┘ │ │ │ │ │ ┌──────▼─────────────────────────────────────┐ │ │ │ Embedding Service (Ollama) │ │ │ └─────────────────────────────────────────────┘│ └─────────────────┬───────────────────────────────┘ │ ┌─────────────────▼───────────────────────────────┐ │ Data Layer │ │ ┌──────────────────────────────────────────┐ │ │ │ PostgreSQL with pgvector │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │Repository│ │ Files │ │ Chunks │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ ┌──────────┐ ┌──────────────────────┐ │ │ │ │ │ Tasks │ │ Vector Embeddings │ │ │ │ │ └──────────┘ └──────────────────────┘ │ │ │ └──────────────────────────────────────────┘ │ └──────────────────────────────────────────────────┘

Component Overview

MCP Layer: Handles protocol compliance, tool registration, SSE transport
Service Layer: Business logic for indexing, searching, task management
Repository Service: File system operations, git integration, .gitignore handling
Embedding Service: Ollama integration for generating text embeddings
Data Layer: PostgreSQL with pgvector for storage and similarity search

Data Flow

Indexing: Repository → Parse → Chunk → Embed → Store
Searching: Query → Embed → Vector Search → Rank → Return
Task Tracking: Create → Update → Git Integration → Query

Testing

Run All Tests

# Run all tests with coverage pytest tests/ -v --cov=src --cov-report=term-missing # Run specific test categories pytest tests/unit/ -v # Unit tests only pytest tests/integration/ -v # Integration tests pytest tests/contract/ -v # Contract tests

Test Categories

Unit Tests: Fast, isolated component tests
Integration Tests: Database and service integration
Contract Tests: MCP protocol compliance validation
Performance Tests: Latency and throughput benchmarks

Coverage Requirements

Minimum coverage: 95%
Critical paths: 100%
View HTML report: open htmlcov/index.html

Performance Tuning

Database Optimization

-- Optimize vector searches CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Adjust work_mem for large result sets ALTER SYSTEM SET work_mem = '256MB'; SELECT pg_reload_conf();

Connection Pool Settings

# In .env DATABASE_POOL_SIZE=20 # Connection pool size DATABASE_MAX_OVERFLOW=10 # Max overflow connections DATABASE_POOL_TIMEOUT=30 # Connection timeout in seconds

Embedding Batch Size

# Adjust based on available memory EMBEDDING_BATCH_SIZE=100 # For systems with 8GB+ RAM EMBEDDING_BATCH_SIZE=50 # Default for 4GB RAM EMBEDDING_BATCH_SIZE=25 # For constrained environments

Troubleshooting

Common Issues

Database Connection Failed
- Check PostgreSQL is running: pg_ctl status
- Verify DATABASE_URL in .env
- Ensure database exists: psql -U postgres -l
Ollama Connection Error
- Check Ollama is running: curl http://localhost:11434/api/tags
- Verify model is installed: ollama list
- Check OLLAMA_BASE_URL in .env
Slow Performance
- Check database indexes: \di in psql
- Monitor query performance: See logs at LOG_FILE path
- Adjust batch sizes and connection pool

For detailed troubleshooting, see the Configuration Guide troubleshooting section.

Contributing

We follow a specification-driven development workflow using the Specify framework.

Development Workflow

Feature Specification: Use /specify command to create feature specs
Planning: Generate implementation plan with /plan
Task Breakdown: Create tasks with /tasks
Implementation: Execute tasks with /implement

Git Workflow

# Create feature branch git checkout -b 001-feature-name # Make atomic commits git add . git commit -m "feat(component): add specific feature" # Push and create PR git push origin 001-feature-name

Code Quality Standards

Type Safety: mypy --strict must pass
Linting: ruff check with no errors
Testing: All tests must pass with 95%+ coverage
Documentation: Update relevant docs with changes

Constitutional Principles

Simplicity Over Features: Focus on core semantic search
Local-First Architecture: No cloud dependencies
Protocol Compliance: Strict MCP adherence
Performance Guarantees: Meet stated benchmarks
Production Quality: Comprehensive error handling

See .specify/memory/constitution.md for full principles.

FastMCP Migration (Oct 2025)

Migration Complete: The server has been successfully migrated from the legacy MCP SDK to the modern FastMCP framework.

What Changed

Before (MCP SDK):

# Old: Manual tool registration with JSON schemas class MCPServer: def __init__(self): self.tools = { "search_code": { "name": "search_code", "description": "...", "inputSchema": {...} } }

After (FastMCP):

# New: Decorator-based tool definitions @mcp.tool() async def search_code(query: str, limit: int = 10) -> dict[str, Any]: """Semantic code search with natural language queries.""" # Implementation

Key Benefits

Simpler Tool Definitions: Decorators replace manual JSON schema creation
Type Safety: Automatic schema generation from Pydantic models
Dual Logging: File logging + MCP protocol without stdout pollution
Better Error Handling: Structured error responses with context
Cleaner Architecture: Separation of tool interface from business logic

Server Files

New Entry Point: server_fastmcp.py (root directory)
Legacy Server: src/mcp/mcp_stdio_server_v3.py (deprecated, will be removed)
Tool Handlers: src/mcp/tools/*.py (unchanged, reused by FastMCP)
Services: src/services/*.py (unchanged, business logic intact)

Configuration Update Required

Update your Claude Desktop config to use the new server:

{ "mcpServers": { "codebase-mcp": { "command": "uv", "args": ["run", "--with", "fastmcp", "python", "/path/to/server_fastmcp.py"] } } }

Migration Notes

All 6 MCP tools remain functional (100% backward compatible)
No database schema changes required
Tool signatures and responses unchanged
Logging now goes exclusively to /tmp/codebase-mcp.log
All tests pass with FastMCP implementation

Performance

FastMCP maintains performance targets:

Repository indexing: <60 seconds for 10K files
Code search: <500ms p95 latency
Async/await throughout for optimal concurrency

License

MIT License (LICENSE file pending).

Support

Issues: GitHub Issues
Documentation: Full documentation
Logs: Check /tmp/codebase-mcp.log for detailed debugging

Quick Start

Basic Usage (Default Project)

For most users, the default project workspace is sufficient. All indexing now uses background jobs to prevent MCP client timeouts:

# Start background indexing job (returns immediately) job = await start_indexing_background(repo_path="/path/to/your/repo") job_id = job["job_id"] # Poll for completion while True: status = await get_indexing_status(job_id=job_id) if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Check result if status["status"] == "completed": print(f"✅ Indexed {status['files_indexed']} files, {status['chunks_created']} chunks") else: print(f"❌ Indexing failed: {status['error_message']}") # Search code results = await search_code(query="function to handle authentication") # Search with filters results = await search_code( query="database query", file_type="py", limit=20 )

The server automatically uses a default project workspace (project_default) if no project ID is specified.

Multi-Project Usage

For users managing multiple codebases or client projects, use the project_id parameter to isolate repositories:

# Index repositories with project_id job_a = await start_indexing_background( repo_path="/path/to/client-a-repo", project_id="client-a" ) job_b = await start_indexing_background( repo_path="/path/to/client-b-repo", project_id="client-b" ) # Poll both jobs for job in [job_a, job_b]: while True: status = await get_indexing_status(job_id=job["job_id"]) if status["status"] in ["completed", "failed"]: break await asyncio.sleep(2) # Search within specific project results_a = await search_code( query="authentication logic", project_id="client-a" ) results_b = await search_code( query="payment processing", project_id="client-b" )

Each project has its own isolated database schema, ensuring repositories and embeddings are completely separated.

workflow-mcp Integration (Optional)

The Codebase MCP Server can optionally integrate with workflow-mcp for automatic project context resolution. This is an advanced feature and not required for basic usage.

Standalone Usage (Default)

By default, Codebase MCP operates independently:

# Works out of the box without workflow-mcp job = await start_indexing_background(repo_path="/path/to/repo") results = await search_code(query="search query")

Integration with workflow-mcp

If you're using workflow-mcp to manage development projects, Codebase MCP can automatically resolve project context:

# Set workflow-mcp URL in environment export WORKFLOW_MCP_URL=http://localhost:8001

# Now project_id is automatically resolved from workflow-mcp's active project job = await start_indexing_background(repo_path="/path/to/repo") # Uses active project results = await search_code(query="search query") # Searches in active project's context

How It Works:

Codebase MCP queries workflow-mcp for the active project
If an active project exists, it's used as the project_id
If no active project or workflow-mcp is unavailable, falls back to default project
You can still override with --project-id flag

Configuration:

# In .env file WORKFLOW_MCP_URL=http://localhost:8001 # Optional, enables integration

See Also: workflow-mcp repository for details on project workspace management.

Documentation

Comprehensive documentation is available for different use cases:

Migration Guide - Upgrading from v1.x to v2.x with multi-project support
Configuration Guide - Production deployment and tuning
Architecture Documentation - System design and multi-project isolation
API Reference - Complete MCP tool documentation
Glossary - Canonical terminology definitions

For quick setup, refer to the Installation section above.

Contributing

We welcome contributions to the Codebase MCP Server. This project follows a specification-driven development workflow.

Getting Started

Read the Architecture: Start with docs/architecture/multi-project-design.md to understand the system design
Review the Constitution: See .specify/memory/constitution.md for project principles
Follow the Workflow: Use the Specify workflow documented in CLAUDE.md

Development Process

Create a feature specification using /specify command
Plan the implementation with /plan
Generate tasks using /tasks
Implement incrementally with atomic commits

Code Standards

Type Safety: Full mypy --strict compliance
Testing: 95%+ test coverage, contract tests for MCP protocol
Performance: Meet benchmarks (60s indexing, 500ms search p95)
Documentation: Update docs with all changes

Code of Conduct

This project adheres to a code of conduct that promotes a welcoming, inclusive environment. We expect:

Respectful communication in issues and PRs
Constructive feedback focused on code and ideas
Recognition that contributors volunteer their time
Patience with maintainers and fellow contributors

By participating, you agree to uphold these standards.

Acknowledgments

MCP framework powered by FastMCP
Built with FastAPI, SQLAlchemy, and Pydantic
Vector search powered by pgvector
Embeddings via Ollama and nomic-embed-text
Code parsing with tree-sitter
MCP protocol by Anthropic