Provides git integration for tracking development tasks with branch and commit linking capabilities
Integrates with Ollama for generating text embeddings used in semantic code search across indexed repositories
Uses PostgreSQL with pgvector extension for storing and performing semantic search on code repositories and development tasks
Codebase MCP Server
A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants.
What's New in v2.0
Version 2.0 represents a major architectural refactoring focused exclusively on semantic code search capabilities. This release removes project management, entity tracking, and work item features to maintain single-responsibility focus.
Breaking Changes:
14 tools removed (project management, entity tracking, work item features extracted to workflow-mcp)
3 tools remaining:
start_indexing_background,get_indexing_status, andsearch_codewith multi-project supportForeground
index_repositoryremoved (all indexing now uses background jobs to prevent timeouts)Database schema simplified (9 tables dropped,
project_idparameter added)New environment variables for optional workflow-mcp integration
Migration Required: Existing v1.x users must follow the migration guide to upgrade safely. See Migration Guide for complete upgrade and rollback procedures.
What's Preserved: All indexed repositories and code embeddings remain searchable after migration.
What's Discarded: All v1.x project management data, entities, and work items are permanently removed.
Features
The Codebase MCP Server provides exactly 3 MCP tools for semantic code search with multi-project workspace support:
start_indexing_background: Start a background indexing job for a repositoryReturns job_id immediately to prevent MCP client timeouts
Accepts optional
project_idparameter for workspace isolationDefault behavior: indexes to default project workspace if
project_idnot specifiedPerformance target: 60-second indexing for 10,000 files
get_indexing_status: Poll the status of a background indexing jobQuery job progress using job_id from start_indexing_background
Returns files_indexed, chunks_created, and completion status
Enables responsive UIs with progress indicators
search_code: Semantic code search with natural language queriesAccepts optional
project_idparameter to restrict search scopeDefault behavior: searches default project workspace if
project_idnot specifiedPerformance target: 500ms p95 search latency
Multi-Project Support
The v2.0 architecture supports isolated project workspaces through the optional project_id parameter:
Single Project Workflow (default):
Multi-Project Workflow:
Use Cases:
Single Project: Individual developers or small teams working on one codebase
Multi-Project: Consultants managing multiple client codebases, organizations with separate product lines, or multi-tenant deployments requiring workspace isolation
Optional Integration: The project_id can be automatically resolved from Git repository context when the optional workflow-mcp server is configured. Without workflow-mcp, all operations default to a single shared workspace.
Quick Start
1. Database Setup
2. Install Dependencies
Key Dependencies:
fastmcp>=0.1.0- Modern MCP framework with decorator-based toolsanthropic-mcp- MCP protocol implementationsqlalchemy>=2.0- Async ORMpgvector- PostgreSQL vector extensionollama- Embedding generation
3. Configure Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
Important:
Use absolute paths!
Server uses FastMCP framework with decorator-based tool definitions
All logs go to
/tmp/codebase-mcp.log(no stdout/stderr pollution)
4. Start Ollama
5. Test
Current Status
Working Tools (3/3) ✅
Tool | Status | Description |
| ✅ Working | Start background indexing job, returns job_id immediately |
| ✅ Working | Poll indexing job status with files_indexed/chunks_created |
| ✅ Working | Semantic code search with pgvector similarity |
Recent Fixes (Oct 6, 2025)
✅ Parameter passing architecture (Pydantic models)
✅ MCP schema mismatches (status enums, missing parameters)
✅ Timezone/datetime compatibility (PostgreSQL)
✅ Binary file filtering (images, cache dirs)
Test Results
Tool Usage Examples
Index a Repository (Background Job)
In Claude Desktop:
Initial Response (immediate):
Poll for Status:
Completed Response:
Search Code
Response:
Architecture
MCP Framework: Built with FastMCP - a modern, decorator-based framework for building MCP servers with:
Type-safe tool definitions via
@mcp.tool()decoratorsAutomatic JSON Schema generation from Pydantic models
Dual logging (file + MCP protocol) without stdout pollution
Async/await support throughout
See Multi-Project Architecture for detailed component diagrams.
Documentation
Multi-Project Architecture - System architecture and data flow
Auto-Switch Architecture - Config-based project switching internals
Configuration Guide - Production deployment and tuning
API Reference - Complete MCP tool documentation
CLAUDE.md - Specify workflow for AI-assisted development
Database Schema
11 tables with pgvector for semantic search:
Core Tables:
repositories- Indexed repositoriescode_files- Source files with metadatacode_chunks- Semantic chunks with embeddings (vector(768))tasks- Development tasks with git trackingtask_status_history- Audit trail
See Multi-Project Architecture for complete schema documentation.
Technology Stack
MCP Framework: FastMCP 0.1+ (decorator-based tool definitions)
Server: Python 3.13+, FastAPI patterns, async/await
Database: PostgreSQL 14+ with pgvector extension
Embeddings: Ollama (nomic-embed-text, 768 dimensions)
ORM: SQLAlchemy 2.0 (async), Pydantic V2 for validation
Type Safety: Full mypy --strict compliance
Development
Running Tests
Code Structure
FastMCP Server Architecture:
server_fastmcp.py- Main entry point using@mcp.tool()decoratorsTool handlers in
src/mcp/tools/provide service integrationServices in
src/services/contain all business logicDual logging: file (
/tmp/codebase-mcp.log) + MCP protocol
Installation
Prerequisites
Before installing Codebase MCP Server v2.0, ensure the following requirements are met:
Required Software:
PostgreSQL 14+ - Database with pgvector extension for vector similarity search
Python 3.11+ - Runtime environment (Python 3.13 compatible)
Ollama - Local embedding model server with nomic-embed-text model
System Requirements:
4GB+ RAM recommended for typical workloads
SSD storage for optimal performance (database and embedding operations are I/O intensive)
Network access to Ollama server (default: localhost:11434)
Installation Commands
Install Codebase MCP Server v2.0 using pip:
Alternative Installation Methods:
Key Dependencies Installed Automatically:
fastmcp>=0.1.0- Modern MCP frameworksqlalchemy>=2.0- Async database ORMpgvector- PostgreSQL vector extension Python bindingsollama- Embedding generation clientpydantic>=2.0- Data validation and settings
Verification Steps
After installation, verify the setup is correct:
Setup Complete: If all verification steps pass, Codebase MCP Server v2.0 is ready for use. Proceed to the Quick Start section for first-time indexing and search operations.
Multi-Project Configuration
The Codebase MCP server supports automatic project switching based on your working directory using .codebase-mcp/config.json files.
Quick Start
Create a config file in your project root:
Set your working directory (via MCP client):
Use tools normally - they'll automatically use your project:
Config File Format
Fields:
version(required): Config version (currently "1.0")project.name(required): Project identifier (used if no ID provided)project.id(optional): Explicit project UUID (takes priority over name)project.database_name(optional): Override computed database name (see Database Name Resolution below)auto_switch(optional, default true): Enable automatic project switchingstrict_mode(optional, default false): Reject operations if project mismatchdry_run(optional, default false): Log intended switches without executing
Database Name Resolution:
The server determines which database to use in this order:
Explicit - Uses exact database name specified
{"project": {"database_name": "cb_proj_my_project_550e8400"}}Computed from - Automatically generates database name
Format: cb_proj_{sanitized_name}_{id_prefix} Example: cb_proj_my_project_550e8400
Use Cases for
Recovering from database name mismatches
Migrating from old database naming schemes
Explicit control over database selection
Debugging and troubleshooting
Example - Auto-generated (default):
Database used: cb_proj_my_project_550e8400 (auto-computed)
Example - Explicit override:
Database used: cb_proj_legacy_database_12345678 (explicit override)
Project Resolution Priority
When you call MCP tools, the server resolves the project workspace using this 4-tier priority system:
Explicit (highest priority)
await mcpClient.callTool("start_indexing_background", { repo_path: "/path/to/repo", project_id: "explicit-project-id" // Always takes priority });Session-based config file (via
set_working_directory)Server searches up to 20 directory levels for
.codebase-mcp/config.jsonCached with mtime-based invalidation for performance
Isolated per MCP session (multiple clients stay independent)
workflow-mcp integration (external project tracking)
Queries workflow-mcp server for active project context
Configurable timeout and caching
Default workspace (fallback)
Uses
project_defaultschema when no other resolution succeeds
Multi-Session Isolation
The server maintains separate working directories for each MCP session (client connection):
Config File Discovery
The server searches for .codebase-mcp/config.json by:
Starting from your working directory
Searching up to 20 parent directories
Stopping at the first config file found
Caching the result (with automatic invalidation on file modification)
Example directory structure:
If you set working directory to /Users/alice/projects/my-app/src/components/, the server will find the config at /Users/alice/projects/my-app/.codebase-mcp/config.json.
Performance
Config discovery: <50ms (with upward traversal)
Cache hit: <5ms
Session lookup: <1ms
Background cleanup: Hourly (removes sessions inactive >24h)
Database Setup
1. Create Database
2. Initialize Schema
The initialization script will:
Create all required tables (repositories, files, chunks, tasks)
Set up vector indexes for similarity search
Configure connection pooling
Apply all database migrations
3. Verify Setup
4. Database Reset & Cleanup
During development, you may need to reset your database using the following reset options:
scripts/clear_data.sh - Clear all data, keep schema (fastest, no restart needed)
scripts/reset_database.sh - Drop and recreate all tables (recommended for schema changes)
scripts/nuclear_reset.sh - Drop entire database (requires Claude Desktop restart)
Running the Server
FastMCP Server (Recommended)
The primary way to run the server is via Claude Desktop or other MCP clients:
Server Entry Point: server_fastmcp.py in repository root
Logging: All output goes to /tmp/codebase-mcp.log (configurable via LOG_FILE env var)
Development Mode (Legacy FastAPI)
Production Mode (Legacy)
stdio Transport (Legacy CLI Mode)
The legacy MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout.
JSON-RPC 2.0 Request Format:
JSON-RPC 2.0 Response Format:
Available Methods:
search_code- Semantic code searchstart_indexing_background- Start background indexing jobget_indexing_status- Poll indexing job status
Logging:
All logs go to /tmp/codebase-mcp.log (configurable via LOG_FILE env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout.
Health Check
Usage Examples
1. Index a Repository (Background Job)
2. Search Code
Architecture
Component Overview
MCP Layer: Handles protocol compliance, tool registration, SSE transport
Service Layer: Business logic for indexing, searching, task management
Repository Service: File system operations, git integration, .gitignore handling
Embedding Service: Ollama integration for generating text embeddings
Data Layer: PostgreSQL with pgvector for storage and similarity search
Data Flow
Indexing: Repository → Parse → Chunk → Embed → Store
Searching: Query → Embed → Vector Search → Rank → Return
Task Tracking: Create → Update → Git Integration → Query
Testing
Run All Tests
Test Categories
Unit Tests: Fast, isolated component tests
Integration Tests: Database and service integration
Contract Tests: MCP protocol compliance validation
Performance Tests: Latency and throughput benchmarks
Coverage Requirements
Minimum coverage: 95%
Critical paths: 100%
View HTML report:
open htmlcov/index.html
Performance Tuning
Database Optimization
Connection Pool Settings
Embedding Batch Size
Troubleshooting
Common Issues
Database Connection Failed
Check PostgreSQL is running:
pg_ctl statusVerify DATABASE_URL in .env
Ensure database exists:
psql -U postgres -l
Ollama Connection Error
Check Ollama is running:
curl http://localhost:11434/api/tagsVerify model is installed:
ollama listCheck OLLAMA_BASE_URL in .env
Slow Performance
Check database indexes:
\diin psqlMonitor query performance: See logs at LOG_FILE path
Adjust batch sizes and connection pool
For detailed troubleshooting, see the Configuration Guide troubleshooting section.
Contributing
We follow a specification-driven development workflow using the Specify framework.
Development Workflow
Feature Specification: Use
/specifycommand to create feature specsPlanning: Generate implementation plan with
/planTask Breakdown: Create tasks with
/tasksImplementation: Execute tasks with
/implement
Git Workflow
Code Quality Standards
Type Safety:
mypy --strictmust passLinting:
ruff checkwith no errorsTesting: All tests must pass with 95%+ coverage
Documentation: Update relevant docs with changes
Constitutional Principles
Simplicity Over Features: Focus on core semantic search
Local-First Architecture: No cloud dependencies
Protocol Compliance: Strict MCP adherence
Performance Guarantees: Meet stated benchmarks
Production Quality: Comprehensive error handling
See .specify/memory/constitution.md for full principles.
FastMCP Migration (Oct 2025)
Migration Complete: The server has been successfully migrated from the legacy MCP SDK to the modern FastMCP framework.
What Changed
Before (MCP SDK):
After (FastMCP):
Key Benefits
Simpler Tool Definitions: Decorators replace manual JSON schema creation
Type Safety: Automatic schema generation from Pydantic models
Dual Logging: File logging + MCP protocol without stdout pollution
Better Error Handling: Structured error responses with context
Cleaner Architecture: Separation of tool interface from business logic
Server Files
New Entry Point:
server_fastmcp.py(root directory)Legacy Server:
src/mcp/mcp_stdio_server_v3.py(deprecated, will be removed)Tool Handlers:
src/mcp/tools/*.py(unchanged, reused by FastMCP)Services:
src/services/*.py(unchanged, business logic intact)
Configuration Update Required
Update your Claude Desktop config to use the new server:
Migration Notes
All 6 MCP tools remain functional (100% backward compatible)
No database schema changes required
Tool signatures and responses unchanged
Logging now goes exclusively to
/tmp/codebase-mcp.logAll tests pass with FastMCP implementation
Performance
FastMCP maintains performance targets:
Repository indexing: <60 seconds for 10K files
Code search: <500ms p95 latency
Async/await throughout for optimal concurrency
License
MIT License (LICENSE file pending).
Support
Issues: GitHub Issues
Documentation: Full documentation
Logs: Check
/tmp/codebase-mcp.logfor detailed debugging
Quick Start
Basic Usage (Default Project)
For most users, the default project workspace is sufficient. All indexing now uses background jobs to prevent MCP client timeouts:
The server automatically uses a default project workspace (project_default) if no project ID is specified.
Multi-Project Usage
For users managing multiple codebases or client projects, use the project_id parameter to isolate repositories:
Each project has its own isolated database schema, ensuring repositories and embeddings are completely separated.
workflow-mcp Integration (Optional)
The Codebase MCP Server can optionally integrate with workflow-mcp for automatic project context resolution. This is an advanced feature and not required for basic usage.
Standalone Usage (Default)
By default, Codebase MCP operates independently:
Integration with workflow-mcp
If you're using workflow-mcp to manage development projects, Codebase MCP can automatically resolve project context:
How It Works:
Codebase MCP queries workflow-mcp for the active project
If an active project exists, it's used as the
project_idIf no active project or workflow-mcp is unavailable, falls back to default project
You can still override with
--project-idflag
Configuration:
See Also: workflow-mcp repository for details on project workspace management.
Documentation
Comprehensive documentation is available for different use cases:
Migration Guide - Upgrading from v1.x to v2.x with multi-project support
Configuration Guide - Production deployment and tuning
Architecture Documentation - System design and multi-project isolation
API Reference - Complete MCP tool documentation
Glossary - Canonical terminology definitions
For quick setup, refer to the Installation section above.
Contributing
We welcome contributions to the Codebase MCP Server. This project follows a specification-driven development workflow.
Getting Started
Read the Architecture: Start with docs/architecture/multi-project-design.md to understand the system design
Review the Constitution: See .specify/memory/constitution.md for project principles
Follow the Workflow: Use the Specify workflow documented in CLAUDE.md
Development Process
Create a feature specification using
/specifycommandPlan the implementation with
/planGenerate tasks using
/tasksImplement incrementally with atomic commits
Code Standards
Type Safety: Full mypy --strict compliance
Testing: 95%+ test coverage, contract tests for MCP protocol
Performance: Meet benchmarks (60s indexing, 500ms search p95)
Documentation: Update docs with all changes
Code of Conduct
This project adheres to a code of conduct that promotes a welcoming, inclusive environment. We expect:
Respectful communication in issues and PRs
Constructive feedback focused on code and ideas
Recognition that contributors volunteer their time
Patience with maintainers and fellow contributors
By participating, you agree to uphold these standards.
Acknowledgments
This server cannot be installed