Provides git integration for tracking development tasks with branch and commit linking capabilities
Integrates with Ollama for generating text embeddings used in semantic code search across indexed repositories
Uses PostgreSQL with pgvector extension for storing and performing semantic search on code repositories and development tasks
Codebase MCP Server
A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants.
Overview
The Codebase MCP Server provides semantic code search capabilities through a focused, local-first architecture. It enables AI assistants to understand and navigate codebases efficiently by combining tree-sitter AST parsing with vector embeddings.
Key Features
Semantic Code Search: Natural language queries across indexed repositories
Repository Indexing: Fast scanning and chunking with tree-sitter parsers
Task Management: Development task tracking with git integration
MCP Protocol: Six focused tools via Server-Sent Events (SSE) and stdio (JSON-RPC)
Performance Guaranteed: 60-second indexing for 10K files, 500ms p95 search latency
Production Ready: Comprehensive error handling, structured logging, type safety
MCP Tools
search_code: Semantic search across indexed code
index_repository: Index a repository for searching
get_task: Retrieve a specific development task
list_tasks: List tasks with filtering options
create_task: Create a new development task
update_task: Update task status with git integration
Quick Start
1. Database Setup
2. Install Dependencies
Key Dependencies:
fastmcp>=0.1.0- Modern MCP framework with decorator-based toolsanthropic-mcp- MCP protocol implementationsqlalchemy>=2.0- Async ORMpgvector- PostgreSQL vector extensionollama- Embedding generation
3. Configure Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
Important:
Use absolute paths!
Server uses FastMCP framework with decorator-based tool definitions
All logs go to
/tmp/codebase-mcp.log(no stdout/stderr pollution)
4. Start Ollama
5. Test
Current Status
Working Tools (6/6) ✅
Tool | Status | Description |
| ✅ Working | Create development tasks with planning references |
| ✅ Working | Retrieve task by ID |
| ✅ Working | List tasks with filters (status, branch) |
| ✅ Working | Update tasks with git tracking (branch, commit) |
| ✅ Working | Index code repositories with semantic chunking |
| ✅ Working | Semantic code search with pgvector similarity |
Recent Fixes (Oct 6, 2025)
✅ Parameter passing architecture (Pydantic models)
✅ MCP schema mismatches (status enums, missing parameters)
✅ Timezone/datetime compatibility (PostgreSQL)
✅ Binary file filtering (images, cache dirs)
Test Results
Tool Usage Examples
Create a Task
In Claude Desktop:
Response:
Index a Repository
Response:
Search Code
Response:
Track Task with Git
Response:
Architecture
MCP Framework: Built with FastMCP - a modern, decorator-based framework for building MCP servers with:
Type-safe tool definitions via
@mcp.tool()decoratorsAutomatic JSON Schema generation from Pydantic models
Dual logging (file + MCP protocol) without stdout pollution
Async/await support throughout
See ARCHITECTURE.md for detailed component diagrams.
Documentation
docs/status/MCP_SERVER_STATUS.md - Current status, test results, configuration
docs/status/SESSION_HANDOFF.md - Recent problems solved, current working state
docs/guides/SETUP_GUIDE.md - Complete setup instructions with troubleshooting
docs/ARCHITECTURE.md - System architecture and data flow
CLAUDE.md - Specify workflow for AI-assisted development
Database Schema
11 tables with pgvector for semantic search:
Core Tables:
repositories- Indexed repositoriescode_files- Source files with metadatacode_chunks- Semantic chunks with embeddings (vector(768))tasks- Development tasks with git trackingtask_status_history- Audit trail
See docs/ARCHITECTURE.md for complete schema documentation.
Technology Stack
MCP Framework: FastMCP 0.1+ (decorator-based tool definitions)
Server: Python 3.13+, FastAPI patterns, async/await
Database: PostgreSQL 14+ with pgvector extension
Embeddings: Ollama (nomic-embed-text, 768 dimensions)
ORM: SQLAlchemy 2.0 (async), Pydantic V2 for validation
Type Safety: Full mypy --strict compliance
Development
Running Tests
Code Structure
FastMCP Server Architecture:
server_fastmcp.py- Main entry point using@mcp.tool()decoratorsTool handlers in
src/mcp/tools/provide service integrationServices in
src/services/contain all business logicDual logging: file (
/tmp/codebase-mcp.log) + MCP protocol
Prerequisites
System Requirements
Python 3.11+ (3.13 compatible)
PostgreSQL 14+ with pgvector extension
Ollama for embedding generation
4GB+ RAM recommended
SSD storage for optimal performance
PostgreSQL with pgvector
Ollama Setup
Installation
1. Clone the Repository
2. Create Virtual Environment
3. Install Dependencies
Key Dependencies Installed:
fastmcp>=0.1.0- Modern MCP frameworksqlalchemy>=2.0- Async database ORMpgvector- PostgreSQL vector extension Python bindingsollama- Embedding generation clientpydantic>=2.0- Data validation and settings
4. Configure Environment
Environment Variables:
Database Setup
1. Create Database
2. Initialize Schema
The initialization script will:
Create all required tables (repositories, files, chunks, tasks)
Set up vector indexes for similarity search
Configure connection pooling
Apply all database migrations
3. Verify Setup
4. Database Reset & Cleanup
During development, you may need to reset your database. See DATABASE_RESET.md for three reset options:
scripts/clear_data.sh - Clear all data, keep schema (fastest, no restart needed)
scripts/reset_database.sh - Drop and recreate all tables (recommended for schema changes)
scripts/nuclear_reset.sh - Drop entire database (requires Claude Desktop restart)
Running the Server
FastMCP Server (Recommended)
The primary way to run the server is via Claude Desktop or other MCP clients:
Server Entry Point: server_fastmcp.py in repository root
Logging: All output goes to /tmp/codebase-mcp.log (configurable via LOG_FILE env var)
Development Mode (Legacy FastAPI)
Production Mode (Legacy)
stdio Transport (Legacy CLI Mode)
The legacy MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout.
JSON-RPC 2.0 Request Format:
JSON-RPC 2.0 Response Format:
Available Methods:
search_code- Semantic code searchindex_repository- Index a repositoryget_task- Get task by IDlist_tasks- List tasks with filterscreate_task- Create new taskupdate_task- Update task status
Logging:
All logs go to /tmp/codebase-mcp.log (configurable via LOG_FILE env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout.
Health Check
Usage Examples
1. Index a Repository
2. Search Code
3. Task Management
Architecture
Component Overview
MCP Layer: Handles protocol compliance, tool registration, SSE transport
Service Layer: Business logic for indexing, searching, task management
Repository Service: File system operations, git integration, .gitignore handling
Embedding Service: Ollama integration for generating text embeddings
Data Layer: PostgreSQL with pgvector for storage and similarity search
Data Flow
Indexing: Repository → Parse → Chunk → Embed → Store
Searching: Query → Embed → Vector Search → Rank → Return
Task Tracking: Create → Update → Git Integration → Query
Testing
Run All Tests
Test Categories
Unit Tests: Fast, isolated component tests
Integration Tests: Database and service integration
Contract Tests: MCP protocol compliance validation
Performance Tests: Latency and throughput benchmarks
Coverage Requirements
Minimum coverage: 95%
Critical paths: 100%
View HTML report:
open htmlcov/index.html
Performance Tuning
Database Optimization
Connection Pool Settings
Embedding Batch Size
Troubleshooting
Common Issues
Database Connection Failed
Check PostgreSQL is running:
pg_ctl statusVerify DATABASE_URL in .env
Ensure database exists:
psql -U postgres -l
Ollama Connection Error
Check Ollama is running:
curl http://localhost:11434/api/tagsVerify model is installed:
ollama listCheck OLLAMA_BASE_URL in .env
Slow Performance
Check database indexes:
\diin psqlMonitor query performance: See logs at LOG_FILE path
Adjust batch sizes and connection pool
For detailed troubleshooting, see docs/troubleshooting.md and docs/guides/SETUP_GUIDE.md.
Contributing
We follow a specification-driven development workflow using the Specify framework.
Development Workflow
Feature Specification: Use
/specifycommand to create feature specsPlanning: Generate implementation plan with
/planTask Breakdown: Create tasks with
/tasksImplementation: Execute tasks with
/implement
Git Workflow
Code Quality Standards
Type Safety:
mypy --strictmust passLinting:
ruff checkwith no errorsTesting: All tests must pass with 95%+ coverage
Documentation: Update relevant docs with changes
Constitutional Principles
Simplicity Over Features: Focus on core semantic search
Local-First Architecture: No cloud dependencies
Protocol Compliance: Strict MCP adherence
Performance Guarantees: Meet stated benchmarks
Production Quality: Comprehensive error handling
See .specify/memory/constitution.md for full principles.
FastMCP Migration (Oct 2025)
Migration Complete: The server has been successfully migrated from the legacy MCP SDK to the modern FastMCP framework.
What Changed
Before (MCP SDK):
After (FastMCP):
Key Benefits
Simpler Tool Definitions: Decorators replace manual JSON schema creation
Type Safety: Automatic schema generation from Pydantic models
Dual Logging: File logging + MCP protocol without stdout pollution
Better Error Handling: Structured error responses with context
Cleaner Architecture: Separation of tool interface from business logic
Server Files
New Entry Point:
server_fastmcp.py(root directory)Legacy Server:
src/mcp/mcp_stdio_server_v3.py(deprecated, will be removed)Tool Handlers:
src/mcp/tools/*.py(unchanged, reused by FastMCP)Services:
src/services/*.py(unchanged, business logic intact)
Configuration Update Required
Update your Claude Desktop config to use the new server:
Migration Notes
All 6 MCP tools remain functional (100% backward compatible)
No database schema changes required
Tool signatures and responses unchanged
Logging now goes exclusively to
/tmp/codebase-mcp.logAll tests pass with FastMCP implementation
Performance
FastMCP maintains performance targets:
Repository indexing: <60 seconds for 10K files
Code search: <500ms p95 latency
Async/await throughout for optimal concurrency
License
MIT License - see LICENSE file for details.
Support
Issues: GitHub Issues
Documentation: Full documentation
Logs: Check
/tmp/codebase-mcp.logfor detailed debugging
Acknowledgments
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables AI assistants to semantically search and understand code repositories using PostgreSQL with pgvector embeddings. Provides repository indexing, natural language code search, and development task management with git integration.
- Overview
- Quick Start
- Current Status
- Tool Usage Examples
- Architecture
- Documentation
- Database Schema
- Technology Stack
- Development
- Prerequisites
- Installation
- Database Setup
- Running the Server
- Usage Examples
- Architecture
- Testing
- Performance Tuning
- Troubleshooting
- Contributing
- FastMCP Migration (Oct 2025)
- License
- Support
- Acknowledgments