Provides git integration for tracking development tasks with branch and commit linking capabilities
Integrates with Ollama for generating text embeddings used in semantic code search across indexed repositories
Uses PostgreSQL with pgvector extension for storing and performing semantic search on code repositories and development tasks
Codebase MCP Server
A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants.
Overview
The Codebase MCP Server provides semantic code search capabilities through a focused, local-first architecture. It enables AI assistants to understand and navigate codebases efficiently by combining tree-sitter AST parsing with vector embeddings.
Key Features
Semantic Code Search: Natural language queries across indexed repositories
Repository Indexing: Fast scanning and chunking with tree-sitter parsers
Task Management: Development task tracking with git integration
MCP Protocol: Six focused tools via Server-Sent Events (SSE) and stdio (JSON-RPC)
Performance Guaranteed: 60-second indexing for 10K files, 500ms p95 search latency
Production Ready: Comprehensive error handling, structured logging, type safety
MCP Tools
search_code: Semantic search across indexed code
index_repository: Index a repository for searching
get_task: Retrieve a specific development task
list_tasks: List tasks with filtering options
create_task: Create a new development task
update_task: Update task status with git integration
Quick Start
1. Database Setup
2. Install Dependencies
3. Configure Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json
:
Important: Use absolute paths!
4. Start Ollama
5. Test
Current Status
Working Tools (6/6) ✅
Tool | Status | Description |
| ✅ Working | Create development tasks with planning references |
| ✅ Working | Retrieve task by ID |
| ✅ Working | List tasks with filters (status, branch) |
| ✅ Working | Update tasks with git tracking (branch, commit) |
| ✅ Working | Index code repositories with semantic chunking |
| ✅ Working | Semantic code search with pgvector similarity |
Recent Fixes (Oct 6, 2025)
✅ Parameter passing architecture (Pydantic models)
✅ MCP schema mismatches (status enums, missing parameters)
✅ Timezone/datetime compatibility (PostgreSQL)
✅ Binary file filtering (images, cache dirs)
Test Results
Tool Usage Examples
Create a Task
In Claude Desktop:
Response:
Index a Repository
Response:
Search Code
Response:
Track Task with Git
Response:
Architecture
See ARCHITECTURE.md for detailed component diagrams.
Documentation
docs/status/MCP_SERVER_STATUS.md - Current status, test results, configuration
docs/status/SESSION_HANDOFF.md - Recent problems solved, current working state
docs/guides/SETUP_GUIDE.md - Complete setup instructions with troubleshooting
docs/ARCHITECTURE.md - System architecture and data flow
CLAUDE.md - Specify workflow for AI-assisted development
Database Schema
11 tables with pgvector for semantic search:
Core Tables:
repositories
- Indexed repositoriescode_files
- Source files with metadatacode_chunks
- Semantic chunks with embeddings (vector(768))tasks
- Development tasks with git trackingtask_status_history
- Audit trail
See docs/ARCHITECTURE.md for complete schema documentation.
Technology Stack
Server: Python 3.13+, MCP SDK, FastAPI patterns
Database: PostgreSQL 14+ with pgvector extension
Embeddings: Ollama (nomic-embed-text, 768 dimensions)
ORM: SQLAlchemy 2.0 (async), Pydantic for validation
Type Safety: Full mypy --strict compliance
Development
Running Tests
Code Structure
Prerequisites
System Requirements
Python 3.11+ (3.13 compatible)
PostgreSQL 14+ with pgvector extension
Ollama for embedding generation
4GB+ RAM recommended
SSD storage for optimal performance
PostgreSQL with pgvector
Ollama Setup
Installation
1. Clone the Repository
2. Create Virtual Environment
3. Install Dependencies
4. Configure Environment
Environment Variables:
Database Setup
1. Create Database
2. Initialize Schema
The initialization script will:
Create all required tables (repositories, files, chunks, tasks)
Set up vector indexes for similarity search
Configure connection pooling
Apply all database migrations
3. Verify Setup
4. Database Reset & Cleanup
During development, you may need to reset your database. See DATABASE_RESET.md for three reset options:
scripts/clear_data.sh - Clear all data, keep schema (fastest, no restart needed)
scripts/reset_database.sh - Drop and recreate all tables (recommended for schema changes)
scripts/nuclear_reset.sh - Drop entire database (requires Claude Desktop restart)
Running the Server
Development Mode
Production Mode
stdio Transport (CLI Mode)
The MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout. This is ideal for command-line tools and scripted interactions.
JSON-RPC 2.0 Request Format:
JSON-RPC 2.0 Response Format:
Available Methods:
search_code
- Semantic code searchindex_repository
- Index a repositoryget_task
- Get task by IDlist_tasks
- List tasks with filterscreate_task
- Create new taskupdate_task
- Update task status
Logging:
All logs go to /tmp/codebase-mcp.log
(configurable via LOG_FILE
env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout.
Health Check
Usage Examples
1. Index a Repository
2. Search Code
3. Task Management
Architecture
Component Overview
MCP Layer: Handles protocol compliance, tool registration, SSE transport
Service Layer: Business logic for indexing, searching, task management
Repository Service: File system operations, git integration, .gitignore handling
Embedding Service: Ollama integration for generating text embeddings
Data Layer: PostgreSQL with pgvector for storage and similarity search
Data Flow
Indexing: Repository → Parse → Chunk → Embed → Store
Searching: Query → Embed → Vector Search → Rank → Return
Task Tracking: Create → Update → Git Integration → Query
Testing
Run All Tests
Test Categories
Unit Tests: Fast, isolated component tests
Integration Tests: Database and service integration
Contract Tests: MCP protocol compliance validation
Performance Tests: Latency and throughput benchmarks
Coverage Requirements
Minimum coverage: 95%
Critical paths: 100%
View HTML report:
open htmlcov/index.html
Performance Tuning
Database Optimization
Connection Pool Settings
Embedding Batch Size
Troubleshooting
Common Issues
Database Connection Failed
Check PostgreSQL is running:
pg_ctl status
Verify DATABASE_URL in .env
Ensure database exists:
psql -U postgres -l
Ollama Connection Error
Check Ollama is running:
curl http://localhost:11434/api/tags
Verify model is installed:
ollama list
Check OLLAMA_BASE_URL in .env
Slow Performance
Check database indexes:
\di
in psqlMonitor query performance: See logs at LOG_FILE path
Adjust batch sizes and connection pool
For detailed troubleshooting, see docs/troubleshooting.md and docs/guides/SETUP_GUIDE.md.
Contributing
We follow a specification-driven development workflow using the Specify framework.
Development Workflow
Feature Specification: Use
/specify
command to create feature specsPlanning: Generate implementation plan with
/plan
Task Breakdown: Create tasks with
/tasks
Implementation: Execute tasks with
/implement
Git Workflow
Code Quality Standards
Type Safety:
mypy --strict
must passLinting:
ruff check
with no errorsTesting: All tests must pass with 95%+ coverage
Documentation: Update relevant docs with changes
Constitutional Principles
Simplicity Over Features: Focus on core semantic search
Local-First Architecture: No cloud dependencies
Protocol Compliance: Strict MCP adherence
Performance Guarantees: Meet stated benchmarks
Production Quality: Comprehensive error handling
See .specify/memory/constitution.md for full principles.
License
MIT License - see LICENSE file for details.
Support
Issues: GitHub Issues
Documentation: Full documentation
Logs: Check
/tmp/codebase-mcp.log
for detailed debugging
Acknowledgments
Built with FastAPI, SQLAlchemy, and Pydantic
Vector search powered by pgvector
Embeddings via Ollama and nomic-embed-text
Code parsing with tree-sitter
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables AI assistants to semantically search and understand code repositories using PostgreSQL with pgvector embeddings. Provides repository indexing, natural language code search, and development task management with git integration.