Code-Index-MCP (Local-first Code Indexer)
Modular, extensible local-first code indexer designed to enhance Claude Code and other LLMs with deep code understanding capabilities. Built on the Model Context Protocol (MCP) for seamless integration with AI assistants.
Implementation Status
Version: 1.0.0 (MVP Release) Core Features: Stable - Local indexing, symbol/text search, 48-language support Optional Features: Semantic search (requires Voyage AI), Index sync (beta) Performance: Sub-100ms queries, <10s indexing for cached repositories
New to Code-Index-MCP? Check out our Getting Started Guide for a quick walkthrough.
π― Key Features
π Local-First Architecture: All indexing happens locally for speed and privacy
π Local Index Storage: All indexes stored at
.indexes/(relative to MCP server)π Plugin-Based Design: Easily extensible with language-specific plugins
π 48-Language Support: Complete tree-sitter integration with semantic search
β‘ Real-Time Updates: File system monitoring for instant index updates
π§ Semantic Search: AI-powered code search with Voyage AI embeddings
π Rich Code Intelligence: Symbol resolution, type inference, dependency tracking
π Enhanced Performance: Sub-100ms queries with timeout protection and BM25 bypass
π Git Synchronization: Automatic index updates tracking repository changes
π¦ Portable Index Management: Zero-cost index sharing via GitHub Artifacts
π Automatic Index Sync: Pull indexes on clone, push on changes
π― Smart Result Reranking: Multi-strategy reranking for improved relevance
π Security-Aware Export: Automatic filtering of sensitive files from shared indexes
π Hybrid Search: BM25 + semantic search with configurable fusion
π Index Everything Locally: Search .env files and secrets on your machine
π« Smart Filtering on Share: .gitignore and .mcp-index-ignore patterns applied only during export
π Multi-Language Indexing: Index entire repositories with mixed languages
ποΈ Architecture
The Code-Index-MCP follows a modular, plugin-based architecture designed for extensibility and performance:
System Layers
π System Context (Level 1)
Developer interacts with Claude Code or other LLMs
MCP protocol provides standardized tool interface
Local-first processing with optional cloud features
Performance SLAs: <100ms symbol lookup, <500ms search
π¦ Container Architecture (Level 2)
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ β API Gateway ββββββΆβ Dispatcher ββββββΆβ Plugins β β (FastAPI) β β β β (Language) β βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ β β β βΌ βΌ βΌ βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ β Local Index β β File Watcher β β Embedding β β (SQLite+FTS5) β β (Watchdog) β β Service β βββββββββββββββββββ ββββββββββββββββ βββββββββββββββπ§ Component Details (Level 3)
Gateway Controller: RESTful API endpoints
Dispatcher Core: Plugin routing and lifecycle
Plugin Base: Standard interface for all plugins
Language Plugins: Specialized parsers and analyzers
Index Manager: SQLite with FTS5 for fast searches
Watcher Service: Real-time file monitoring
π Project Structure
The project follows a clean, organized structure. See docs/PROJECT_STRUCTURE.md for detailed layout.
Key directories:
mcp_server/- Core MCP server implementationscripts/- Development and utility scriptstests/- Comprehensive test suite with fixturesdocs/- Documentation and guidesarchitecture/- System design and diagramsdocker/- Docker configurations and compose filesdata/- Database files and indexeslogs/- Application and test logsreports/- Generated performance reports and analysisanalysis_archive/- Historical analysis and archived research
π οΈ Language Support
β Fully Supported Languages (46+ Total)
Production-Ready Features:
Dynamic Plugin Loading: Languages are loaded on-demand for optimal performance
Tree-sitter Parsing: Accurate AST-based symbol extraction with language-specific queries
Query Caching: Improved performance with cached tree-sitter queries
Semantic Search: Optional AI-powered code search (when Qdrant is available)
Cross-Language Search: Find symbols and patterns across all supported languages
Language Categories:
Category | Languages | Features |
Dedicated Plugins | Python, JavaScript, TypeScript, C, C++, Dart, HTML/CSS | Enhanced analysis, framework support |
Systems Languages | Go, Rust, C, C++, Zig, Nim, D, V | Memory safety, performance analysis |
JVM Languages | Java, Kotlin, Scala, Clojure | Package analysis, build tool integration |
Web Technologies | JavaScript, TypeScript, HTML, CSS, SCSS, PHP | Framework detection, bundler support |
Scripting Languages | Python, Ruby, Perl, Lua, R, Julia | Dynamic typing, REPL integration |
Functional Languages | Haskell, Elixir, Erlang, F#, OCaml | Pattern matching, type inference |
Mobile Development | Swift, Kotlin, Dart, Objective-C | Platform-specific APIs |
Infrastructure | Dockerfile, Bash, PowerShell, Makefile, CMake | Build automation, CI/CD |
Data Formats | JSON, YAML, TOML, XML, GraphQL, SQL | Schema validation, query optimization |
Documentation | Markdown, LaTeX, reStructuredText | Cross-references, formatting |
Implementation Status: Production-Ready - All languages supported via the enhanced dispatcher with:
β Dynamic plugin loading (lazy initialization)
β Robust error handling and fallback mechanisms
β Path resolution for complex project structures
β Graceful degradation when external services unavailable
π Quick Start
π― Automatic Setup for Claude Code/Desktop (Recommended)
This automatically detects your environment and creates the appropriate .mcp.json configuration.
π³ Docker Setup by Environment
Option 1: Basic Search (No API Keys) - 2 Minutes
Option 2: AI-Powered Search
π» Environment-Specific Setup
πͺ Windows (Native)
π macOS
π§ Linux
π WSL2 (Windows Subsystem for Linux)
π¦ Nested Containers (Dev Containers)
π MCP.json Configuration Examples
The setup script creates the appropriate .mcp.json for your environment. Manual examples:
Native Python (Dev Container/Local)
Docker (Windows/Mac/Linux)
π° Costs & Features
Feature | Minimal | Standard | Full | Cost |
Code Search | β | β | β | Free |
48 Languages | β | β | β | Free |
Semantic Search | β | β | β | ~$0.05/1M tokens |
GitHub Sync | β | β | β | Free |
Monitoring | β | β | β | Free |
π Quickstart (Python)
Prerequisites
Python 3.8+
Git
Installation
Option 1: Install via pip (Recommended)
Option 2: Install from Source
Quick Start After Installation
π§ Configuration
Create a .env file for configuration:
π Privacy & GitHub Artifact Sync
Control how your code index is shared:
Privacy Features:
Indexes filtered by .gitignore automatically
Additional patterns via .mcp-index-ignore
Audit logs show what was excluded
Sync disabled by default in Docker minimal version
π Advanced Features
Search Result Reranking
The system includes multiple reranking strategies to improve search relevance:
Available Rerankers:
TF-IDF: Fast, local reranking using term frequency
Cohere: Cloud-based neural reranking (requires API key)
Cross-Encoder: Local transformer-based reranking
Hybrid: Combines multiple rerankers with fallback
Security-Aware Index Sharing
Prevent accidental sharing of sensitive files:
BM25 Hybrid Search
Combines traditional full-text search with semantic search:
π§ Dispatcher Configuration
Enhanced Dispatcher (Default)
The enhanced dispatcher includes timeout protection and automatic fallback:
Simple Dispatcher (Lightweight Alternative)
For maximum performance with BM25-only search:
Configuration Options
Configure dispatcher behavior via environment variables:
ποΈ Index Management
Centralized Index Storage
All indexes are now stored centrally at .indexes/ (relative to the MCP project) for better organization and to prevent accidental commits:
Benefits:
Indexes never accidentally committed to git
Reusable across multiple clones of same repository
Clear separation between code and indexes
Automatic discovery based on git remote
Migration: For existing repositories with local indexes:
For This Repository
This project uses GitHub Actions Artifacts for efficient index sharing, eliminating reindexing time while keeping the repository lean.
For ANY Repository (MCP Index Kit)
Enable portable index management in any repository with zero GitHub compute costs:
Quick Install
How It Works
Zero-Cost Architecture:
All indexing happens on developer machines
Indexes stored as GitHub Artifacts (free for public repos)
Automatic download on clone, upload on push
No GitHub Actions compute required
Portable Design:
Single command setup for any repository
Auto-detected by MCP servers and tools
Works with all 48 supported languages
Enable/disable per repository
Usage:
# Initialize in your repo cd your-repo mcp-index init # Build index locally mcp-index build # Push to GitHub Artifacts mcp-index push # Pull latest index mcp-index pull # Auto sync mcp-index sync
Configuration
Semantic Search Configuration
To enable semantic search capabilities, you need a Voyage AI API key. Get one from https://www.voyageai.com/.
Method 1: Claude Code Configuration (Recommended)
Create or edit .mcp.json in your project root:
Method 2: Claude Code CLI
Method 3: Environment Variables
Method 4: .env File
Create a .env file in your project root:
Check Configuration
Verify your semantic search setup:
Index Configuration
Edit .mcp-index.json in your repository:
See mcp-index-kit for full documentation
View artifact details
python scripts/cli/mcp_cli.py artifact info 12345
GitHub Actions Integration
Pull Requests: Validates developer-provided indexes (no rebuilding)
Merges to Main: Promotes validated indexes to artifacts
Cost-Efficient: Uses free GitHub Actions Artifacts storage
Auto-Cleanup: Old artifacts cleaned up after 30 days
Storage & Cost
GitHub Actions Artifacts: FREE for public repos, included in private repo quotas
Retention: 7 days for PR artifacts, 30 days for main branch
Size Limits: 500MB per artifact (compressed)
Automatic Compression: ~70% size reduction with tar.gz
Developer Workflow
Clone Repository
git clone https://github.com/yourusername/Code-Index-MCP.git cd Code-Index-MCPGet Latest Indexes
python scripts/cli/mcp_cli.py artifact pull --latestMake Your Changes
Edit code as normal
Indexes update automatically via file watcher
Share Updates
# Your indexes are already updated locally python scripts/cli/mcp_cli.py artifact push
Embedding Model Compatibility
The system tracks embedding model versions to ensure compatibility:
Current model:
voyage-code-3(1024 dimensions)Distance metric: Cosine similarity
Auto-detection: System checks compatibility before download
If you use a different embedding model, the system will detect incompatibility and rebuild locally with your configuration.
π» Development
Creating a New Language Plugin
Create plugin structure
mkdir -p mcp_server/plugins/my_language_plugin cd mcp_server/plugins/my_language_plugin touch __init__.py plugin.pyImplement the plugin interface
from mcp_server.plugin_base import PluginBase class MyLanguagePlugin(PluginBase): def __init__(self): self.tree_sitter_language = "my_language" def index(self, file_path: str) -> Dict: # Parse and index the file pass def getDefinition(self, symbol: str, context: Dict) -> Dict: # Find symbol definition pass def getReferences(self, symbol: str, context: Dict) -> List[Dict]: # Find symbol references passRegister the plugin
# In dispatcher.py from .plugins.my_language_plugin import MyLanguagePlugin self.plugins['my_language'] = MyLanguagePlugin()
Running Tests
Architecture Visualization
π API Reference
Core Endpoints
GET /symbol
Get symbol definition
Query parameters:
symbol_name(required): Name of the symbol to findfile_path(optional): Specific file to search in
GET /search
Search for code patterns
Query parameters:
query(required): Search pattern (regex supported)file_extensions(optional): Comma-separated list of extensions
Response Format
All API responses follow a consistent JSON structure:
Success Response:
Error Response:
π’ Deployment
Docker Deployment Options
The project includes multiple Docker configurations for different environments:
Development (Default):
Production:
Enhanced Development:
Container Restart Behavior
Important: By default, docker-compose restart uses the DEVELOPMENT configuration:
docker-compose restartβ Usesdocker-compose.yml(Development)docker-compose -f docker-compose.production.yml restartβ Uses Production
Production Deployment
For production environments, we provide:
Multi-stage Docker builds with security hardening
PostgreSQL database with async support
Redis caching for performance optimization
Qdrant vector database for semantic search
Prometheus + Grafana monitoring stack
Kubernetes manifests in
k8s/directorynginx reverse proxy configuration
See our Deployment Guide for detailed instructions including:
Kubernetes deployment configurations
Auto-scaling setup
Database optimization
Security best practices
Monitoring and observability
System Requirements
Minimum: 2GB RAM, 2 CPU cores, 10GB storage
Recommended: 8GB RAM, 4 CPU cores, 50GB SSD storage
Large codebases: 16GB+ RAM, 8+ CPU cores, 100GB+ SSD storage
π¦ Releases & Pre-built Indexes
Using Pre-built Indexes
For quick setup, download pre-built indexes from our GitHub releases:
Creating Releases
Maintainers can create new releases with pre-built indexes:
Automatic Index Synchronization
The project includes Git hooks for automatic index synchronization:
Pre-push: Uploads index changes to GitHub artifacts
Post-merge: Downloads compatible indexes after pulling
Install hooks with: mcp-index hooks install
π€ Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Process
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature)Make your changes
Add tests (aim for 90%+ coverage)
Update documentation
Submit a pull request
Code Style
Follow PEP 8 for Python code
Use type hints for all functions
Write descriptive docstrings
Keep functions small and focused
π Performance
Benchmarks
Operation | Performance Target | Current Status |
Symbol Lookup | <100ms (p95) | β Achieved - All queries < 100ms |
Code Search | <500ms (p95) | β Achieved - BM25 search < 50ms |
File Indexing | 10K files/min | β Achieved - 152K files indexed |
ποΈ Architecture Overview
The system follows C4 model architecture patterns:
Workspace Definition: 100% implemented (architecture/workspace.dsl) - Validated with CLI tools
System Context (L1): Claude Code integration with MCP sub-agent support fully operational
Container Level (L2): 8 main containers including enhanced MCP server and user documentation
Component Level (L3): Plugin system with 48 languages, memory management, and cross-repo coordination
Code Level (L4): 43 PlantUML diagrams documenting all system components and flows
For detailed architectural documentation, see the architecture/ directory.
πΊοΈ Development Roadmap
See ROADMAP.md for detailed development plans and current progress.
Current Status: v1.0.0 MVP Release
β Core Indexing: SQLite + FTS5 for fast local search
β Multi-Language: 48 languages via tree-sitter integration
β MCP Protocol: Full compatibility with Claude Code and other MCP clients
β Performance: Sub-100ms queries with BM25 optimization
π Index Sync: Beta support via GitHub Artifacts
π Semantic Search: Optional feature requiring Voyage AI API
Recent Improvements:
β‘ Dispatcher Optimization: Timeout protection and BM25 bypass for reliability
π Hybrid Search: BM25 + semantic search with graceful degradation
π Result Ranking: Improved relevance with score normalization
π§ CLI Tools: Full-featured
mcp-indexcommand for index management
Optimization Tips
Performance optimization features are implemented and available:
Enable caching: Redis caching is implemented and configurable via environment variables
Adjust batch size: Configurable via
INDEXING_BATCH_SIZEenvironment variableUse SSD storage: Improves indexing speed significantly
Limit file size: Configurable via
INDEXING_MAX_FILE_SIZEenvironment variableParallel processing: Multi-worker indexing configurable via
INDEXING_MAX_WORKERS
π Security
Local-first: All processing happens locally by default
Path validation: Prevents directory traversal attacks
Input sanitization: All queries are sanitized
Secret detection: Automatic redaction of detected secrets
Plugin isolation: Plugins run in restricted environments
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
Tree-sitter for language parsing
Jedi for Python analysis
FastAPI for the API framework
Voyage AI for embeddings
Anthropic for the MCP protocol
π¬ Contact
Issues: GitHub Issues
Discussions: GitHub Discussions