Integrates with Git repositories to track commits and provide incremental reindexing, with support for post-merge hooks to automatically update code indices when repositories change
Works with GitHub repositories for code indexing and semantic search, as evidenced by the GitHub repository URL and repository management features
Provides multi-language code parsing and semantic search support for JavaScript codebases using Tree-sitter AST parsing
Offers comprehensive code parsing and semantic search capabilities for Python projects, including symbol definition finding and reference tracking
Supports Ruby code analysis and semantic search through Tree-sitter based parsing and intelligent code chunking
Enables semantic code search and symbol lookup capabilities for TypeScript projects with AST-based parsing and cross-repo dependency analysis
Demonstrated integration with Zendesk App Framework for code indexing, as shown in the performance benchmarks section
MCP Indexer
Semantic code search indexer for AI tools via the Model Context Protocol (MCP).
For AI Coding Agents
If you're an AI agent working on this project, please read AGENTS.MD first. It contains instructions for using Beads issue tracking to manage tasks systematically across sessions.
Overview
MCP Indexer provides intelligent code search capabilities to any MCP-compatible LLM (Claude, etc.). It indexes your repositories using semantic embeddings, enabling natural language code search, symbol lookups, and cross-repo dependency analysis.
Features
Semantic Search: Natural language queries find relevant code by meaning, not just keywords
Multi-Language Support: Python, JavaScript, TypeScript, Ruby, Go
Cross-Repo Analysis: Detect dependencies and suggest missing repos
Incremental Updates: Track git commits and reindex only when needed
MCP Integration: Works with any MCP-compatible LLM client
Stack Management: Persistent configuration for repo collections
Installation
Prerequisites
Python 3.8 or higher
pip
Steps
Clone the repository:
Install dependencies:
Set up environment variables:
Configure MCP integration (for Claude Code or other MCP clients):
Quick Start
1. Try the Demo
Run the demo to see mcpIndexer in action:
2. Index Your Repositories
3. Use with MCP Clients
Once configured in .mcp.json
, the MCP server automatically starts when you use an MCP client like Claude Code.
The MCP server exposes 12 tools:
Search Tools:
semantic_search
- Natural language code searchfind_definition
- Find where symbols are definedfind_references
- Find where symbols are usedfind_related_code
- Find architecturally related files
Repository Management:
add_repo_to_stack
- Add a new repositoryremove_repo
- Remove a repositorylist_repos
- List all indexed reposget_repo_stats
- Get detailed repo statisticsreindex_repo
- Force reindex a repository
Cross-Repo Analysis:
get_cross_repo_dependencies
- Find dependencies between repossuggest_missing_repos
- Suggest repos to add based on imports
Stack Management:
get_stack_status
- Get overall indexing status
CLI Commands
Check for Updates
Check which repos need reindexing:
Reindex Changed Repos
Automatically reindex repos with new commits:
Stack Status
View current stack status:
Install Git Hooks
Auto-reindex on git pull:
This installs a post-merge hook that triggers reindexing after pulls.
Usage Examples
Semantic Search
Find Symbol Definitions
Cross-Repo Dependencies
Configuration
Environment Variables
MCP_INDEXER_DB_PATH
- Database path (default:~/.mcpindexer/db
)PYTHONPATH
- Must include thesrc/
directory of your installation
Stack Configuration
Configuration is stored at ~/.mcpindexer/stack.json
:
Architecture
Components
Parser (
parser.py
) - Tree-sitter based multi-language AST parsingChunker (
chunker.py
) - Intelligent code chunking respecting AST boundariesEmbeddings (
embeddings.py
) - ChromaDB + sentence-transformers for semantic searchIndexer (
indexer.py
) - Orchestrates parsing → chunking → embedding → storageDependency Analyzer (
dependency_analyzer.py
) - Tracks imports and dependenciesStack Config (
stack_config.py
) - Persistent configuration managementMCP Server (
server.py
) - Exposes tools via Model Context ProtocolCLI (
cli.py
) - Command-line interface
Indexing Pipeline
Performance
Based on testing with real-world repos:
Speed: ~56 files/sec
Zendesk App Framework: 162 files, 302 chunks in 1.86s
3 Repos: 255 files, 595 chunks in 4.58s
Search Latency: ~100-200ms per query
Troubleshooting
Issue: "ModuleNotFoundError: No module named 'tree_sitter'"
Solution: Install dependencies
Issue: Slow indexing
Causes:
Large files with many symbols
Complex nested structures
First-time embedding generation
Solutions:
Use file filters to skip test/build directories
Increase chunk size target
Use GPU-accelerated embeddings (if available)
Issue: Poor search results
Causes:
Query too generic
Code not indexed
Wrong language filter
Solutions:
Use more specific queries ("JWT token validation" vs "auth")
Check
list_repos
to verify indexingTry without language filter
Increase
n_results
parameter
Issue: Out of memory
Causes:
Indexing too many repos at once
Very large monoliths
Solutions:
Index repos individually
Increase system memory
Use incremental indexing (git commit-based)
Issue: Git hooks not triggering
Causes:
Hook not executable
PYTHONPATH not set
Hook overwritten
Solutions:
Issue: Stale results after code changes
Solutions:
Example Queries
Finding Implementations
"password hashing"
"JWT token validation"
"database connection pool"
"API rate limiting"
Finding Patterns
"error handling"
"logging configuration"
"caching strategy"
"retry logic"
Finding Components
"user authentication"
"payment processing"
"email sending"
"file upload handling"
Architecture Understanding
"dependency injection setup"
"middleware configuration"
"router registration"
"database migration"
Testing
See the examples/
directory for more usage examples.
Contributing
The codebase is organized by component:
src/mcpindexer/
- Main source codetests/
- Test suite (130+ tests)test_*.py
- Integration test scripts
All components are independently tested with comprehensive coverage.
License
MIT License - see LICENSE file for details.
Support
For issues or questions, please open an issue on the repository.
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Enables semantic code search across multiple repositories using natural language queries. Provides intelligent code discovery, symbol lookups, and cross-repo dependency analysis for AI coding agents.