Code-Index-MCP

Code-Index-MCP
architecture

multi_repository_architecture.md•8.15 KiB

# Multi-Repository and Smart Plugin Loading Architecture ## Overview This document describes the architecture for enhancing MCP to support multiple use cases through intelligent plugin loading and multi-repository search capabilities. ## Use Cases ### Use Case 1: Single Repository Development **Scenario**: Developer using Claude Code to build/enhance a single repository. **Current Behavior**: - MCP loads all 47 plugins regardless of repository content - High memory usage (~1.4GB) and slow startup - Wasteful for repos with only 2-3 languages **Enhanced Behavior**: - Detect languages in repository index on startup - Load only required plugins (e.g., 7 plugins for Python/HTML/CSS repo) - 85% reduction in memory usage - Faster startup and search operations ### Use Case 2: Multi-Repository Reference **Scenario**: Developer refactoring code using multiple repositories as reference. **Current Behavior**: - MCP only searches current working directory repository - Other repos require manual Grep/Read tools - Loss of MCP benefits for reference repositories **Enhanced Behavior**: - Optional `repository` parameter in MCP tools - Dynamic loading of plugins for referenced repositories - Unified search across multiple repositories - Memory-aware caching of repository indexes ### Use Case 3: Comprehensive Analysis **Scenario**: Testing MCP capabilities across many languages and repositories. **Current Behavior**: - Must manually switch between repositories - No built-in support for cross-repository analysis - Benchmarking requires custom scripts **Enhanced Behavior**: - Analysis mode flag loads all plugins upfront - Support for querying multiple repositories - Comprehensive metrics collection - Higher memory limits for testing scenarios ## Architecture Components ### 1. Repository Language Detection ```python class RepositoryLanguageDetector: """Detects languages present in a repository index.""" def detect_languages(self, db_path: str) -> Set[str]: """Query index database for distinct languages.""" # Query files table for language distribution # Map file extensions to plugin languages # Return set of required plugin languages ``` ### 2. Multi-Repository Index Manager ```python class MultiRepoIndexManager: """Manages indexes across multiple repositories.""" def __init__(self, primary_repo_id: str, storage_path: str): self.primary_repo_id = primary_repo_id self.loaded_indexes = {} # repo_id -> SQLiteStore self.repo_languages = {} # repo_id -> Set[str] def get_index(self, repo_id: Optional[str] = None) -> SQLiteStore: """Get index for specified or primary repository.""" def search_across_repos(self, query: str, repo_ids: List[str]) -> List[SearchResult]: """Search multiple repositories in parallel.""" ``` ### 3. Memory-Aware Plugin Manager ```python class MemoryAwarePluginManager: """Manages plugins with memory constraints and LRU eviction.""" def __init__(self, max_memory_mb: int = 1024): self.plugins: OrderedDict[str, PluginInfo] = OrderedDict() self.max_memory_mb = max_memory_mb self.access_counts = defaultdict(int) self.last_access = {} async def get_plugin(self, language: str, timeout: float = 5.0) -> IPlugin: """Get plugin with memory management and timeout.""" def _evict_lru_plugins(self, target_free_mb: int = 100): """Evict least recently used plugins to free memory.""" ``` ### 4. Enhanced MCP Tools ```python # Updated tool schema with repository parameter types.Tool( name="search_code", inputSchema={ "properties": { "query": {"type": "string", "description": "Search query"}, "repository": { "type": "string", "description": "Repository ID or path. Defaults to current repo.", "optional": True }, "semantic": {"type": "boolean", "default": False}, "limit": {"type": "integer", "default": 20} } } ) ``` ## Repository Resolution ### Repository ID Generation 1. Primary method: SHA256 hash of git remote URL (first 12 chars) 2. Fallback: SHA256 hash of absolute path for non-git directories ### Index Storage Structure ``` .indexes/ ├── f7b49f5d0ae0/ # Current repo │ ├── current.db │ └── metadata.json ├── d72d7e1e17d2/ # Reference repo 1 │ ├── current.db │ └── metadata.json └── e3acd2328eea/ # Reference repo 2 ├── current.db └── metadata.json ``` ## Configuration Strategy ### Environment Variables ```bash # Plugin loading strategy MCP_PLUGIN_STRATEGY=auto|all|minimal - auto: Load based on detected languages (default) - all: Load all plugins (analysis mode) - minimal: Load only on-demand # Multi-repository support MCP_ENABLE_MULTI_REPO=true|false - Enable/disable multi-repo features MCP_REFERENCE_REPOS=repo1,repo2,repo3 - Pre-authorized repository IDs for cross-repo search # Memory management MCP_MAX_MEMORY_MB=1024 - Maximum memory for plugin manager MCP_MIN_FREE_MB=256 - Minimum free memory to maintain # Performance tuning MCP_PLUGIN_LOAD_TIMEOUT=5.0 - Timeout for loading individual plugins MCP_CACHE_HIGH_PRIORITY_LANGS=python,javascript,typescript - Languages to keep in memory ``` ### Language Priority System ```python LANGUAGE_PRIORITIES = { "python": 100, # Most common "javascript": 95, "typescript": 95, "java": 85, "go": 80, "rust": 75, "c": 70, "cpp": 70, # ... others get default priority of 50 } ``` ## Implementation Flow ### Startup Sequence 1. Determine repository ID from git remote or path 2. Load primary repository index 3. Detect languages in primary repository 4. Load plugins based on strategy: - auto: Load detected languages only - all: Load all 47 plugins - minimal: Load nothing, wait for requests 5. Initialize multi-repo manager if enabled 6. Pre-cache authorized reference repositories ### Query Processing 1. Parse tool request for repository parameter 2. If repository specified: - Resolve repository ID - Check authorization - Load repository index if not cached - Detect new languages - Load required plugins with memory check 3. Route query to appropriate repository index 4. Return results with repository context ### Memory Management Flow 1. Monitor process memory usage 2. When approaching limit: - Calculate eviction candidates - Score by: priority, frequency, recency - Evict lowest scoring plugins - Free plugin resources properly 3. Log eviction events for monitoring ## Performance Considerations ### Optimization Strategies 1. **Lazy Loading**: Plugins load only when needed 2. **Parallel Search**: Multi-repo queries execute in parallel 3. **Index Caching**: Recently used indexes stay in memory 4. **Query Caching**: Results cached per repository 5. **Batch Operations**: Group plugin loads together ### Expected Performance - Single repo startup: < 2 seconds (vs 10+ seconds) - Plugin load time: < 5 seconds per language - Multi-repo search: < 3 seconds for 3 repos - Memory usage: 200-400MB typical (vs 1.4GB) ## Backward Compatibility ### Preserved Behavior 1. Default mode works exactly as before 2. No repository parameter = current directory 3. All existing APIs remain unchanged 4. Configuration is opt-in via environment ### Migration Path 1. Existing installations work without changes 2. Enable features gradually via environment 3. Monitor performance improvements 4. Adjust configuration based on usage ## Security Considerations ### Repository Authorization - Only allow searching pre-authorized repos - Validate repository IDs before access - Log all cross-repository operations - Respect existing index access controls ### Resource Limits - Enforce memory limits strictly - Timeout long-running operations - Limit concurrent repository loads - Monitor for resource exhaustion ## Future Extensions ### Potential Enhancements 1. Repository groups for project sets 2. Distributed index synchronization 3. Cloud-based repository indexes 4. Automatic language detection from file content 5. Plugin performance profiling and optimization

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ViperJuice/Code-Index-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

multi_repository_architecture.md•8.15 KiB