Skip to main content
Glama

CodeGraph CLI MCP Server

by Jakedismo
MCP_IMPROVEMENTS.md•18.5 kB
# MCP Server Improvements - Multi-Codebase Support & Protocol Compliance ## Overview This document details the improvements made to CodeGraph's MCP (Model Context Protocol) server implementation to support multi-codebase indexing with isolated storage and ensure protocol compliance. ## Problem Statement ### Before: Centralized Storage Issues The original implementation had several critical issues: 1. **Hardcoded Storage Paths**: All indexes and databases were hardcoded to `.codegraph` in the current working directory 2. **Single Codebase Limitation**: No proper support for indexing multiple projects with separate isolated indexes 3. **Working Directory Dependency**: MCP server assumed it was always running from the project root 4. **Storage Conflicts**: Multiple projects couldn't maintain independent indexes Example of the problem: ```bash # User wants to index two projects cd /home/user/project-a codegraph index . # Creates ./.codegraph/ cd /home/user/project-b codegraph index . # Creates ./.codegraph/ (good so far) # But what if MCP server starts from a different directory? cd /home/user codegraph start stdio # Would look for ./.codegraph/ (wrong location!) ``` ### Architecture Issues **Hardcoded paths throughout the stack:** - `indexer.rs`: `Path::new(".codegraph")` - `server.rs`: `Path::new(".codegraph/faiss.index")` - `official_server.rs`: `CodeGraph::new()` → hardcoded `./.codegraph/db` - `bin/codegraph.rs`: Clean command used `std::path::Path::new(".codegraph")` ## Solution: Project-Relative Storage ### Architecture Changes #### 1. **IndexerConfig Enhancement** Added `project_root` field to explicitly track which project is being indexed: ```rust pub struct IndexerConfig { pub languages: Vec<String>, pub exclude_patterns: Vec<String>, pub include_patterns: Vec<String>, pub recursive: bool, pub force_reindex: bool, pub watch: bool, pub workers: usize, pub batch_size: usize, pub vector_dimension: usize, pub device: Option<String>, pub max_seq_len: usize, /// Root directory of the project being indexed (where .codegraph/ will be created) /// Defaults to current directory if not specified pub project_root: PathBuf, // NEW! } ``` **Default behavior:** ```rust impl Default for IndexerConfig { fn default() -> Self { Self { // ... other fields ... project_root: std::env::current_dir().unwrap_or_else(|_| PathBuf::from(".")), } } } ``` #### 2. **ProjectIndexer Enhancement** Added `project_root` field and updated initialization to use project-specific database paths: ```rust pub struct ProjectIndexer { config: IndexerConfig, progress: MultiProgress, parser: TreeSitterParser, graph: Option<CodeGraph>, vector_dim: usize, project_root: PathBuf, // NEW! #[cfg(feature = "embeddings")] embedder: codegraph_vector::EmbeddingGenerator, } impl ProjectIndexer { pub async fn new(config: IndexerConfig, multi_progress: MultiProgress) -> Result<Self> { let parser = TreeSitterParser::new(); let project_root = config.project_root.clone(); // Use project-specific database path let db_path = project_root.join(".codegraph/db"); let graph = CodeGraph::new_with_path(db_path.to_str().unwrap())?; // ... rest of initialization ... Ok(Self { config, progress: multi_progress, parser, graph: Some(graph), vector_dim, project_root, // Store for later use #[cfg(feature = "embeddings")] embedder, }) } } ``` #### 3. **Path Updates Throughout Indexer** Updated all storage operations to use `self.project_root`: **FAISS Index Creation:** ```rust // BEFORE let out_dir = Path::new(".codegraph"); // AFTER let out_dir = self.project_root.join(".codegraph"); ``` **Embeddings Storage:** ```rust // BEFORE let out_path = Path::new(".codegraph").join("embeddings.json"); // AFTER let out_path = self.project_root.join(".codegraph/embeddings.json"); ``` **Metadata Management:** ```rust // BEFORE let metadata_path = Path::new(".codegraph").join("index.json"); // AFTER let metadata_path = self.project_root.join(".codegraph/index.json"); ``` #### 4. **CLI Updates** Updated `bin/codegraph.rs` to properly set `project_root` when indexing: **Index Command:** ```rust async fn handle_index(path: PathBuf, ...) -> Result<()> { // ... setup code ... let config = IndexerConfig { languages: languages_list.clone(), exclude_patterns: exclude, include_patterns: include, recursive, force_reindex: force, watch, workers: optimized_workers, batch_size: optimized_batch_size, device, max_seq_len, project_root: path.clone().canonicalize().unwrap_or(path.clone()), // NEW! ..Default::default() }; let mut indexer = ProjectIndexer::new(config, multi_progress.clone()).await?; // ... } ``` **Clean Command Fix:** ```rust // BEFORE if clean && std::path::Path::new(".codegraph").exists() { let _ = std::fs::remove_dir_all(".codegraph"); } // AFTER let project_root = path.clone().canonicalize().unwrap_or(path.clone()); if clean { let codegraph_dir = project_root.join(".codegraph"); if codegraph_dir.exists() { let _ = std::fs::remove_dir_all(&codegraph_dir); } } ``` ### Multi-Codebase Workflow #### How It Works Now **1. Index Multiple Projects (each with isolated storage):** ```bash # Index Project A cd /home/user/project-a codegraph index . # Creates: /home/user/project-a/.codegraph/ # ā”œā”€ā”€ db/ (RocksDB for project A) # ā”œā”€ā”€ faiss.index (FAISS index for project A) # ā”œā”€ā”€ faiss_ids.json # ā”œā”€ā”€ shards/ # │ ā”œā”€ā”€ lang/ (Language-based shards) # │ └── path/ (Path-based shards) # └── index.json (Metadata) # Index Project B cd /home/user/project-b codegraph index . # Creates: /home/user/project-b/.codegraph/ # ā”œā”€ā”€ db/ (RocksDB for project B - isolated!) # ā”œā”€ā”€ faiss.index (FAISS index for project B - isolated!) # └── ... (independent index) # Index from absolute path codegraph index /path/to/project-c # Creates: /path/to/project-c/.codegraph/ ``` **2. Serve Projects (MCP server uses current directory's index):** ```bash # Serve Project A cd /home/user/project-a codegraph start stdio # Uses: /home/user/project-a/.codegraph/ # Serve Project B (in separate terminal/session) cd /home/user/project-b codegraph start stdio # Uses: /home/user/project-b/.codegraph/ ``` **3. Multiple MCP Servers (one per project):** You can run multiple MCP servers simultaneously, each serving a different codebase: ```bash # Terminal 1: Serve Project A cd ~/project-a && codegraph start stdio # Terminal 2: Serve Project B cd ~/project-b && codegraph start stdio # Terminal 3: Serve Project C cd ~/project-c && codegraph start stdio ``` Configure multiple MCP servers in Claude Desktop: ```json { "mcpServers": { "codegraph-project-a": { "command": "codegraph", "args": ["start", "stdio"], "cwd": "/home/user/project-a" }, "codegraph-project-b": { "command": "codegraph", "args": ["start", "stdio"], "cwd": "/home/user/project-b" } } } ``` ### Directory Structure Each project maintains its own isolated `.codegraph/` directory: ``` project-a/ ā”œā”€ā”€ src/ ā”œā”€ā”€ lib/ ā”œā”€ā”€ .codegraph/ # Project A's index (isolated) │ ā”œā”€ā”€ db/ # RocksDB for Project A │ ā”œā”€ā”€ faiss.index # FAISS index for Project A │ ā”œā”€ā”€ faiss_ids.json # Node ID mappings for Project A │ ā”œā”€ā”€ shards/ │ │ ā”œā”€ā”€ lang/ │ │ │ ā”œā”€ā”€ rust.index │ │ │ ā”œā”€ā”€ rust_ids.json │ │ │ ā”œā”€ā”€ typescript.index │ │ │ └── typescript_ids.json │ │ └── path/ │ │ ā”œā”€ā”€ src.index │ │ ā”œā”€ā”€ src_ids.json │ │ ā”œā”€ā”€ lib.index │ │ └── lib_ids.json │ ā”œā”€ā”€ embeddings.json # Embedding backup │ ā”œā”€ā”€ index.json # Project A metadata │ └── cache/ # Qwen response cache └── .gitignore project-b/ ā”œā”€ā”€ app/ ā”œā”€ā”€ tests/ ā”œā”€ā”€ .codegraph/ # Project B's index (completely isolated) │ ā”œā”€ā”€ db/ # Different RocksDB instance │ ā”œā”€ā”€ faiss.index # Different FAISS index │ ā”œā”€ā”€ shards/ │ │ └── ... # Project B specific shards │ └── index.json # Project B metadata └── .gitignore ``` ## MCP Protocol Compliance ### Official rmcp SDK Usage CodeGraph uses the official `rmcp` Rust MCP SDK for protocol-compliant server implementation: **Server Structure:** ```rust use rmcp::{ handler::server::{router::tool::ToolRouter, wrapper::Parameters}, model::{CallToolResult, Content, ServerCapabilities, ServerInfo}, tool, tool_handler, tool_router, ErrorData as McpError, ServerHandler, }; #[derive(Clone)] pub struct CodeGraphMCPServer { graph: Arc<tokio::sync::Mutex<codegraph_graph::CodeGraph>>, counter: Arc<Mutex<i32>>, #[cfg(feature = "qwen-integration")] qwen_client: Arc<Mutex<Option<QwenClient>>>, tool_router: ToolRouter<Self>, // Official MCP router } #[tool_router] // Official macro for tool registration impl CodeGraphMCPServer { // Tool implementations using #[tool(...)] macro } ``` ### Tool Definitions All tools follow MCP protocol specifications: **Example - Vector Search Tool:** ```rust #[derive(Deserialize, JsonSchema)] struct VectorSearchRequest { /// Search query text for vector similarity matching query: String, /// Optional file paths to restrict search (e.g., ["src/", "lib/"]) #[serde(default)] paths: Option<Vec<String>>, /// Optional programming languages to filter (e.g., ["rust", "typescript"]) #[serde(default)] langs: Option<Vec<String>>, /// Maximum number of results to return #[serde(default = "default_limit")] limit: usize, } #[tool(description = "Search code using vector similarity with FAISS")] async fn vector_search( &self, params: Parameters<VectorSearchRequest>, ) -> Result<CallToolResult, McpError> { let request = params.parse()?; // Implementation... } ``` ### Error Handling Proper MCP error responses: ```rust // Example error handling if query.is_empty() { return Err(McpError { code: -32602, // Invalid params message: "Query cannot be empty".to_string(), data: None, }); } // Graph operation errors let results = match self.perform_search(&query).await { Ok(results) => results, Err(e) => { return Err(McpError { code: -32603, // Internal error message: format!("Search failed: {}", e), data: None, }); } }; ``` ### Transport Support **STDIO Transport (Primary):** ```rust use rmcp::{transport::stdio, ServiceExt}; let server = CodeGraphMCPServer::new_with_graph().await?; stdio().serve(server).await?; ``` **HTTP Transport (Extended in server.rs):** ```rust // Custom implementation for HTTP endpoints pub async fn serve_http(addr: &str) -> Result<()> { // Axum-based HTTP server } ``` ## Performance Optimizations ### 1. Read-Only Mode Fallback MCP server gracefully falls back to read-only mode for concurrent access: ```rust let graph = match codegraph_graph::CodeGraph::new() { Ok(graph) => graph, Err(err) => { eprintln!("āš ļø Primary CodeGraph open failed. Falling back to read-only mode."); match codegraph_graph::CodeGraph::new_read_only() { Ok(read_only_graph) => read_only_graph, Err(ro_err) => panic!("Failed to initialize CodeGraph database"), } } }; ``` This allows: - Multiple MCP servers to read the same index (read-only mode) - One writer + multiple readers pattern - No lock conflicts during concurrent queries ### 2. Sharded FAISS Indexes FAISS indexes are sharded by language and path for faster searches: ```rust // Language shards .codegraph/shards/lang/ ā”œā”€ā”€ rust.index # Only Rust code ā”œā”€ā”€ typescript.index # Only TypeScript code └── python.index # Only Python code // Path shards .codegraph/shards/path/ ā”œā”€ā”€ src.index # Only src/ directory ā”œā”€ā”€ lib.index # Only lib/ directory └── tests.index # Only tests/ directory ``` **Search optimization:** ```rust // If user specifies langs=["rust"], only search rust.index // If user specifies paths=["src/"], only search src.index // Otherwise search all shards + main index ``` ### 3. Batch Embedding Processing Already implemented in Phase 1 of improvements: ```rust // Batched embedding generation (10-50x faster) let batch_texts: Vec<String> = batch.iter().map(|s| s.to_string()).collect(); match embedder.embed_texts_batched(&batch_texts).await { Ok(batch_embeddings) => { for (symbol, embedding) in batch.iter().zip(batch_embeddings.into_iter()) { embeddings.insert(symbol.to_string(), embedding); } } // Fallback to one-by-one if batch fails } ``` ## Testing Multi-Codebase Support ### Test Scenario 1: Index Multiple Projects ```bash # Create test projects mkdir -p /tmp/test-project-a /tmp/test-project-b # Index Project A cd /tmp/test-project-a echo "fn main() { println!(\"Project A\"); }" > main.rs codegraph index . # Verify Project A index exists ls -la .codegraph/ # Should see: db/, faiss.index, faiss_ids.json, shards/, index.json # Index Project B cd /tmp/test-project-b echo "fn main() { println!(\"Project B\"); }" > main.rs codegraph index . # Verify Project B index exists independently ls -la .codegraph/ # Should see separate index for Project B ``` ### Test Scenario 2: Serve Multiple Projects ```bash # Terminal 1: Serve Project A cd /tmp/test-project-a codegraph start stdio # MCP server uses /tmp/test-project-a/.codegraph/ # Terminal 2: Serve Project B cd /tmp/test-project-b codegraph start stdio # MCP server uses /tmp/test-project-b/.codegraph/ (isolated!) ``` ### Test Scenario 3: Search Different Projects ```bash # Search Project A cd /tmp/test-project-a codegraph search "main function" --limit 5 # Results from Project A's index # Search Project B cd /tmp/test-project-b codegraph search "main function" --limit 5 # Results from Project B's index (different results!) ``` ## Migration Guide ### For Existing Users If you have existing indexes from the previous version: **Option 1: Re-index (recommended)** ```bash cd /path/to/your/project codegraph index . --force ``` **Option 2: Move existing index** The index should already be in the correct location (`./.codegraph/`) if you indexed from the project root. No action needed! ### For New Users Just index your project - everything works automatically: ```bash cd /path/to/your/project codegraph index . codegraph start stdio ``` ## Benefits Summary ### āœ… Multi-Codebase Support - Each project has isolated `.codegraph/` storage - No cross-contamination between projects - Clear project boundaries ### āœ… Protocol Compliance - Uses official `rmcp` SDK - Proper tool parameter schemas with `JsonSchema` - Standard MCP error codes and responses - STDIO transport (official MCP protocol) ### āœ… Performance - Batch embedding processing (10-50x faster) - Sharded FAISS indexes for targeted searches - Read-only mode for concurrent access - No RocksDB lock conflicts ### āœ… Scalability - Run multiple MCP servers (one per project) - Each server serves independent codebase - Claude Desktop can connect to multiple CodeGraph instances ### āœ… Developer Experience - Simple workflow: `cd project && codegraph index .` - Automatic project-relative storage - Clear directory structure - Easy to understand and debug ## Future Enhancements ### Potential Improvements 1. **Workspace Support**: Federate multiple projects under a workspace ```bash codegraph workspace create my-workspace codegraph workspace add-project ./project-a codegraph workspace add-project ./project-b codegraph serve-workspace my-workspace ``` 2. **Project Path Override**: Add `--project-path` flag to MCP server ```bash # Serve a different project from current directory codegraph start stdio --project-path /path/to/project ``` 3. **Cross-Project Search**: Search across multiple indexed projects ```bash codegraph search "authentication" --projects project-a,project-b,project-c ``` 4. **Remote Index Support**: Support remote/shared indexes via network storage ```bash codegraph index . --remote s3://bucket/project-indexes/project-a ``` ## Files Modified ### Core Indexer - `crates/codegraph-mcp/src/indexer.rs` - Added `project_root: PathBuf` to `IndexerConfig` - Added `project_root: PathBuf` to `ProjectIndexer` - Updated `ProjectIndexer::new()` to use `CodeGraph::new_with_path()` - Updated all `.codegraph` paths to use `self.project_root` - Fixed `is_indexed()`, `save_index_metadata()` to use project-relative paths ### CLI - `crates/codegraph-mcp/src/bin/codegraph.rs` - Updated `handle_index()` to set `project_root` from `path` parameter - Fixed clean command to use project-relative `.codegraph` directory - Updated benchmark command to use project-relative paths ### MCP Server - `crates/codegraph-mcp/src/official_server.rs` - Already uses current directory's `.codegraph/` (correct behavior) - Graceful fallback to read-only mode for concurrent access - `crates/codegraph-mcp/src/server.rs` - Uses current directory's `.codegraph/` (correct for project-specific serving) - Loads FAISS indexes from current directory's shards ## Conclusion These improvements transform CodeGraph from a single-codebase tool into a scalable multi-project system with proper isolation and protocol compliance. Each project maintains its own independent index, and multiple MCP servers can run simultaneously to serve different codebases to AI assistants like Claude. The architecture is clean, intuitive, and follows the principle of least surprise: indexes live with the code they represent.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jakedismo/codegraph-rust'

If you have feedback or need assistance with the MCP directory API, please join our Discord server