MCP_IMPROVEMENTS.mdā¢18.5 kB
# MCP Server Improvements - Multi-Codebase Support & Protocol Compliance
## Overview
This document details the improvements made to CodeGraph's MCP (Model Context Protocol) server implementation to support multi-codebase indexing with isolated storage and ensure protocol compliance.
## Problem Statement
### Before: Centralized Storage Issues
The original implementation had several critical issues:
1. **Hardcoded Storage Paths**: All indexes and databases were hardcoded to `.codegraph` in the current working directory
2. **Single Codebase Limitation**: No proper support for indexing multiple projects with separate isolated indexes
3. **Working Directory Dependency**: MCP server assumed it was always running from the project root
4. **Storage Conflicts**: Multiple projects couldn't maintain independent indexes
Example of the problem:
```bash
# User wants to index two projects
cd /home/user/project-a
codegraph index . # Creates ./.codegraph/
cd /home/user/project-b
codegraph index . # Creates ./.codegraph/ (good so far)
# But what if MCP server starts from a different directory?
cd /home/user
codegraph start stdio # Would look for ./.codegraph/ (wrong location!)
```
### Architecture Issues
**Hardcoded paths throughout the stack:**
- `indexer.rs`: `Path::new(".codegraph")`
- `server.rs`: `Path::new(".codegraph/faiss.index")`
- `official_server.rs`: `CodeGraph::new()` ā hardcoded `./.codegraph/db`
- `bin/codegraph.rs`: Clean command used `std::path::Path::new(".codegraph")`
## Solution: Project-Relative Storage
### Architecture Changes
#### 1. **IndexerConfig Enhancement**
Added `project_root` field to explicitly track which project is being indexed:
```rust
pub struct IndexerConfig {
pub languages: Vec<String>,
pub exclude_patterns: Vec<String>,
pub include_patterns: Vec<String>,
pub recursive: bool,
pub force_reindex: bool,
pub watch: bool,
pub workers: usize,
pub batch_size: usize,
pub vector_dimension: usize,
pub device: Option<String>,
pub max_seq_len: usize,
/// Root directory of the project being indexed (where .codegraph/ will be created)
/// Defaults to current directory if not specified
pub project_root: PathBuf, // NEW!
}
```
**Default behavior:**
```rust
impl Default for IndexerConfig {
fn default() -> Self {
Self {
// ... other fields ...
project_root: std::env::current_dir().unwrap_or_else(|_| PathBuf::from(".")),
}
}
}
```
#### 2. **ProjectIndexer Enhancement**
Added `project_root` field and updated initialization to use project-specific database paths:
```rust
pub struct ProjectIndexer {
config: IndexerConfig,
progress: MultiProgress,
parser: TreeSitterParser,
graph: Option<CodeGraph>,
vector_dim: usize,
project_root: PathBuf, // NEW!
#[cfg(feature = "embeddings")]
embedder: codegraph_vector::EmbeddingGenerator,
}
impl ProjectIndexer {
pub async fn new(config: IndexerConfig, multi_progress: MultiProgress) -> Result<Self> {
let parser = TreeSitterParser::new();
let project_root = config.project_root.clone();
// Use project-specific database path
let db_path = project_root.join(".codegraph/db");
let graph = CodeGraph::new_with_path(db_path.to_str().unwrap())?;
// ... rest of initialization ...
Ok(Self {
config,
progress: multi_progress,
parser,
graph: Some(graph),
vector_dim,
project_root, // Store for later use
#[cfg(feature = "embeddings")]
embedder,
})
}
}
```
#### 3. **Path Updates Throughout Indexer**
Updated all storage operations to use `self.project_root`:
**FAISS Index Creation:**
```rust
// BEFORE
let out_dir = Path::new(".codegraph");
// AFTER
let out_dir = self.project_root.join(".codegraph");
```
**Embeddings Storage:**
```rust
// BEFORE
let out_path = Path::new(".codegraph").join("embeddings.json");
// AFTER
let out_path = self.project_root.join(".codegraph/embeddings.json");
```
**Metadata Management:**
```rust
// BEFORE
let metadata_path = Path::new(".codegraph").join("index.json");
// AFTER
let metadata_path = self.project_root.join(".codegraph/index.json");
```
#### 4. **CLI Updates**
Updated `bin/codegraph.rs` to properly set `project_root` when indexing:
**Index Command:**
```rust
async fn handle_index(path: PathBuf, ...) -> Result<()> {
// ... setup code ...
let config = IndexerConfig {
languages: languages_list.clone(),
exclude_patterns: exclude,
include_patterns: include,
recursive,
force_reindex: force,
watch,
workers: optimized_workers,
batch_size: optimized_batch_size,
device,
max_seq_len,
project_root: path.clone().canonicalize().unwrap_or(path.clone()), // NEW!
..Default::default()
};
let mut indexer = ProjectIndexer::new(config, multi_progress.clone()).await?;
// ...
}
```
**Clean Command Fix:**
```rust
// BEFORE
if clean && std::path::Path::new(".codegraph").exists() {
let _ = std::fs::remove_dir_all(".codegraph");
}
// AFTER
let project_root = path.clone().canonicalize().unwrap_or(path.clone());
if clean {
let codegraph_dir = project_root.join(".codegraph");
if codegraph_dir.exists() {
let _ = std::fs::remove_dir_all(&codegraph_dir);
}
}
```
### Multi-Codebase Workflow
#### How It Works Now
**1. Index Multiple Projects (each with isolated storage):**
```bash
# Index Project A
cd /home/user/project-a
codegraph index .
# Creates: /home/user/project-a/.codegraph/
# āāā db/ (RocksDB for project A)
# āāā faiss.index (FAISS index for project A)
# āāā faiss_ids.json
# āāā shards/
# ā āāā lang/ (Language-based shards)
# ā āāā path/ (Path-based shards)
# āāā index.json (Metadata)
# Index Project B
cd /home/user/project-b
codegraph index .
# Creates: /home/user/project-b/.codegraph/
# āāā db/ (RocksDB for project B - isolated!)
# āāā faiss.index (FAISS index for project B - isolated!)
# āāā ... (independent index)
# Index from absolute path
codegraph index /path/to/project-c
# Creates: /path/to/project-c/.codegraph/
```
**2. Serve Projects (MCP server uses current directory's index):**
```bash
# Serve Project A
cd /home/user/project-a
codegraph start stdio
# Uses: /home/user/project-a/.codegraph/
# Serve Project B (in separate terminal/session)
cd /home/user/project-b
codegraph start stdio
# Uses: /home/user/project-b/.codegraph/
```
**3. Multiple MCP Servers (one per project):**
You can run multiple MCP servers simultaneously, each serving a different codebase:
```bash
# Terminal 1: Serve Project A
cd ~/project-a && codegraph start stdio
# Terminal 2: Serve Project B
cd ~/project-b && codegraph start stdio
# Terminal 3: Serve Project C
cd ~/project-c && codegraph start stdio
```
Configure multiple MCP servers in Claude Desktop:
```json
{
"mcpServers": {
"codegraph-project-a": {
"command": "codegraph",
"args": ["start", "stdio"],
"cwd": "/home/user/project-a"
},
"codegraph-project-b": {
"command": "codegraph",
"args": ["start", "stdio"],
"cwd": "/home/user/project-b"
}
}
}
```
### Directory Structure
Each project maintains its own isolated `.codegraph/` directory:
```
project-a/
āāā src/
āāā lib/
āāā .codegraph/ # Project A's index (isolated)
ā āāā db/ # RocksDB for Project A
ā āāā faiss.index # FAISS index for Project A
ā āāā faiss_ids.json # Node ID mappings for Project A
ā āāā shards/
ā ā āāā lang/
ā ā ā āāā rust.index
ā ā ā āāā rust_ids.json
ā ā ā āāā typescript.index
ā ā ā āāā typescript_ids.json
ā ā āāā path/
ā ā āāā src.index
ā ā āāā src_ids.json
ā ā āāā lib.index
ā ā āāā lib_ids.json
ā āāā embeddings.json # Embedding backup
ā āāā index.json # Project A metadata
ā āāā cache/ # Qwen response cache
āāā .gitignore
project-b/
āāā app/
āāā tests/
āāā .codegraph/ # Project B's index (completely isolated)
ā āāā db/ # Different RocksDB instance
ā āāā faiss.index # Different FAISS index
ā āāā shards/
ā ā āāā ... # Project B specific shards
ā āāā index.json # Project B metadata
āāā .gitignore
```
## MCP Protocol Compliance
### Official rmcp SDK Usage
CodeGraph uses the official `rmcp` Rust MCP SDK for protocol-compliant server implementation:
**Server Structure:**
```rust
use rmcp::{
handler::server::{router::tool::ToolRouter, wrapper::Parameters},
model::{CallToolResult, Content, ServerCapabilities, ServerInfo},
tool, tool_handler, tool_router, ErrorData as McpError, ServerHandler,
};
#[derive(Clone)]
pub struct CodeGraphMCPServer {
graph: Arc<tokio::sync::Mutex<codegraph_graph::CodeGraph>>,
counter: Arc<Mutex<i32>>,
#[cfg(feature = "qwen-integration")]
qwen_client: Arc<Mutex<Option<QwenClient>>>,
tool_router: ToolRouter<Self>, // Official MCP router
}
#[tool_router] // Official macro for tool registration
impl CodeGraphMCPServer {
// Tool implementations using #[tool(...)] macro
}
```
### Tool Definitions
All tools follow MCP protocol specifications:
**Example - Vector Search Tool:**
```rust
#[derive(Deserialize, JsonSchema)]
struct VectorSearchRequest {
/// Search query text for vector similarity matching
query: String,
/// Optional file paths to restrict search (e.g., ["src/", "lib/"])
#[serde(default)]
paths: Option<Vec<String>>,
/// Optional programming languages to filter (e.g., ["rust", "typescript"])
#[serde(default)]
langs: Option<Vec<String>>,
/// Maximum number of results to return
#[serde(default = "default_limit")]
limit: usize,
}
#[tool(description = "Search code using vector similarity with FAISS")]
async fn vector_search(
&self,
params: Parameters<VectorSearchRequest>,
) -> Result<CallToolResult, McpError> {
let request = params.parse()?;
// Implementation...
}
```
### Error Handling
Proper MCP error responses:
```rust
// Example error handling
if query.is_empty() {
return Err(McpError {
code: -32602, // Invalid params
message: "Query cannot be empty".to_string(),
data: None,
});
}
// Graph operation errors
let results = match self.perform_search(&query).await {
Ok(results) => results,
Err(e) => {
return Err(McpError {
code: -32603, // Internal error
message: format!("Search failed: {}", e),
data: None,
});
}
};
```
### Transport Support
**STDIO Transport (Primary):**
```rust
use rmcp::{transport::stdio, ServiceExt};
let server = CodeGraphMCPServer::new_with_graph().await?;
stdio().serve(server).await?;
```
**HTTP Transport (Extended in server.rs):**
```rust
// Custom implementation for HTTP endpoints
pub async fn serve_http(addr: &str) -> Result<()> {
// Axum-based HTTP server
}
```
## Performance Optimizations
### 1. Read-Only Mode Fallback
MCP server gracefully falls back to read-only mode for concurrent access:
```rust
let graph = match codegraph_graph::CodeGraph::new() {
Ok(graph) => graph,
Err(err) => {
eprintln!("ā ļø Primary CodeGraph open failed. Falling back to read-only mode.");
match codegraph_graph::CodeGraph::new_read_only() {
Ok(read_only_graph) => read_only_graph,
Err(ro_err) => panic!("Failed to initialize CodeGraph database"),
}
}
};
```
This allows:
- Multiple MCP servers to read the same index (read-only mode)
- One writer + multiple readers pattern
- No lock conflicts during concurrent queries
### 2. Sharded FAISS Indexes
FAISS indexes are sharded by language and path for faster searches:
```rust
// Language shards
.codegraph/shards/lang/
āāā rust.index # Only Rust code
āāā typescript.index # Only TypeScript code
āāā python.index # Only Python code
// Path shards
.codegraph/shards/path/
āāā src.index # Only src/ directory
āāā lib.index # Only lib/ directory
āāā tests.index # Only tests/ directory
```
**Search optimization:**
```rust
// If user specifies langs=["rust"], only search rust.index
// If user specifies paths=["src/"], only search src.index
// Otherwise search all shards + main index
```
### 3. Batch Embedding Processing
Already implemented in Phase 1 of improvements:
```rust
// Batched embedding generation (10-50x faster)
let batch_texts: Vec<String> = batch.iter().map(|s| s.to_string()).collect();
match embedder.embed_texts_batched(&batch_texts).await {
Ok(batch_embeddings) => {
for (symbol, embedding) in batch.iter().zip(batch_embeddings.into_iter()) {
embeddings.insert(symbol.to_string(), embedding);
}
}
// Fallback to one-by-one if batch fails
}
```
## Testing Multi-Codebase Support
### Test Scenario 1: Index Multiple Projects
```bash
# Create test projects
mkdir -p /tmp/test-project-a /tmp/test-project-b
# Index Project A
cd /tmp/test-project-a
echo "fn main() { println!(\"Project A\"); }" > main.rs
codegraph index .
# Verify Project A index exists
ls -la .codegraph/
# Should see: db/, faiss.index, faiss_ids.json, shards/, index.json
# Index Project B
cd /tmp/test-project-b
echo "fn main() { println!(\"Project B\"); }" > main.rs
codegraph index .
# Verify Project B index exists independently
ls -la .codegraph/
# Should see separate index for Project B
```
### Test Scenario 2: Serve Multiple Projects
```bash
# Terminal 1: Serve Project A
cd /tmp/test-project-a
codegraph start stdio
# MCP server uses /tmp/test-project-a/.codegraph/
# Terminal 2: Serve Project B
cd /tmp/test-project-b
codegraph start stdio
# MCP server uses /tmp/test-project-b/.codegraph/ (isolated!)
```
### Test Scenario 3: Search Different Projects
```bash
# Search Project A
cd /tmp/test-project-a
codegraph search "main function" --limit 5
# Results from Project A's index
# Search Project B
cd /tmp/test-project-b
codegraph search "main function" --limit 5
# Results from Project B's index (different results!)
```
## Migration Guide
### For Existing Users
If you have existing indexes from the previous version:
**Option 1: Re-index (recommended)**
```bash
cd /path/to/your/project
codegraph index . --force
```
**Option 2: Move existing index**
The index should already be in the correct location (`./.codegraph/`) if you indexed from the project root. No action needed!
### For New Users
Just index your project - everything works automatically:
```bash
cd /path/to/your/project
codegraph index .
codegraph start stdio
```
## Benefits Summary
### ā
Multi-Codebase Support
- Each project has isolated `.codegraph/` storage
- No cross-contamination between projects
- Clear project boundaries
### ā
Protocol Compliance
- Uses official `rmcp` SDK
- Proper tool parameter schemas with `JsonSchema`
- Standard MCP error codes and responses
- STDIO transport (official MCP protocol)
### ā
Performance
- Batch embedding processing (10-50x faster)
- Sharded FAISS indexes for targeted searches
- Read-only mode for concurrent access
- No RocksDB lock conflicts
### ā
Scalability
- Run multiple MCP servers (one per project)
- Each server serves independent codebase
- Claude Desktop can connect to multiple CodeGraph instances
### ā
Developer Experience
- Simple workflow: `cd project && codegraph index .`
- Automatic project-relative storage
- Clear directory structure
- Easy to understand and debug
## Future Enhancements
### Potential Improvements
1. **Workspace Support**: Federate multiple projects under a workspace
```bash
codegraph workspace create my-workspace
codegraph workspace add-project ./project-a
codegraph workspace add-project ./project-b
codegraph serve-workspace my-workspace
```
2. **Project Path Override**: Add `--project-path` flag to MCP server
```bash
# Serve a different project from current directory
codegraph start stdio --project-path /path/to/project
```
3. **Cross-Project Search**: Search across multiple indexed projects
```bash
codegraph search "authentication" --projects project-a,project-b,project-c
```
4. **Remote Index Support**: Support remote/shared indexes via network storage
```bash
codegraph index . --remote s3://bucket/project-indexes/project-a
```
## Files Modified
### Core Indexer
- `crates/codegraph-mcp/src/indexer.rs`
- Added `project_root: PathBuf` to `IndexerConfig`
- Added `project_root: PathBuf` to `ProjectIndexer`
- Updated `ProjectIndexer::new()` to use `CodeGraph::new_with_path()`
- Updated all `.codegraph` paths to use `self.project_root`
- Fixed `is_indexed()`, `save_index_metadata()` to use project-relative paths
### CLI
- `crates/codegraph-mcp/src/bin/codegraph.rs`
- Updated `handle_index()` to set `project_root` from `path` parameter
- Fixed clean command to use project-relative `.codegraph` directory
- Updated benchmark command to use project-relative paths
### MCP Server
- `crates/codegraph-mcp/src/official_server.rs`
- Already uses current directory's `.codegraph/` (correct behavior)
- Graceful fallback to read-only mode for concurrent access
- `crates/codegraph-mcp/src/server.rs`
- Uses current directory's `.codegraph/` (correct for project-specific serving)
- Loads FAISS indexes from current directory's shards
## Conclusion
These improvements transform CodeGraph from a single-codebase tool into a scalable multi-project system with proper isolation and protocol compliance. Each project maintains its own independent index, and multiple MCP servers can run simultaneously to serve different codebases to AI assistants like Claude.
The architecture is clean, intuitive, and follows the principle of least surprise: indexes live with the code they represent.