# SACL MCP Server
**Semantic-Augmented Reranking and Localization for Code Retrieval**
A Model Context Protocol (MCP) server that implements the SACL research framework to provide bias-aware code retrieval for AI coding assistants like Claude Code, Cursor, and other MCP-enabled tools.
## ๐ฏ Overview
SACL addresses the critical problem of **textual bias** in code retrieval systems. Traditional systems over-rely on surface-level features like docstrings, comments, and variable names, leading to biased results that favor well-documented code regardless of functional relevance.
### Key Features
- **๐ง Bias Detection**: Identifies over-reliance on textual features
- **๐ Semantic Augmentation**: Enriches code understanding beyond surface text
- **๐ Intelligent Reranking**: Prioritizes functional relevance over documentation
- **๐ฏ Code Localization**: Pinpoints functionally relevant code segments
- **๐ Relationship Analysis**: Maps code dependencies and relationships
- **๐จ Context-Aware Retrieval**: Returns results with related components
- **๐ Agent-Controlled Updates**: Explicit file updates for Docker compatibility
- **๐๏ธ Knowledge Graph**: Persistent semantic storage with Graphiti/Neo4j
- **๐ง MCP Integration**: Works with Claude Code, Cursor, and other AI tools
## ๐๏ธ Architecture
```
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ AI Assistant โโโโโโ SACL MCP Server โโโโโโ Graphiti/Neo4j โ
โ (Claude, Cursor)โ โ โ โ Knowledge Graph โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโ
โ SACL Framework โ
โ โ
โ โข Bias Detectionโ
โ โข Semantic Aug. โ
โ โข Reranking โ
โ โข Localization โ
โ โข Relationships โ
โ โข Context-Aware โ
โโโโโโโโโโโโโโโโโโโ
```
## ๐ Quick Start
### Prerequisites
- Node.js 18+
- Neo4j database
- OpenAI API key
### Installation
```bash
# Clone the repository
git clone <repository-url>
cd sacl
# Install dependencies
npm install
# Copy environment configuration
cp .env.example .env
# Edit .env with your settings
OPENAI_API_KEY=your_key_here
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
```
### Using Docker (Recommended)
```bash
# Start Neo4j and SACL server
docker-compose up -d
# Check logs
docker-compose logs -f sacl-mcp-server
```
### Manual Setup
```bash
# Build the project
npm run build
# Start the server
npm start
```
## ๐ง Configuration
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI API key (required) | - |
| `SACL_REPO_PATH` | Repository to analyze | Current directory |
| `SACL_NAMESPACE` | Unique namespace | Auto-generated |
| `SACL_LLM_MODEL` | LLM model for analysis | `gpt-4` |
| `SACL_EMBEDDING_MODEL` | Embedding model | `text-embedding-3-small` |
| `SACL_BIAS_THRESHOLD` | Bias detection sensitivity (0-1) | `0.5` |
| `SACL_MAX_RESULTS` | Maximum search results | `10` |
| `SACL_CACHE_ENABLED` | Enable embedding cache | `true` |
| `NEO4J_URI` | Neo4j connection URI | `bolt://localhost:7687` |
| `NEO4J_USER` | Neo4j username | `neo4j` |
| `NEO4J_PASSWORD` | Neo4j password | `password` |
## ๐ฎ Usage
### MCP Tools
The SACL server provides comprehensive MCP tools for bias-aware code analysis:
#### 1. `analyze_repository`
Performs full SACL analysis of a repository:
```json
{
"repositoryPath": "/path/to/repo",
"incremental": false
}
```
#### 2. `query_code`
Bias-aware code search with optional context:
```json
{
"query": "function that sorts arrays efficiently",
"repositoryPath": "/path/to/repo",
"maxResults": 10,
"includeContext": false // Set true for relationship context
}
```
#### 3. `query_code_with_context` ๐
Enhanced search with relationship context and related components:
```json
{
"query": "authentication middleware",
"repositoryPath": "/path/to/repo",
"maxResults": 10,
"includeRelated": true
}
```
#### 4. `update_file` ๐
Explicitly update single file analysis when changes are made:
```json
{
"filePath": "src/services/auth.js",
"changeType": "modified" // "created", "modified", or "deleted"
}
```
#### 5. `update_files` ๐
Batch update multiple files:
```json
{
"files": [
{ "filePath": "src/index.js", "changeType": "modified" },
{ "filePath": "src/utils/new.js", "changeType": "created" }
]
}
```
#### 6. `get_relationships` ๐
Analyze code relationships and dependencies:
```json
{
"filePath": "src/controllers/UserController.js",
"maxDepth": 3,
"relationshipTypes": ["imports", "calls", "extends"] // Optional filter
}
```
#### 7. `get_file_context` ๐
Get comprehensive context for a file:
```json
{
"filePath": "src/models/User.js",
"includeSnippets": true // Include code previews
}
```
#### 8. `get_bias_analysis`
Detailed bias metrics and debugging:
```json
{
"filePath": "src/utils/sort.js" // Optional
}
```
#### 9. `get_system_stats`
System performance and statistics:
```json
{}
```
### MCP Client Configuration
#### Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"sacl": {
"command": "node",
"args": ["/path/to/sacl/dist/index.js"],
"env": {
"OPENAI_API_KEY": "your-key",
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USER": "neo4j",
"NEO4J_PASSWORD": "password"
}
}
}
}
```
#### Cursor IDE
Configure in your Cursor settings to connect to the SACL MCP server.
## ๐ SACL Framework
### Stage 1: Bias Detection
Identifies three types of textual bias:
- **Docstring Dependency**: Over-reliance on documentation
- **Identifier Name Bias**: Focusing on variable/function names
- **Comment Over-reliance**: Prioritizing commented code
### Stage 2: Semantic Augmentation
Enriches code representations with:
- **Functional Signatures**: What the code actually does
- **Behavior Patterns**: Computational patterns (iteration, recursion, etc.)
- **Structural Features**: Complexity metrics, AST analysis
- **Augmented Embeddings**: Bias-adjusted semantic vectors
### Stage 3: Reranking & Localization
- **Bias-Aware Ranking**: Reduces textual weight based on bias score
- **Code Localization**: Identifies functionally relevant segments
- **Semantic Similarity**: Uses augmented embeddings
- **Functional Relevance**: Considers computational patterns
### Stage 4: Relationship Analysis ๐
Maps code relationships and dependencies:
- **Import/Export Analysis**: Module dependencies and exports
- **Function Call Mapping**: Call graphs and method invocations
- **Class Inheritance**: Extends/implements relationships
- **Dependency Tracking**: External and internal dependencies
- **Context-Aware Results**: Related components with each query result
## ๐งช Example Workflow
1. **Repository Analysis**:
```
AI Assistant โ analyze_repository โ SACL processes all files โ Knowledge graph populated
```
2. **Code Query with Context**:
```
AI Assistant โ query_code_with_context("authentication") โ SACL retrieval โ Context-aware results
```
3. **File Updates**:
```
AI modifies code โ update_file("src/auth.js", "modified") โ SACL re-analyzes โ Relationships updated
```
4. **Relationship Exploration**:
```
AI Assistant โ get_relationships("UserController.js") โ Dependency graph โ Related components
```
5. **Results Include**:
- Original textual similarity score
- Semantic similarity score
- Bias-adjusted final score
- Localized code regions
- Related components and dependencies
- Context explanation with relationship importance
- Explanation of ranking decisions
## ๐ Performance
Based on SACL research benchmarks:
- **12.8%** improvement in Recall@1 on HumanEval
- **9.4%** improvement on MBPP
- **7.0%** improvement on SWE-Bench-Lite
- **P95 latency**: <300ms for retrieval operations
## ๐ Bias Analysis Example
```
๐ง SACL Bias Analysis
File: src/algorithms/quicksort.js
Bias Metrics:
โข Overall Bias Score: 73.2% ๐ด
โข Semantic Pattern: Recursive divide-and-conquer sorting
โข Functional Signature: Array input โ sorted array output
Bias Indicators:
โข docstring_dependency: High docstring dependency (15.3% of code)
โข identifier_name_bias: High reliance on descriptive names
โข comment_over_reliance: Excessive comments (18.7% of code)
๐ก Improvement Suggestions:
โข Reduce reliance on variable naming for semantic understanding
โข Focus on structural patterns over comments
โข Improve functional signature extraction
```
## ๐ ๏ธ Development
### Project Structure
```
src/
โโโ core/ # SACL framework implementation
โ โโโ BiasDetector.ts # Textual bias detection
โ โโโ SemanticAugmenter.ts # Semantic enhancement
โ โโโ SACLReranker.ts # Reranking and localization with context
โ โโโ SACLProcessor.ts # Main orchestrator with relationship support
โโโ mcp/ # MCP server implementation
โ โโโ SACLMCPServer.ts # MCP protocol handlers (9 tools)
โโโ graphiti/ # Knowledge graph integration
โ โโโ GraphitiClient.ts # Graphiti/Neo4j interface with relationships
โโโ utils/ # Utility modules
โ โโโ CodeAnalyzer.ts # AST analysis and relationship extraction
โโโ types/ # TypeScript type definitions
โ โโโ index.ts # Core types and interfaces
โ โโโ relationships.ts # Relationship type definitions
โโโ index.ts # Application entry point
```
### Building
```bash
npm run build # Build TypeScript
npm run dev # Development with auto-reload
npm run lint # Code linting
npm run format # Code formatting
npm test # Run tests
```
### Contributing
1. Fork the repository
2. Create a feature branch
3. Implement changes following SACL methodology
4. Add tests for new functionality
5. Submit a pull request
## ๐ Research Background
This implementation is based on the research paper:
**"SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization"**
- Authors: Dhruv Gupta, Gayathri Ganesh Lakshmy, Yiqing Xie
- arXiv: 2506.20081v2
### Key Research Contributions
1. **Systematic Bias Detection**: Identifies textual bias through feature masking
2. **Semantic Augmentation**: Enhances code understanding beyond text
3. **Bias-Aware Ranking**: Reduces surface-level feature dependency
4. **Localization**: Pinpoints functionally relevant code regions
## ๐ Integration
### Supported AI Tools
- **Claude Code**: Direct MCP integration
- **Cursor**: MCP server connection
- **VS Code Extensions**: Via MCP protocol
- **Custom Tools**: Any MCP-compatible client
### Language Support
- **JavaScript/TypeScript**: Full AST analysis with relationship extraction
- Import/export tracking
- Function call analysis
- Class inheritance detection
- Dynamic imports support
- **Python**: Regex-based analysis
- Import statement parsing
- Class inheritance detection
- Function call patterns
- **Other Languages** (Java, C++, C#, Go, Rust): Basic analysis
- Import/include statements
- Class declarations
- Function definitions
- **Extensible**: Easy to add new language analyzers
## ๐ License
MIT License - see LICENSE file for details.
## ๐ Support
- **Issues**: GitHub Issues
- **Documentation**: See `/docs` directory
- **Research Paper**: [arXiv:2506.20081v2](https://arxiv.org/abs/2506.20081v2)
## ๐ฎ Future Enhancements
- [ ] Multi-language AST parsing for all supported languages
- [ ] Real-time Graphiti integration (currently uses mock methods)
- [ ] Semantic relationship detection beyond syntactic analysis
- [ ] Visual relationship graphs in MCP responses
- [ ] Custom bias threshold configuration per project
- [ ] Integration with Language Server Protocol (LSP)
- [ ] Advanced localization algorithms with machine learning
- [ ] Performance optimizations for large codebases (>10k files)
- [ ] Real-time bias notifications during code writing
- [ ] Custom relationship type definitions
---
**SACL MCP Server** - Bringing research-backed bias-aware code retrieval to AI coding assistants.