Skip to main content
Glama
README.mdโ€ข13 kB
# SACL MCP Server **Semantic-Augmented Reranking and Localization for Code Retrieval** A Model Context Protocol (MCP) server that implements the SACL research framework to provide bias-aware code retrieval for AI coding assistants like Claude Code, Cursor, and other MCP-enabled tools. ## ๐ŸŽฏ Overview SACL addresses the critical problem of **textual bias** in code retrieval systems. Traditional systems over-rely on surface-level features like docstrings, comments, and variable names, leading to biased results that favor well-documented code regardless of functional relevance. ### Key Features - **๐Ÿง  Bias Detection**: Identifies over-reliance on textual features - **๐Ÿ” Semantic Augmentation**: Enriches code understanding beyond surface text - **๐Ÿ“Š Intelligent Reranking**: Prioritizes functional relevance over documentation - **๐ŸŽฏ Code Localization**: Pinpoints functionally relevant code segments - **๐Ÿ”— Relationship Analysis**: Maps code dependencies and relationships - **๐ŸŽจ Context-Aware Retrieval**: Returns results with related components - **๐Ÿš€ Agent-Controlled Updates**: Explicit file updates for Docker compatibility - **๐Ÿ—„๏ธ Knowledge Graph**: Persistent semantic storage with Graphiti/Neo4j - **๐Ÿ”ง MCP Integration**: Works with Claude Code, Cursor, and other AI tools ## ๐Ÿ—๏ธ Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ AI Assistant โ”‚โ”€โ”€โ”€โ”€โ”‚ SACL MCP Server โ”‚โ”€โ”€โ”€โ”€โ”‚ Graphiti/Neo4j โ”‚ โ”‚ (Claude, Cursor)โ”‚ โ”‚ โ”‚ โ”‚ Knowledge Graph โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ SACL Framework โ”‚ โ”‚ โ”‚ โ”‚ โ€ข Bias Detectionโ”‚ โ”‚ โ€ข Semantic Aug. โ”‚ โ”‚ โ€ข Reranking โ”‚ โ”‚ โ€ข Localization โ”‚ โ”‚ โ€ข Relationships โ”‚ โ”‚ โ€ข Context-Aware โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ## ๐Ÿš€ Quick Start ### Prerequisites - Node.js 18+ - Neo4j database - OpenAI API key ### Installation ```bash # Clone the repository git clone <repository-url> cd sacl # Install dependencies npm install # Copy environment configuration cp .env.example .env # Edit .env with your settings OPENAI_API_KEY=your_key_here NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=your_password ``` ### Using Docker (Recommended) ```bash # Start Neo4j and SACL server docker-compose up -d # Check logs docker-compose logs -f sacl-mcp-server ``` ### Manual Setup ```bash # Build the project npm run build # Start the server npm start ``` ## ๐Ÿ”ง Configuration ### Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `OPENAI_API_KEY` | OpenAI API key (required) | - | | `SACL_REPO_PATH` | Repository to analyze | Current directory | | `SACL_NAMESPACE` | Unique namespace | Auto-generated | | `SACL_LLM_MODEL` | LLM model for analysis | `gpt-4` | | `SACL_EMBEDDING_MODEL` | Embedding model | `text-embedding-3-small` | | `SACL_BIAS_THRESHOLD` | Bias detection sensitivity (0-1) | `0.5` | | `SACL_MAX_RESULTS` | Maximum search results | `10` | | `SACL_CACHE_ENABLED` | Enable embedding cache | `true` | | `NEO4J_URI` | Neo4j connection URI | `bolt://localhost:7687` | | `NEO4J_USER` | Neo4j username | `neo4j` | | `NEO4J_PASSWORD` | Neo4j password | `password` | ## ๐ŸŽฎ Usage ### MCP Tools The SACL server provides comprehensive MCP tools for bias-aware code analysis: #### 1. `analyze_repository` Performs full SACL analysis of a repository: ```json { "repositoryPath": "/path/to/repo", "incremental": false } ``` #### 2. `query_code` Bias-aware code search with optional context: ```json { "query": "function that sorts arrays efficiently", "repositoryPath": "/path/to/repo", "maxResults": 10, "includeContext": false // Set true for relationship context } ``` #### 3. `query_code_with_context` ๐Ÿ†• Enhanced search with relationship context and related components: ```json { "query": "authentication middleware", "repositoryPath": "/path/to/repo", "maxResults": 10, "includeRelated": true } ``` #### 4. `update_file` ๐Ÿ†• Explicitly update single file analysis when changes are made: ```json { "filePath": "src/services/auth.js", "changeType": "modified" // "created", "modified", or "deleted" } ``` #### 5. `update_files` ๐Ÿ†• Batch update multiple files: ```json { "files": [ { "filePath": "src/index.js", "changeType": "modified" }, { "filePath": "src/utils/new.js", "changeType": "created" } ] } ``` #### 6. `get_relationships` ๐Ÿ†• Analyze code relationships and dependencies: ```json { "filePath": "src/controllers/UserController.js", "maxDepth": 3, "relationshipTypes": ["imports", "calls", "extends"] // Optional filter } ``` #### 7. `get_file_context` ๐Ÿ†• Get comprehensive context for a file: ```json { "filePath": "src/models/User.js", "includeSnippets": true // Include code previews } ``` #### 8. `get_bias_analysis` Detailed bias metrics and debugging: ```json { "filePath": "src/utils/sort.js" // Optional } ``` #### 9. `get_system_stats` System performance and statistics: ```json {} ``` ### MCP Client Configuration #### Claude Desktop Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "sacl": { "command": "node", "args": ["/path/to/sacl/dist/index.js"], "env": { "OPENAI_API_KEY": "your-key", "NEO4J_URI": "bolt://localhost:7687", "NEO4J_USER": "neo4j", "NEO4J_PASSWORD": "password" } } } } ``` #### Cursor IDE Configure in your Cursor settings to connect to the SACL MCP server. ## ๐Ÿ“Š SACL Framework ### Stage 1: Bias Detection Identifies three types of textual bias: - **Docstring Dependency**: Over-reliance on documentation - **Identifier Name Bias**: Focusing on variable/function names - **Comment Over-reliance**: Prioritizing commented code ### Stage 2: Semantic Augmentation Enriches code representations with: - **Functional Signatures**: What the code actually does - **Behavior Patterns**: Computational patterns (iteration, recursion, etc.) - **Structural Features**: Complexity metrics, AST analysis - **Augmented Embeddings**: Bias-adjusted semantic vectors ### Stage 3: Reranking & Localization - **Bias-Aware Ranking**: Reduces textual weight based on bias score - **Code Localization**: Identifies functionally relevant segments - **Semantic Similarity**: Uses augmented embeddings - **Functional Relevance**: Considers computational patterns ### Stage 4: Relationship Analysis ๐Ÿ†• Maps code relationships and dependencies: - **Import/Export Analysis**: Module dependencies and exports - **Function Call Mapping**: Call graphs and method invocations - **Class Inheritance**: Extends/implements relationships - **Dependency Tracking**: External and internal dependencies - **Context-Aware Results**: Related components with each query result ## ๐Ÿงช Example Workflow 1. **Repository Analysis**: ``` AI Assistant โ†’ analyze_repository โ†’ SACL processes all files โ†’ Knowledge graph populated ``` 2. **Code Query with Context**: ``` AI Assistant โ†’ query_code_with_context("authentication") โ†’ SACL retrieval โ†’ Context-aware results ``` 3. **File Updates**: ``` AI modifies code โ†’ update_file("src/auth.js", "modified") โ†’ SACL re-analyzes โ†’ Relationships updated ``` 4. **Relationship Exploration**: ``` AI Assistant โ†’ get_relationships("UserController.js") โ†’ Dependency graph โ†’ Related components ``` 5. **Results Include**: - Original textual similarity score - Semantic similarity score - Bias-adjusted final score - Localized code regions - Related components and dependencies - Context explanation with relationship importance - Explanation of ranking decisions ## ๐Ÿ“ˆ Performance Based on SACL research benchmarks: - **12.8%** improvement in Recall@1 on HumanEval - **9.4%** improvement on MBPP - **7.0%** improvement on SWE-Bench-Lite - **P95 latency**: <300ms for retrieval operations ## ๐Ÿ” Bias Analysis Example ``` ๐Ÿง  SACL Bias Analysis File: src/algorithms/quicksort.js Bias Metrics: โ€ข Overall Bias Score: 73.2% ๐Ÿ”ด โ€ข Semantic Pattern: Recursive divide-and-conquer sorting โ€ข Functional Signature: Array input โ†’ sorted array output Bias Indicators: โ€ข docstring_dependency: High docstring dependency (15.3% of code) โ€ข identifier_name_bias: High reliance on descriptive names โ€ข comment_over_reliance: Excessive comments (18.7% of code) ๐Ÿ’ก Improvement Suggestions: โ€ข Reduce reliance on variable naming for semantic understanding โ€ข Focus on structural patterns over comments โ€ข Improve functional signature extraction ``` ## ๐Ÿ› ๏ธ Development ### Project Structure ``` src/ โ”œโ”€โ”€ core/ # SACL framework implementation โ”‚ โ”œโ”€โ”€ BiasDetector.ts # Textual bias detection โ”‚ โ”œโ”€โ”€ SemanticAugmenter.ts # Semantic enhancement โ”‚ โ”œโ”€โ”€ SACLReranker.ts # Reranking and localization with context โ”‚ โ””โ”€โ”€ SACLProcessor.ts # Main orchestrator with relationship support โ”œโ”€โ”€ mcp/ # MCP server implementation โ”‚ โ””โ”€โ”€ SACLMCPServer.ts # MCP protocol handlers (9 tools) โ”œโ”€โ”€ graphiti/ # Knowledge graph integration โ”‚ โ””โ”€โ”€ GraphitiClient.ts # Graphiti/Neo4j interface with relationships โ”œโ”€โ”€ utils/ # Utility modules โ”‚ โ””โ”€โ”€ CodeAnalyzer.ts # AST analysis and relationship extraction โ”œโ”€โ”€ types/ # TypeScript type definitions โ”‚ โ”œโ”€โ”€ index.ts # Core types and interfaces โ”‚ โ””โ”€โ”€ relationships.ts # Relationship type definitions โ””โ”€โ”€ index.ts # Application entry point ``` ### Building ```bash npm run build # Build TypeScript npm run dev # Development with auto-reload npm run lint # Code linting npm run format # Code formatting npm test # Run tests ``` ### Contributing 1. Fork the repository 2. Create a feature branch 3. Implement changes following SACL methodology 4. Add tests for new functionality 5. Submit a pull request ## ๐Ÿ“š Research Background This implementation is based on the research paper: **"SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization"** - Authors: Dhruv Gupta, Gayathri Ganesh Lakshmy, Yiqing Xie - arXiv: 2506.20081v2 ### Key Research Contributions 1. **Systematic Bias Detection**: Identifies textual bias through feature masking 2. **Semantic Augmentation**: Enhances code understanding beyond text 3. **Bias-Aware Ranking**: Reduces surface-level feature dependency 4. **Localization**: Pinpoints functionally relevant code regions ## ๐Ÿ”— Integration ### Supported AI Tools - **Claude Code**: Direct MCP integration - **Cursor**: MCP server connection - **VS Code Extensions**: Via MCP protocol - **Custom Tools**: Any MCP-compatible client ### Language Support - **JavaScript/TypeScript**: Full AST analysis with relationship extraction - Import/export tracking - Function call analysis - Class inheritance detection - Dynamic imports support - **Python**: Regex-based analysis - Import statement parsing - Class inheritance detection - Function call patterns - **Other Languages** (Java, C++, C#, Go, Rust): Basic analysis - Import/include statements - Class declarations - Function definitions - **Extensible**: Easy to add new language analyzers ## ๐Ÿ“„ License MIT License - see LICENSE file for details. ## ๐Ÿ†˜ Support - **Issues**: GitHub Issues - **Documentation**: See `/docs` directory - **Research Paper**: [arXiv:2506.20081v2](https://arxiv.org/abs/2506.20081v2) ## ๐Ÿ”ฎ Future Enhancements - [ ] Multi-language AST parsing for all supported languages - [ ] Real-time Graphiti integration (currently uses mock methods) - [ ] Semantic relationship detection beyond syntactic analysis - [ ] Visual relationship graphs in MCP responses - [ ] Custom bias threshold configuration per project - [ ] Integration with Language Server Protocol (LSP) - [ ] Advanced localization algorithms with machine learning - [ ] Performance optimizations for large codebases (>10k files) - [ ] Real-time bias notifications during code writing - [ ] Custom relationship type definitions --- **SACL MCP Server** - Bringing research-backed bias-aware code retrieval to AI coding assistants.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ulasbilgen/sacl'

If you have feedback or need assistance with the MCP directory API, please join our Discord server