Stores knowledge about code entities, relationships, patterns, and style conventions from a codebase, with support for incremental updates when code changes.
Referenced for source code management and contributions, allowing users to clone the repository and submit pull requests.
SourceSage: Efficient Code Memory for LLMs
SourceSage is an MCP (Model Context Protocol) server that efficiently memorizes key aspects of a codebase—logic, style, and standards—while allowing dynamic updates and fast retrieval. It's designed to be language-agnostic, leveraging the LLM's understanding of code across multiple languages.
Features
Language Agnostic: Works with any programming language the LLM understands
Knowledge Graph Storage: Efficiently stores code entities, relationships, patterns, and style conventions
LLM-Driven Analysis: Relies on the LLM to analyze code and provide insights
Token-Efficient Storage: Optimizes for minimal token usage while maximizing memory capacity
Incremental Updates: Updates knowledge when code changes without redundant storage
Fast Retrieval: Enables quick and accurate retrieval of relevant information
How It Works
SourceSage uses a novel approach where:
The LLM analyzes code files (in any language)
The LLM uses MCP tools to register entities, relationships, patterns, and style conventions
SourceSage stores this knowledge in a token-efficient graph structure
The LLM can later query this knowledge when needed
This approach leverages the LLM's inherent language understanding while focusing the MCP server on efficient memory management.
Installation
Usage
Running the MCP Server
Connecting to Claude for Desktop
Open Claude for Desktop
Go to Settings > Developer > Edit Config
Add the following to your
claude_desktop_config.json
:
If you've installed the package:
If you're running from a local directory without installing:
Restart Claude for Desktop
Available Tools
SourceSage provides the following MCP tools:
register_entity: Register a code entity in the knowledge graph
Input: - name: Name of the entity (e.g., class name, function name) - entity_type: Type of entity (class, function, module, etc.) - summary: Brief description of the entity - signature: Entity signature (optional) - language: Programming language (optional) - observations: List of observations about the entity (optional) - metadata: Additional metadata (optional) Output: Confirmation message with entity IDregister_relationship: Register a relationship between entities
Input: - from_entity: Name of the source entity - to_entity: Name of the target entity - relationship_type: Type of relationship (calls, inherits, imports, etc.) - metadata: Additional metadata (optional) Output: Confirmation message with relationship IDregister_pattern: Register a code pattern
Input: - name: Name of the pattern - description: Description of the pattern - language: Programming language (optional) - example: Example code demonstrating the pattern (optional) - metadata: Additional metadata (optional) Output: Confirmation message with pattern IDregister_style_convention: Register a coding style convention
Input: - name: Name of the convention - description: Description of the convention - language: Programming language (optional) - examples: Example code snippets demonstrating the convention (optional) - metadata: Additional metadata (optional) Output: Confirmation message with convention IDadd_entity_observation: Add an observation to an entity
Input: - entity_name: Name of the entity - observation: Observation to add Output: Confirmation messagequery_entities: Query entities in the knowledge graph
Input: - entity_type: Filter by entity type (optional) - language: Filter by programming language (optional) - name_pattern: Filter by name pattern (regex, optional) - limit: Maximum number of results to return (optional) Output: List of matching entitiesget_entity_details: Get detailed information about an entity
Input: - entity_name: Name of the entity Output: Detailed information about the entityquery_patterns: Query code patterns in the knowledge graph
Input: - language: Filter by programming language (optional) - pattern_name: Filter by pattern name (optional) Output: List of matching patternsquery_style_conventions: Query coding style conventions
Input: - language: Filter by programming language (optional) - convention_name: Filter by convention name (optional) Output: List of matching style conventionsget_knowledge_statistics: Get statistics about the knowledge graph
Input: None Output: Statistics about the knowledge graphclear_knowledge: Clear all knowledge from the graph
Input: None Output: Confirmation message
Example Workflow with Claude
Analyze Code: Ask Claude to analyze your code files
"Please analyze this Python file and register the key entities and relationships."Register Entities: Claude will use the register_entity tool to store code entities
"I'll register the main class in this file."Register Relationships: Claude will use the register_relationship tool to store relationships
"I'll register the inheritance relationship between these classes."Query Knowledge: Later, ask Claude about your codebase
"What classes are defined in my codebase?" "Show me the details of the User class." "What's the relationship between the User and Profile classes?"Get Coding Patterns: Ask Claude about coding patterns
"What design patterns are used in my codebase?" "Show me examples of the Factory pattern in my code."
How It's Different
Unlike traditional code analysis tools, SourceSage:
Leverages LLM Understanding: Uses the LLM's ability to understand code semantics across languages
Stores Semantic Knowledge: Focuses on meaning and relationships, not just syntax
Is Language Agnostic: Works with any programming language the LLM understands
Optimizes for Token Efficiency: Stores knowledge in a way that minimizes token usage
Evolves with LLM Capabilities: As LLMs improve, so does code understanding
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Tools
SourceSage is an MCP (Model Context Protocol) server that efficiently memorizes key aspects of a codebase—logic, style, and standards—while allowing dynamic updates and fast retrieval. It's designed to be language-agnostic, leveraging the LLM's understanding of code across multiple languages.
Related MCP Servers
- -securityAlicense-qualityThis is an MCP server for PostgREST. It allows LLMs perform database queries and operations on Postgres databases via PostgREST. This server works with both Supabase projects (which use PostgREST) and standalone PostgREST servers.Last updated -9262,159Apache 2.0
- -securityFlicense-qualityA Model Context Protocol server that provides AI assistants with structured access to your Logseq knowledge graph, enabling retrieval, searching, analysis, and creation of content within your personal knowledge base.Last updated -51
- AsecurityAlicenseAqualityAn MCP server that connects to a Swagger specification and helps an AI to build all the required models to generate a MCP server for that service.Last updated -53581MIT License
- AsecurityAlicenseAqualityA Model Context Protocol server that integrates with DeepSource to provide AI assistants with access to code quality metrics, issues, and analysis results.Last updated -9674MIT License
Appeared in Searches
- Retrieve the latest documentation of libraries in word embeddings form for LLMs
- Using local LLMs for code writing, reviewing, and rule generation
- Assistance with reviewing and merging code changes in a merge request
- Resources or Assistance for Learning to Code
- Developing Transformer-based deep learning models with PyTorch, PyTorch Lightning, and GluonTS for integration in AI-powered IDEs