Elasticsearch Knowledge Graph for MCP

by j3k0
# Elasticsearch Knowledge Graph for MCP A scalable knowledge graph implementation for Model Context Protocol (MCP) using Elasticsearch as the backend. This implementation is designed to replace the previous JSON file-based approach with a more scalable, performant solution. ## Key Features - **Scalable Storage**: Elasticsearch provides distributed, scalable storage for knowledge graph entities and relations - **Advanced Search**: Full-text search with fuzzy matching and relevancy ranking - **Memory-like Behavior**: Tracks access patterns to prioritize recently viewed and important entities - **Import/Export Tools**: Easy migration from existing JSON-based knowledge graphs - **Rich Query API**: Advanced querying capabilities not possible with the previous implementation - **Admin Tools**: Management CLI for inspecting and maintaining the knowledge graph - **Complete CRUD Operations**: Full create, read, update, and delete capabilities for entities and relations - **Elasticsearch Query Support**: Native support for Elasticsearch query DSL for advanced search capabilities - **Multi-Zone Architecture**: Separate memory zones for organizing domain-specific knowledge - **Cross-Zone Relations**: Relations between entities in different memory zones ## Architecture The knowledge graph system consists of: 1. **Elasticsearch Cluster**: Core data store for entities and relations 2. **Knowledge Graph Library**: TypeScript interface to Elasticsearch with all core operations 3. **MCP Server**: Protocol-compliant server for AI models to interact with the knowledge graph 4. **Admin CLI**: Command-line tools for maintenance and management 5. **Import/Export Tools**: Utilities for data migration and backup 6. **Multiple Memory Zones**: Ability to partition knowledge into separate zones/indices ## Getting Started ### Prerequisites - Node.js 18+ - Docker and Docker Compose ### Installation 1. Clone the repository: ```bash git clone https://github.com/mcp-servers/mcp-servers.git cd mcp-servers/memory ``` 2. Install dependencies: ```bash npm install ``` 3. Start the Elasticsearch cluster: ```bash npm run es:start ``` 4. Build the project: ```bash npm run build ``` ### Migration from JSON If you have an existing JSON-based knowledge graph, you can import it: ```bash node dist/admin-cli.js init node dist/admin-cli.js import memory.json ``` ### Running the MCP Server Start the MCP server that connects to Elasticsearch: ```bash npm start ``` ## Configuration The system can be configured via environment variables: - `ES_NODE`: Elasticsearch node URL (default: `http://localhost:9200`) - `ES_USERNAME`: Elasticsearch username (if authentication is enabled) - `ES_PASSWORD`: Elasticsearch password (if authentication is enabled) - `MEMORY_FILE_PATH`: Path to memory JSON file (for import/export) - `KG_DEFAULT_ZONE`: Default memory zone to use (default: `default`) - `KG_INDEX_PREFIX`: Prefix for Elasticsearch indices (default: `knowledge-graph`) ## Admin CLI Commands The admin CLI provides tools for managing the knowledge graph: ```bash # Initialize Elasticsearch index node dist/admin-cli.js init # Import data from JSON file to a specific zone node dist/admin-cli.js import memory.json [zone] # Export data from a specific zone to JSON file node dist/admin-cli.js export backup.json [zone] # Backup all zones and relations node dist/admin-cli.js backup full-backup.json # Restore from a full backup node dist/admin-cli.js restore full-backup.json [--yes] # Show statistics about all zones or a specific zone node dist/admin-cli.js stats [zone] # Search the knowledge graph with optional zone parameter node dist/admin-cli.js search "search query" [zone] # Show details about a specific entity node dist/admin-cli.js entity "John Smith" [zone] # Show relations for a specific entity node dist/admin-cli.js relations "John Smith" [zone] # List all memory zones node dist/admin-cli.js zones list # Add a new memory zone node dist/admin-cli.js zones add projectX "Project X knowledge zone" # Delete a memory zone node dist/admin-cli.js zones delete projectX [--yes] # Show statistics for a specific zone node dist/admin-cli.js zones stats projectX # Reset all zones or a specific zone node dist/admin-cli.js reset [zone] [--yes] # Show help node dist/admin-cli.js help ``` ## Memory Zones The knowledge graph supports multiple memory zones to organize domain-specific knowledge. This allows you to: 1. **Partition Knowledge**: Separate data into different domains (projects, departments, etc.) 2. **Improve Query Performance**: Search within specific zones for faster and more relevant results 3. **Maintain Context**: Keep context-specific information isolated but connected ### Working with Zones ```bash # Create a new zone node dist/admin-cli.js zones add projectX "Project X knowledge zone" # List all zones node dist/admin-cli.js zones list # Import data into a specific zone node dist/admin-cli.js import project-data.json projectX # Search within a specific zone node dist/admin-cli.js search "feature" projectX ``` ### Cross-Zone Relations Entities in different zones can be related to each other. When creating a relation, you can specify the zones for both entities: ```json { "type": "relation", "from": "Project Feature", "fromZone": "projectX", "to": "General Concept", "toZone": "default", "relationType": "implements" } ``` ### Automation Support For scripting and automation, you can use the `--yes` or `-y` flag to skip confirmation prompts: ```bash # Reset without confirmation node dist/admin-cli.js reset --yes # Delete a zone without confirmation node dist/admin-cli.js zones delete projectX --yes # Restore from backup without confirmation node dist/admin-cli.js restore backup.json --yes ``` ### Search Examples The Elasticsearch-backed knowledge graph provides powerful search capabilities: ```bash # Basic search node dist/admin-cli.js search "cordova plugin" # Search in a specific zone node dist/admin-cli.js search "feature" projectX # Fuzzy search (will find "subscription" even with typo) node dist/admin-cli.js search "subscrption" # Person search node dist/admin-cli.js search "Jean" ``` Search results include: - Relevancy scoring - Highlighted matches showing where the terms were found - Entity types and observation counts - Sorted by most relevant first ## MCP Server Tools The MCP server exposes the following tools for interacting with the knowledge graph: ### Entity Operations | Tool | Description | |------|-------------| | `create_entities` | Create one or more entities in the knowledge graph | | `update_entities` | Update properties of existing entities | | `delete_entities` | Delete one or more entities from the knowledge graph | | `add_observations` | Add observations to an existing entity | | `mark_important` | Mark an entity as important or not | ### Relation Operations | Tool | Description | |------|-------------| | `create_relations` | Create relations between entities | | `delete_relations` | Delete relations between entities | ### Query Operations | Tool | Description | |------|-------------| | `search_nodes` | Search for entities using Elasticsearch query capabilities | | `open_nodes` | Get details about specific entities by name | | `get_recent` | Get recently accessed entities | Each tool can include an optional `memory_zone` parameter to specify which zone to operate on. ## Relevancy Ranking The knowledge graph implements a sophisticated relevancy ranking system that considers: 1. **Text Relevance**: How well entities match the search query 2. **Recency**: Prioritizes recently accessed entities 3. **Importance**: Entities marked as important receive higher ranking 4. **Usage Frequency**: Entities accessed more frequently rank higher This approach simulates memory-like behavior where important, recent, and frequently accessed information is prioritized. ## Benefits Over JSON Implementation - **Scalability**: Handles millions of entities efficiently - **Performance**: Optimized for fast queries even with large datasets - **Rich Queries**: Advanced search capabilities like fuzzy matching and relevancy ranking - **Resiliency**: Better handling of concurrent operations - **Observability**: Built-in monitoring and diagnostics - **Complete CRUD**: Full lifecycle management for entities and relations ## License MIT