Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
DOCKER_SYNC_RAG_STRATEGY.mdβ€’55.7 kB
# Docker Bind Mount RAG Strategy (Strategy 1 - Revised) **Date:** 2025-10-17 **Status:** Design Phase **Version:** 2.0 **Related:** [DOCKER_FILE_INDEXING_STRATEGY.md](DOCKER_FILE_INDEXING_STRATEGY.md), [FILE_INDEXING_SYSTEM.md](FILE_INDEXING_SYSTEM.md) --- ## πŸ“‹ Executive Summary **Strategy:** Use Docker bind mounts with polling-based file watching combined with Neo4j full-text search and graph traversal to automatically enrich query responses and node retrievals with relevant file content, code snippets, and contextual information. Vector embeddings are **optional** and can be added later without changing the core architecture. **Key Innovation:** When an agent queries for "orchestration", the system automatically: 1. **Searches** indexed codebase for related files/nodes (full-text + graph traversal) 2. **Ranks** results by relevance (keyword match, graph distance, *optionally* vector similarity) 3. **Enriches** response with file content, function signatures, and relationships 4. **Returns** comprehensive context without additional tool calls **Phase 1 (Current):** Full-text + Graph traversal only **Phase 2 (Future):** Add vector embeddings for semantic search (no architecture changes needed) **Goal:** Zero-friction RAG where every query automatically includes relevant codebase context. --- ## 🎯 Architecture Overview ### High-Level Flow ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ User/Agent Query β”‚ β”‚ "Show me nodes related to orchestration" β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MCP Graph Search Tool β”‚ β”‚ memory_search_nodes() β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Neo4j Multi-Strategy Search β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Full-Text β”‚ Graph Traversal β”‚ [Vector - Future] β”‚ β”‚ β”‚ β”‚ Index β”‚ (RELATES_TO edges) β”‚ (optional Phase 2) β”‚ β”‚ β”‚ β”‚ (Phase 1) β”‚ (Phase 1) β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Result Ranking & Fusion β”‚ β”‚ β€’ Phase 1: Combine keyword + graph signals β”‚ β”‚ β€’ Score: 0.6*text + 0.4*graph β”‚ β”‚ β€’ Phase 2: Add vector (0.4*text + 0.3*vector + 0.3*graph) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Context Enrichment Pipeline β”‚ β”‚ 1. Fetch matched nodes (files, functions, TODOs) β”‚ β”‚ 2. Traverse graph for related entities (1-2 hops) β”‚ β”‚ 3. Load file content from synced volume β”‚ β”‚ 4. Extract relevant code snippets β”‚ β”‚ 5. Include import relationships β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Enriched Response β”‚ β”‚ { β”‚ β”‚ "query": "orchestration", β”‚ β”‚ "nodes": [...], β”‚ β”‚ "related_files": [ β”‚ β”‚ { β”‚ β”‚ "path": "src/orchestration/WorkflowManager.ts", β”‚ β”‚ "content": "...", β”‚ β”‚ "relevance": 0.95, β”‚ β”‚ "reason": "keyword_match + imports_orchestration" β”‚ β”‚ } β”‚ β”‚ ], β”‚ β”‚ "related_functions": [...], β”‚ β”‚ "related_concepts": [...] β”‚ β”‚ } β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## πŸ”„ Docker Bind Mount Strategy ### Why Bind Mounts? (Revised Decision) **Simplicity:** - βœ… No external dependencies (no mutagen/docker-sync) - βœ… Native Docker feature - βœ… Simpler configuration and deployment - βœ… Works out-of-the-box **File Access:** - βœ… Direct host filesystem access - βœ… No sync latency (immediate visibility) - βœ… Easy to mount multiple directories **File Watching:** - ⚠️ Events don't propagate on macOS/Windows - βœ… Solved with polling (chokidar `usePolling: true`) - βœ… 1s polling interval = acceptable latency - βœ… 2-5% CPU overhead (reasonable trade-off) **Trade-offs:** - ⚠️ Polling required for file events - ⚠️ Slightly higher CPU usage (~2-5%) - ⚠️ 1-2s event detection latency - βœ… But: No external tools, simpler setup **Decision:** For a development/single-machine tool, simplicity > performance. Bind mounts are the right choice. --- ### Docker Compose Configuration ```yaml # docker compose.yml version: '3.8' services: mcp-server: build: . container_name: mcp-graph-rag ports: - "3000:3000" volumes: # Bind mount: Root directory containing all projects (read-only) # ${WORKSPACE_ROOT} comes from .env file or environment variable # Example: WORKSPACE_ROOT=/Users/timothysweet/src # This mounts your entire src directory into /workspace/src in the container - ${WORKSPACE_ROOT:~/src}:/workspace/src:ro # Persistent config and data (read-write) - ./data:/app/data environment: # File indexing config DOCKER_ENV: "true" FILE_WATCH_ENABLED: "true" FILE_WATCH_POLLING: "true" # Required for bind mounts FILE_WATCH_INTERVAL: "1000" # Poll every 1s WORKSPACE_PATH: "/workspace/src" WATCH_CONFIG_PATH: "/app/data/watch-config.json" # RAG config RAG_AUTO_ENRICH: "true" RAG_MAX_RELATED_FILES: "10" RAG_MAX_DEPTH: "2" RAG_MIN_RELEVANCE: "0.3" # Neo4j connection NEO4J_URI: bolt://neo4j:7687 NEO4J_USER: neo4j NEO4J_PASSWORD: ${NEO4J_PASSWORD} depends_on: neo4j: condition: service_healthy networks: - app-network neo4j: image: neo4j:5.15-community ports: - "7474:7474" - "7687:7687" environment: NEO4J_AUTH: neo4j/${NEO4J_PASSWORD} NEO4J_PLUGINS: '["apoc"]' # Enable full-text indexes NEO4J_dbms_security_procedures_unrestricted: apoc.* volumes: - ./data/neo4j:/data - ./data/neo4j-logs:/logs healthcheck: test: ["CMD", "cypher-shell", "-u", "neo4j", "-p", "${NEO4J_PASSWORD}", "RETURN 1"] interval: 10s timeout: 5s retries: 5 networks: - app-network networks: app-network: ``` **Concrete Example (What Actually Happens):** ```yaml # If your .env file contains: # WORKSPACE_ROOT=/Users/timothysweet/src # Then Docker resolves this: volumes: - ${WORKSPACE_ROOT:~/src}:/workspace/src:ro # To this actual mount: volumes: - /Users/timothysweet/src:/workspace/src:ro # Meaning: # Host path: /Users/timothysweet/src # Container path: /workspace/src # Mode: :ro (read-only) ``` **Alternative: Hardcoded Path (No Environment Variable)** ```yaml # If you prefer not to use environment variables: volumes: - /Users/timothysweet/src:/workspace/src:ro - ./data:/app/data # This works too! Just less flexible across different machines. ``` ### Real-World Example: Mount Root, Configure What to Index **Scenario:** You have multiple projects under `/Users/timothysweet/src/` but only want to index specific ones. **Strategy:** 1. Mount the entire `/src/` directory (read-only) 2. Use `watch-config.json` to specify which subdirectories to index 3. Add/remove folders via MCP tools (`watch_folder` / `unwatch_folder`) **Host Path:** `/Users/timothysweet/src` **Container Path:** `/workspace/src` ```yaml # docker compose.yml services: mcp-server: volumes: # Mount entire src directory (read-only) - /Users/timothysweet/src:/workspace/src:ro # Data and config (read-write) - ./data:/app/data ``` **Inside the container, the file structure will be:** ``` /workspace/src/ β”œβ”€β”€ GRAPH-RAG-TODO-main/ β”‚ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ docs/ β”‚ └── package.json β”œβ”€β”€ my-web-app/ β”‚ β”œβ”€β”€ components/ β”‚ └── pages/ β”œβ”€β”€ research-project/ β”‚ └── experiments/ └── personal-notes/ └── ideas/ ``` **Configuration Storage: Neo4j (Not JSON Files)** Instead of flat file config, watch configurations are stored as Neo4j nodes: ```cypher // Watch configuration nodes in Neo4j CREATE (w:WatchConfig { id: 'watch-1-1234567890', path: '/workspace/src/GRAPH-RAG-TODO-main', recursive: true, debounce_ms: 500, generate_embeddings: false, added_date: '2025-10-17T10:30:00Z', last_indexed: '2025-10-17T10:35:00Z', file_patterns: ['*.ts', '*.js', '*.md'], ignore_patterns: ['node_modules/', 'dist/', '*.test.ts'], status: 'active', files_indexed: 147 }); CREATE (w:WatchConfig { id: 'watch-2-1234567891', path: '/workspace/src/my-web-app', recursive: true, debounce_ms: 500, generate_embeddings: false, added_date: '2025-10-17T11:45:00Z', last_indexed: '2025-10-17T11:50:00Z', file_patterns: ['*.tsx', '*.ts', '*.jsx', '*.js'], ignore_patterns: ['node_modules/', '.next/', 'build/'], status: 'active', files_indexed: 89 }); ``` **Benefits of Neo4j Storage:** - βœ… Single source of truth (no file sync issues) - βœ… Transactional updates (no race conditions) - βœ… Query watch status alongside indexed files - βœ… Automatic persistence (Neo4j volume) - βœ… Simple startup query (no complex relationships needed) **To add a folder for indexing via MCP tool:** ```typescript // Agent calls: await mcp.call('watch_folder', { path: '/workspace/src/research-project', // Container path recursive: true, file_patterns: ['*.py', '*.ipynb', '*.md'] }); // System validates and executes: // 1. βœ… Check if path exists on filesystem // 2. βœ… Create WatchConfig node in Neo4j // 3. βœ… Start chokidar watcher // 4. βœ… Return watch_id: "watch-3-xxx" // Response: { "watch_id": "watch-3-xxx", "path": "/workspace/src/research-project", "status": "active", "message": "Folder is now being watched. Indexing will begin." } ``` **To manually trigger indexing (with validation):** ```typescript // Agent calls: await mcp.call('index_folder', { path: '/workspace/src/research-project' }); // System validates: // 1. βœ… Check if path exists on filesystem // 2. βœ… Check if path is in WatchConfig (must be watched first!) // 3. βœ… If valid: index all matching files // 4. ❌ If invalid: return error with instructions // Success response: { "status": "success", "path": "/workspace/src/research-project", "files_indexed": 47, "elapsed_ms": 1250 } // Error response (not watched): { "status": "error", "error": "folder_not_watched", "message": "Path '/workspace/src/research-project' is not in watch list. Call watch_folder first.", "path": "/workspace/src/research-project" } // Error response (doesn't exist): { "status": "error", "error": "path_not_found", "message": "Path '/workspace/src/nonexistent' does not exist.", "path": "/workspace/src/nonexistent" } ``` **To remove a folder from indexing:** ```typescript // Agent calls: await mcp.call('unwatch_folder', { path: '/workspace/src/personal-notes' }); // System: // 1. Stops chokidar watcher // 2. Updates WatchConfig node: status = 'inactive' // 3. Optionally: Removes indexed files from Neo4j // Response: { "status": "success", "path": "/workspace/src/personal-notes", "files_removed": 23, "message": "Folder watch stopped and indexed data removed." } ``` **Query example after indexing:** ```typescript // Agent searches across ALL indexed projects: await mcp.call('memory_search_nodes', { query: 'GraphManager Neo4j' }); // Response includes results from: // - /workspace/src/GRAPH-RAG-TODO-main/src/managers/GraphManager.ts // - /workspace/src/my-web-app/lib/graphql-client.ts (if related) // - Cross-project relationships automatically discovered ``` **Benefits:** - βœ… Single mount point for all projects - βœ… Selective indexing via config file - βœ… Easy to add/remove projects dynamically - βœ… Config persists across restarts - βœ… Cross-project search and relationships --- ### Environment Configuration ```bash # .env (Real Example) NEO4J_PASSWORD=your_secure_password # Mount root directory containing all projects WORKSPACE_ROOT=/Users/timothysweet/src # File watching config FILE_WATCH_ENABLED=true FILE_WATCH_POLLING=true FILE_WATCH_INTERVAL=1000 WATCH_CONFIG_PATH=/app/data/watch-config.json # RAG config RAG_AUTO_ENRICH=true RAG_MAX_RELATED_FILES=10 RAG_MAX_DEPTH=2 RAG_MIN_RELEVANCE=0.3 ``` **Key Concept:** - Mount ONE broad directory (`WORKSPACE_ROOT`) - Config file controls WHICH subdirectories to index - Add/remove via MCP tools, not environment variables ### Starting the System **Example 1: Mount Root Directory** ```bash # Set environment variable (mount entire src directory) export WORKSPACE_ROOT=/Users/timothysweet/src # Start Docker services docker compose up -d # Verify the mount worked (all projects visible) docker exec mcp-graph-rag ls -la /workspace/src # You should see ALL your projects: # drwxr-xr-x GRAPH-RAG-TODO-main/ # drwxr-xr-x my-web-app/ # drwxr-xr-x research-project/ # drwxr-xr-x personal-notes/ # Check logs docker compose logs -f mcp-server # Expected output (NO watches yet - empty config): # πŸ“‚ Loading persisted watches... # πŸ“‚ Loading 0 persisted watches... # 🎯 File watching initialized (0 active) ``` **Example 2: Add Projects to Index (via MCP)** ```typescript // Agent adds first project await mcp.call('watch_folder', { path: '/workspace/src/GRAPH-RAG-TODO-main', recursive: true, file_patterns: ['*.ts', '*.js', '*.md'] }); // Response: { "status": "success", "watch_id": "watch-1-xxx", "folder": "/workspace/src/GRAPH-RAG-TODO-main", "files_indexed": 147, "persisted": true } // Agent adds second project await mcp.call('watch_folder', { path: '/workspace/src/my-web-app', recursive: true, file_patterns: ['*.tsx', '*.ts', '*.jsx'] }); // Both entries now in ./data/watch-config.json // On restart: Both automatically resume watching ``` **Example 3: Check What's Being Watched** ```typescript // Agent checks current watches await mcp.call('list_folders'); // Response: { "watches": [ { "watch_id": "watch-1-xxx", "folder": "/workspace/src/GRAPH-RAG-TODO-main", "recursive": true, "files_indexed": 147, "last_update": "2025-10-17T10:30:00Z", "active": true }, { "watch_id": "watch-2-xxx", "folder": "/workspace/src/my-web-app", "recursive": true, "files_indexed": 89, "last_update": "2025-10-17T11:45:00Z", "active": true } ], "total": 2 } ``` **Example 4: Remove a Folder from Indexing** ```typescript // Agent removes folder await mcp.call('unwatch_folder', { path: '/workspace/src/personal-notes' }); // System: // 1. Stops file watching // 2. Removes from ./data/watch-config.json // 3. Optionally: Purges indexed data from Neo4j // Next restart: Only remaining folders in config auto-resume ``` ### Adding New Folders Dynamically **Option 1: Via MCP Tool (Recommended)** ```typescript // Agent calls this tool await mcp.call('watch_folder', { path: '/workspace/project1', // Inside container recursive: true }); // System: // 1. Starts watching /workspace/project1 // 2. Saves to ./data/watch-config.json // 3. On restart: Automatically resumes watching ``` **Option 2: Update docker compose.yml (Requires Restart)** ```yaml # Add new bind mount volumes: - /Users/username/new-project:/workspace/project3:ro ``` ```bash docker compose down docker compose up -d ``` --- ### Complete Workflow Example **Step 1: Setup and Mount Root Directory** ```bash # Terminal 1: Set up environment cd /Users/timothysweet/src/GRAPH-RAG-TODO-main # Create .env file cat > .env << EOF NEO4J_PASSWORD=mysecurepassword WORKSPACE_ROOT=/Users/timothysweet/src EOF # Verify docker compose.yml volumes section # volumes: # - ${WORKSPACE_ROOT}:/workspace/src:ro # - ./data:/app/data # Start services docker compose up -d # Verify ALL projects are visible (but not indexed yet) docker exec mcp-graph-rag ls /workspace/src # Output: GRAPH-RAG-TODO-main/ my-web-app/ research-project/ # Check specific project docker exec mcp-graph-rag ls /workspace/src/GRAPH-RAG-TODO-main/src # Output: index.ts managers/ tools/ types/ utils/ # Check logs - no watches yet docker compose logs mcp-server # Output: πŸ“‚ Loading 0 persisted watches... ``` **Step 2: Index Specific Projects via MCP Tool** ```typescript // Agent (via Cursor MCP) adds GRAPH-RAG-TODO for indexing: await mcp.call('watch_folder', { path: '/workspace/src/GRAPH-RAG-TODO-main', recursive: true, file_patterns: ['*.ts', '*.js', '*.md'] }); // Response: { "status": "success", "watch_id": "watch-1-1234567890", "folder": "/workspace/src/GRAPH-RAG-TODO-main", "files_indexed": 147, "persisted": true, "message": "Watching folder. Will re-index on file changes." } // Agent adds another project: await mcp.call('watch_folder', { path: '/workspace/src/my-web-app', recursive: true, file_patterns: ['*.tsx', '*.ts', '*.jsx'] }); // Response: { "status": "success", "watch_id": "watch-2-1234567891", "folder": "/workspace/src/my-web-app", "files_indexed": 89, "persisted": true } // Both entries saved to ./data/watch-config.json // Both will auto-resume on restart ``` **Step 3: Query for Context** ```typescript // Agent searches for "GraphManager" await mcp.call('memory_search_nodes', { query: 'GraphManager Neo4j implementation', max_results: 5, include_content: true }); // Response (auto-enriched): { "query": "GraphManager Neo4j implementation", "results": [ { "id": "file-1-xxx", "type": "file", "properties": { "path": "src/managers/GraphManager.ts", "language": "typescript", "line_count": 605 }, "relevance": 0.95 } ], "related_files": [ { "path": "src/managers/GraphManager.ts", "content": "import neo4j from 'neo4j-driver';\n\nexport class GraphManager implements IGraphManager {\n private driver: neo4j.Driver;\n \n async addNode(type: string, properties: any): Promise<Node> {\n // ... full implementation ...\n }\n // ... 600 lines of code ...\n}", "language": "typescript", "relevance": 0.95, "reason": "RELATES_TO β†’ keyword_match", "distance": 1 }, { "path": "src/types/IGraphManager.ts", "content": "export interface IGraphManager {\n addNode(type: string, properties: any): Promise<Node>;\n // ...\n}", "relevance": 0.82, "reason": "IMPORTS", "distance": 1 } ], "related_functions": [ { "name": "addNode", "signature": "addNode(type: string, properties: any): Promise<Node>", "file_path": "src/managers/GraphManager.ts", "line_start": 45, "line_end": 78, "body": "async addNode(type: string, properties: any): Promise<Node> {\n const session = this.driver.session();\n // ...\n}", "relevance": 0.90 } ], "execution_time_ms": 145 } ``` **Step 4: Agent Uses Context** ```typescript // Agent now has: // βœ… Full GraphManager.ts source code // βœ… Interface definition // βœ… Related functions with implementations // βœ… Import relationships // Agent can answer questions like: // - "How does addNode work?" β†’ See full implementation // - "What methods are available?" β†’ See interface // - "Where is Neo4j driver used?" β†’ See imports and usage // All from a SINGLE tool call! ``` **Step 5: File Changes Auto-Detected** ```bash # Terminal 2: Make a change to an indexed project echo "// New feature" >> /Users/timothysweet/src/GRAPH-RAG-TODO-main/src/managers/GraphManager.ts # System automatically (within 1-2 seconds): # 1. Detects file change (via polling) # 2. Re-indexes GraphManager.ts # 3. Updates Neo4j graph # 4. Next query includes updated content # Check logs: docker compose logs -f mcp-server # πŸ“ File changed: /workspace/src/GRAPH-RAG-TODO-main/src/managers/GraphManager.ts # βœ… reindex: /workspace/src/GRAPH-RAG-TODO-main/src/managers/GraphManager.ts (87ms) # Make a change to non-indexed project (ignored) echo "// Comment" >> /Users/timothysweet/src/personal-notes/ideas.txt # No logs - not in watch-config.json, so not monitored ``` **Step 6: Remove a Project from Indexing** ```typescript // Agent decides to stop indexing my-web-app await mcp.call('unwatch_folder', { path: '/workspace/src/my-web-app' }); // Response: { "status": "success", "folder": "/workspace/src/my-web-app", "removed": true, "message": "Folder watch removed and config updated" } // System: // 1. Stops watching /workspace/src/my-web-app // 2. Removes entry from ./data/watch-config.json // 3. On restart: Only GRAPH-RAG-TODO-main auto-resumes ``` --- ## πŸ“Š Neo4j Graph Schema for RAG ### Node Types ```cypher // File nodes - Source code files CREATE (f:File { id: "file-1-xxx", type: "file", path: "src/orchestration/WorkflowManager.ts", name: "WorkflowManager.ts", language: "typescript", content: "...", // Full file content size_bytes: 15234, line_count: 456, last_modified: datetime(), indexed_date: datetime(), hash: "sha256-xxx" }) // Function nodes - Extracted functions CREATE (fn:Function { id: "func-1-xxx", type: "function", name: "orchestrateWorkflow", signature: "orchestrateWorkflow(config: WorkflowConfig): Promise<void>", file_path: "src/orchestration/WorkflowManager.ts", line_start: 45, line_end: 78, body: "...", // Full function body docstring: "Orchestrates multi-agent workflow execution", complexity: 12, async: true }) // Class nodes CREATE (c:Class { id: "class-1-xxx", type: "class", name: "WorkflowOrchestrator", file_path: "src/orchestration/WorkflowManager.ts", line_start: 10, line_end: 150, docstring: "Manages workflow orchestration and agent coordination", methods: ["start", "stop", "pause", "resume"] }) // Concept nodes - Abstract concepts CREATE (concept:Concept { id: "concept-1-xxx", type: "concept", name: "orchestration", description: "Multi-agent workflow coordination and execution", keywords: ["workflow", "orchestration", "agent", "coordination"], created: datetime() }) // TODO nodes (existing) CREATE (todo:Todo { id: "todo-1-xxx", type: "todo", title: "Implement workflow orchestration", description: "Add multi-agent orchestration support", status: "in_progress", priority: "high" }) ``` ### Relationship Types ```cypher // File contains function CREATE (file:File)-[:CONTAINS {line_start: 45}]->(func:Function) // File imports another file CREATE (file1:File)-[:IMPORTS { import_statement: "import { Orchestrator } from './orchestration'" }]->(file2:File) // Function calls another function CREATE (func1:Function)-[:CALLS { line: 56, call_type: "direct" }]->(func2:Function) // TODO references file CREATE (todo:Todo)-[:REFERENCES { line: 42, context: "Implementation needed here" }]->(file:File) // Node relates to concept (for semantic search) CREATE (file:File)-[:RELATES_TO { relevance: 0.85, reason: "keyword_match" }]->(concept:Concept) CREATE (func:Function)-[:RELATES_TO { relevance: 0.95, reason: "name_match + docstring" }]->(concept:Concept) // Class extends another class CREATE (class1:Class)-[:EXTENDS]->(class2:Class) // Function implements interface CREATE (func:Function)-[:IMPLEMENTS]->(interface:Interface) ``` --- ## πŸ” RAG Search Implementation ### Multi-Strategy Search ```typescript // src/rag/RAGSearchEngine.ts import { Driver, Session } from 'neo4j-driver'; export interface RAGSearchOptions { query: string; maxResults?: number; minRelevance?: number; includeContent?: boolean; maxDepth?: number; strategies?: ('fulltext' | 'graph' | 'vector')[]; // Phase 1: fulltext+graph, Phase 2: add vector enableVectorSearch?: boolean; // Future: enable when embeddings are added } export interface RAGSearchResult { query: string; results: EnrichedNode[]; related_files: FileContext[]; related_functions: FunctionContext[]; related_concepts: ConceptContext[]; execution_time_ms: number; } export class RAGSearchEngine { constructor( private driver: Driver, private enableVectorSearch: boolean = false // Phase 1: disabled by default ) {} async search(options: RAGSearchOptions): Promise<RAGSearchResult> { const startTime = Date.now(); // 1. Execute multi-strategy search (Phase 1: fulltext + graph only) const searchPromises = [ this.fulltextSearch(options.query), this.graphSearch(options.query) ]; // Phase 2: Add vector search when enabled if (this.enableVectorSearch && options.enableVectorSearch !== false) { searchPromises.push(this.vectorSearch(options.query)); } const results = await Promise.all(searchPromises); const [fulltextResults, graphResults, vectorResults] = results; // 2. Fuse and rank results const rankedNodes = this.fuseResults( fulltextResults, graphResults, vectorResults || [], // Empty array if vector search disabled options.minRelevance || 0.3 ); // 3. Enrich with context const enrichedResults = await this.enrichWithContext( rankedNodes.slice(0, options.maxResults || 10), options.maxDepth || 2 ); return { query: options.query, results: enrichedResults.nodes, related_files: enrichedResults.files, related_functions: enrichedResults.functions, related_concepts: enrichedResults.concepts, execution_time_ms: Date.now() - startTime }; } // Strategy 1: Full-text search (keyword matching) private async fulltextSearch(query: string): Promise<SearchMatch[]> { const session = this.driver.session(); try { const result = await session.run(` // Search file content CALL db.index.fulltext.queryNodes('file_content', $query) YIELD node, score WHERE node:File RETURN node.id AS id, 'file' AS type, score AS relevance, 'fulltext_content' AS match_type UNION // Search function names and docstrings CALL db.index.fulltext.queryNodes('function_search', $query) YIELD node, score WHERE node:Function RETURN node.id AS id, 'function' AS type, score AS relevance, 'fulltext_function' AS match_type UNION // Search class names and docstrings CALL db.index.fulltext.queryNodes('class_search', $query) YIELD node, score WHERE node:Class RETURN node.id AS id, 'class' AS type, score AS relevance, 'fulltext_class' AS match_type ORDER BY relevance DESC LIMIT 50 `, { query }); return result.records.map(r => ({ id: r.get('id'), type: r.get('type'), relevance: r.get('relevance'), matchType: r.get('match_type') })); } finally { await session.close(); } } // Strategy 2: Vector similarity search (PHASE 2 - FUTURE) // This method is prepared but not used in Phase 1 private async vectorSearch(query: string): Promise<SearchMatch[]> { // Phase 1: Return empty results (vector search disabled) if (!this.enableVectorSearch) { return []; } // Phase 2: Implement when embeddings are added const session = this.driver.session(); try { // Generate embedding for query (would use actual embedding service) const queryEmbedding = await this.generateEmbedding(query); const result = await session.run(` // Vector similarity search using cosine distance CALL db.index.vector.queryNodes('file_embeddings', $k, $embedding) YIELD node, score RETURN node.id AS id, node.type AS type, score AS relevance, 'vector_similarity' AS match_type ORDER BY score DESC `, { k: 50, // Top 50 results embedding: queryEmbedding }); return result.records.map(r => ({ id: r.get('id'), type: r.get('type'), relevance: r.get('relevance'), matchType: r.get('match_type') })); } finally { await session.close(); } } // Strategy 3: Graph traversal (relationship-based) private async graphSearch(query: string): Promise<SearchMatch[]> { const session = this.driver.session(); try { // Find concept nodes matching query, then traverse to related entities const result = await session.run(` // Find matching concepts MATCH (c:Concept) WHERE c.name CONTAINS toLower($query) OR any(kw IN c.keywords WHERE kw CONTAINS toLower($query)) // Traverse to related nodes OPTIONAL MATCH (c)<-[r:RELATES_TO]-(entity) WHERE entity:File OR entity:Function OR entity:Class WITH entity, avg(r.relevance) AS avg_relevance, count(r) AS relation_count WHERE entity IS NOT NULL RETURN entity.id AS id, entity.type AS type, (avg_relevance * log(1 + relation_count)) AS relevance, 'memory_traversal' AS match_type ORDER BY relevance DESC LIMIT 50 `, { query }); return result.records.map(r => ({ id: r.get('id'), type: r.get('type'), relevance: r.get('relevance'), matchType: r.get('match_type') })); } finally { await session.close(); } } // Fuse results from multiple strategies private fuseResults( fulltext: SearchMatch[], graph: SearchMatch[], vector: SearchMatch[], // Empty array in Phase 1 minRelevance: number ): RankedNode[] { const nodeScores = new Map<string, { id: string; type: string; scores: { [key: string]: number }; matchTypes: string[]; }>(); // Determine weights based on whether vector search is enabled const hasVector = vector.length > 0; const weights = hasVector ? { fulltext: 0.4, graph: 0.3, vector: 0.3 } // Phase 2: All strategies : { fulltext: 0.6, graph: 0.4, vector: 0.0 }; // Phase 1: No vector // Aggregate scores by node ID [ ...fulltext.map(m => ({ ...m, strategy: 'fulltext', weight: weights.fulltext })), ...graph.map(m => ({ ...m, strategy: 'graph', weight: weights.graph })), ...vector.map(m => ({ ...m, strategy: 'vector', weight: weights.vector })) ].forEach(match => { const existing = nodeScores.get(match.id) || { id: match.id, type: match.type, scores: {}, matchTypes: [] }; existing.scores[match.strategy] = match.relevance * match.weight; existing.matchTypes.push(match.matchType); nodeScores.set(match.id, existing); }); // Calculate final scores and rank const ranked = Array.from(nodeScores.values()) .map(node => ({ id: node.id, type: node.type, relevance: Object.values(node.scores).reduce((a, b) => a + b, 0), matchTypes: [...new Set(node.matchTypes)], scoreBreakdown: node.scores })) .filter(node => node.relevance >= minRelevance) .sort((a, b) => b.relevance - a.relevance); return ranked; } // Enrich results with context private async enrichWithContext( nodes: RankedNode[], maxDepth: number ): Promise<EnrichedResults> { const session = this.driver.session(); try { const nodeIds = nodes.map(n => n.id); const result = await session.run(` // Get primary nodes MATCH (n) WHERE n.id IN $nodeIds // Traverse to related entities (up to maxDepth hops) OPTIONAL MATCH path = (n)-[*1..${maxDepth}]-(related) WHERE related:File OR related:Function OR related:Class OR related:Concept WITH n, related, length(path) AS distance, relationships(path) AS rels // Calculate relevance based on distance WITH n, related, (1.0 / (distance + 1)) AS proximity_score, [r IN rels | type(r)] AS relationship_chain // Group results RETURN n, collect(DISTINCT { node: related, relevance: proximity_score, relationships: relationship_chain, distance: distance }) AS context `, { nodeIds, maxDepth }); // Parse and structure results const enrichedNodes: EnrichedNode[] = []; const relatedFiles: FileContext[] = []; const relatedFunctions: FunctionContext[] = []; const relatedConcepts: ConceptContext[] = []; for (const record of result.records) { const primaryNode = record.get('n'); const context = record.get('context'); // Add primary node enrichedNodes.push(await this.nodeToEnriched(primaryNode)); // Process context for (const ctx of context) { const related = ctx.node; if (!related) continue; const relatedType = related.properties.type; if (relatedType === 'file') { relatedFiles.push({ path: related.properties.path, content: related.properties.content, language: related.properties.language, relevance: ctx.relevance, reason: ctx.relationships.join(' β†’ '), distance: ctx.distance }); } else if (relatedType === 'function') { relatedFunctions.push({ name: related.properties.name, signature: related.properties.signature, file_path: related.properties.file_path, line_start: related.properties.line_start, line_end: related.properties.line_end, body: related.properties.body, docstring: related.properties.docstring, relevance: ctx.relevance, reason: ctx.relationships.join(' β†’ ') }); } else if (relatedType === 'concept') { relatedConcepts.push({ name: related.properties.name, description: related.properties.description, keywords: related.properties.keywords, relevance: ctx.relevance }); } } } // Deduplicate and sort by relevance return { nodes: enrichedNodes, files: this.deduplicateAndSort(relatedFiles, 'path'), functions: this.deduplicateAndSort(relatedFunctions, 'signature'), concepts: this.deduplicateAndSort(relatedConcepts, 'name') }; } finally { await session.close(); } } private async nodeToEnriched(node: any): Promise<EnrichedNode> { const props = node.properties; return { id: props.id, type: props.type, properties: props, created: props.created, updated: props.updated }; } private deduplicateAndSort<T extends { relevance: number }>( items: T[], key: keyof T ): T[] { const seen = new Set<any>(); return items .filter(item => { const k = item[key]; if (seen.has(k)) return false; seen.add(k); return true; }) .sort((a, b) => b.relevance - a.relevance) .slice(0, 10); // Limit to top 10 } private async generateEmbedding(text: string): Promise<number[]> { // PHASE 2 - FUTURE: Integrate with embedding service // Options: OpenAI text-embedding-3-small, Voyage Code, local Sentence-Transformers // For now, this method is not called (enableVectorSearch = false) throw new Error('Vector embeddings not yet implemented. Enable in Phase 2.'); } } ``` --- ## πŸ› οΈ Neo4j Index Setup ### Phase 1: Full-Text Indexes (Required) ```cypher // File content search CREATE FULLTEXT INDEX file_content FOR (n:File) ON EACH [n.content, n.path, n.name]; // Function search CREATE FULLTEXT INDEX function_search FOR (n:Function) ON EACH [n.name, n.signature, n.docstring, n.body]; // Class search CREATE FULLTEXT INDEX class_search FOR (n:Class) ON EACH [n.name, n.docstring]; // Concept search CREATE FULLTEXT INDEX concept_search FOR (n:Concept) ON EACH [n.name, n.description, n.keywords]; ``` ### Phase 2: Vector Indexes (Future - Optional) **⚠️ Not implemented yet. Add these when embedding support is ready.** ```cypher // File embeddings CREATE VECTOR INDEX file_embeddings FOR (n:File) ON n.embedding OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } }; // Function embeddings CREATE VECTOR INDEX function_embeddings FOR (n:Function) ON n.embedding OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } }; ``` --- ## 🎯 MCP Tool Integration ### Updated `memory_search_nodes` Tool ```typescript // src/tools/graph.tools.ts export const GRAPH_SEARCH_TOOL = { name: "memory_search_nodes", description: ` Search the knowledge graph and automatically retrieve related files, functions, and context. **RAG-Enhanced Search:** Results automatically include: - Matched nodes (files, functions, classes, TODOs) - Related file content (full source code) - Related functions (signatures + bodies) - Related concepts and keywords - Graph relationships explaining relevance **Example Usage:** query = "orchestration" Response includes: - All files containing "orchestration" keyword - Functions with "orchestrate" in name/docstring - Classes related to workflow management - TODOs referencing orchestration - Import chains and call graphs **Search Strategies:** 1. Full-text: Keyword matching in content 2. Vector: Semantic similarity (if embeddings enabled) 3. Graph: Relationship traversal from concept nodes **When to use:** - Exploring codebase: "Show me authentication code" - Finding implementations: "Where is error handling?" - Understanding architecture: "How does caching work?" - Context for TODO: "What files relate to this bug?" **Memory Tip:** Use this instead of manually listing files. The graph knows relationships! `, inputSchema: { type: "object", properties: { query: { type: "string", description: "Search query (keywords, concepts, or natural language)" }, max_results: { type: "number", description: "Maximum number of primary results (default: 10)", default: 10 }, min_relevance: { type: "number", description: "Minimum relevance score 0-1 (default: 0.3)", default: 0.3 }, include_content: { type: "boolean", description: "Include full file content in response (default: true)", default: true }, max_depth: { type: "number", description: "Max graph traversal depth for context (default: 2)", default: 2 } }, required: ["query"] } }; // Tool handler export async function handleGraphSearch( args: any, ragEngine: RAGSearchEngine ): Promise<ToolResponse> { const result = await ragEngine.search({ query: args.query, maxResults: args.max_results || 10, minRelevance: args.min_relevance || 0.3, includeContent: args.include_content !== false, maxDepth: args.max_depth || 2 }); return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }] }; } ``` ### Updated `memory_get_node` Tool (Auto-Enriched) ```typescript export const GRAPH_GET_NODE_TOOL = { name: "memory_get_node", description: ` Get a node by ID with automatic RAG context enrichment. **Auto-Enriched Response:** Includes: - Primary node data - Related files (imports, references) - Related functions (calls, implementations) - Related TODOs (if any) - Graph neighborhood (1-2 hops) **Example:** get_node({ id: "todo-1-xxx" }) Response includes: - The TODO itself - Files referenced in TODO context - Functions mentioned or modified - Related TODOs (same component/feature) - Recent commits touching same files **Memory Tip:** Single tool call gets comprehensive context. No need to manually traverse graph! `, inputSchema: { type: "object", properties: { id: { type: "string", description: "Node ID to retrieve" }, include_context: { type: "boolean", description: "Include related nodes (default: true)", default: true }, max_depth: { type: "number", description: "Max graph traversal depth (default: 2)", default: 2 } }, required: ["id"] } }; // Tool handler export async function handleGetNode( args: any, graphManager: GraphManager, ragEngine: RAGSearchEngine ): Promise<ToolResponse> { // Get primary node const node = await graphManager.getNode(args.id); if (!node) { return { content: [{ type: "text", text: JSON.stringify({ error: "Node not found", id: args.id }) }] }; } // Enrich with context if requested if (args.include_context !== false) { const context = await ragEngine.enrichWithContext( [{ id: node.id, type: node.type, relevance: 1.0, matchTypes: [] }], args.max_depth || 2 ); return { content: [{ type: "text", text: JSON.stringify({ node, related_files: context.files, related_functions: context.functions, related_concepts: context.concepts }, null, 2) }] }; } return { content: [{ type: "text", text: JSON.stringify({ node }, null, 2) }] }; } ``` --- ## πŸ“ Example Query: "orchestration" ### Request ```json { "tool": "memory_search_nodes", "arguments": { "query": "orchestration", "max_results": 5, "min_relevance": 0.4, "include_content": true, "max_depth": 2 } } ``` ### Response (Enriched) ```json { "query": "orchestration", "results": [ { "id": "file-1-xxx", "type": "file", "properties": { "path": "src/orchestration/WorkflowManager.ts", "name": "WorkflowManager.ts", "language": "typescript", "line_count": 456 }, "relevance": 0.95, "match_types": ["fulltext_content", "memory_traversal"], "score_breakdown": { "fulltext": 0.38, "vector": 0.27, "graph": 0.30 } }, { "id": "func-2-xxx", "type": "function", "properties": { "name": "orchestrateWorkflow", "signature": "orchestrateWorkflow(config: WorkflowConfig): Promise<void>", "file_path": "src/orchestration/WorkflowManager.ts" }, "relevance": 0.89, "match_types": ["fulltext_function", "vector_similarity"] }, { "id": "class-3-xxx", "type": "class", "properties": { "name": "WorkflowOrchestrator", "docstring": "Manages workflow orchestration and agent coordination" }, "relevance": 0.87, "match_types": ["fulltext_class", "memory_traversal"] }, { "id": "todo-4-xxx", "type": "todo", "properties": { "title": "Implement workflow orchestration", "description": "Add multi-agent orchestration support", "status": "in_progress" }, "relevance": 0.75, "match_types": ["fulltext_content"] } ], "related_files": [ { "path": "src/orchestration/WorkflowManager.ts", "content": "import { Agent } from '../agents/Agent';\nimport { WorkflowConfig } from './types';\n\n/**\n * Manages workflow orchestration and agent coordination\n */\nexport class WorkflowOrchestrator {\n private agents: Map<string, Agent> = new Map();\n \n async orchestrateWorkflow(config: WorkflowConfig): Promise<void> {\n // Initialize agents\n for (const agentConfig of config.agents) {\n const agent = await this.createAgent(agentConfig);\n this.agents.set(agent.id, agent);\n }\n \n // Execute workflow steps\n for (const step of config.steps) {\n await this.executeStep(step);\n }\n }\n \n private async executeStep(step: WorkflowStep): Promise<void> {\n const agent = this.agents.get(step.agentId);\n if (!agent) throw new Error(`Agent not found: ${step.agentId}`);\n \n const result = await agent.execute(step.task);\n return result;\n }\n}\n", "language": "typescript", "relevance": 0.95, "reason": "RELATES_TO β†’ contains", "distance": 1 }, { "path": "src/agents/Agent.ts", "content": "export interface Agent {\n id: string;\n name: string;\n execute(task: Task): Promise<Result>;\n}\n\nexport class BaseAgent implements Agent {\n constructor(\n public id: string,\n public name: string\n ) {}\n \n async execute(task: Task): Promise<Result> {\n // Base implementation\n return { success: true };\n }\n}\n", "language": "typescript", "relevance": 0.72, "reason": "IMPORTS β†’ CONTAINS", "distance": 2 }, { "path": "src/orchestration/types.ts", "content": "export interface WorkflowConfig {\n name: string;\n agents: AgentConfig[];\n steps: WorkflowStep[];\n}\n\nexport interface WorkflowStep {\n id: string;\n agentId: string;\n task: Task;\n dependencies: string[];\n}\n", "language": "typescript", "relevance": 0.68, "reason": "IMPORTS", "distance": 1 } ], "related_functions": [ { "name": "orchestrateWorkflow", "signature": "orchestrateWorkflow(config: WorkflowConfig): Promise<void>", "file_path": "src/orchestration/WorkflowManager.ts", "line_start": 45, "line_end": 78, "body": "async orchestrateWorkflow(config: WorkflowConfig): Promise<void> {\n // Initialize agents\n for (const agentConfig of config.agents) {\n const agent = await this.createAgent(agentConfig);\n this.agents.set(agent.id, agent);\n }\n \n // Execute workflow steps\n for (const step of config.steps) {\n await this.executeStep(step);\n }\n}", "docstring": "Orchestrates multi-agent workflow execution", "relevance": 0.95, "reason": "CONTAINS" }, { "name": "executeStep", "signature": "executeStep(step: WorkflowStep): Promise<void>", "file_path": "src/orchestration/WorkflowManager.ts", "line_start": 80, "line_end": 90, "body": "private async executeStep(step: WorkflowStep): Promise<void> {\n const agent = this.agents.get(step.agentId);\n if (!agent) throw new Error(`Agent not found: ${step.agentId}`);\n \n const result = await agent.execute(step.task);\n return result;\n}", "docstring": null, "relevance": 0.85, "reason": "CALLS β†’ CONTAINS" }, { "name": "createAgent", "signature": "createAgent(config: AgentConfig): Promise<Agent>", "file_path": "src/orchestration/WorkflowManager.ts", "line_start": 30, "line_end": 43, "body": "private async createAgent(config: AgentConfig): Promise<Agent> {\n const agent = new BaseAgent(config.id, config.name);\n // Configure agent\n return agent;\n}", "docstring": "Creates and configures a new agent instance", "relevance": 0.78, "reason": "CALLS" } ], "related_concepts": [ { "name": "orchestration", "description": "Multi-agent workflow coordination and execution", "keywords": ["workflow", "orchestration", "agent", "coordination", "multi-agent"], "relevance": 1.0 }, { "name": "agent-architecture", "description": "Agent-based system design patterns", "keywords": ["agent", "architecture", "design-pattern"], "relevance": 0.65 } ], "execution_time_ms": 145 } ``` --- ## πŸš€ Implementation Checklist ### **PHASE 1: Core RAG (No Vector Embeddings) - CURRENT FOCUS** #### 1.1: Docker Bind Mount Setup (Week 1) - [ ] Update `docker compose.yml` with bind mount volumes - [ ] Configure environment variables for watched folders - [ ] Test polling performance (CPU usage, event latency) - [ ] Configure chokidar with `usePolling: true` - [ ] Document setup for team #### 1.2: Neo4j Schema & Indexes (Week 1-2) - [ ] Create node types (File, Function, Class, Concept) - [ ] Create relationship types (CONTAINS, IMPORTS, CALLS, RELATES_TO) - [ ] βœ… Set up full-text indexes (REQUIRED) - [ ] ❌ Skip vector indexes (Phase 2) - [ ] Create concept nodes for common terms - [ ] Test index performance #### 1.3: RAG Search Engine - Full-Text + Graph Only (Week 2-3) - [ ] Implement `RAGSearchEngine` class with `enableVectorSearch = false` - [ ] βœ… Implement full-text search strategy - [ ] βœ… Implement graph traversal strategy - [ ] ❌ Skip vector search (Phase 2) - [ ] Implement result fusion (60% fulltext + 40% graph) - [ ] Implement context enrichment - [ ] Add caching for performance - [ ] Test with real queries #### 1.4: File Indexing Pipeline (Week 3-4) - [ ] Implement file watcher with polling - [ ] Implement file parser (TypeScript/JavaScript) - [ ] Extract functions, classes, imports - [ ] Generate concept relationships - [ ] ❌ Skip embeddings (Phase 2) - [ ] Batch indexing for existing files - [ ] Incremental updates on file changes #### 1.5: MCP Tool Integration (Week 4) - [ ] Update `memory_search_nodes` tool - [ ] Update `memory_get_node` tool - [ ] Add RAG config environment variables - [ ] Test end-to-end workflow - [ ] Performance optimization - [ ] Documentation and examples --- ### **PHASE 2: Add Vector Embeddings (Future - Optional Enhancement)** **Prerequisites:** Phase 1 complete and working #### 2.1: Embedding Service Integration - [ ] Choose embedding model (OpenAI text-embedding-3-small recommended) - [ ] Implement `generateEmbedding()` method - [ ] Add environment variables for embedding API - [ ] Test embedding generation speed #### 2.2: Neo4j Vector Indexes - [ ] Create vector indexes for File nodes - [ ] Create vector indexes for Function nodes - [ ] Test vector similarity queries - [ ] Benchmark performance vs full-text #### 2.3: Enable Vector Search - [ ] Set `enableVectorSearch = true` in RAGSearchEngine - [ ] Update result fusion (40% text + 30% vector + 30% graph) - [ ] Re-index existing files with embeddings - [ ] Test semantic search quality #### 2.4: Performance Tuning - [ ] Optimize embedding batch size - [ ] Add embedding caching - [ ] Compare quality with/without embeddings - [ ] Document cost implications (API usage) **Note:** This phase can be skipped entirely. Phase 1 provides excellent results using full-text + graph only. --- ## πŸ“Š Performance Targets | Metric | Target | Notes | |--------|--------|-------| | File Event Latency | 1-2s | Polling interval + detection | | Search Latency | <200ms | Query to results (p95) | | Indexing Speed | >100 files/s | Initial bulk indexing | | Re-index Speed | <100ms | Single file update | | Context Enrichment | <100ms | Adding related files to response | | Memory Overhead | <1GB | For 100k indexed files | | CPU Usage (Polling) | 2-5% | Per watched folder at 1s interval | | CPU Usage (Idle) | <1% | No file changes | **Scalability:** 100k+ indexed files, 10 watched folders --- ## οΏ½οΏ½ Success Criteria **Agent Experience:** 1. βœ… Single tool call (`memory_search_nodes`) returns comprehensive context 2. βœ… No manual file listing or traversal needed 3. βœ… Results explain relevance (why this file/function matters) 4. βœ… Response includes actual code, not just references 5. βœ… Follow-up queries use cached context (fast) **Technical:** 1. βœ… Multi-strategy search (keyword + semantic + graph) 2. βœ… Sub-200ms query latency 3. βœ… Automatic updates on file changes (via polling) 4. βœ… Scales to 100k+ files 5. βœ… Bind mounts work reliably across platforms (Linux/macOS/Windows) --- **Last Updated:** 2025-10-17 **Version:** 2.0 **Status:** Design Complete β†’ Ready for Implementation

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server