EliteMCP

EliteMCP
docs

nsccn_architecture.md•6.68 KiB

# NSCCN Architecture Documentation ## Overview NSCCN (Neuro-Symbolic Causal Code Navigator) is a context-efficient code navigation system that replaces naive file operations with a research-backed architecture achieving ~84% token reduction on input context. ## Architecture Layers NSCCN implements a "Map vs. Territory" cognitive model with six layers: ``` Layer 6: FastMCP Interface (expose tools via stdio) Layer 5: Intent Resolution Engine (NL → graph traversals + Hybrid RRF) Layer 4: Skeletonization Engine (Telegraphic Semantic Compression) Layer 3: Heterogeneous Code Graph (structural nodes + causal edges) Layer 2: Neural-Lexical Index (sqlite-vec + ripgrep) Layer 1: Incremental Graph Builder (watchdog + tree-sitter) ``` ## Components ### 1. Database Layer (`database.py`) **Purpose**: Persistent storage for code graph, embeddings, and cached skeletons. **Key Features**: - SQLite database with three tables: entities, edges, skeletons - Efficient indexing for graph queries - Vector storage for embeddings (numpy arrays as BLOBs) - CRUD operations for entities, edges, and skeletons **Schema**: ```sql -- Entities: code elements (functions, classes, modules) entities (id, type, file_path, name, start_line, end_line, signature, docstring, embedding, last_updated) -- Edges: causal relationships (CALLS, INHERITS) edges (source_id, relation, target_id, context) -- Skeletons: cached compressed views skeletons (file_path, content, last_modified) ``` ### 2. Parser (`parser.py`) **Purpose**: Extract code structure and generate compressed views using Tree-sitter. **Key Features**: - Tree-sitter based Python parsing - Extract functions, classes, methods with metadata - Generate entity IDs: `{type}:{file_path}:{name}` - Extract CALLS and INHERITS edges - Telegraphic Semantic Compression (skeleton generation) - Incremental parsing support (cached parse trees) **Skeleton Format**: - Keep: signatures, type hints, docstrings, structure - Remove: implementation details (replaced with `...`) - Target: 70-90% token reduction ### 3. Embedding Engine (`embeddings.py`) **Purpose**: Generate semantic embeddings for code entities. **Key Features**: - Uses fastembed with nomic-ai/nomic-embed-text-v1.5 - Matryoshka Representation Learning (MRL) for 256-dim vectors - Batch embedding for efficiency - Async embedding queue for incremental updates - Embeds signature + docstring (not full code) ### 4. Search Engine (`search.py`) **Purpose**: Hybrid search combining lexical and semantic approaches. **Key Features**: - **Lexical Stream**: ripgrep for fast keyword search - **Semantic Stream**: embedding similarity search - **RRF Fusion**: Reciprocal Rank Fusion (k=60) **RRF Formula**: ``` Score(d) = Σ 1/(k + rank(d)) ``` ### 5. Graph Engine (`graph.py`) **Purpose**: Causal flow analysis and multi-hop reasoning. **Key Features**: - Upstream traversal: find callers (who calls this?) - Downstream traversal: find callees (what does this call?) - Inheritance chain: class hierarchy - Depth limiting (default 3 hops) - Returns subgraphs as adjacency lists ### 6. File Watcher (`watcher.py`) **Purpose**: Real-time incremental graph updates. **Key Features**: - Monitors `.py` files using watchdog - Debounces events (100ms default) - On file change: 1. Re-parse with Tree-sitter 2. Compare with database, update delta 3. Re-extract edges 4. Queue for re-embedding 5. Invalidate skeleton cache - Background thread execution - Graceful shutdown ### 7. Tools Interface (`tools.py`) **Purpose**: Expose NSCCN capabilities as FastMCP tools. **Four Primary Tools**: 1. **`search_and_rank`** (Locate) - Find code entities using Hybrid RRF - Input: natural language query - Output: compact JSON with entity IDs, scores, signatures 2. **`read_skeleton`** (Orient) - Get Telegraphic Semantic Compression view - Input: file path - Output: compressed file view (signatures + structure) 3. **`trace_causal_path`** (Trace) - Trace causal graph from entity - Input: entity ID, direction (upstream/downstream/inheritance), depth - Output: JSON adjacency list of subgraph 4. **`open_surgical_window`** (Examine) - Read specific entity implementation - Input: entity ID, context lines - Output: entity source code with line numbers ### 8. Main Server (`server.py`) **Purpose**: Orchestrate all components and expose via FastMCP. **Key Features**: - Initialize all components - Load configuration - Build initial index (`--init` flag) - Start file watcher - Register tools with FastMCP - Graceful shutdown - Tool discovery (`--info` flag) ## Data Flow ### Initial Indexing ``` 1. Scan directory for Python files 2. Parse each file with Tree-sitter 3. Extract entities and edges 4. Embed entities (batch) 5. Store in database 6. Generate and cache skeletons ``` ### Incremental Updates ``` 1. File change detected by watchdog 2. Debounce (100ms) 3. Re-parse file 4. Compare entities (added/removed/updated) 5. Update database (entities + edges) 6. Queue changed entities for re-embedding 7. Invalidate skeleton cache ``` ### Query Flow ``` 1. User query → search_and_rank 2. Lexical search (ripgrep) + Semantic search (embeddings) 3. RRF fusion 4. Return ranked entity IDs 5. User selects entity → trace_causal_path 6. Graph traversal (upstream/downstream) 7. Return subgraph 8. User identifies target → open_surgical_window 9. Return entity source with context ``` ## Performance Characteristics **Initial Indexing**: - ~1 minute for 10k LOC codebase - Memory: ~500MB for 100k entity index **Incremental Updates**: - <100ms per file change - Background processing **Search**: - <50ms for hybrid RRF query - Combines lexical and semantic results **Skeleton Generation**: - <20ms per file - 70-90% token reduction - Cached for performance ## Configuration Configuration file: `config/nsccn_config.json` ```json { "database_path": "nsccn.db", "embedding_model": "nomic-ai/nomic-embed-text-v1.5", "embedding_dim": 256, "rrf_k": 60, "max_traversal_depth": 3, "skeleton_cache_enabled": true, "watch_debounce_ms": 100, "supported_languages": ["python"], "ignore_patterns": ["**/test_*", "**/__pycache__/**", "**/.*"] } ``` ## Design Principles 1. **Zero-cost**: No external APIs, all local execution 2. **Context-efficient**: 84% token reduction via skeletons 3. **Incremental**: Real-time updates with minimal overhead 4. **Causal**: Graph-based reasoning over naive search 5. **Hybrid**: Combine lexical and semantic approaches 6. **Surgical**: Navigate to exact code location efficiently ## Future Enhancements - Multi-language support (JavaScript, TypeScript, Go) - Additional edge types (PROPAGATES_ERROR) - Cross-file call resolution - IDE integration - Performance optimizations (parallel indexing)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/itstanner5216/EliteMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

nsccn_architecture.md•6.68 KiB