Skip to main content
Glama
README.md9.56 kB
# Nabu **Code Intelligence Ecosystem for LLM Agents** <p align="center"> <em>Ancient wisdom meets modern AI cognition</em> </p> --- ## What Is This? Semantic code analysis via queryable graphs (KuzuDB) **Status:** v0.1-alpha | MIT Licensed | Research prototype --- ## The Problem Traditional LLM coding agents waste enormous token budgets on sequential operations: ```python # Typical workflow (expensive): glob("**/*.py") # Find 1000 files → read(file1.py) # Read entire file (5000 tokens) → grep("class Foo") # Search for class → read(file2.py) # Read another file (3000 tokens) → grep("import") # Search again → ... repeat 20 times ... # 100K+ tokens burned ``` **Measured impact:** massive context consumption, poor spatial awareness. --- ## The Solution **Semantic-first navigation with persistent workspace state:** ```python # Nabu + Nisaba workflow (efficient): map_codebase() # Get project overview (2K tokens) → search("authentication logic") # Semantic search (500 tokens) → show_structure("AuthService") # Skeleton only (100 tokens) → query_relationships(...) # Graph query (300 tokens) ``` **Measured savings:** surgical token usage, persistent spatial memory. --- ## Quick Start ### Installation ```bash git clone https://github.com/yourusername/nabu_nisaba.git cd nabu_nisaba pip install -r requirements.txt ``` ### Parse Your Codebase ```bash # Create database nabu db reindex --db-path ./my_project.kuzu --repo-path /path/to/your/code ``` ### Query the Graph (Python API) ```python from nabu.db import KuzuConnectionManager db = KuzuConnectionManager.get_instance("my_project.kuzu") # Find all callers of a function result = db.execute(""" MATCH (caller:Frame)-[e:Edge {type: 'CALLS'}]->(target:Frame) WHERE target.qualified_name = 'AuthService.validate_token' AND e.confidence >= 0.7 RETURN caller.qualified_name, caller.file_path, e.confidence ORDER BY e.confidence DESC """) print(result.get_as_df()) ``` ### Use MCP Tools (LLM Agents) Available when running unified environment: ```python # === NABU TOOLS (Code Analysis) === # Get project overview map_codebase() # Semantic search search(query="authentication logic", k=10) # Get class structure (no implementation - 53x token savings!) show_structure(target="AuthService", detail_level="minimal") # Graph query query_relationships(cypher_query="...") # Impact analysis check_impact(target="critical_function", max_depth=2) # === NISABA TOOLS (Workspace Management) === # Load knowledge domains activate_augments(patterns=["architecture/*", "refactoring/*"]) # Open file windows (persistent across turns) file_windows.open_frame(frame_path="AuthService") # Navigate code tree structural_view(operation="search", query="database connection") # Track TODOs todo_write(operation="add", todos=[{"content": "Refactor auth"}]) ``` --- ## Architecture ### Nabu: Semantic Code Analysis **Pipeline:** ``` Source Code ↓ tree-sitter parsing Raw AST Nodes ↓ multi-pass resolution Frame Hierarchy (with confidence) ↓ relationship extraction Graph (Frames + Edges) ↓ KuzuDB export Queryable Database ``` **Key Components:** - **CodebaseParser**: Multi-language parsing (Python, C++, Java, Perl) - **Frame Types**: CODEBASE, LANGUAGE, PACKAGE, CLASS, CALLABLE, IF_BLOCK, FOR_LOOP, TRY_BLOCK, ... - **Edge Types**: CONTAINS, CALLS, INHERITS, IMPLEMENTS, IMPORTS, USES - **Confidence System**: 4-tier probabilistic scoring - **Hybrid Search**: BM25 (keyword) + UniXcoder + CodeBERT (semantic) - **KuzuDB**: Embedded graph database with Cypher queries **Schema:** ```cypher // Frame node (20 properties) (f:Frame { id: STRING, // Stable SHA256-based type: STRING, // Frame type qualified_name: STRING, // Fully qualified confidence: FLOAT, // 0.0-1.0 confidence_tier: STRING, // HIGH/MEDIUM/LOW/SPECULATIVE language: STRING, // python, cpp, java, perl file_path: STRING, start_line: INT32, end_line: INT32, content: STRING, ... }) // Edge relationship (4 properties) -[e:Edge { type: STRING, // CALLS, CONTAINS, INHERITS, ... confidence: FLOAT, confidence_tier: STRING, metadata: STRING // JSON extra data }]-> ``` --- ## Use Cases ### For LLM Agents **Efficient code exploration:** ```python # Instead of reading 50 files sequentially... map_codebase() # Understand structure → search("error handling") # Find relevant code → show_structure("ErrorHandler") # See skeleton (no impl) → check_impact("ErrorHandler") # Understand dependencies # 10x faster, 3x fewer tokens ``` ### For Developers **Architecture analysis:** ```cypher -- Find most-called functions (hotspots) MATCH (f:Frame)<-[e:Edge {type: 'CALLS'}]-() RETURN f.qualified_name, count(e) as calls ORDER BY calls DESC LIMIT 20 ``` **Refactoring prep:** ```cypher -- Find all code that depends on this function MATCH path = (start:Frame {qualified_name: 'old_function'}) <-[:Edge {type: 'CALLS'}*1..3]-(dependent) RETURN dependent.qualified_name, dependent.file_path ``` **Code review:** ```cypher -- Find high-complexity functions MATCH (func:Frame {type: 'CALLABLE'}) -[:Edge {type: 'CONTAINS'}]->(control:Frame) WHERE control.type IN ['IF_BLOCK', 'FOR_LOOP', 'TRY_BLOCK'] WITH func, count(control) + 1 as complexity WHERE complexity > 10 RETURN func.qualified_name, complexity ORDER BY complexity DESC ``` ### For Researchers - **Graph algorithms**: PageRank for central classes, community detection - **Code metrics**: Coupling, cohesion, complexity via graph queries - **ML training data**: Structured code graphs with confidence scores - **Cognitive models**: Study LLM agent behavior with workspace state logs --- ## What Works ✅ - ✅ Multi-language parsing (Python, C++, Java, Perl) - ✅ Unified frame abstraction (15 types, cross-language queries) - ✅ Multi-pass confidence scoring (4 tiers) - ✅ KuzuDB graph export with full schema - ✅ Hybrid search (BM25 + UniXcoder + CodeBERT) - ✅ Incremental updates (stable IDs, surgical changes) - ✅ Dynamic system prompt injection (augments) - ✅ Persistent workspace state (file windows, structural view) - ✅ MCP server integration (nabu + nisaba) - ✅ Workflow guidance (pattern-based suggestions) --- ## Known Limitations 🚧 - Control flow is structural only (no data flow analysis yet) - External libraries referenced but not parsed (stdlib can be pre-indexed) - Dynamic calls limited (Python `getattr`, C++ function pointers) - Local variables not tracked yet (only class fields) - Renames are catastrophic (`class Foo → Bar` changes stable_id) - Database size grows exponentially when file watch and incremental updates are on --- ## Philosophy > **"Ideally abstract, without compromising too much."** ### Core Principles - **Heuristics over perfection** - 95% accurate fast > 60% slow & complex - **Confidence over certainty** - Probabilistic understanding; model uncertainty explicitly - **Relationships over entities** - Graph connections are the insight - **Abstraction over detail** - Unified model; details in `content` property - **Token efficiency matters** - Measured savings, not claims - **Spatial over sequential** - Navigate state space, don't replay history - **Perception as mutable** - Augments change how agents think, not just what they know ### Design Decisions **Why frames instead of language-specific types?** - Simplifies schema (15 types vs 80+) - Enables cross-language queries - Acceptable false positives (heuristic approach) **Why confidence tiers?** - Not all analysis is certain (especially dynamic languages) - Let users filter by precision/recall tradeoff - Progressive enhancement across passes **Why dynamic system prompt injection?** - Enables perceptual filtering (augments change cognition) - Persistent workspace state (spatial memory) - Avoids token waste (don't repeat context) **Why KuzuDB?** - Embedded (no external server) - Fast graph queries with Cypher - Native Python integration - **Note:** Project archived Oct 2024; v0.11.3 is final stable release --- ## Name Etymology ### Nabu (Akkadian: 𒀭𒀝) Mesopotamian god of writing, wisdom, and scribes (2nd millennium BCE). **His role:** - Records the fates of gods and mortals - Patron of scribal arts - Symbol: clay tablet and stylus **The parallel:** ``` Ancient Nabu → Modern Nabu Clay tablets → KuzuDB database Stylus inscriptions → Tree-sitter parsing Divine decrees → Code structure Permanent records → Persistent graphs ``` ### Nisaba (Sumerian: 𒀭𒉀) Mesopotamian goddess of writing, accounting, and harvest. **Her role:** - Organization and record-keeping - Patron of scribes - Sister of Nabu **The parallel:** ``` Ancient Nisaba → Modern Nisaba Grain organization → Tool registry organization Record maintenance → Workspace state persistence Scribe support → MCP framework for developers ``` When you use this ecosystem, you invoke ancient wisdom keepers who preserved knowledge across millennia. 📜 --- <p align="center"> <em>"The stylus of wisdom inscribes the tablets of understanding."</em><br> <em>— Ancient Nabu hymn, modernized</em> </p> <p align="center"> <strong>Token efficiency measured. Confidence explicitly modeled. Workspace persistently navigable.</strong> </p> <p align="center">📜 ✨ 🤖</p>

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/y3i12/nabu_nisaba'

If you have feedback or need assistance with the MCP directory API, please join our Discord server