M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

llm-ast-security.md•14.6 KiB

# LLM & AST Security Patterns **Safe patterns for integrating Large Language Models with NornicDB's query system.** ## Overview NornicDB uses a **stream parse-execute** architecture where queries are parsed and executed in a single pass, with a **lazy AST** built separately for LLM features. This document covers: 1. Why stream parse-execute is fast 2. Security considerations for this approach 3. Safe LLM integration patterns 4. Plugin security with AST ## Architecture: Stream Parse-Execute + Lazy AST ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ NornicDB Query Architecture │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ Traditional DB (Full Parse → AST → Execute): │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Query → [Lexer] → [Parser] → AST → [Optimizer] → [Executor] │ │ │ │ ↑ │ │ │ │ Full tree in memory │ │ │ │ Multiple passes │ │ │ │ ~10-50µs overhead │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ NornicDB (Stream Parse-Execute + Lazy AST): │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Query → [Stream Parser+Executor] ─────────────────→ Result │ │ │ │ ↓ (async/lazy) │ │ │ │ [AST Builder] → Cached AST (for LLM features) │ │ │ │ │ │ │ │ • Single pass through query │ │ │ │ • Execute as we parse │ │ │ │ • No intermediate allocations for simple queries │ │ │ │ • ~1-3µs for simple queries (10-50x faster) │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` ## Why Stream Parse-Execute is Fast ### Performance Benefits | Aspect | Traditional AST | Stream Parse-Execute | |--------|----------------|---------------------| | Memory allocations | Full tree (~100+ nodes) | Minimal (on-demand) | | Passes over query | 2-4 (lex, parse, optimize, execute) | 1 (combined) | | Latency to first byte | After full parse | Immediate | | Simple query overhead | ~10-50µs | ~1-3µs | | Complex query overhead | ~50-200µs | ~10-50µs | ### Why This Works ```go // Traditional: Parse everything, then execute ast := parser.Parse(query) // Allocate full AST optimized := optimizer.Optimize(ast) // Another pass result := executor.Execute(optimized) // Finally execute // Stream: Execute as we recognize tokens // MATCH (n:Person) WHERE n.age > 21 RETURN n.name // ↓ // See MATCH → start pattern matching // See (n:Person) → find nodes with label // See WHERE → filter in-place // See RETURN → project results // No intermediate AST needed! ``` ### Benchmarks ``` BenchmarkSimpleQuery/Traditional-16 50000 25000 ns/op 12000 B/op 150 allocs/op BenchmarkSimpleQuery/StreamExecute-16 500000 2500 ns/op 1200 B/op 15 allocs/op ↑ 10x faster ↑ 10x less memory ``` ## Security Considerations: Stream Parse-Execute ### ✅ Benefits | Property | Explanation | |----------|-------------| | **No TOCTOU** | Check and use happen atomically - no race between validation and execution | | **Smaller attack surface** | No intermediate AST to manipulate | | **Consistent parsing** | Same code parses AND executes - no semantic drift | | **Memory safety** | Less allocation = less chance of buffer issues | ### ⚠️ Considerations | Concern | Risk | Mitigation | |---------|------|------------| | **Partial execution on error** | Side effects before error detected | Transaction rollback, implicit transactions | | **No global semantic check** | Can't validate entire query before starting | Validate syntax first, use explicit transactions for critical ops | | **Error recovery** | Harder to provide good error messages | Store context during parse for error reporting | | **Optimization opportunities** | Can't reorder operations | Accept trade-off for latency; complex queries can use AST path | ### Partial Execution Risk ```cypher // Risk: What if error occurs mid-query? CREATE (a:Node) CREATE (b:Node) CREATE (c:Invalid!) // ← Syntax error here // Without protection: a and b created, c fails // With implicit transaction: all rolled back ``` **Our mitigation:** ```go // Implicit transactions wrap non-explicit queries func (e *Executor) Execute(ctx, query, params) { // For write operations without explicit transaction if isWriteQuery && !inExplicitTransaction { tx := e.storage.BeginTransaction() defer tx.Rollback() // Rollback on any error result, err := e.executeWithTransaction(tx, query) if err != nil { return nil, err // Transaction rolled back } tx.Commit() // Only commit if fully successful return result, nil } // ... } ``` ## Safe LLM Integration Patterns ### Pattern 1: Read-Only AST Analysis (SAFE) ```go // ✅ SAFE: LLM only reads AST, doesn't generate queries func AnalyzeQueryComplexity(query string) (*Analysis, error) { info := analyzer.Analyze(query) ast := info.GetAST() // LLM analyzes structure complexity := llm.AnalyzeComplexity(ast) suggestions := llm.SuggestIndexes(ast) return &Analysis{ Complexity: complexity, Suggestions: suggestions, }, nil } ``` **Why safe:** LLM output is informational only, never executed. ### Pattern 2: Query Correction with Validation (SAFE with care) ```go // ⚠️ REQUIRES VALIDATION: LLM generates corrected query func CorrectQuery(originalQuery string, error error) (string, error) { info := analyzer.Analyze(originalQuery) ast := info.GetAST() // LLM suggests correction correctedQuery := llm.SuggestCorrection(ast, error) // ⚠️ CRITICAL: Validate the corrected query if err := validateQuerySafety(correctedQuery); err != nil { return "", fmt.Errorf("LLM generated unsafe query: %w", err) } // ⚠️ CRITICAL: User must approve before execution return correctedQuery, nil // Return for user approval, don't auto-execute } func validateQuerySafety(query string) error { // 1. Parse with our parser (not LLM's interpretation) info := analyzer.Analyze(query) // 2. Check for dangerous patterns if info.HasDelete && !userHasDeletePermission { return errors.New("DELETE not permitted") } // 3. Validate all identifiers for _, label := range info.Labels { if !isValidIdentifier(label) { return fmt.Errorf("invalid label: %s", label) } } return nil } ``` ### Pattern 3: Query Generation from Natural Language (HIGH RISK) ```go // ❌ DANGEROUS: Direct execution of LLM-generated queries func DangerousNLToQuery(naturalLanguage string) (*Result, error) { query := llm.GenerateCypher(naturalLanguage) return executor.Execute(ctx, query, nil) // ❌ NO VALIDATION! } // ✅ SAFE: Validated execution with constraints func SafeNLToQuery(naturalLanguage string, constraints QueryConstraints) (*Result, error) { query := llm.GenerateCypher(naturalLanguage) // 1. Parse and analyze info := analyzer.Analyze(query) // 2. Enforce constraints if !constraints.AllowWrites && info.IsWriteQuery { return nil, errors.New("write operations not allowed") } if !constraints.AllowDelete && info.HasDelete { return nil, errors.New("delete operations not allowed") } // 3. Whitelist labels and relationships for _, label := range info.Labels { if !constraints.AllowedLabels.Contains(label) { return nil, fmt.Errorf("label %s not in whitelist", label) } } // 4. Use read-only transaction for safety if !info.IsWriteQuery { return executor.ExecuteReadOnly(ctx, query, nil) } // 5. Require explicit user approval for writes return nil, errors.New("write query requires user approval") } ``` ### Pattern 4: Plugin Query Execution (REQUIRES SANDBOXING) ```go // Plugin-generated queries need strict sandboxing type PluginQueryConstraints struct { MaxResults int TimeoutMs int AllowedLabels []string AllowedTypes []string ReadOnly bool MaxDepth int // For path queries } func ExecutePluginQuery(plugin Plugin, query string, constraints PluginQueryConstraints) (*Result, error) { // 1. Validate plugin has permission for this query type info := analyzer.Analyze(query) if constraints.ReadOnly && info.IsWriteQuery { return nil, errors.New("plugin attempted write in read-only mode") } // 2. Check labels against plugin's allowed set for _, label := range info.Labels { if !contains(constraints.AllowedLabels, label) { return nil, fmt.Errorf("plugin not authorized for label: %s", label) } } // 3. Inject constraints into query constrainedQuery := injectConstraints(query, constraints) // 4. Execute with timeout ctx, cancel := context.WithTimeout(ctx, time.Duration(constraints.TimeoutMs)*time.Millisecond) defer cancel() return executor.Execute(ctx, constrainedQuery, nil) } func injectConstraints(query string, c PluginQueryConstraints) string { // Add LIMIT if not present if c.MaxResults > 0 && !strings.Contains(strings.ToUpper(query), "LIMIT") { query = query + fmt.Sprintf(" LIMIT %d", c.MaxResults) } return query } ``` ## AST Cache Security ### Cache Key Security ```go // Cache keys include normalized query + parameter hash type CacheKey struct { NormalizedQuery string ParamHash uint64 } // This prevents: // 1. Cache confusion between different parameter values // 2. Cache poisoning from similar queries ``` ### Cache Isolation ```go // Per-user cache isolation (if multi-tenant) type UserScopedCache struct { userID string cache *QueryCache } func (c *UserScopedCache) Get(query string, params map[string]any) (*QueryInfo, bool) { key := c.makeKey(c.userID, query, params) return c.cache.Get(key) } ``` ### Cache Invalidation Security ```go // Write operations invalidate relevant caches func (e *Executor) invalidateCachesAfterWrite(info *QueryInfo) { // Don't trust the query to tell us what it modified // Use actual affected labels from execution affectedLabels := e.getActualAffectedLabels() e.cache.InvalidateLabels(affectedLabels) } ``` ## Heimdall Plugin Security ### Plugin Query Constraints ```yaml # Plugin manifest defines allowed operations plugin: name: analytics-plugin permissions: queries: read_only: true allowed_labels: [Event, User, Session] allowed_relationships: [TRIGGERED, BELONGS_TO] max_results: 10000 timeout_ms: 5000 ast_access: can_read: true can_generate: false # Cannot generate new queries ``` ### Plugin AST Access ```go // Plugins get read-only AST view type PluginASTView struct { Clauses []ASTClauseView // Sanitized view IsReadOnly bool Labels []string } func (ast *AST) ToPluginView() *PluginASTView { return &PluginASTView{ Clauses: sanitizeClauses(ast.Clauses), IsReadOnly: ast.IsReadOnly, Labels: ast.Labels, } } // Plugins cannot: // - Modify AST // - Generate queries from AST // - Access raw query text (potential injection source) ``` ## Security Checklist ### For LLM Integration - [ ] LLM output is NEVER directly executed - [ ] All LLM-generated queries are re-parsed by our parser - [ ] Write operations require explicit user approval - [ ] Label/relationship whitelisting enforced - [ ] Timeout and result limits applied - [ ] Audit logging for all LLM-generated queries ### For Plugin Integration - [ ] Plugin permissions declared in manifest - [ ] Read-only mode enforced where declared - [ ] Label/relationship access controlled - [ ] Query timeout enforced - [ ] Result count limited - [ ] AST access is read-only view ### For AST Cache - [ ] Cache keys include parameter hash - [ ] Per-user isolation (if multi-tenant) - [ ] Write operations invalidate affected caches - [ ] Cache TTL prevents stale data ## Summary | Component | Security Model | |-----------|---------------| | **Stream Parse-Execute** | Atomic parse+execute, no intermediate attack surface | | **Lazy AST** | Observation only, never in execution path | | **LLM Integration** | Re-parse all output, whitelist, require approval | | **Plugin Queries** | Sandbox with permissions, timeouts, limits | | **AST Cache** | Keyed by query+params, per-user isolation | --- **See Also:** - [Query Cache Security](query-cache-security.md) - Cache-specific security - [HTTP Security](http-security.md) - Network-level protections - [Plugin Development Guide](../development/plugin-guide.md) - Building secure plugins

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

llm-ast-security.md•14.6 KiB