Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
query-cache-security.md14.8 kB
# Query Analysis & Cache Security **How NornicDB safely analyzes and caches Cypher queries.** ## Overview NornicDB uses a `QueryAnalyzer` to extract metadata from Cypher queries for caching optimization. This document explains the security model, what protections exist, and edge cases. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ Query Execution Flow │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Client │───▶│ QueryAnalyzer │───▶│ Cache Check │ │ │ │ Query │ │ (Metadata only) │ │ (Read-only?) │ │ │ └─────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Query Executor │◀───│ Cache Hit/Miss │ │ │ │ (Full parsing) │ │ │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────┐ │ │ │ Storage Engine │ │ │ │ (Data access) │ │ │ └──────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` ## Security Model ### What QueryAnalyzer Does The `QueryAnalyzer` performs **lightweight keyword detection** to determine: - Is this query read-only? (cacheable) - Is this query a write operation? (requires cache invalidation) - What labels are affected? (for smart cache invalidation) ### What QueryAnalyzer Does NOT Do - **NOT used for access control** - permissions are checked separately - **NOT used for query validation** - the executor validates syntax - **NOT a full parser** - just keyword detection for performance ## Threat Analysis ### ✅ Protected: Write Operations Hidden as Reads **Threat:** An attacker crafts a query that deletes/modifies data but is marked as "read-only" by the analyzer. **Protection:** This is **NOT POSSIBLE** because: 1. Cypher keywords must be whole words with boundaries 2. We search ALL occurrences of each keyword 3. There's no way to "hide" a keyword in valid Cypher syntax **Example attacks that are blocked:** ```cypher -- Hiding DELETE - NOT POSSIBLE MATCH (n) WHERE n.name = 'safe' DELETE n -- ✅ DELETE detected, marked as write -- Splitting keywords - NOT POSSIBLE (invalid Cypher) MATCH (n) DE/*comment*/LETE n -- ✅ Invalid syntax, rejected by executor -- Unicode tricks - NOT POSSIBLE MATCH (n) DELETE n -- ✅ Not ASCII "DELETE", not valid Cypher ``` ### ⚡ Accepted: Read Operations Marked as Writes (False Positives) **Scenario:** A read-only query is incorrectly marked as a write operation. **Impact:** Performance only (not security) - Query is not cached - May trigger unnecessary cache invalidation - Query still executes correctly **When this happens:** | Scenario | Example Query | What Happens | |----------|---------------|--------------| | Property named like keyword | `RETURN n.delete` | Marked as write | | Keyword in string literal | `WHERE name = 'DELETE me'` | Marked as write | | Keyword in comment | `RETURN n // TODO: delete` | Marked as write | **Why we accept this:** 1. **Conservative is safer** - Better to not cache than to cache incorrectly 2. **Rare occurrence** - Few users name properties "delete" or "create" 3. **No security impact** - Query executes correctly, just not cached 4. **Simple implementation** - Complex parsing adds attack surface ### ✅ Protected: False Positives Cannot Leak Data **Concern:** Could an attacker use a false positive to gain unauthorized data access? **Answer:** No. The QueryAnalyzer is completely separate from access control. ``` ┌─────────────────────────────────────────────────────────────┐ │ Security Boundary │ ├─────────────────────────────────────────────────────────────┤ │ │ │ QueryAnalyzer Access Control (RBAC) │ │ ┌────────────────┐ ┌────────────────┐ │ │ │ Determines: │ │ Determines: │ │ │ │ • Caching │ │ • Can user │ │ │ │ • Invalidation │ │ read nodes? │ │ │ │ │ │ • Can user │ │ │ │ Does NOT │ │ write nodes? │ │ │ │ affect: │ │ │ │ │ │ • Permissions │ │ ALWAYS │ │ │ │ • Authorization│ │ ENFORCED │ │ │ └────────────────┘ └────────────────┘ │ │ │ │ │ │ │ Performance │ Security │ │ ▼ ▼ │ │ ┌────────────────┐ ┌────────────────┐ │ │ │ Cache miss │ │ Query allowed │ │ │ │ (slower, but │ │ or denied │ │ │ │ still works) │ │ (enforced) │ │ │ └────────────────┘ └────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` **Attack scenarios that DO NOT work:** | Attack | Why It Fails | |--------|--------------| | "Read secret data by triggering write path" | Write path doesn't grant extra permissions | | "Learn label exists via cache invalidation" | Query must succeed first (already requires access) | | "Bypass RBAC by confusing the analyzer" | Analyzer has no role in authorization | | "Force stale data via false positive" | False positive = no caching = always fresh data | **Example:** ```cypher -- Attacker tries to read :Secret nodes they shouldn't access MATCH (n:Secret) RETURN n.delete ``` What happens: 1. QueryAnalyzer sees "delete" → marks as write (false positive) 2. **Access control checks if user can read :Secret** → DENIED 3. Query rejected - attacker gets nothing 4. Cache invalidation never happens (query failed) The false positive only affects step 1 (caching decision). The security check in step 2 is completely independent and always enforced. ## Cache Security ### Result Caching Only **read-only queries** are cached: ```go // Cached (read-only): MATCH (n:Person) RETURN n.name // ✅ Cached CALL db.labels() // ✅ Cached (db.* procedures) SHOW INDEXES // ✅ Cached // NOT cached (write operations): CREATE (n:Person {name: 'Alice'}) // ❌ Not cached MATCH (n) SET n.updated = timestamp() // ❌ Not cached CALL gds.graph.drop('myGraph') // ❌ Not cached (non-db.* procedure) ``` ### Cache Invalidation Write operations invalidate relevant caches: ```go // Smart invalidation by label: CREATE (n:Person) // Invalidates :Person cache MATCH (n:Company) DELETE n // Invalidates :Company cache // Full invalidation (no label detected): MATCH (n) DELETE n // Invalidates ALL caches ``` ### Cache Key Security Cache keys include: - Normalized query string - Parameter values (hashed) This prevents: - Cache poisoning across different parameters - Cache confusion between similar queries ## Procedure Call Security ### Read-Only Procedures (Cached) Only `CALL db.*` procedures are marked as read-only: ```cypher CALL db.labels() -- ✅ Read-only, cached CALL db.propertyKeys() -- ✅ Read-only, cached CALL db.relationshipTypes() -- ✅ Read-only, cached ``` ### Write Procedures (Not Cached) All other procedures are conservatively treated as writes: ```cypher CALL gds.graph.project(...) -- ❌ May be write, not cached CALL gds.graph.drop(...) -- ❌ Is a write, not cached CALL apoc.create.node(...) -- ❌ Is a write, not cached CALL custom.myProcedure(...) -- ❌ Unknown, not cached ``` ## Implementation Details ### Keyword Detection ```go // containsKeyword searches ALL occurrences, checking word boundaries func containsKeyword(upper, keyword string) bool { // Iterates through string finding each occurrence // Checks character before: must not be alphanumeric or _ // Checks character after: must not be alphanumeric or _ // Returns true if ANY occurrence is a valid keyword } ``` ### Word Boundary Rules A keyword match requires: - **Before:** Start of string, or non-alphanumeric (except `_`) - **After:** End of string, or non-alphanumeric (except `_`) ``` "DELETE n" → DELETE matched (space before, space after) "n.delete" → delete matched (. is not alphanumeric) - FALSE POSITIVE "ToDelete" → DELETE not matched (o before) "DELETED" → DELETE not matched (D after) ``` ## Query Types Reference ### Write Operations (Never Cached) | Keyword | Detection | Example | |---------|-----------|---------| | `CREATE` | Word boundary | `CREATE (n:Node)` | | `MERGE` | Word boundary | `MERGE (n:Node {id: 1})` | | `DELETE` | Word boundary | `MATCH (n) DELETE n` | | `DETACH DELETE` | Phrase | `MATCH (n) DETACH DELETE n` | | `SET` | Word boundary | `MATCH (n) SET n.x = 1` | | `REMOVE` | Word boundary | `MATCH (n) REMOVE n.label` | ### Read Operations (Cacheable) | Pattern | Requirement | Example | |---------|-------------|---------| | `MATCH ... RETURN` | No write keywords | `MATCH (n) RETURN n` | | `CALL db.*` | Starts with `CALL db.` | `CALL db.labels()` | | `SHOW` | Starts with `SHOW` | `SHOW INDEXES` | ### Schema Operations (Special Handling) | Operation | Cached | Invalidates | |-----------|--------|-------------| | `CREATE INDEX` | No | Schema cache | | `DROP INDEX` | No | Schema cache | | `CREATE CONSTRAINT` | No | Schema cache | | `DROP CONSTRAINT` | No | Schema cache | ## Testing & Verification ### Security Tests The following attack patterns are verified in tests: ```go // All write operations detected: "MATCH (n) DELETE n" // HasDelete = true ✅ "MATCH (n) WHERE type = 'safe' DELETE n" // HasDelete = true ✅ "CREATE (n:Node)" // HasCreate = true ✅ "MERGE (n:Node)" // HasMerge = true ✅ "MATCH (n) SET n.x = 1" // HasSet = true ✅ "MATCH (n) REMOVE n.label" // HasRemove = true ✅ // False positives (safe, not cached): "RETURN n.delete" // HasDelete = true (false positive) "WHERE name = 'DELETE'" // HasDelete = true (false positive) ``` ### Running Security Tests ```bash # Run all cypher tests including security checks cd nornicdb go test ./pkg/cypher/... -v # Run specific cache/security tests go test ./pkg/cypher/... -run "Cache|Security" -v ``` ## Best Practices ### For Query Authors 1. **Avoid keyword-named properties** - Don't name properties `delete`, `create`, `set`, etc. 2. **Use parameters for values** - `WHERE name = $name` instead of `WHERE name = 'DELETE'` 3. **Explicit transactions for writes** - Use `BEGIN/COMMIT` for critical write operations ### For Administrators 1. **Monitor cache hit rates** - Low hit rates may indicate false positives 2. **Review slow query logs** - Uncached queries appear more frequently 3. **Use production mode** - Ensures strict security settings ## Summary | Aspect | Protection Level | Notes | |--------|-----------------|-------| | Write ops hidden as reads | ✅ **Fully Protected** | Not possible in valid Cypher | | Read ops marked as writes | ⚡ **Accepted** | Performance impact only | | False positive data leakage | ✅ **Protected** | Analyzer doesn't affect access control | | Cache poisoning | ✅ **Protected** | Keys include parameters | | Procedure writes | ✅ **Protected** | Only `db.*` cached | | Access control bypass | ✅ **N/A** | Analyzer not used for auth | | Cache timing side-channel | ✅ **Protected** | Query must succeed before cache action | --- **See Also:** - [HTTP Security](http-security.md) - Network-level protections - [Compliance Guide](../compliance/) - Regulatory requirements

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server