Session Buddy

ENGRAM_FEATURE_1_QUERY_CACHE.md•10.1 KiB

# Implementation Plan: Exact-Match Query Cache **Inspired by**: DeepSeek Engram's O(1) deterministic addressing pattern **Priority**: High value, Low effort **Status**: Proposed ## Overview Add a fast hash-based cache layer that bypasses expensive vector similarity search for repeated or near-identical queries. This implements Engram's core insight: **static patterns don't need neural/embedding computation**. ## Problem Statement Currently, Session Buddy performs full vector similarity search for every query: ```python # Current: Always compute embedding + vector search embedding = await self._generate_embedding(query) results = SELECT ... array_cosine_similarity(embedding, $1) ... ``` This is wasteful when: - The same query is asked multiple times in a session - Near-identical queries differ only in whitespace/punctuation - Common patterns like "how do I..." or "what is the..." appear frequently ## Proposed Solution ### Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Query Pipeline │ ├─────────────────────────────────────────────────────────────┤ │ 1. Normalize query (lowercase, strip, collapse whitespace) │ │ 2. Compute hash key │ │ 3. Check L1 cache (in-memory dict) │ │ 4. Check L2 cache (DuckDB table) │ │ 5. If miss: full vector search → populate caches │ └─────────────────────────────────────────────────────────────┘ ``` ### Components #### 1. Query Normalizer ```python def normalize_query(query: str) -> str: """Normalize query for cache key generation. Applies same normalizations as Engram: - NFKC Unicode normalization - Lowercase - Strip accents (optional, configurable) - Collapse whitespace - Remove punctuation (optional, configurable) """ import unicodedata # NFKC normalization (compatibility decomposition + canonical composition) text = unicodedata.normalize("NFKC", query) # Lowercase text = text.lower() # Collapse whitespace text = " ".join(text.split()) return text.strip() ``` #### 2. Cache Key Generator ```python def compute_cache_key(normalized_query: str, project: str | None = None) -> str: """Compute deterministic cache key using xxhash for speed. Uses xxhash (not cryptographic, but fast) like Engram's XOR hashing. """ import xxhash # Include project in key for isolation key_input = f"{project or 'global'}:{normalized_query}" return xxhash.xxh64(key_input.encode()).hexdigest() ``` #### 3. Two-Level Cache Structure **L1: In-Memory LRU Cache** - Fast dict with LRU eviction - Configurable max size (default: 1000 entries) - Cleared on adapter close - TTL: session duration **L2: Persistent DuckDB Table** - Survives restarts - Configurable TTL (default: 7 days) - Automatic cleanup of stale entries ```sql CREATE TABLE IF NOT EXISTS query_cache ( cache_key TEXT PRIMARY KEY, normalized_query TEXT NOT NULL, project TEXT, result_ids TEXT[], -- Array of conversation/reflection IDs hit_count INTEGER DEFAULT 1, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_accessed TIMESTAMP DEFAULT CURRENT_TIMESTAMP, ttl_seconds INTEGER DEFAULT 604800 -- 7 days ); CREATE INDEX idx_query_cache_accessed ON query_cache(last_accessed); ``` ### Integration Points #### File: `session_buddy/adapters/reflection_adapter_oneiric.py` ```python class ReflectionDatabaseAdapterOneiric: def __init__(self, ...): # Existing initialization ... # NEW: Query cache (L1 in-memory) self._query_cache: dict[str, QueryCacheEntry] = {} self._query_cache_max_size: int = 1000 self._query_cache_hits: int = 0 self._query_cache_misses: int = 0 async def search_conversations( self, query: str, limit: int = 10, project: str | None = None, min_similarity: float = 0.7, use_cache: bool = True, # NEW parameter ) -> list[dict[str, Any]]: """Search with optional cache bypass.""" if use_cache: # Try cache first cached = await self._check_query_cache(query, project) if cached is not None: self._query_cache_hits += 1 return cached[:limit] # Cache miss - full vector search self._query_cache_misses += 1 results = await self._vector_search(query, limit, project, min_similarity) if use_cache and results: await self._populate_query_cache(query, project, results) return results ``` ### Cache Entry Data Model ```python from dataclasses import dataclass from datetime import datetime @dataclass class QueryCacheEntry: """Cached query result entry.""" cache_key: str normalized_query: str project: str | None result_ids: list[str] hit_count: int = 1 created_at: datetime = field(default_factory=lambda: datetime.now(UTC)) last_accessed: datetime = field(default_factory=lambda: datetime.now(UTC)) ``` ### Configuration Add to `session_buddy/adapters/settings.py`: ```python class ReflectionAdapterSettings(BaseSettings): # Existing settings... # Query cache settings query_cache_enabled: bool = True query_cache_l1_max_size: int = 1000 query_cache_l2_ttl_days: int = 7 query_cache_normalize_accents: bool = False # Strip accents? query_cache_normalize_punctuation: bool = False # Strip punctuation? ``` ### Cache Invalidation Strategy 1. **Time-based**: L2 entries expire after TTL 1. **Event-based**: Invalidate when new conversations are stored in same project 1. **Manual**: Expose `clear_query_cache()` method 1. **Size-based**: LRU eviction for L1 when max size exceeded ```python async def _invalidate_project_cache(self, project: str | None) -> None: """Invalidate cache entries for a project when new data is added.""" # L1: Remove matching entries keys_to_remove = [ k for k, v in self._query_cache.items() if v.project == project ] for key in keys_to_remove: del self._query_cache[key] # L2: Mark as stale (will be refreshed on next hit) if self.conn: self.conn.execute(""" DELETE FROM query_cache WHERE project = ? """, [project]) ``` ### Metrics & Observability ```python def get_cache_stats(self) -> dict[str, Any]: """Return cache performance metrics.""" total = self._query_cache_hits + self._query_cache_misses hit_rate = self._query_cache_hits / total if total > 0 else 0.0 return { "l1_size": len(self._query_cache), "l1_max_size": self._query_cache_max_size, "hits": self._query_cache_hits, "misses": self._query_cache_misses, "hit_rate": hit_rate, "embedding_cache_hits": self._cache_hits, # Existing "embedding_cache_misses": self._cache_misses, # Existing } ``` ## Implementation Steps ### Phase 1: Core Cache Infrastructure (2-3 hours) 1. [ ] Add `QueryCacheEntry` dataclass to `session_buddy/adapters/models.py` 1. [ ] Add `normalize_query()` and `compute_cache_key()` to new `session_buddy/utils/query_cache.py` 1. [ ] Add cache settings to `ReflectionAdapterSettings` 1. [ ] Create L2 cache table in `_ensure_tables()` ### Phase 2: Integration (2-3 hours) 1. [ ] Add L1 cache dict to `ReflectionDatabaseAdapterOneiric.__init__()` 1. [ ] Implement `_check_query_cache()` method (L1 → L2 lookup) 1. [ ] Implement `_populate_query_cache()` method 1. [ ] Modify `search_conversations()` to use cache 1. [ ] Add cache invalidation to `store_conversation()` and `store_reflection()` ### Phase 3: Cleanup & Metrics (1-2 hours) 1. [ ] Implement `clear_query_cache()` method 1. [ ] Add `get_cache_stats()` method 1. [ ] Add periodic L2 cleanup (delete expired entries) 1. [ ] Clear L1 cache in `aclose()` ### Phase 4: Testing (2-3 hours) 1. [ ] Unit tests for normalization functions 1. [ ] Unit tests for cache key generation 1. [ ] Integration tests for cache hit/miss scenarios 1. [ ] Performance benchmarks comparing cached vs uncached search ### Phase 5: MCP Tool Exposure (1 hour) 1. [ ] Add `query_cache_stats` tool to expose metrics 1. [ ] Add `clear_query_cache` tool for manual invalidation 1. [ ] Update `reflection_stats` to include cache stats ## Dependencies **New dependency** (optional but recommended for speed): ```toml [project.optional-dependencies] performance = ["xxhash>=3.0"] # Fast non-cryptographic hashing ``` **Fallback**: Use `hashlib.blake2b` if xxhash not available (still fast, stdlib). ## Expected Performance Impact | Scenario | Current | With Cache | |----------|---------|------------| | Repeated exact query | ~50-100ms | \<1ms | | Similar query (normalized match) | ~50-100ms | \<1ms | | New unique query | ~50-100ms | ~50-100ms + cache write | | Cache hit rate (estimated) | 0% | 30-50% for active sessions | ## Risks & Mitigations | Risk | Mitigation | |------|------------| | Stale results after new data | Invalidate on store operations | | Memory bloat from L1 cache | LRU eviction at max size | | Cache key collisions | Use 64-bit hash (collision probability ~1e-10 for 1M entries) | | Complexity increase | Well-isolated module, can disable via config | ## Success Criteria - [ ] Cache hit rate >30% for typical session workflows - [ ] No regression in search result quality - [ ] \<1ms latency for cache hits - [ ] Zero memory leaks (L1 properly cleared on close) - [ ] All existing tests pass - [ ] New cache tests achieve >90% coverage ## References - DeepSeek Engram: https://github.com/deepseek-ai/Engram - xxhash Python: https://github.com/ifduyue/python-xxhash - LRU Cache patterns: https://docs.python.org/3/library/functools.html#functools.lru_cache

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lesleslie/session-buddy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ENGRAM_FEATURE_1_QUERY_CACHE.md•10.1 KiB