Markdown RAG Documentation

performance-safety-improvements.md•10.7 KiB

# Performance, Safety, and Resiliency Improvements Backlog Generated: 2026-01-20 Updated: 2026-01-22 Status: Nearly Complete ## Overview This document tracks medium and lower priority improvements identified during the comprehensive code quality scan. High-priority and critical issues have already been addressed. --- ## Priority 3: Medium Priority (1-2 months) ### Issue #1: Add Retry Logic for Index Persistence (DONE) **Category:** Resiliency - Lack of retry logic **Location:** `src/indexing/manager.py:238-249` **Impact:** MEDIUM - Transient failures cause data loss **Status:** ✅ Fixed - `_persist_indices_with_retry()` uses tenacity `@retry` decorator **Problem:** Single-shot persist with no retry. Transient failures (disk full, NFS timeout, permission errors) cause complete failure. **Solution:** ```python from tenacity import retry, stop_after_attempt, wait_exponential @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10), reraise=True ) def persist(self): # existing implementation ``` **Effort:** 2-3 hours (add dependency, update method, add tests) **Risk:** Low - tenacity is stable, retry is well-understood pattern --- ### Issue #2: Implement Circuit Breaker for Embedding Model (DONE) **Category:** Resiliency - Cascading failure risks **Location:** `src/indices/vector.py:47-52`, `src/utils/circuit_breaker.py` **Impact:** MEDIUM - Repeated failures cascade to all queries **Status:** ✅ Fixed - Custom `CircuitBreaker` class with configurable thresholds (failure_threshold=5, recovery_timeout=60s) **Problem:** No failure tracking. If embedding model fails (OOM, corrupted model file), every query retries the same operation. **Solution:** Implement circuit breaker pattern with: - Failure threshold (e.g., 5 failures within 60s → open circuit) - Automatic fallback to keyword-only search - Half-open state with gradual recovery - Configurable via `config.toml` **Libraries:** - `pybreaker` (1.6k stars, actively maintained) - Or custom implementation (100 lines) **Effort:** 1-2 days (design, implement, test failure scenarios) **Risk:** Medium - requires careful state management, fallback coordination --- ### Issue #3: Optimize Graph Traversal Locking (DONE) **Category:** Performance - Inefficient data structures **Location:** `src/indices/graph.py:79-99` **Impact:** MEDIUM - Lock contention during traversal **Status:** ✅ Fixed - Snapshot pattern implemented (shallow copy under lock, BFS without lock) **Problem:** Holds write lock during entire BFS traversal. For deep graphs, this blocks other operations. **Solution:** Two approaches: 1. **Use read-write lock (RWLock):** `readerwriterlock` package - Allows concurrent reads, exclusive writes - Minimal code changes 2. **Snapshot pattern:** ```python def get_neighbors(self, doc_id, max_depth=2): # Take shallow copy of graph under lock with self._graph_lock: graph_snapshot = self._graph.copy() # BFS on snapshot (no lock held) neighbors = self._bfs_traversal(graph_snapshot, doc_id, max_depth) return neighbors ``` **Recommendation:** Start with snapshot pattern (simpler, no new dependency) **Effort:** 3-4 hours (implement snapshot, benchmark, verify correctness) **Risk:** Low - snapshot is safe, benchmark will show if copy overhead is acceptable --- ### Issue #4: Cache Query Embeddings (DONE) **Category:** Performance - Redundant computations **Location:** `src/search/orchestrator.py:45-48, 67-93` **Impact:** MEDIUM - Repeated expensive operations **Status:** ✅ Fixed - `_embedding_cache` dict with TTL (300s) and LRU eviction, `_get_cached_embedding()` method **Problem:** Query embedding computed multiple times (once for search, again for MMR). **Solution:** ```python from functools import lru_cache from typing import Tuple class SearchOrchestrator: def __init__(self, ...): # ... existing init self._embedding_cache: dict[str, Tuple[list[float], float]] = {} self._cache_max_size = 100 def _get_cached_embedding(self, query: str) -> list[float]: """Get query embedding with LRU cache.""" import time current_time = time.time() # Check cache if query in self._embedding_cache: embedding, timestamp = self._embedding_cache[query] # Expire after 5 minutes if current_time - timestamp < 300: return embedding # Compute new embedding embedding = self._vector._embedding_model.get_text_embedding(query) # Evict oldest if cache full if len(self._embedding_cache) >= self._cache_max_size: oldest_key = min(self._embedding_cache, key=lambda k: self._embedding_cache[k][1]) del self._embedding_cache[oldest_key] self._embedding_cache[query] = (embedding, current_time) return embedding ``` **Alternative:** Use `@lru_cache` on embedding method (simpler but less control) **Effort:** 2-3 hours (implement cache, add tests, benchmark) **Risk:** Low - caching is well-understood, memory bounded **Impact Analysis:** - Embedding computation: ~50-100ms per query - Cache hit rate (estimated): 20-30% (users refine similar queries) - Speedup: 10-30% for cached queries --- ### Issue #5: Add Backpressure to Event Queue (DONE) **Category:** Resiliency - Missing circuit breakers **Location:** `src/indexing/watcher.py:34` **Impact:** MEDIUM - Memory exhaustion under load **Status:** ✅ Fixed in P2 sprint **Problem:** Unbounded queue allows rapid file changes to exhaust memory. **Solution:** Already implemented with `MAX_QUEUE_SIZE = 1000` and drop-oldest policy. --- ## Priority 4: Low Priority (As time permits) ### Issue #6: Add Timeout on Background Tasks (DONE) **Category:** Safety - Missing timeout handling **Location:** `src/context.py:286-297` **Impact:** LOW - Tasks may hang indefinitely **Status:** ✅ Fixed - `_index_git_commits_initial_with_timeout()` wraps git indexing with `asyncio.wait_for(timeout=30.0)` **Problem:** Background tasks (git indexing, file watching) have no timeout protection. **Solution:** ```python async def _startup_background_tasks(self): tasks = [ self._start_file_watcher(), self._start_git_indexing(), ] # Wrap in timeout try: await asyncio.wait_for( asyncio.gather(*tasks, return_exceptions=True), timeout=30.0 ) except asyncio.TimeoutError: logger.warning("Background tasks startup timed out") ``` **Effort:** 1 hour (add timeouts, test) **Risk:** Low --- ### Issue #7: Optimize SearchPipeline Deduplication **Category:** Performance - Inefficient list scans **Location:** `src/search/dedup.py` **Impact:** LOW - Potential O(n²) deduplication **Status:** ⏳ Not completed - Still uses O(n²) pattern with nested loops **Problem:** Deduplication may use repeated list scans. **Solution:** Use set-based lookups for O(n) deduplication. **Effort:** 2 hours (read code, benchmark, optimize if needed) **Risk:** Low --- ### Issue #8: Add Resource Cleanup on Lifecycle Exception (DONE) **Category:** Safety - Resource leaks **Location:** `src/lifecycle.py:54-72` **Impact:** LOW - Resources leak on startup exception **Status:** ✅ Fixed - `start()` method has try/except that calls `await self._cleanup_resources()` on failure **Problem:** If initialization raises exception, resources (file handles, threads) may leak. **Solution:** ```python async def startup(self, timeout: float = 60.0): try: await self._startup_impl(timeout) except Exception: # Cleanup on failure logger.error("Startup failed, cleaning up resources", exc_info=True) await self._cleanup_resources() raise async def _cleanup_resources(self): """Best-effort cleanup of all resources.""" if self.ctx: try: await self.ctx.cleanup() except Exception as e: logger.error(f"Cleanup failed: {e}", exc_info=True) ``` **Effort:** 2 hours (add try/finally, test failure scenarios) **Risk:** Low --- ## Priority 5: Nice-to-Have (Future) ### Issue #9: Optimize Stale Warning Deduplication (DONE) **Category:** Performance - Minor memory leak **Location:** `src/indices/vector.py:56-58, 286-293` **Impact:** LOW **Status:** ✅ Fixed - `OrderedDict` with LRU eviction (`_max_warned_chunks = 1000`) **Problem:** `_warned_stale_chunk_ids` grows unbounded. **Solution:** ```python from functools import lru_cache @lru_cache(maxsize=1000) def _log_stale_warning(self, chunk_id: str): logger.warning(f"Stale chunk reference: {chunk_id}") ``` **Effort:** 15 minutes **Risk:** None --- ### Issue #10: Cancel Emergency Timer on Normal Shutdown (DONE) **Category:** Safety - Resource cleanup **Location:** `src/lifecycle.py:117-125` **Impact:** LOW **Status:** ✅ Fixed - `_cancel_emergency_timer()` called at start of `shutdown()` and in `_cleanup_resources()` **Problem:** Emergency timer thread persists briefly after normal shutdown. **Solution:** Always call `_cancel_emergency_timer()` in shutdown path. **Effort:** 30 minutes **Risk:** None --- ## Implementation Roadmap ### Sprint 1 (Next 2 weeks) - [x] P1: Embedding timeout protection - [x] P1: Bounded vocabularies - [x] P2: FileWatcher shutdown race - [x] P2: MCP handler validation - [x] P2: Blocking I/O async wrapping ### Sprint 2 (Weeks 3-4) - [x] P3-1: Retry logic for persistence - [x] P3-3: Graph traversal locking optimization - [x] P3-4: Query embedding cache ### Sprint 3 (Weeks 5-6) - [x] P3-2: Circuit breaker for embedding model - [x] P4-6: Background task timeouts - [ ] P4-7: SearchPipeline optimization ### Sprint 4+ (Future) - [x] P4-8: Resource cleanup on lifecycle exception - [x] P5-9: Stale warning deduplication - [x] P5-10: Cancel emergency timer --- ## Monitoring & Metrics To validate improvements, track: | Metric | Baseline | Target | Tool | |--------|----------|--------|------| | **Embedding cache hit rate** | 0% | 20-30% | Custom logging | | **Graph traversal latency (p95)** | TBD | -20% | pytest-benchmark | | **Index persist failure rate** | 1-2% | <0.1% | Logs analysis | | **Memory growth rate** | +5MB/hr | +1MB/hr | Memory profiler | | **Query embedding time** | 50-100ms | 50-100ms (cached: <1ms) | pytest-benchmark | --- ## Summary **Completion Status:** 9 of 10 issues resolved (90%) | Priority | Total | Completed | Remaining | |----------|-------|-----------|-----------| | P3 (Medium) | 5 | 5 | 0 | | P4 (Low) | 3 | 2 | 1 (P4-7: SearchPipeline dedup) | | P5 (Nice-to-have) | 2 | 2 | 0 | **Last Updated:** 2026-01-22 --- ## Notes - All P1 and P2 issues resolved as of 2026-01-20 - P3, P4, P5 issues mostly resolved as of 2026-01-22 - Only P4-7 (SearchPipeline deduplication optimization) remains - This backlog focuses on incremental improvements without breaking changes

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/andnp/ragdocs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

performance-safety-improvements.md•10.7 KiB