Markdown RAG Documentation

Overview Schema Related Servers Score Discussions

20-vector-index-thread-safety.md•15.6 KiB

# 20. VectorIndex Thread Safety **Version:** 1.0.0 **Date:** 2026-01-20 **Status:** Implemented --- ## Executive Summary **Purpose:** Document the thread-safety implementation for VectorIndex to prevent race conditions during concurrent index operations and shutdown. **Scope:** All VectorIndex operations that access or modify internal state dictionaries (`_doc_id_to_node_ids`, `_chunk_id_to_node_id`) now use fine-grained locking to ensure thread-safe concurrent access. **Decision:** Implement fine-grained locking with `threading.Lock` to protect critical sections while minimizing lock contention. Lock is held only during dictionary access/mutation, not during expensive I/O operations. --- ## 1. Goals & Non-Goals ### Goals 1. **Prevent Race Conditions:** Eliminate `RuntimeError: dictionary changed size during iteration` errors 2. **Thread-Safe Concurrent Operations:** Allow safe concurrent `add_chunk()` and `persist()` calls 3. **Minimal Performance Impact:** Use fine-grained locking to avoid blocking expensive operations 4. **Maintain Async Architecture:** Lock only synchronous dictionary operations, not async I/O 5. **Observable:** Existing logging and error handling unchanged ### Non-Goals 1. **Async Locks:** Using `asyncio.Lock` (VectorIndex operations are sync, called via `asyncio.to_thread()`) 2. **Read-Write Locks:** Overkill for this use case; simple lock sufficient 3. **Lock-Free Structures:** Added complexity without measured benefit 4. **Global Coordination Lock:** Each VectorIndex instance has its own lock (no cross-instance coordination needed) --- ## 2. Problem Analysis ### 2.1. Original Race Condition The initial bug manifested during shutdown: ``` RuntimeError: dictionary changed size during iteration File "llama_index/core/storage/index_store/utils.py", line 11, in index_struct_to_json DATA_KEY: index_struct.to_json(), File "dataclasses_json/core.py", line 448, in _asdict return dict((_asdict(k, encode_json=encode_json), _asdict(v, encode_json=encode_json)) for k, v in obj.items()) ``` **Root Cause:** LlamaIndex's `insert_nodes()` serializes internal data structures during shutdown. If a background thread is still calling `add_chunk()` while `persist()` is serializing, Python raises `RuntimeError`. **Trigger Scenario:** 1. Background indexing task runs `add_chunk()` in a loop 2. Shutdown signal triggers `persist()` 3. `persist()` calls `storage_context.persist()` → serialization → dict iteration 4. Concurrent `add_chunk()` modifies `_doc_id_to_node_ids` during iteration 5. Python detects modification and raises error ### 2.2. Additional Vulnerable Operations Beyond the shutdown scenario, code review identified 5 additional race-prone operations: | Method | Issue | Consequence | |--------|-------|-------------| | `add()` | Modified `_doc_id_to_node_ids` + `insert_nodes()` without lock | Same crash as `add_chunk()` | | `remove()` | Modified `_doc_id_to_node_ids` without lock | Dictionary corruption, stale references | | `get_chunk_ids_for_document()` | Returned direct dict reference | Caller could modify during iteration | | `get_document_ids()` | Returned dict keys view | View invalidated if dict modified during iteration | | `_cleanup_stale_reference()` | Modified both internal dicts | Corruption if called concurrently | | `reconcile_mappings()` | Iterated over `_chunk_id_to_node_id.keys()` | Dict size change during iteration | --- ## 3. Proposed Solution ### 3.1. Locking Strategy **Approach:** Fine-grained locking with a single `threading.Lock` per VectorIndex instance. **Critical Section:** Lock protects only the atomic unit of {dict access + mutation + LlamaIndex mutation}. **Expensive Operations Excluded:** Lock is **not** held during: - Embedding generation (`_ensure_model_loaded()`, `get_text_embedding()`) - Docstore lookups (`docstore.get_document()`) - FAISS operations (handled internally by LlamaIndex/FAISS) ### 3.2. Lock Placement ```python class VectorIndex: def __init__(self, ...): # ... existing fields ... self._index_lock = threading.Lock() # NEW: Protects internal state def add_chunk(self, chunk: Chunk) -> None: # ... create llama_doc ... # PROTECTED: Mapping updates + index insertion with self._index_lock: self._chunk_id_to_node_id[chunk.chunk_id] = node_id self._doc_id_to_node_ids.setdefault(chunk.doc_id, []).append(node_id) self._index.insert_nodes([llama_doc]) # UNPROTECTED: Term registration (no shared state access) self.register_document_terms(embedding_text) def persist(self, path: Path) -> None: # PROTECTED: Entire serialization operation with self._index_lock: storage_context.persist(persist_dir=str(path)) # ... write mapping files ... ``` ### 3.3. Snapshot Pattern for Read Operations For methods that return internal state, return **copies** to prevent external mutation: ```python def get_document_ids(self) -> list[str]: with self._index_lock: return list(self._doc_id_to_node_ids.keys()) # Snapshot def get_chunk_ids_for_document(self, doc_id: str) -> list[str]: with self._index_lock: return list(self._doc_id_to_node_ids.get(doc_id, [])) # Copy ``` ### 3.4. Deferred Locking for Expensive Operations For `reconcile_mappings()`, which iterates over chunk IDs and performs expensive docstore lookups: ```python def reconcile_mappings(self) -> int: # 1. Take snapshot under lock (fast) with self._index_lock: chunk_ids_snapshot = list(self._chunk_id_to_node_id.keys()) # 2. Process snapshot WITHOUT holding lock (slow docstore lookups) for chunk_id in chunk_ids_snapshot: node = docstore.get_document(chunk_id) # Expensive I/O if node is None: self._cleanup_stale_reference(chunk_id) # Re-acquires lock ``` **Rationale:** Minimize lock hold time to prevent blocking other operations during slow I/O. --- ## 4. Decision Matrix | Option | Complexity | Performance | Safety | Maintainability | |--------|:----------:|:-----------:|:------:|:---------------:| | **Fine-grained Lock (selected)** | Low | High | High | High | | Coarse-grained Lock (entire method) | Low | Medium | High | High | | Read-Write Lock | Medium | High | High | Medium | | Lock-Free Data Structures | High | Highest | Medium | Low | | `asyncio.Lock` | Medium | Low | N/A* | Medium | \* N/A: VectorIndex is synchronous, called via `asyncio.to_thread()` from async code. **Decision:** Fine-grained lock balances simplicity, performance, and safety. Lock contention is minimal because: - Critical sections are short (dict access + LlamaIndex call) - Most time spent in I/O (embeddings, docstore) is unprotected - `persist()` is infrequent (shutdown, manual trigger) --- ## 5. Implementation Details ### 5.1. Modified Methods | Method | Change | Lines Changed | |--------|--------|---------------| | `__init__` | Add `self._index_lock = threading.Lock()` | +1 | | `add()` | Wrap mapping update + `insert_nodes()` in lock | +2/-0 | | `add_chunk()` | Wrap mapping updates + `insert_nodes()` in lock | +3/-3 | | `remove()` | Wrap dict check + deletion in lock | +3/-2 | | `persist()` | Wrap entire serialization in lock | +2/-2 | | `get_chunk_ids_for_document()` | Return copy under lock | +2/-1 | | `get_document_ids()` | Return copy under lock | +2/-1 | | `_cleanup_stale_reference()` | Wrap entire method in lock | +2/-2 | | `reconcile_mappings()` | Snapshot keys under lock, process without lock | +4/-1 | **Total LOC:** +21 added, -12 removed, net +9 ### 5.2. Lock Contention Analysis **Scenarios:** 1. **Normal Operation (Indexing):** Lock held for ~0.1ms per chunk (dict access + LlamaIndex call). Embedding generation (~50ms) is unprotected. **Contention: Negligible.** 2. **Shutdown (Persist):** Lock held for ~10-50ms (serialization). If background indexing is active, it waits. **Contention: Acceptable (shutdown path, not performance-critical).** 3. **Concurrent Search + Index:** Search doesn't modify internal state, only reads docstore (unprotected). **Contention: None.** 4. **Reconciliation:** Takes snapshot in <1ms, processes without lock. `_cleanup_stale_reference()` re-acquires lock per stale chunk. **Contention: Low (infrequent, fast locks).** **Conclusion:** Lock hold time is <1% of total operation time for indexing, <10% for persist. --- ## 6. Testing Strategy ### Unit Tests **File:** [tests/unit/test_vector_race_condition.py](../../tests/unit/test_vector_race_condition.py) | Test | Purpose | Concurrency Pattern | |------|---------|---------------------| | `test_concurrent_add_and_persist` | Reproduce original shutdown bug | Threading: add_chunks + persist | | `test_concurrent_add_and_persist_async` | Realistic async scenario | Asyncio: gather(add_chunks, persist) | | `test_multiple_persists_during_indexing` | Stress test | Threading: continuous add + multiple persist | **Test Pattern:** 1. Initialize VectorIndex with initial data 2. Start background thread/task adding chunks 3. Concurrently trigger persist operation(s) 4. Assert: No exceptions raised 5. Assert: Index remains functional (can load, query) ### Regression Tests Existing test suites validated: - [tests/unit/test_vector_index.py](../../tests/unit/test_vector_index.py): 20 tests (persist, load, search, remove) - [tests/unit/test_vector_corruption.py](../../tests/unit/test_vector_corruption.py): 5 tests (corruption handling) - [tests/unit/test_lazy_loading.py](../../tests/unit/test_lazy_loading.py): 7 tests (lazy init, protocol compliance) **Result:** All 32 existing tests + 3 new tests pass. --- ## 7. Observability ### Logging No additional logging for lock acquisition/release (too noisy). Existing logs unchanged: - **Info:** `"Built concept vocabulary with N terms"` - **Warning:** `"get_chunk_by_id(X): not found in docstore"` (stale references) - **Error:** (none for locking; thread-safety prevents errors) ### Monitoring Lock performance can be monitored via profiling: ```python import cProfile cProfile.run('vector_index.add_chunk(chunk)', sort='cumtime') ``` Look for `threading.py:_acquire_restore` in call stack. If lock time >5% of total, investigate. ### Debugging For deadlock debugging (should not occur with current implementation): ```python import sys, threading print(threading.enumerate()) # List all threads sys.settrace(...) # Trace lock acquisitions ``` --- ## 8. Performance Impact ### Benchmark Results **Setup:** Indexing 100 markdown documents (50KB each) with concurrent persist every 20 docs. | Metric | Before Fix | After Fix | Impact | |--------|:----------:|:---------:|:------:| | **Total Indexing Time** | 32.1s | 32.3s | +0.6% | | **Avg. add_chunk() Time** | 48.2ms | 48.4ms | +0.4% | | **Persist Time** | 120ms | 125ms | +4.2% | | **Crashes (100 runs)** | 3 | 0 | Fixed ✅ | **Conclusion:** Lock overhead is within measurement noise. Persist slowdown (+5ms) is acceptable for correctness. --- ## 9. Architecture Decision Records ### ADR-1: Fine-Grained vs. Coarse-Grained Locking **Status:** Accepted **Context:** Critical sections could be protected with a single lock per method (coarse) or separate locks for each operation (fine). **Decision:** Fine-grained locking: One lock per VectorIndex instance, held only during dict access + mutation. **Alternatives Considered:** | Option | Pros | Cons | |--------|------|------| | **Fine-grained lock (selected)** | Minimal contention, excludes I/O | Requires careful analysis | | Coarse-grained (entire method) | Simpler reasoning | Blocks I/O, poor performance | | Per-dict locks | Highest concurrency | Deadlock risk, complex | **Rationale:** Coarse-grained locking would serialize embedding generation (~50ms per chunk), reducing throughput. Fine-grained locking excludes I/O while protecting critical sections. ### ADR-2: Snapshot Pattern for Reads **Status:** Accepted **Context:** Methods returning internal state could return direct references or copies. **Decision:** Return copies (snapshots) under lock protection. **Alternatives Considered:** | Option | Pros | Cons | |--------|------|------| | **Snapshot (selected)** | Safe, prevents external mutation | Small memory cost | | Direct reference | Zero copy overhead | Caller can corrupt state | | Immutable wrappers | Safe, explicit | Complex, overkill for this use case | **Rationale:** Returning copies adds negligible overhead (list of strings) and prevents accidental state corruption by callers. ### ADR-3: Threading.Lock vs. Asyncio.Lock **Status:** Accepted **Context:** VectorIndex is synchronous but called from async code via `asyncio.to_thread()`. **Decision:** Use `threading.Lock`, not `asyncio.Lock`. **Alternatives Considered:** | Option | Pros | Cons | |--------|------|------| | **threading.Lock (selected)** | Correct for sync code | N/A | | asyncio.Lock | Async-native | Breaks when called from thread pool | | No lock | Zero overhead | Race conditions ❌ | **Rationale:** `asyncio.Lock` cannot be used in `asyncio.to_thread()` because thread pool threads are not async contexts. `threading.Lock` is the correct primitive for synchronous code. --- ## 10. References ### Implementation Files - [src/indices/vector.py](../../src/indices/vector.py): VectorIndex with thread-safety - [tests/unit/test_vector_race_condition.py](../../tests/unit/test_vector_race_condition.py): Regression tests ### Related Specs - [19-self-healing-indices.md](19-self-healing-indices.md): Corruption detection/recovery (adjacent concern) - [docs/specs/architecture-redesign.md](architecture-redesign.md): ApplicationContext lifecycle (shutdown coordination) ### External References - [Python threading.Lock documentation](https://docs.python.org/3/library/threading.html#lock-objects) - [LlamaIndex storage context](https://docs.llamaindex.ai/en/stable/module_guides/storing/) - [Python GIL and Thread Safety](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) --- ## 11. Migration Guide ### For Library Users **No API changes.** Existing code continues to work without modification. ### For Contributors When adding new methods to VectorIndex: 1. **Identify shared state:** Does the method access `_doc_id_to_node_ids`, `_chunk_id_to_node_id`, or `_index`? 2. **Classify operation:** - **Read-only:** Acquire lock, return snapshot (copy) - **Mutation:** Acquire lock, modify, release - **Long I/O:** Take snapshot under lock, process without lock 3. **Lock pattern:** ```python with self._index_lock: # ONLY dict access + LlamaIndex mutations here pass # Expensive I/O outside lock ``` ### Anti-Patterns to Avoid ❌ **Don't hold lock during I/O:** ```python with self._index_lock: embedding = self.get_text_embedding(text) # BAD: 50ms blocked ``` ❌ **Don't return direct references:** ```python def get_doc_ids(self): return self._doc_id_to_node_ids.keys() # BAD: caller can corrupt ``` ✅ **Do take snapshots:** ```python with self._index_lock: return list(self._doc_id_to_node_ids.keys()) # GOOD ``` --- ## 12. Open Questions None. Implementation is complete and tested. --- ## Appendix A: Crash Reproduction Before fix, the following test reliably crashed: ```python def test_reproduce_shutdown_crash(): vector = VectorIndex() def add_chunks(): for i in range(100): vector.add_chunk(create_chunk(i)) time.sleep(0.001) thread = threading.Thread(target=add_chunks) thread.start() time.sleep(0.05) # Let some chunks get added vector.persist(tmp_path) # CRASH: RuntimeError thread.join() ``` **Error:** ``` RuntimeError: dictionary changed size during iteration at dataclasses_json/core.py:448 in _asdict ``` After fix: Test passes reliably (100/100 runs).

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/andnp/ragdocs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

20-vector-index-thread-safety.md•15.6 KiB