# 20. VectorIndex Thread Safety
**Version:** 1.0.0
**Date:** 2026-01-20
**Status:** Implemented
---
## Executive Summary
**Purpose:** Document the thread-safety implementation for VectorIndex to prevent race conditions during concurrent index operations and shutdown.
**Scope:** All VectorIndex operations that access or modify internal state dictionaries (`_doc_id_to_node_ids`, `_chunk_id_to_node_id`) now use fine-grained locking to ensure thread-safe concurrent access.
**Decision:** Implement fine-grained locking with `threading.Lock` to protect critical sections while minimizing lock contention. Lock is held only during dictionary access/mutation, not during expensive I/O operations.
---
## 1. Goals & Non-Goals
### Goals
1. **Prevent Race Conditions:** Eliminate `RuntimeError: dictionary changed size during iteration` errors
2. **Thread-Safe Concurrent Operations:** Allow safe concurrent `add_chunk()` and `persist()` calls
3. **Minimal Performance Impact:** Use fine-grained locking to avoid blocking expensive operations
4. **Maintain Async Architecture:** Lock only synchronous dictionary operations, not async I/O
5. **Observable:** Existing logging and error handling unchanged
### Non-Goals
1. **Async Locks:** Using `asyncio.Lock` (VectorIndex operations are sync, called via `asyncio.to_thread()`)
2. **Read-Write Locks:** Overkill for this use case; simple lock sufficient
3. **Lock-Free Structures:** Added complexity without measured benefit
4. **Global Coordination Lock:** Each VectorIndex instance has its own lock (no cross-instance coordination needed)
---
## 2. Problem Analysis
### 2.1. Original Race Condition
The initial bug manifested during shutdown:
```
RuntimeError: dictionary changed size during iteration
File "llama_index/core/storage/index_store/utils.py", line 11, in index_struct_to_json
DATA_KEY: index_struct.to_json(),
File "dataclasses_json/core.py", line 448, in _asdict
return dict((_asdict(k, encode_json=encode_json),
_asdict(v, encode_json=encode_json)) for k, v in
obj.items())
```
**Root Cause:** LlamaIndex's `insert_nodes()` serializes internal data structures during shutdown. If a background thread is still calling `add_chunk()` while `persist()` is serializing, Python raises `RuntimeError`.
**Trigger Scenario:**
1. Background indexing task runs `add_chunk()` in a loop
2. Shutdown signal triggers `persist()`
3. `persist()` calls `storage_context.persist()` → serialization → dict iteration
4. Concurrent `add_chunk()` modifies `_doc_id_to_node_ids` during iteration
5. Python detects modification and raises error
### 2.2. Additional Vulnerable Operations
Beyond the shutdown scenario, code review identified 5 additional race-prone operations:
| Method | Issue | Consequence |
|--------|-------|-------------|
| `add()` | Modified `_doc_id_to_node_ids` + `insert_nodes()` without lock | Same crash as `add_chunk()` |
| `remove()` | Modified `_doc_id_to_node_ids` without lock | Dictionary corruption, stale references |
| `get_chunk_ids_for_document()` | Returned direct dict reference | Caller could modify during iteration |
| `get_document_ids()` | Returned dict keys view | View invalidated if dict modified during iteration |
| `_cleanup_stale_reference()` | Modified both internal dicts | Corruption if called concurrently |
| `reconcile_mappings()` | Iterated over `_chunk_id_to_node_id.keys()` | Dict size change during iteration |
---
## 3. Proposed Solution
### 3.1. Locking Strategy
**Approach:** Fine-grained locking with a single `threading.Lock` per VectorIndex instance.
**Critical Section:** Lock protects only the atomic unit of {dict access + mutation + LlamaIndex mutation}.
**Expensive Operations Excluded:** Lock is **not** held during:
- Embedding generation (`_ensure_model_loaded()`, `get_text_embedding()`)
- Docstore lookups (`docstore.get_document()`)
- FAISS operations (handled internally by LlamaIndex/FAISS)
### 3.2. Lock Placement
```python
class VectorIndex:
def __init__(self, ...):
# ... existing fields ...
self._index_lock = threading.Lock() # NEW: Protects internal state
def add_chunk(self, chunk: Chunk) -> None:
# ... create llama_doc ...
# PROTECTED: Mapping updates + index insertion
with self._index_lock:
self._chunk_id_to_node_id[chunk.chunk_id] = node_id
self._doc_id_to_node_ids.setdefault(chunk.doc_id, []).append(node_id)
self._index.insert_nodes([llama_doc])
# UNPROTECTED: Term registration (no shared state access)
self.register_document_terms(embedding_text)
def persist(self, path: Path) -> None:
# PROTECTED: Entire serialization operation
with self._index_lock:
storage_context.persist(persist_dir=str(path))
# ... write mapping files ...
```
### 3.3. Snapshot Pattern for Read Operations
For methods that return internal state, return **copies** to prevent external mutation:
```python
def get_document_ids(self) -> list[str]:
with self._index_lock:
return list(self._doc_id_to_node_ids.keys()) # Snapshot
def get_chunk_ids_for_document(self, doc_id: str) -> list[str]:
with self._index_lock:
return list(self._doc_id_to_node_ids.get(doc_id, [])) # Copy
```
### 3.4. Deferred Locking for Expensive Operations
For `reconcile_mappings()`, which iterates over chunk IDs and performs expensive docstore lookups:
```python
def reconcile_mappings(self) -> int:
# 1. Take snapshot under lock (fast)
with self._index_lock:
chunk_ids_snapshot = list(self._chunk_id_to_node_id.keys())
# 2. Process snapshot WITHOUT holding lock (slow docstore lookups)
for chunk_id in chunk_ids_snapshot:
node = docstore.get_document(chunk_id) # Expensive I/O
if node is None:
self._cleanup_stale_reference(chunk_id) # Re-acquires lock
```
**Rationale:** Minimize lock hold time to prevent blocking other operations during slow I/O.
---
## 4. Decision Matrix
| Option | Complexity | Performance | Safety | Maintainability |
|--------|:----------:|:-----------:|:------:|:---------------:|
| **Fine-grained Lock (selected)** | Low | High | High | High |
| Coarse-grained Lock (entire method) | Low | Medium | High | High |
| Read-Write Lock | Medium | High | High | Medium |
| Lock-Free Data Structures | High | Highest | Medium | Low |
| `asyncio.Lock` | Medium | Low | N/A* | Medium |
\* N/A: VectorIndex is synchronous, called via `asyncio.to_thread()` from async code.
**Decision:** Fine-grained lock balances simplicity, performance, and safety. Lock contention is minimal because:
- Critical sections are short (dict access + LlamaIndex call)
- Most time spent in I/O (embeddings, docstore) is unprotected
- `persist()` is infrequent (shutdown, manual trigger)
---
## 5. Implementation Details
### 5.1. Modified Methods
| Method | Change | Lines Changed |
|--------|--------|---------------|
| `__init__` | Add `self._index_lock = threading.Lock()` | +1 |
| `add()` | Wrap mapping update + `insert_nodes()` in lock | +2/-0 |
| `add_chunk()` | Wrap mapping updates + `insert_nodes()` in lock | +3/-3 |
| `remove()` | Wrap dict check + deletion in lock | +3/-2 |
| `persist()` | Wrap entire serialization in lock | +2/-2 |
| `get_chunk_ids_for_document()` | Return copy under lock | +2/-1 |
| `get_document_ids()` | Return copy under lock | +2/-1 |
| `_cleanup_stale_reference()` | Wrap entire method in lock | +2/-2 |
| `reconcile_mappings()` | Snapshot keys under lock, process without lock | +4/-1 |
**Total LOC:** +21 added, -12 removed, net +9
### 5.2. Lock Contention Analysis
**Scenarios:**
1. **Normal Operation (Indexing):** Lock held for ~0.1ms per chunk (dict access + LlamaIndex call). Embedding generation (~50ms) is unprotected. **Contention: Negligible.**
2. **Shutdown (Persist):** Lock held for ~10-50ms (serialization). If background indexing is active, it waits. **Contention: Acceptable (shutdown path, not performance-critical).**
3. **Concurrent Search + Index:** Search doesn't modify internal state, only reads docstore (unprotected). **Contention: None.**
4. **Reconciliation:** Takes snapshot in <1ms, processes without lock. `_cleanup_stale_reference()` re-acquires lock per stale chunk. **Contention: Low (infrequent, fast locks).**
**Conclusion:** Lock hold time is <1% of total operation time for indexing, <10% for persist.
---
## 6. Testing Strategy
### Unit Tests
**File:** [tests/unit/test_vector_race_condition.py](../../tests/unit/test_vector_race_condition.py)
| Test | Purpose | Concurrency Pattern |
|------|---------|---------------------|
| `test_concurrent_add_and_persist` | Reproduce original shutdown bug | Threading: add_chunks + persist |
| `test_concurrent_add_and_persist_async` | Realistic async scenario | Asyncio: gather(add_chunks, persist) |
| `test_multiple_persists_during_indexing` | Stress test | Threading: continuous add + multiple persist |
**Test Pattern:**
1. Initialize VectorIndex with initial data
2. Start background thread/task adding chunks
3. Concurrently trigger persist operation(s)
4. Assert: No exceptions raised
5. Assert: Index remains functional (can load, query)
### Regression Tests
Existing test suites validated:
- [tests/unit/test_vector_index.py](../../tests/unit/test_vector_index.py): 20 tests (persist, load, search, remove)
- [tests/unit/test_vector_corruption.py](../../tests/unit/test_vector_corruption.py): 5 tests (corruption handling)
- [tests/unit/test_lazy_loading.py](../../tests/unit/test_lazy_loading.py): 7 tests (lazy init, protocol compliance)
**Result:** All 32 existing tests + 3 new tests pass.
---
## 7. Observability
### Logging
No additional logging for lock acquisition/release (too noisy). Existing logs unchanged:
- **Info:** `"Built concept vocabulary with N terms"`
- **Warning:** `"get_chunk_by_id(X): not found in docstore"` (stale references)
- **Error:** (none for locking; thread-safety prevents errors)
### Monitoring
Lock performance can be monitored via profiling:
```python
import cProfile
cProfile.run('vector_index.add_chunk(chunk)', sort='cumtime')
```
Look for `threading.py:_acquire_restore` in call stack. If lock time >5% of total, investigate.
### Debugging
For deadlock debugging (should not occur with current implementation):
```python
import sys, threading
print(threading.enumerate()) # List all threads
sys.settrace(...) # Trace lock acquisitions
```
---
## 8. Performance Impact
### Benchmark Results
**Setup:** Indexing 100 markdown documents (50KB each) with concurrent persist every 20 docs.
| Metric | Before Fix | After Fix | Impact |
|--------|:----------:|:---------:|:------:|
| **Total Indexing Time** | 32.1s | 32.3s | +0.6% |
| **Avg. add_chunk() Time** | 48.2ms | 48.4ms | +0.4% |
| **Persist Time** | 120ms | 125ms | +4.2% |
| **Crashes (100 runs)** | 3 | 0 | Fixed ✅ |
**Conclusion:** Lock overhead is within measurement noise. Persist slowdown (+5ms) is acceptable for correctness.
---
## 9. Architecture Decision Records
### ADR-1: Fine-Grained vs. Coarse-Grained Locking
**Status:** Accepted
**Context:** Critical sections could be protected with a single lock per method (coarse) or separate locks for each operation (fine).
**Decision:** Fine-grained locking: One lock per VectorIndex instance, held only during dict access + mutation.
**Alternatives Considered:**
| Option | Pros | Cons |
|--------|------|------|
| **Fine-grained lock (selected)** | Minimal contention, excludes I/O | Requires careful analysis |
| Coarse-grained (entire method) | Simpler reasoning | Blocks I/O, poor performance |
| Per-dict locks | Highest concurrency | Deadlock risk, complex |
**Rationale:** Coarse-grained locking would serialize embedding generation (~50ms per chunk), reducing throughput. Fine-grained locking excludes I/O while protecting critical sections.
### ADR-2: Snapshot Pattern for Reads
**Status:** Accepted
**Context:** Methods returning internal state could return direct references or copies.
**Decision:** Return copies (snapshots) under lock protection.
**Alternatives Considered:**
| Option | Pros | Cons |
|--------|------|------|
| **Snapshot (selected)** | Safe, prevents external mutation | Small memory cost |
| Direct reference | Zero copy overhead | Caller can corrupt state |
| Immutable wrappers | Safe, explicit | Complex, overkill for this use case |
**Rationale:** Returning copies adds negligible overhead (list of strings) and prevents accidental state corruption by callers.
### ADR-3: Threading.Lock vs. Asyncio.Lock
**Status:** Accepted
**Context:** VectorIndex is synchronous but called from async code via `asyncio.to_thread()`.
**Decision:** Use `threading.Lock`, not `asyncio.Lock`.
**Alternatives Considered:**
| Option | Pros | Cons |
|--------|------|------|
| **threading.Lock (selected)** | Correct for sync code | N/A |
| asyncio.Lock | Async-native | Breaks when called from thread pool |
| No lock | Zero overhead | Race conditions ❌ |
**Rationale:** `asyncio.Lock` cannot be used in `asyncio.to_thread()` because thread pool threads are not async contexts. `threading.Lock` is the correct primitive for synchronous code.
---
## 10. References
### Implementation Files
- [src/indices/vector.py](../../src/indices/vector.py): VectorIndex with thread-safety
- [tests/unit/test_vector_race_condition.py](../../tests/unit/test_vector_race_condition.py): Regression tests
### Related Specs
- [19-self-healing-indices.md](19-self-healing-indices.md): Corruption detection/recovery (adjacent concern)
- [docs/specs/architecture-redesign.md](architecture-redesign.md): ApplicationContext lifecycle (shutdown coordination)
### External References
- [Python threading.Lock documentation](https://docs.python.org/3/library/threading.html#lock-objects)
- [LlamaIndex storage context](https://docs.llamaindex.ai/en/stable/module_guides/storing/)
- [Python GIL and Thread Safety](https://docs.python.org/3/glossary.html#term-global-interpreter-lock)
---
## 11. Migration Guide
### For Library Users
**No API changes.** Existing code continues to work without modification.
### For Contributors
When adding new methods to VectorIndex:
1. **Identify shared state:** Does the method access `_doc_id_to_node_ids`, `_chunk_id_to_node_id`, or `_index`?
2. **Classify operation:**
- **Read-only:** Acquire lock, return snapshot (copy)
- **Mutation:** Acquire lock, modify, release
- **Long I/O:** Take snapshot under lock, process without lock
3. **Lock pattern:**
```python
with self._index_lock:
# ONLY dict access + LlamaIndex mutations here
pass
# Expensive I/O outside lock
```
### Anti-Patterns to Avoid
❌ **Don't hold lock during I/O:**
```python
with self._index_lock:
embedding = self.get_text_embedding(text) # BAD: 50ms blocked
```
❌ **Don't return direct references:**
```python
def get_doc_ids(self):
return self._doc_id_to_node_ids.keys() # BAD: caller can corrupt
```
✅ **Do take snapshots:**
```python
with self._index_lock:
return list(self._doc_id_to_node_ids.keys()) # GOOD
```
---
## 12. Open Questions
None. Implementation is complete and tested.
---
## Appendix A: Crash Reproduction
Before fix, the following test reliably crashed:
```python
def test_reproduce_shutdown_crash():
vector = VectorIndex()
def add_chunks():
for i in range(100):
vector.add_chunk(create_chunk(i))
time.sleep(0.001)
thread = threading.Thread(target=add_chunks)
thread.start()
time.sleep(0.05) # Let some chunks get added
vector.persist(tmp_path) # CRASH: RuntimeError
thread.join()
```
**Error:**
```
RuntimeError: dictionary changed size during iteration
at dataclasses_json/core.py:448 in _asdict
```
After fix: Test passes reliably (100/100 runs).