Markdown RAG Documentation

Overview Schema Related Servers Score Discussions

simplification-opportunities.md•4.73 KiB

# Simplification Opportunities for Index System ## Observed Patterns ### 1. Index Creation Duplication (High Impact) **Problem**: The same index creation pattern appears in 4 places: - `ApplicationContext.create()` (lines 73-84) - `ReadOnlyContext.create()` (lines 43-50) - `src/memory/init.py` (lines 32-39) - `src/worker/process.py` (lines 197-205) ```python # Current pattern (repeated 4x) vector = VectorIndex( embedding_model_name=embedding_model_name, embedding_workers=config.indexing.embedding_workers, ) keyword = KeywordIndex() graph = GraphStore() ``` **Recommendation**: Create a unified `IndexTriple` factory: ```python # src/indices/factory.py from dataclasses import dataclass @dataclass class IndexTriple: """Bundled indices that are always used together.""" vector: VectorIndex keyword: KeywordIndex graph: GraphStore def create_indices( config: Config, warm_up: bool = False, ) -> IndexTriple: """Create the standard index triple from configuration.""" embedding_model = resolve_embedding_model(config) vector = VectorIndex( embedding_model_name=embedding_model, embedding_workers=config.indexing.embedding_workers, ) if warm_up: vector.warm_up() return IndexTriple( vector=vector, keyword=KeywordIndex(), graph=GraphStore(), ) ``` **Benefit**: Single place to modify index creation, reduce duplication ~60 LOC. --- ### 2. Embedding Model Resolution (Medium Impact) **Problem**: This pattern appears everywhere: ```python embedding_model_name = config.llm.embedding_model if embedding_model_name == "local": embedding_model_name = "BAAI/bge-small-en-v1.5" ``` **Recommendation**: Add a computed property to `Config` or `LLMConfig`: ```python # src/config.py @dataclass class LLMConfig: embedding_model: str = "local" @property def resolved_embedding_model(self) -> str: """Return actual model name, resolving 'local' to default.""" if self.embedding_model == "local": return "BAAI/bge-small-en-v1.5" return self.embedding_model ``` **Benefit**: Self-documenting, single source of truth for resolution. --- ### 3. Path Resolution at Different Times (Medium Impact) **Problem**: Paths are resolved at different stages: - `resolve_index_path()` called in `ApplicationContext.create()` - `resolve_memory_path()` called in multiple places - `resolve_documents_path()` called in context creation **Recommendation**: Create a `ResolvedPaths` dataclass populated once: ```python # src/paths.py @dataclass class ResolvedPaths: index: Path documents: Path memory: Path | None snapshots: Path @classmethod def resolve(cls, config: Config, detected_project: str | None) -> ResolvedPaths: """Resolve all paths once based on configuration.""" return cls( index=resolve_index_path(config, detected_project), documents=Path(resolve_documents_path(config, detected_project)), memory=resolve_memory_path(config, detected_project) if config.memory.enabled else None, snapshots=resolve_index_path(config, detected_project) / "snapshots", ) ``` **Benefit**: All paths computed once, passed explicitly, easier to test. --- ### 4. Persist/Load Pattern Duplication (Lower Impact) **Problem**: Both `IndexManager` and `MemoryIndexManager` have similar persist/load code: ```python # IndexManager._persist_indices() self.vector.persist(index_path / "vector") self.keyword.persist(index_path / "keyword") self.graph.persist(index_path / "graph") # MemoryIndexManager.persist() self._vector.persist(indices_path / "vector") self._keyword.persist(indices_path / "keyword") self._graph.persist(indices_path / "graph") ``` **Recommendation**: If `IndexTriple` is introduced, add persist/load methods: ```python class IndexTriple: def persist(self, path: Path) -> None: self.vector.persist(path / "vector") self.keyword.persist(path / "keyword") self.graph.persist(path / "graph") def load(self, path: Path) -> None: self.vector.load(path / "vector") self.keyword.load(path / "keyword") self.graph.load(path / "graph") ``` --- ## Implementation Priority 1. **Embedding Model Resolution** - Quick win, ~5 LOC, immediate benefit 2. **IndexTriple Factory** - Medium effort, removes duplication, improves testability 3. **ResolvedPaths** - Medium effort, explicit data flow, prevents path confusion 4. **Persist/Load Unification** - Lower priority, depends on IndexTriple ## Notes - These changes would be breaking for tests that mock individual components - Consider implementing gradually, starting with embedding model resolution - Each change should include migration of existing code + tests

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/andnp/ragdocs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

simplification-opportunities.md•4.73 KiB