Markdown RAG Documentation

Overview Schema Related Servers Score Discussions

17-search-overhaul-plan.md•12.4 KiB

# Implementation Plan: Search Infrastructure Overhaul (Spec 17)

## 1. Executive Summary

This plan details the implementation of GraphRAG, Score-Aware Fusion, and HyDE features for `mcp-markdown-ragdocs`. Phase 1 introduces typed edges and Leiden community detection to GraphStore. Phase 2 replaces static RRF weights with variance-aware dynamic fusion. Phase 3 adds a `search_with_hypothesis` MCP tool for hypothesis-driven search. The upgrade is backwards-compatible; existing indices migrate on re-index.

**Total estimated LOC:** ~650 (new) + ~200 (modifications)

---

## 2. Implementation Phases

### Phase 1: Graph Upgrade (~280 LOC)

**Goal:** Transform GraphStore from untyped edges to typed relationships with community detection.

| Task | File | LOC | Description |
|------|------|-----|-------------|
| 1.1 | `src/models.py` | 15 | Add `EdgeType` enum |
| 1.2 | `src/indices/graph.py` | 80 | Add community detection, storage, and query methods |
| 1.3 | `src/parsers/markdown.py` | 60 | Infer edge types from header context |
| 1.4 | `src/indexing/manager.py` | 25 | Pass header context to `add_edge()` |
| 1.5 | `src/search/orchestrator.py` | 40 | Add community boosting to graph neighbor scoring |
| 1.6 | `src/config.py` | 20 | Add `[search.graph]` config section |
| 1.7 | `pyproject.toml` | 5 | Add `cdlib` dependency |
| 1.8 | Tests | 35 | Unit tests for edge types and community detection |

**Dependencies:** None (standalone phase)

### Phase 2: Fusion Logic (~220 LOC)

**Goal:** Replace static RRF with variance-aware dynamic fusion.

| Task | File | LOC | Description |
|------|------|-----|-------------|
| 2.1 | `src/search/fusion.py` | 80 | Add `calculate_variance()`, `normalize_strategy_scores()`, `compute_dynamic_weights()` |
| 2.2 | `src/search/orchestrator.py` | 60 | Integrate dynamic weights into query flow |
| 2.3 | `src/search/classifier.py` | 30 | Extend query classification for variance signals |
| 2.4 | `src/config.py` | 15 | Add `[search.fusion]` config for variance thresholds |
| 2.5 | Tests | 35 | Unit tests for variance calculation and dynamic weights |

**Dependencies:** None (can run parallel to Phase 1)

### Phase 3: HyDE Support (~150 LOC)

**Goal:** Add hypothesis-driven search via new MCP tool.

| Task | File | LOC | Description |
|------|------|-----|-------------|
| 3.1 | `src/mcp_server.py` | 60 | Add `search_with_hypothesis` tool registration and handler |
| 3.2 | `src/search/orchestrator.py` | 40 | Add `query_with_hypothesis()` method |
| 3.3 | `src/config.py` | 10 | Add `[search.hyde]` config section |
| 3.4 | Tests | 40 | Integration tests for HyDE tool |

**Dependencies:** Phase 2 (uses normalized fusion scores)

---

## 3. File Manifest

### Files to Create

| File | Purpose |
|------|---------|
| `tests/unit/test_edge_types.py` | Unit tests for EdgeType inference |
| `tests/unit/test_community_detection.py` | Unit tests for Leiden algorithm integration |
| `tests/unit/test_dynamic_fusion.py` | Unit tests for variance-aware weights |
| `tests/integration/test_hyde_search.py` | Integration tests for HyDE tool |

### Files to Modify

| File | Changes |
|------|---------|
| [src/models.py](../../src/models.py) | Add `EdgeType` enum |
| [src/indices/graph.py](../../src/indices/graph.py) | Add community detection, typed edge queries |
| [src/parsers/markdown.py](../../src/parsers/markdown.py) | Infer edge types from header context |
| [src/indexing/manager.py](../../src/indexing/manager.py) | Pass header context to graph edges |
| [src/search/fusion.py](../../src/search/fusion.py) | Add variance calculation, dynamic weights |
| [src/search/orchestrator.py](../../src/search/orchestrator.py) | Integrate community boost and dynamic weights |
| [src/search/classifier.py](../../src/search/classifier.py) | Extend for variance signals |
| [src/mcp_server.py](../../src/mcp_server.py) | Add `search_with_hypothesis` tool |
| [src/config.py](../../src/config.py) | Add `GraphConfig`, `FusionConfig`, `HydeConfig` |
| [pyproject.toml](../../pyproject.toml) | Add `cdlib` dependency |

---

## 4. Dependency Graph

```
Phase 1 (Graph)          Phase 2 (Fusion)
     │                        │
     │                        │
     └────────────┬───────────┘
                  │
                  ▼
           Phase 3 (HyDE)
```

- Phases 1 and 2 are **independent** and can be developed in parallel.
- Phase 3 depends on Phase 2 (uses normalized fusion scores for HyDE results).

---

## 5. Function Signatures

### Phase 1: Graph Upgrade

```python
# src/models.py
class EdgeType(Enum):
    LINKS_TO = "links_to"
    IMPLEMENTS = "implements"
    TESTS = "tests"
    RELATED = "related"

# src/indices/graph.py
class GraphStore:
    def detect_communities(self) -> dict[str, int]:
        ...

    def get_community(self, doc_id: str) -> int | None:
        ...

    def get_community_members(self, community_id: int) -> list[str]:
        ...

    def boost_by_community(
        self,
        doc_ids: list[str],
        seed_doc_ids: set[str],
        boost_factor: float = 1.1,
    ) -> dict[str, float]:
        ...

# src/parsers/markdown.py
def infer_edge_type(header_context: str, target: str) -> EdgeType:
    ...
```

### Phase 2: Fusion Logic

```python
# src/search/fusion.py
def normalize_strategy_scores(
    scores: list[float],
) -> list[float]:
    ...

def calculate_variance(scores: list[float]) -> float:
    ...

def compute_dynamic_weights(
    vector_scores: list[float],
    keyword_scores: list[float],
    base_vector_weight: float,
    base_keyword_weight: float,
    variance_threshold: float = 0.1,
) -> tuple[float, float]:
    ...

def fuse_results_v2(
    results: dict[str, list[tuple[str, float]]],
    k: int,
    base_weights: dict[str, float],
    modified_times: dict[str, float],
    use_dynamic_weights: bool = True,
) -> list[tuple[str, float]]:
    ...
```

### Phase 3: HyDE Support

```python
# src/search/orchestrator.py
class SearchOrchestrator:
    async def query_with_hypothesis(
        self,
        hypothesis: str,
        top_k: int = 10,
        top_n: int = 5,
    ) -> tuple[list[ChunkResult], CompressionStats]:
        ...

# src/mcp_server.py (tool schema)
Tool(
    name="search_with_hypothesis",
    description="Search documentation using a hypothesis about what the answer might look like...",
    inputSchema={
        "type": "object",
        "properties": {
            "hypothesis": {"type": "string", "description": "Hypothesis about the expected answer"},
            "top_n": {"type": "integer", "default": 5},
        },
        "required": ["hypothesis"],
    },
)
```

---

## 6. Acceptance Criteria

### Phase 1: Graph Upgrade

| ID | Criterion | Verification |
|----|-----------|--------------|
| P1-AC1 | `EdgeType` enum exists with 4 values | Unit test |
| P1-AC2 | `GraphStore.add_edge()` accepts `edge_type` parameter | Unit test |
| P1-AC3 | Links in `# Testing` sections create `TESTS` edges | Unit test |
| P1-AC4 | Links in `# Implementation` sections create `IMPLEMENTS` edges | Unit test |
| P1-AC5 | `detect_communities()` returns community assignments | Unit test |
| P1-AC6 | Community data persists across restarts | Integration test |
| P1-AC7 | Search results boost same-community documents by configurable factor | Integration test |

### Phase 2: Fusion Logic

| ID | Criterion | Verification |
|----|-----------|--------------|
| P2-AC1 | `normalize_strategy_scores()` returns values in [0, 1] | Unit test |
| P2-AC2 | `calculate_variance()` returns 0 for identical scores | Unit test |
| P2-AC3 | Low variance reduces weight for that strategy | Unit test |
| P2-AC4 | `fuse_results_v2()` produces same order as `fuse_results()` when variance is high | Unit test |
| P2-AC5 | Dynamic weights enabled via config flag | Unit test |
| P2-AC6 | Search quality improves on benchmark queries (manual verification) | Manual test |

### Phase 3: HyDE Support

| ID | Criterion | Verification |
|----|-----------|--------------|
| P3-AC1 | `search_with_hypothesis` tool registered in MCP | E2E test |
| P3-AC2 | Tool embeds hypothesis text (not raw query) | Integration test |
| P3-AC3 | Vague query + good hypothesis finds relevant docs | Integration test |
| P3-AC4 | Tool respects `top_n` parameter | Unit test |
| P3-AC5 | Tool returns normalized scores in [0, 1] | Unit test |

---

## 7. Risk Register

| ID | Risk | Severity | Likelihood | Mitigation |
|----|------|----------|------------|------------|
| R1 | `cdlib` adds heavy dependencies (igraph, numpy) | Medium | High | Verify package size; consider vendoring Leiden only |
| R2 | Community detection slow on large graphs | Medium | Medium | Run async during `persist()`; cache results |
| R3 | Dynamic weights degrade search quality | High | Low | Feature flag; A/B testing; fallback to static |
| R4 | HyDE increases latency (extra embedding call) | Low | High | Document tradeoff; optional tool |
| R5 | Edge type inference misclassifies links | Medium | Medium | Conservative defaults (`LINKS_TO`); user override via frontmatter |
| R6 | Backwards compatibility with existing indices | Medium | Low | Graceful migration: missing edge types default to `LINKS_TO` |

---

## 8. Testing Strategy

### Unit Tests

**Phase 1:**
```python
# tests/unit/test_edge_types.py
def test_infer_edge_type_testing_section():
    """
    Links under '# Testing' header infer TESTS edge type.
    """

def test_infer_edge_type_implementation_section():
    """
    Links under '# Implementation' header infer IMPLEMENTS edge type.
    """

def test_infer_edge_type_default():
    """
    Links without context default to LINKS_TO.
    """

# tests/unit/test_community_detection.py
def test_detect_communities_empty_graph():
    """
    Empty graph returns empty community dict.
    """

def test_detect_communities_single_cluster():
    """
    Fully connected graph forms single community.
    """

def test_community_boost_same_community():
    """
    Documents in same community receive boost factor.
    """
```

**Phase 2:**
```python
# tests/unit/test_dynamic_fusion.py
def test_normalize_strategy_scores_min_max():
    """
    Scores normalize to [0, 1] range using min-max.
    """

def test_calculate_variance_flat_scores():
    """
    Identical scores yield variance of 0.
    """

def test_dynamic_weights_low_vector_variance():
    """
    Low vector variance reduces vector weight.
    """

def test_fuse_results_v2_backwards_compatible():
    """
    New fusion produces same ranking as old when variance high.
    """
```

**Phase 3:**
```python
# tests/integration/test_hyde_search.py
async def test_search_with_hypothesis_tool_registered():
    """
    Tool appears in list_tools response.
    """

async def test_search_with_hypothesis_embeds_hypothesis():
    """
    Hypothesis text is embedded, not literal query.
    """

async def test_search_with_hypothesis_finds_relevant_docs():
    """
    Vague query with good hypothesis returns relevant results.
    """
```

### Integration Tests

- **Graph persistence:** Index documents, restart server, verify communities loaded.
- **Fusion quality:** Benchmark queries against ground truth (manual curation).
- **HyDE latency:** Measure overhead of extra embedding call.

### E2E Tests

- **MCP protocol:** Verify `search_with_hypothesis` tool callable via MCP client.
- **Full flow:** Index → Search → HyDE → Verify results.

---

## 9. Configuration Schema

```toml
[search.graph]
community_detection_enabled = true
community_boost_factor = 1.1

[search.fusion]
dynamic_weights_enabled = true
variance_threshold = 0.1
min_weight_factor = 0.5

[search.hyde]
enabled = true
```

---

## 10. Migration Notes

1. **Existing graphs:** Load without community data; run `detect_communities()` on first persist.
2. **Edge types:** Missing `edge_type` attribute defaults to `LINKS_TO`.
3. **Config:** New config sections have sensible defaults; no breaking changes.
4. **Re-indexing:** Full re-index recommended to populate edge types from header context.

---

## 11. Open Questions

1. Should community detection run synchronously during persist or as background task?
2. What header keywords map to `IMPLEMENTS` vs `RELATED`?
3. Should HyDE tool allow custom embedding models?

---

## 12. References

- [Spec 17: Search Infrastructure Overhaul](../specs/17-search-overhaul.md)
- [Spec 16: Memory Management](../specs/16-memory-management.md) (edge attributes)
- [GraphRAG Paper](https://arxiv.org/abs/2404.16130)
- [HyDE Paper](https://arxiv.org/abs/2212.10496)
- [cdlib Documentation](https://cdlib.readthedocs.io/)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/andnp/ragdocs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

17-search-overhaul-plan.md•12.4 KiB

# Implementation Plan: Search Infrastructure Overhaul (Spec 17)

## 1. Executive Summary

This plan details the implementation of GraphRAG, Score-Aware Fusion, and HyDE features for `mcp-markdown-ragdocs`. Phase 1 introduces typed edges and Leiden community detection to GraphStore. Phase 2 replaces static RRF weights with variance-aware dynamic fusion. Phase 3 adds a `search_with_hypothesis` MCP tool for hypothesis-driven search. The upgrade is backwards-compatible; existing indices migrate on re-index.

**Total estimated LOC:** ~650 (new) + ~200 (modifications)

---

## 2. Implementation Phases

### Phase 1: Graph Upgrade (~280 LOC)

**Goal:** Transform GraphStore from untyped edges to typed relationships with community detection.

| Task | File | LOC | Description |
|------|------|-----|-------------|
| 1.1 | `src/models.py` | 15 | Add `EdgeType` enum |
| 1.2 | `src/indices/graph.py` | 80 | Add community detection, storage, and query methods |
| 1.3 | `src/parsers/markdown.py` | 60 | Infer edge types from header context |
| 1.4 | `src/indexing/manager.py` | 25 | Pass header context to `add_edge()` |
| 1.5 | `src/search/orchestrator.py` | 40 | Add community boosting to graph neighbor scoring |
| 1.6 | `src/config.py` | 20 | Add `[search.graph]` config section |
| 1.7 | `pyproject.toml` | 5 | Add `cdlib` dependency |
| 1.8 | Tests | 35 | Unit tests for edge types and community detection |

**Dependencies:** None (standalone phase)

### Phase 2: Fusion Logic (~220 LOC)

**Goal:** Replace static RRF with variance-aware dynamic fusion.

| Task | File | LOC | Description |
|------|------|-----|-------------|
| 2.1 | `src/search/fusion.py` | 80 | Add `calculate_variance()`, `normalize_strategy_scores()`, `compute_dynamic_weights()` |
| 2.2 | `src/search/orchestrator.py` | 60 | Integrate dynamic weights into query flow |
| 2.3 | `src/search/classifier.py` | 30 | Extend query classification for variance signals |
| 2.4 | `src/config.py` | 15 | Add `[search.fusion]` config for variance thresholds |
| 2.5 | Tests | 35 | Unit tests for variance calculation and dynamic weights |

**Dependencies:** None (can run parallel to Phase 1)

### Phase 3: HyDE Support (~150 LOC)

**Goal:** Add hypothesis-driven search via new MCP tool.

| Task | File | LOC | Description |
|------|------|-----|-------------|
| 3.1 | `src/mcp_server.py` | 60 | Add `search_with_hypothesis` tool registration and handler |
| 3.2 | `src/search/orchestrator.py` | 40 | Add `query_with_hypothesis()` method |
| 3.3 | `src/config.py` | 10 | Add `[search.hyde]` config section |
| 3.4 | Tests | 40 | Integration tests for HyDE tool |

**Dependencies:** Phase 2 (uses normalized fusion scores)

---

## 3. File Manifest

### Files to Create

| File | Purpose |
|------|---------|
| `tests/unit/test_edge_types.py` | Unit tests for EdgeType inference |
| `tests/unit/test_community_detection.py` | Unit tests for Leiden algorithm integration |
| `tests/unit/test_dynamic_fusion.py` | Unit tests for variance-aware weights |
| `tests/integration/test_hyde_search.py` | Integration tests for HyDE tool |

### Files to Modify

| File | Changes |
|------|---------|
| [src/models.py](../../src/models.py) | Add `EdgeType` enum |
| [src/indices/graph.py](../../src/indices/graph.py) | Add community detection, typed edge queries |
| [src/parsers/markdown.py](../../src/parsers/markdown.py) | Infer edge types from header context |
| [src/indexing/manager.py](../../src/indexing/manager.py) | Pass header context to graph edges |
| [src/search/fusion.py](../../src/search/fusion.py) | Add variance calculation, dynamic weights |
| [src/search/orchestrator.py](../../src/search/orchestrator.py) | Integrate community boost and dynamic weights |
| [src/search/classifier.py](../../src/search/classifier.py) | Extend for variance signals |
| [src/mcp_server.py](../../src/mcp_server.py) | Add `search_with_hypothesis` tool |
| [src/config.py](../../src/config.py) | Add `GraphConfig`, `FusionConfig`, `HydeConfig` |
| [pyproject.toml](../../pyproject.toml) | Add `cdlib` dependency |

---

## 4. Dependency Graph

```
Phase 1 (Graph)          Phase 2 (Fusion)
     │                        │
     │                        │
     └────────────┬───────────┘
                  │
                  ▼
           Phase 3 (HyDE)
```

- Phases 1 and 2 are **independent** and can be developed in parallel.
- Phase 3 depends on Phase 2 (uses normalized fusion scores for HyDE results).

---

## 5. Function Signatures

### Phase 1: Graph Upgrade

```python
# src/models.py
class EdgeType(Enum):
    LINKS_TO = "links_to"
    IMPLEMENTS = "implements"
    TESTS = "tests"
    RELATED = "related"

# src/indices/graph.py
class GraphStore:
    def detect_communities(self) -> dict[str, int]:
        ...

    def get_community(self, doc_id: str) -> int | None:
        ...

    def get_community_members(self, community_id: int) -> list[str]:
        ...

    def boost_by_community(
        self,
        doc_ids: list[str],
        seed_doc_ids: set[str],
        boost_factor: float = 1.1,
    ) -> dict[str, float]:
        ...

# src/parsers/markdown.py
def infer_edge_type(header_context: str, target: str) -> EdgeType:
    ...
```

### Phase 2: Fusion Logic

```python
# src/search/fusion.py
def normalize_strategy_scores(
    scores: list[float],
) -> list[float]:
    ...

def calculate_variance(scores: list[float]) -> float:
    ...

def compute_dynamic_weights(
    vector_scores: list[float],
    keyword_scores: list[float],
    base_vector_weight: float,
    base_keyword_weight: float,
    variance_threshold: float = 0.1,
) -> tuple[float, float]:
    ...

def fuse_results_v2(
    results: dict[str, list[tuple[str, float]]],
    k: int,
    base_weights: dict[str, float],
    modified_times: dict[str, float],
    use_dynamic_weights: bool = True,
) -> list[tuple[str, float]]:
    ...
```

### Phase 3: HyDE Support

```python
# src/search/orchestrator.py
class SearchOrchestrator:
    async def query_with_hypothesis(
        self,
        hypothesis: str,
        top_k: int = 10,
        top_n: int = 5,
    ) -> tuple[list[ChunkResult], CompressionStats]:
        ...

# src/mcp_server.py (tool schema)
Tool(
    name="search_with_hypothesis",
    description="Search documentation using a hypothesis about what the answer might look like...",
    inputSchema={
        "type": "object",
        "properties": {
            "hypothesis": {"type": "string", "description": "Hypothesis about the expected answer"},
            "top_n": {"type": "integer", "default": 5},
        },
        "required": ["hypothesis"],
    },
)
```

---

## 6. Acceptance Criteria

### Phase 1: Graph Upgrade

| ID | Criterion | Verification |
|----|-----------|--------------|
| P1-AC1 | `EdgeType` enum exists with 4 values | Unit test |
| P1-AC2 | `GraphStore.add_edge()` accepts `edge_type` parameter | Unit test |
| P1-AC3 | Links in `# Testing` sections create `TESTS` edges | Unit test |
| P1-AC4 | Links in `# Implementation` sections create `IMPLEMENTS` edges | Unit test |
| P1-AC5 | `detect_communities()` returns community assignments | Unit test |
| P1-AC6 | Community data persists across restarts | Integration test |
| P1-AC7 | Search results boost same-community documents by configurable factor | Integration test |

### Phase 2: Fusion Logic

| ID | Criterion | Verification |
|----|-----------|--------------|
| P2-AC1 | `normalize_strategy_scores()` returns values in [0, 1] | Unit test |
| P2-AC2 | `calculate_variance()` returns 0 for identical scores | Unit test |
| P2-AC3 | Low variance reduces weight for that strategy | Unit test |
| P2-AC4 | `fuse_results_v2()` produces same order as `fuse_results()` when variance is high | Unit test |
| P2-AC5 | Dynamic weights enabled via config flag | Unit test |
| P2-AC6 | Search quality improves on benchmark queries (manual verification) | Manual test |

### Phase 3: HyDE Support

| ID | Criterion | Verification |
|----|-----------|--------------|
| P3-AC1 | `search_with_hypothesis` tool registered in MCP | E2E test |
| P3-AC2 | Tool embeds hypothesis text (not raw query) | Integration test |
| P3-AC3 | Vague query + good hypothesis finds relevant docs | Integration test |
| P3-AC4 | Tool respects `top_n` parameter | Unit test |
| P3-AC5 | Tool returns normalized scores in [0, 1] | Unit test |

---

## 7. Risk Register

| ID | Risk | Severity | Likelihood | Mitigation |
|----|------|----------|------------|------------|
| R1 | `cdlib` adds heavy dependencies (igraph, numpy) | Medium | High | Verify package size; consider vendoring Leiden only |
| R2 | Community detection slow on large graphs | Medium | Medium | Run async during `persist()`; cache results |
| R3 | Dynamic weights degrade search quality | High | Low | Feature flag; A/B testing; fallback to static |
| R4 | HyDE increases latency (extra embedding call) | Low | High | Document tradeoff; optional tool |
| R5 | Edge type inference misclassifies links | Medium | Medium | Conservative defaults (`LINKS_TO`); user override via frontmatter |
| R6 | Backwards compatibility with existing indices | Medium | Low | Graceful migration: missing edge types default to `LINKS_TO` |

---

## 8. Testing Strategy

### Unit Tests

**Phase 1:**
```python
# tests/unit/test_edge_types.py
def test_infer_edge_type_testing_section():
    """
    Links under '# Testing' header infer TESTS edge type.
    """

def test_infer_edge_type_implementation_section():
    """
    Links under '# Implementation' header infer IMPLEMENTS edge type.
    """

def test_infer_edge_type_default():
    """
    Links without context default to LINKS_TO.
    """

# tests/unit/test_community_detection.py
def test_detect_communities_empty_graph():
    """
    Empty graph returns empty community dict.
    """

def test_detect_communities_single_cluster():
    """
    Fully connected graph forms single community.
    """

def test_community_boost_same_community():
    """
    Documents in same community receive boost factor.
    """
```

**Phase 2:**
```python
# tests/unit/test_dynamic_fusion.py
def test_normalize_strategy_scores_min_max():
    """
    Scores normalize to [0, 1] range using min-max.
    """

def test_calculate_variance_flat_scores():
    """
    Identical scores yield variance of 0.
    """

def test_dynamic_weights_low_vector_variance():
    """
    Low vector variance reduces vector weight.
    """

def test_fuse_results_v2_backwards_compatible():
    """
    New fusion produces same ranking as old when variance high.
    """
```

**Phase 3:**
```python
# tests/integration/test_hyde_search.py
async def test_search_with_hypothesis_tool_registered():
    """
    Tool appears in list_tools response.
    """

async def test_search_with_hypothesis_embeds_hypothesis():
    """
    Hypothesis text is embedded, not literal query.
    """

async def test_search_with_hypothesis_finds_relevant_docs():
    """
    Vague query with good hypothesis returns relevant results.
    """
```

### Integration Tests

- **Graph persistence:** Index documents, restart server, verify communities loaded.
- **Fusion quality:** Benchmark queries against ground truth (manual curation).
- **HyDE latency:** Measure overhead of extra embedding call.

### E2E Tests

- **MCP protocol:** Verify `search_with_hypothesis` tool callable via MCP client.
- **Full flow:** Index → Search → HyDE → Verify results.

---

## 9. Configuration Schema

```toml
[search.graph]
community_detection_enabled = true
community_boost_factor = 1.1

[search.fusion]
dynamic_weights_enabled = true
variance_threshold = 0.1
min_weight_factor = 0.5

[search.hyde]
enabled = true
```

---

## 10. Migration Notes

1. **Existing graphs:** Load without community data; run `detect_communities()` on first persist.
2. **Edge types:** Missing `edge_type` attribute defaults to `LINKS_TO`.
3. **Config:** New config sections have sensible defaults; no breaking changes.
4. **Re-indexing:** Full re-index recommended to populate edge types from header context.

---

## 11. Open Questions

1. Should community detection run synchronously during persist or as background task?
2. What header keywords map to `IMPLEMENTS` vs `RELATED`?
3. Should HyDE tool allow custom embedding models?

---

## 12. References

- [Spec 17: Search Infrastructure Overhaul](../specs/17-search-overhaul.md)
- [Spec 16: Memory Management](../specs/16-memory-management.md) (edge attributes)
- [GraphRAG Paper](https://arxiv.org/abs/2404.16130)
- [HyDE Paper](https://arxiv.org/abs/2212.10496)
- [cdlib Documentation](https://cdlib.readthedocs.io/)