Zotero Chunk RAG

phase-2-combination-engine-rewrite.md•11.2 KiB

# Phase 2: Combination Engine Rewrite ## Overview Replace column-count-based grouping with per-divider confidence voting. The core change: a boundary that many methods independently find (high method agreement) is more trustworthy than one only a single method reports. No grouping by column count — all boundary points from all methods are pooled and voted on independently. Confidence multipliers from `pipeline_weights.json` scale method influence. Ruled line boundaries get unconditional acceptance (physical features, not statistical). **Depends on**: Phase 0 (stub removed here) **Parallel with**: Phase 1 Wave 1.1 **Must complete before**: Phase 1 Wave 1.2 **Entry state**: `combine_hypotheses()` raises `NotImplementedError` for 2+ hypotheses. **Exit state**: Per-divider voting produces consensus boundaries. Pipeline fully functional. --- ## Wave 2.1: Core per-divider voting ### Task 2.1.1: Rewrite `combine_hypotheses()` with per-divider voting - **Description**: Replace the `NotImplementedError` stub in the multi-hypothesis path of `combine_hypotheses()` with the per-divider voting algorithm. **Algorithm** (multi-hypothesis path): 1. Compute spatial precision via existing `_compute_spatial_precision(ctx)` 2. Collect ALL col boundary points and ALL row boundary points from ALL hypotheses (no column-count grouping) 3. Scale each point's confidence by its method's multiplier: `scaled_confidence = point.confidence * multipliers.get(point.provenance, 1.0)` (create new BoundaryPoint with scaled confidence, preserving provenance) 4. Pass scaled col points and row points to `_combine_axis()` (independently) 5. Build consensus BoundaryHypothesis from accepted boundaries **Changes to `_combine_axis()`** — acceptance logic only: - After clustering (steps 1-4 unchanged): compute `distinct_methods` per cluster (already done via provenance) - Compute `median_method_count = statistics.median([cluster.distinct_methods for each cluster])` - Accept cluster if `distinct_methods >= median_method_count` - **Ruled line override**: accept cluster if ANY point in it has `provenance == "ruled_lines"`, regardless of method count - Remove the Q1-percentile confidence threshold (replaced by method count) - **Consensus boundary confidence**: each accepted boundary's confidence is the mean of its constituent points' scaled confidences (not the sum). This prevents methods with many boundary points from dominating through volume. Add helper `_scale_point(point: BoundaryPoint, multipliers: dict[str, float] | None) -> BoundaryPoint`: returns the point unchanged if multipliers is None or the method's multiplier is 1.0; otherwise returns a new BoundaryPoint with `confidence * multiplier`. **Signature change**: `combine_hypotheses(hypotheses, ctx, *, confidence_multipliers=None, trace=False)`. New `confidence_multipliers: dict[str, float] | None` parameter. When None, no scaling is applied (all multipliers default to 1.0). - **Files to modify**: - `src/zotero_chunk_rag/feature_extraction/combination.py`: - Replace the `NotImplementedError` stub in the multi-hypothesis path - Add `confidence_multipliers` parameter to `combine_hypotheses()` signature - Add `_scale_point()` helper - In `_combine_axis()` step 5: replace Q1-confidence acceptance with median-method-count acceptance + ruled line override - Remove the docstring references to "column count agreement grouping" - Update module docstring to describe per-divider voting - **Tests** (update existing + add new): - The 8 tests broken by Phase 0's `NotImplementedError` stub should now pass again with the new per-divider voting implementation. - Update `tests/test_combination.py::TestCombineHypotheses::test_two_methods_agree` — 2 hypotheses with overlapping boundaries at ~145. Both merge into 1 cluster with method_count=2, median=2, accepted. Assert 1 col boundary. **Unchanged assertion.** - Update `tests/test_combination.py::TestCombineHypotheses::test_three_methods_one_disagrees` — 3 hypotheses: 2 at ~145, 1 at ~200. Clusters: ~145 (method_count=2), ~200 (method_count=1). Median=1.5. Only count>=2 passes. Assert 1 col boundary. **Unchanged assertion.** - Update `tests/test_combination.py::TestCombineHypotheses::test_row_and_col_independent` — 2 hypotheses with different col/row positions. Each position has method_count=1, median=1, all pass. Assert 2 col + 2 row boundaries. **Unchanged assertion.** - Update `tests/test_combination.py::TestAcceptanceThreshold::test_low_confidence_rejected` — Under new algorithm: cluster at ~145 has method_count=3, cluster at ~500 has method_count=1. Median=2. Count 1 < 2, rejected. Assert 1 col boundary. **Unchanged assertion.** - **CHANGE** `tests/test_combination.py::TestAcceptanceThreshold::test_percentile_based_acceptance` — Under old algorithm, weakest-confidence cluster was rejected (4 boundaries). Under new algorithm, ALL 5 clusters have method_count=2 (both methods contribute to each), median=2, all pass. **Update assertion to `len(result.col_boundaries) == 5`**. Rename test to `test_equal_method_agreement_all_pass`. - Update `tests/test_combination.py::TestAcceptanceThreshold::test_single_cluster_always_accepted` — 2 hypotheses merge into 1 cluster, method_count=2, median=2, accepted. **Unchanged assertion.** - Update `tests/test_combination.py::TestCombinationTrace::test_multi_hypothesis_trace_structure` — Check that `ClusterRecord` now has `method_names` and `acceptance_reason` fields. Assert `cluster.method_names` is a non-empty list. Assert `cluster.acceptance_reason` is one of `"above_threshold"`, `"ruled_line_override"`, `"rejected"`. - Update `tests/test_combination.py::TestCombinationTrace::test_axis_trace_input_points_match` — All hypotheses' points are now in the pool (no column-count filtering). Assert input_provenances contains all methods. **Unchanged assertion** (both methods were in the winning group before too). - **NEW** `tests/test_combination.py::TestMethodCountRejection::test_lone_divider_rejected` — 3 methods: m1 and m2 agree on divider at ~150, m1 and m3 agree on divider at ~300, m2 alone adds divider at ~500. Clusters: ~150 (count=2), ~300 (count=2), ~500 (count=1). Median=2. Assert ~500 rejected, 2 boundaries accepted. - **NEW** `tests/test_combination.py::TestMethodCountRejection::test_single_method_all_pass` — 1 method with 3 col boundaries. Passthrough (single hypothesis), all accepted. - **NEW** `tests/test_combination.py::TestConfidenceMultipliers::test_multipliers_scale_confidence` — 2 hypotheses at same position. Method "m1" has multiplier 2.0, "m2" has 0.5. After combination, the consensus boundary's confidence is the mean of scaled confidences: (0.9\*2.0 + 0.9\*0.5) / 2 = 1.125. Assert `result.col_boundaries[0].confidence == pytest.approx(1.125)`. - **NEW** `tests/test_combination.py::TestConfidenceMultipliers::test_no_multipliers_default` — Call without `confidence_multipliers` parameter. Behavior unchanged from unscaled. - **NEW** `tests/test_combination.py::TestRuledLineOverride::test_ruled_line_always_accepted` — 3 methods: 2 agree at ~150 (method_count=2), 1 ruled_lines point at ~300 (method_count=1). Median=1.5. Without override, ~300 rejected. With override, ~300 accepted because it has ruled_lines provenance. Assert 2 boundaries. - **NEW** `tests/test_combination.py::TestRuledLineOverride::test_non_ruled_line_still_rejected` — Same setup but the lone point at ~300 has provenance "hotspot" instead of "ruled_lines". Assert 1 boundary (only ~150 accepted). - **Acceptance criteria**: - `combine_hypotheses()` no longer raises `NotImplementedError` - No column-count grouping in the code - All boundary points from all hypotheses are pooled (no filtering by group) - Acceptance based on median method_count, not Q1 confidence percentile - Ruled line override works: `provenance == "ruled_lines"` always accepted - Confidence multipliers scale point confidences before combination - All existing tests pass (with updated assertions where noted) - All new tests pass - All 8 tests broken by Phase 0 now pass ### Task 2.1.2: Pass confidence multipliers from Pipeline - **Description**: Update `Pipeline.extract()` to pass `self._config.confidence_multipliers` to `combine_hypotheses()` at the call site. - **Files to modify**: - `src/zotero_chunk_rag/feature_extraction/pipeline.py`: - In `Pipeline.extract()`, line ~270 where `combine_hypotheses()` is called: change from `combine_hypotheses(result.boundary_hypotheses, ctx)` to `combine_hypotheses(result.boundary_hypotheses, ctx, confidence_multipliers=self._config.confidence_multipliers)` - **Tests**: - `tests/test_feature_extraction/test_integration.py::TestExtractAllGrids::*` — all existing integration tests should still pass (Pipeline now passes multipliers, but behavior is compatible) - **Acceptance criteria**: - `Pipeline.extract()` passes `confidence_multipliers` to `combine_hypotheses()` - Existing integration tests pass - `pipeline_weights.json` multipliers are actually used during combination --- ## Wave 2.2: Trace infrastructure updates ### Task 2.2.1: Update trace models for per-divider voting - **Description**: Update `ClusterRecord` and `AxisTrace` in `models.py` to reflect the new algorithm's diagnostic data. - **Files to modify**: - `src/zotero_chunk_rag/feature_extraction/models.py`: - `ClusterRecord`: add `method_names: list[str]` (names of distinct methods in this cluster) and `acceptance_reason: str` (one of `"above_threshold"`, `"ruled_line_override"`, `"rejected"`) - `AxisTrace`: add `median_method_count: float` (the median used as threshold) - `src/zotero_chunk_rag/feature_extraction/combination.py`: - In `_combine_axis()`: populate `method_names` and `acceptance_reason` on each `ClusterRecord` when `collect_trace=True` - Populate `median_method_count` on `AxisTrace` - In `_make_passthrough_trace()`: set `method_names` and `acceptance_reason` appropriately for passthrough clusters - **Tests**: - `tests/test_combination.py::TestCombinationTrace::test_multi_hypothesis_trace_structure` — already updated in Task 2.1.1 to check new fields - `tests/test_combination.py::TestCombinationTrace::test_trace_cluster_has_method_names` — (new) Pass 2 hypotheses with trace=True. Assert each ClusterRecord in `trace.col_trace.clusters` has a non-empty `method_names` list. - `tests/test_combination.py::TestCombinationTrace::test_trace_has_median_method_count` — (new) Pass 2 hypotheses with trace=True. Assert `trace.col_trace.median_method_count > 0`. - `tests/test_combination.py::TestCombinationTrace::test_acceptance_reason_values` — (new) Pass 3 hypotheses where one cluster is rejected. Assert at least one cluster has `acceptance_reason == "rejected"` and at least one has `acceptance_reason == "above_threshold"`. - **Acceptance criteria**: - `ClusterRecord` has `method_names` and `acceptance_reason` fields - `AxisTrace` has `median_method_count` field - Passthrough traces have appropriate values for new fields - All trace tests pass

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ccam80/zotero-chunk-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

phase-2-combination-engine-rewrite.md•11.2 KiB