---
phase: 07-integration-validation
plan: 02
type: execute
wave: 2
depends_on: ["07-01"]
files_modified:
- tests/validation/test_alpha_tuning.py
- tests/validation/fixtures/tuning_results.json
autonomous: true
must_haves:
truths:
- "PPR alpha grid search produces MRR values for each alpha"
- "Optimal alpha range is documented (not single value)"
- "Default alpha values in codebase produce stable rankings"
artifacts:
- path: "tests/validation/test_alpha_tuning.py"
provides: "Grid search and tuning validation"
contains: "grid_search"
- path: "tests/validation/fixtures/tuning_results.json"
provides: "Documented tuning results"
contains: "alpha_results"
key_links:
- from: "tests/validation/test_alpha_tuning.py"
to: "src/skill_retriever/nodes/retrieval/ppr_engine.py"
via: "alpha parameter"
pattern: "alpha="
---
<objective>
Validate hyperparameter defaults and document grid search results for PPR alpha tuning.
Purpose: Prove that the adaptive alpha values (0.9 specific, 0.6 broad, 0.85 default) produce stable, optimal rankings and document the tuning methodology for future reference.
Output: Grid search tests that validate current defaults and produce documented results.
</objective>
<execution_context>
@C:\Users\33641\.claude/get-shit-done/workflows/execute-plan.md
@C:\Users\33641\.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/07-integration-validation/07-RESEARCH.md
@.planning/phases/07-integration-validation/07-01-PLAN.md
@src/skill_retriever/nodes/retrieval/ppr_engine.py
@src/skill_retriever/nodes/retrieval/score_fusion.py
</context>
<tasks>
<task type="auto">
<name>Task 1: Create PPR alpha grid search tests</name>
<files>
tests/validation/test_alpha_tuning.py
</files>
<action>
Create test_alpha_tuning.py with tests that:
1. `test_alpha_grid_search`:
Run grid search over alpha values [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95] and record MRR for each.
```python
def test_alpha_grid_search(seeded_pipeline, validation_pairs, validation_qrels):
"""Grid search PPR alpha values and record results."""
from ranx import Run, evaluate
from skill_retriever.nodes.retrieval.ppr_engine import run_ppr_retrieval
from skill_retriever.nodes.retrieval.score_fusion import fuse_retrieval_results
from skill_retriever.nodes.retrieval.vector_search import search_with_type_filter
alpha_values = [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95]
results = {}
for alpha in alpha_values:
run_dict = {}
for pair in validation_pairs:
# Run vector search
vector_results = search_with_type_filter(
pair["query"],
seeded_pipeline._vector_store,
seeded_pipeline._graph_store,
top_k=20,
)
# Run PPR with specific alpha
ppr_results = run_ppr_retrieval(
pair["query"],
seeded_pipeline._graph_store,
alpha=alpha,
top_k=20,
)
# Fuse results
fused = fuse_retrieval_results(
vector_results,
ppr_results,
seeded_pipeline._graph_store,
top_k=10,
)
run_dict[pair["query_id"]] = {
c.component_id: c.score for c in fused
}
run = Run(run_dict)
mrr = evaluate(validation_qrels, run, "mrr")
results[alpha] = mrr
# Print results table
print("\nAlpha Grid Search Results:")
print("Alpha | MRR")
print("-" * 20)
for alpha, mrr in sorted(results.items()):
print(f"{alpha:.2f} | {mrr:.4f}")
# Assert all alphas produce reasonable MRR (> 0.3)
for alpha, mrr in results.items():
assert mrr > 0.3, f"Alpha {alpha} produced MRR {mrr:.3f} < 0.3"
```
2. `test_default_alpha_optimal`:
Verify that 0.85 (default) is within 5% of the best MRR achieved.
This proves the default is near-optimal without requiring exact match.
3. `test_adaptive_alpha_categories`:
Test that adaptive alpha selection (specific vs broad) works correctly:
- Specific queries (single named entity): alpha should be ~0.9
- Broad queries (5+ concepts): alpha should be ~0.6
- Normal queries: alpha should be ~0.85
Use plan_retrieval() to verify alpha selection:
```python
from skill_retriever.nodes.retrieval.query_planner import plan_retrieval, extract_query_entities
def test_adaptive_alpha_categories(seeded_pipeline):
# Specific query
specific = "skill-jwt authentication"
entities = extract_query_entities(specific, seeded_pipeline._graph_store)
plan = plan_retrieval(specific, len(entities))
assert plan.ppr_alpha >= 0.85, "Specific query should have high alpha"
# Broad query
broad = "JWT OAuth login refresh session authentication security"
entities_broad = extract_query_entities(broad, seeded_pipeline._graph_store)
plan_broad = plan_retrieval(broad, len(entities_broad))
# Alpha depends on entity count, verify logic runs
assert 0.5 <= plan_broad.ppr_alpha <= 0.95
```
4. `test_rrf_k_sensitivity`:
Quick check that RRF k=60 produces good results by comparing with k=30 and k=100.
Assert k=60 is within 10% of best.
</action>
<verify>
Run: `uv run pytest tests/validation/test_alpha_tuning.py -v -s`
Grid search results printed, all tests pass.
</verify>
<done>
- Alpha grid search runs and prints results table
- Default alpha (0.85) validated as near-optimal
- Adaptive alpha logic tested for specific vs broad queries
- RRF k=60 validated against alternatives
</done>
</task>
<task type="auto">
<name>Task 2: Document tuning results in JSON fixture</name>
<files>
tests/validation/fixtures/tuning_results.json
</files>
<action>
Create a test that generates and saves tuning_results.json with grid search outputs.
Add to test_alpha_tuning.py:
```python
import json
from pathlib import Path
def test_save_tuning_results(seeded_pipeline, validation_pairs, validation_qrels):
"""Run full grid search and save results for documentation."""
from ranx import Run, evaluate
results = {
"alpha_results": {},
"rrf_k_results": {},
"optimal_ranges": {},
"metadata": {
"validation_pairs_count": len(validation_pairs),
"test_date": "2026-02-03",
}
}
# Alpha grid search
alpha_values = [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95]
for alpha in alpha_values:
# ... (same logic as test_alpha_grid_search)
mrr = run_with_alpha(seeded_pipeline, validation_pairs, validation_qrels, alpha)
results["alpha_results"][str(alpha)] = round(mrr, 4)
# Find optimal range
best_alpha = max(results["alpha_results"].items(), key=lambda x: x[1])
best_mrr = best_alpha[1]
optimal_alphas = [
a for a, m in results["alpha_results"].items()
if m >= best_mrr * 0.95 # Within 5% of best
]
results["optimal_ranges"]["alpha"] = {
"best": float(best_alpha[0]),
"best_mrr": best_mrr,
"near_optimal": [float(a) for a in optimal_alphas],
}
# RRF k sensitivity
for k in [30, 60, 100]:
mrr = run_with_rrf_k(seeded_pipeline, validation_pairs, validation_qrels, k)
results["rrf_k_results"][str(k)] = round(mrr, 4)
# Save results
output_path = Path(__file__).parent / "fixtures" / "tuning_results.json"
output_path.write_text(json.dumps(results, indent=2))
print(f"\nTuning results saved to {output_path}")
print(json.dumps(results, indent=2))
```
Helper function to run with specific alpha:
```python
def run_with_alpha(pipeline, pairs, qrels, alpha):
from ranx import Run, evaluate
# ... implementation
return evaluate(qrels, run, "mrr")
```
</action>
<verify>
Run: `uv run pytest tests/validation/test_alpha_tuning.py::test_save_tuning_results -v -s`
File created at tests/validation/fixtures/tuning_results.json
Contents show alpha and RRF k results with optimal ranges.
</verify>
<done>
- tuning_results.json contains alpha_results, rrf_k_results, optimal_ranges
- Best alpha and near-optimal range documented
- Metadata includes test date and pair count
</done>
</task>
</tasks>
<verification>
All tuning tests pass:
```bash
uv run pytest tests/validation/test_alpha_tuning.py -v -s
```
Tuning results documented:
```bash
cat tests/validation/fixtures/tuning_results.json
```
</verification>
<success_criteria>
- [ ] Alpha grid search executes over 7 values
- [ ] Default alpha (0.85) within 5% of best MRR
- [ ] Adaptive alpha logic validated for query types
- [ ] RRF k=60 validated as near-optimal
- [ ] tuning_results.json documents all findings
</success_criteria>
<output>
After completion, create `.planning/phases/07-integration-validation/07-02-SUMMARY.md`
</output>