Skill Retriever

skill-retriever
.planning
phases
07-integration-validation

07-02-PLAN.md•9.04 KiB

--- phase: 07-integration-validation plan: 02 type: execute wave: 2 depends_on: ["07-01"] files_modified: - tests/validation/test_alpha_tuning.py - tests/validation/fixtures/tuning_results.json autonomous: true must_haves: truths: - "PPR alpha grid search produces MRR values for each alpha" - "Optimal alpha range is documented (not single value)" - "Default alpha values in codebase produce stable rankings" artifacts: - path: "tests/validation/test_alpha_tuning.py" provides: "Grid search and tuning validation" contains: "grid_search" - path: "tests/validation/fixtures/tuning_results.json" provides: "Documented tuning results" contains: "alpha_results" key_links: - from: "tests/validation/test_alpha_tuning.py" to: "src/skill_retriever/nodes/retrieval/ppr_engine.py" via: "alpha parameter" pattern: "alpha=" --- <objective> Validate hyperparameter defaults and document grid search results for PPR alpha tuning. Purpose: Prove that the adaptive alpha values (0.9 specific, 0.6 broad, 0.85 default) produce stable, optimal rankings and document the tuning methodology for future reference. Output: Grid search tests that validate current defaults and produce documented results. </objective> <execution_context> @C:\Users\33641\.claude/get-shit-done/workflows/execute-plan.md @C:\Users\33641\.claude/get-shit-done/templates/summary.md </execution_context> <context> @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/07-integration-validation/07-RESEARCH.md @.planning/phases/07-integration-validation/07-01-PLAN.md @src/skill_retriever/nodes/retrieval/ppr_engine.py @src/skill_retriever/nodes/retrieval/score_fusion.py </context> <tasks> <task type="auto"> <name>Task 1: Create PPR alpha grid search tests</name> <files> tests/validation/test_alpha_tuning.py </files> <action> Create test_alpha_tuning.py with tests that: 1. `test_alpha_grid_search`: Run grid search over alpha values [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95] and record MRR for each. ```python def test_alpha_grid_search(seeded_pipeline, validation_pairs, validation_qrels): """Grid search PPR alpha values and record results.""" from ranx import Run, evaluate from skill_retriever.nodes.retrieval.ppr_engine import run_ppr_retrieval from skill_retriever.nodes.retrieval.score_fusion import fuse_retrieval_results from skill_retriever.nodes.retrieval.vector_search import search_with_type_filter alpha_values = [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95] results = {} for alpha in alpha_values: run_dict = {} for pair in validation_pairs: # Run vector search vector_results = search_with_type_filter( pair["query"], seeded_pipeline._vector_store, seeded_pipeline._graph_store, top_k=20, ) # Run PPR with specific alpha ppr_results = run_ppr_retrieval( pair["query"], seeded_pipeline._graph_store, alpha=alpha, top_k=20, ) # Fuse results fused = fuse_retrieval_results( vector_results, ppr_results, seeded_pipeline._graph_store, top_k=10, ) run_dict[pair["query_id"]] = { c.component_id: c.score for c in fused } run = Run(run_dict) mrr = evaluate(validation_qrels, run, "mrr") results[alpha] = mrr # Print results table print("\nAlpha Grid Search Results:") print("Alpha | MRR") print("-" * 20) for alpha, mrr in sorted(results.items()): print(f"{alpha:.2f} | {mrr:.4f}") # Assert all alphas produce reasonable MRR (> 0.3) for alpha, mrr in results.items(): assert mrr > 0.3, f"Alpha {alpha} produced MRR {mrr:.3f} < 0.3" ``` 2. `test_default_alpha_optimal`: Verify that 0.85 (default) is within 5% of the best MRR achieved. This proves the default is near-optimal without requiring exact match. 3. `test_adaptive_alpha_categories`: Test that adaptive alpha selection (specific vs broad) works correctly: - Specific queries (single named entity): alpha should be ~0.9 - Broad queries (5+ concepts): alpha should be ~0.6 - Normal queries: alpha should be ~0.85 Use plan_retrieval() to verify alpha selection: ```python from skill_retriever.nodes.retrieval.query_planner import plan_retrieval, extract_query_entities def test_adaptive_alpha_categories(seeded_pipeline): # Specific query specific = "skill-jwt authentication" entities = extract_query_entities(specific, seeded_pipeline._graph_store) plan = plan_retrieval(specific, len(entities)) assert plan.ppr_alpha >= 0.85, "Specific query should have high alpha" # Broad query broad = "JWT OAuth login refresh session authentication security" entities_broad = extract_query_entities(broad, seeded_pipeline._graph_store) plan_broad = plan_retrieval(broad, len(entities_broad)) # Alpha depends on entity count, verify logic runs assert 0.5 <= plan_broad.ppr_alpha <= 0.95 ``` 4. `test_rrf_k_sensitivity`: Quick check that RRF k=60 produces good results by comparing with k=30 and k=100. Assert k=60 is within 10% of best. </action> <verify> Run: `uv run pytest tests/validation/test_alpha_tuning.py -v -s` Grid search results printed, all tests pass. </verify> <done> - Alpha grid search runs and prints results table - Default alpha (0.85) validated as near-optimal - Adaptive alpha logic tested for specific vs broad queries - RRF k=60 validated against alternatives </done> </task> <task type="auto"> <name>Task 2: Document tuning results in JSON fixture</name> <files> tests/validation/fixtures/tuning_results.json </files> <action> Create a test that generates and saves tuning_results.json with grid search outputs. Add to test_alpha_tuning.py: ```python import json from pathlib import Path def test_save_tuning_results(seeded_pipeline, validation_pairs, validation_qrels): """Run full grid search and save results for documentation.""" from ranx import Run, evaluate results = { "alpha_results": {}, "rrf_k_results": {}, "optimal_ranges": {}, "metadata": { "validation_pairs_count": len(validation_pairs), "test_date": "2026-02-03", } } # Alpha grid search alpha_values = [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95] for alpha in alpha_values: # ... (same logic as test_alpha_grid_search) mrr = run_with_alpha(seeded_pipeline, validation_pairs, validation_qrels, alpha) results["alpha_results"][str(alpha)] = round(mrr, 4) # Find optimal range best_alpha = max(results["alpha_results"].items(), key=lambda x: x[1]) best_mrr = best_alpha[1] optimal_alphas = [ a for a, m in results["alpha_results"].items() if m >= best_mrr * 0.95 # Within 5% of best ] results["optimal_ranges"]["alpha"] = { "best": float(best_alpha[0]), "best_mrr": best_mrr, "near_optimal": [float(a) for a in optimal_alphas], } # RRF k sensitivity for k in [30, 60, 100]: mrr = run_with_rrf_k(seeded_pipeline, validation_pairs, validation_qrels, k) results["rrf_k_results"][str(k)] = round(mrr, 4) # Save results output_path = Path(__file__).parent / "fixtures" / "tuning_results.json" output_path.write_text(json.dumps(results, indent=2)) print(f"\nTuning results saved to {output_path}") print(json.dumps(results, indent=2)) ``` Helper function to run with specific alpha: ```python def run_with_alpha(pipeline, pairs, qrels, alpha): from ranx import Run, evaluate # ... implementation return evaluate(qrels, run, "mrr") ``` </action> <verify> Run: `uv run pytest tests/validation/test_alpha_tuning.py::test_save_tuning_results -v -s` File created at tests/validation/fixtures/tuning_results.json Contents show alpha and RRF k results with optimal ranges. </verify> <done> - tuning_results.json contains alpha_results, rrf_k_results, optimal_ranges - Best alpha and near-optimal range documented - Metadata includes test date and pair count </done> </task> </tasks> <verification> All tuning tests pass: ```bash uv run pytest tests/validation/test_alpha_tuning.py -v -s ``` Tuning results documented: ```bash cat tests/validation/fixtures/tuning_results.json ``` </verification> <success_criteria> - [ ] Alpha grid search executes over 7 values - [ ] Default alpha (0.85) within 5% of best MRR - [ ] Adaptive alpha logic validated for query types - [ ] RRF k=60 validated as near-optimal - [ ] tuning_results.json documents all findings </success_criteria> <output> After completion, create `.planning/phases/07-integration-validation/07-02-SUMMARY.md` </output>

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AnthonyAlcaraz/skill-retriever'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

07-02-PLAN.md•9.04 KiB