Thoughtbox

selection_algorithm.md•2.84 KiB

# Selection Algorithm: Candidate → Final 2–3 This is the critical “anti-slop” step. You get better results if your synthesizer produces more candidates than you publish, then you filter hard. ## Recommended Flow (works well in practice) 1. **Generate 5–8 candidate proposals** Have the synthesizer produce 5–8 candidates (still in schema shape), not just 2–3. 2. **Run the evaluator rubric** Score and gate each candidate. 3. **Select top 2–3 using constraints** 1. Keep only `pass == true`. 2. Sort by `score_total` descending. 3. Apply a diversity constraint so you don’t get 3 variations of the same idea: - Max 1 proposal per category unless scores are very high (≥90). - Or require distinct primary subsystem touch points. 4. **If fewer than 2 pass** - Try one repair pass: - Feed failing candidates plus evaluator `fix_suggestions` to a repair prompt. - Re-evaluate. - If still <2 pass → publish fewer proposals with `blocking_reason`. 5. **Publish final 2–3** Only the final set appears in the GitHub issue. --- ## Implementation Notes That Prevent “Rubric Theater” ### Deterministic checks first (cheap + reliable) Before calling any evaluator model, do a pure code validation: - Evidence URLs exist and are subset of signals. - Touch points are non-empty and match `repo_map` patterns. - Test plan contains at least one unit-ish line and one integration-ish line (simple keyword heuristics: unit, integration, scenario, harness, e2e). - Acceptance criteria count ≥ 2. - Rollout and rollback are non-empty. This catches most slop without spending tokens. ### Make the evaluator “fail closed” If the evaluator returns invalid JSON or missing fields: - Re-run once. - If still invalid → fall back to deterministic-only scoring and select conservatively, or publish `blocking_reason`. ### Add “reputation penalties” If your system repeatedly emits proposals you reject manually, add a lightweight memory: - Store rejected proposals and their failure reasons. - Include a small “do-not-repeat patterns” section in the synthesizer prompt. --- ## Optional: Quick Scoring Math (for code) If you implement weighted scoring: - `specificity_mechanism` (0–5) → ×5 - `evidence_quality` (0–5) → ×3 - `testability_evaluation` (0–5) → ×4 - `impact_leverage` (0–5) → ×4 - `scope_risk` (0–5) → ×2 - `feasibility` (0–5) → ×2 Then: ```text score_total = specificity*5 + evidence*3 + testability*4 + impact*4 + scope*2 + feasibility*2 ``` Max = 25 + 15 + 20 + 20 + 10 + 10 = 100. ## What This Buys You - The daily issue becomes consistently reviewable: each proposal is evidence-linked, repo-grounded, testable. - You reduce proposal noise. - You create a natural bridge into LangSmith eval dashboards: synthesizer run → evaluator run → final selection.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/glassBead-tc/Thoughtbox'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

selection_algorithm.md•2.84 KiB