Skip to main content
Glama
aidesignblueprint

AI Design Blueprint Doctrine

Official

architect.validate_consensus

Run multiple parallel reviews of agentic code to obtain a consensus score with uncertainty estimates, removing anchor bias from prior runs. Provides median score, standard deviation, and per-principle stability.

Instructions

Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. Runs N parallel architect.validate calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot architect.validate re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call architect.validate which keeps baseline injection on. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED UserValidationRun row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer , Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. n is the only extra arg (range 2..5). private_session is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries score_consensus_median (headline), score_stdev (honest uncertainty), score_range (min, max), mode_stability_min_pct (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), per_principle (mode + distribution + severity median per principle), and representative_response (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). RECOVERY: same heartbeat pattern as architect.validate — first notification at t=0 carries the run_id; recover via me.validation_history(run_id=) if your client times out. TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: consensus_quorum_failed when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nNoNumber of parallel child runs. Default 3 (the variance signal is visible at N=3; cost = 3× LLM bill). Capped server-side by Settings.consensus_n_max (default 5).
taskNoWhat the agent or workflow is trying to accomplish.
filesNoList of file paths relevant to the implementation.
goalsNoSpecific safety or quality goals to evaluate against.
languageNoProgramming language of the code (e.g. 'python').
focus_areaNoOptional: narrow the review to a principle cluster or slug.
repositoryNoRepository name or path. Lets the consensus row group into the per-project history view alongside single-shot validate runs.
example_limitNoMax curated examples per child run.
implementation_contextYesThe artifact under review. SEND FULL FILE CONTENTS VERBATIM — same constraint as architect.validate. Truncation produces hallucinated findings on code that isn't there.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only give readOnlyHint=false, openWorldHint=true, idempotentHint=false. The description adds: parallel LLM calls, wallclock 80-120s for N=3, cost = N × LLM bill, private_session behavior, persistence of consolidated row only, auth (Bearer, Pro/Teams), recovery heartbeat pattern, and typed failures including consensus-specific 'consensus_quorum_failed'. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with sections (overview, when to call, behavior, auth, inputs, output, recovery, failures). Front-loaded with purpose. Slightly lengthy but every sentence adds value; could be trimmed slightly but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description fully explains output fields (score_consensus_median, score_stdev, etc.) and their meanings. Covers inputs (9 params), behavior, failure modes, and recovery. No gaps given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining parameter interplay: n is the only extra arg (range 2..5, server-capped), private_session is implicit, repository groups into project history, implementation_context must be verbatim. It also clarifies that inputs are same as architect.validate plus n.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool: 'N-shot CONSENSUS doctrine review of agentic code. Runs N parallel architect.validate calls... aggregates to a per-principle MODE verdict + median severity...'. It distinguishes from sibling architect.validate by emphasizing consensus and variance surfacing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit WHEN TO CALL (user wants honest first-pass score with variance) and WHEN NOT TO CALL (need iteration delta, use architect.validate). Directly names the alternative sibling tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aidesignblueprint/integrations'

If you have feedback or need assistance with the MCP directory API, please join our Discord server