architect.certify

Instructions

Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. RECOVERY FIRST: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call me.validation_history(run_id='<that-id>') to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with payload_incomplete when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted code argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM payload_incomplete refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: from sqlalchemy.exc import SQLAlchemyError → include a stub class; from app.db import models → include a class models: namespace stub with the columns/methods you reference; module-level imports of dataclass, Literal, json, datetime, timezone MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. The stubs cover IMPORTED dependencies only; the certified code's own enforcement branches (approval gates, policy checks, recovery paths) must be present in full. A # ... placeholder reads as an ABSENT control and is graded against you, not as shorthand for one that exists. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), certification_attempts=[] on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's rejection_reason + guidance are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer , Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`code`	No	The same code that was sent to architect.validate to produce this run_id. Sent verbatim — the cert reviewer needs the actual code to surface production_blockers the first pass missed. May be omitted (empty string) when the prior validate stored the code under the 24h cert-retry hold; in that case the server reuses the stored code automatically. Sent under the same enterprise-safety envelope as architect.validate (transient processing, no training, JSON-escaped + delimited).
`run_id`	Yes	The run_id from a prior architect.validate call. Returned in the validate response when persistence_status='saved'. Must be owned by the caller (per-user authorisation, same gate as me.validation_history).

Output Schema

TableJSON Schema

Name	Required	Description	Default
No arguments

AI Design Blueprint Doctrine

Instructions

Input Schema

Output Schema

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API