architect.validate
Validate agentic workflows against the 10-principle AI Blueprint. Get a readiness score, per-principle findings, and production-ready badge eligibility.
Instructions
Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer , Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as implementation_context (NO truncation, NO ... placeholders, NO comment removal — the architect treats your ... as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. RECOVERY: if your MCP client closes the tool-call early, fetch the result via me.validation_history(run_id=) once the run completes server-side — same Bearer token (per-user auth). Unavailable when private_session=true. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| task | No | What the agent or workflow is trying to accomplish. Adds evaluation context. | |
| files | No | List of file paths relevant to the implementation context. | |
| goals | No | Specific safety or quality goals to evaluate against (e.g. 'prevent irreversible actions', 'explicit approvals'). | |
| language | No | Programming language of the code being evaluated (e.g. 'python', 'typescript'). | |
| focus_area | No | Narrow the evaluation to a specific principle cluster or slug (e.g. 'delegation', 'visibility', 'establish-trust-through-inspectability'). | |
| repository | No | Repository name or path for additional context. | |
| example_limit | No | Maximum number of curated examples to include in recommendations. | |
| private_session | No | Set to true to disable all logging for this validation call. | |
| implementation_context | Yes | The artifact under review. SEND FULL FILE CONTENTS VERBATIM — the architect cites per-line evidence (identifiers, branch ordering, structural choices); any compression destroys evidence and produces hallucinated findings on code that isn't there. CONCRETE DON'TS: do NOT replace docstrings/comments with `...`; do NOT condense multi-line statements; do NOT replace dict/set comprehensions with `{...}`; do NOT remove explanatory comments to save tokens. If the file is large, split into MULTIPLE architect.validate calls scoped by file/module — never truncate one call. Architecture summaries (high-level prose) accepted ONLY for greenfield (no code yet); never as a substitute for code that already exists. |