Skip to main content
Glama

Server Details

Doctrine and examples for the Agentic AI Blueprint. 13 public tools, no credentials required.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.7/5 across 24 of 24 tools scored.

Server CoherenceA
Disambiguation5/5

Every tool has a distinctly clear purpose. Groups like architect.*, clusters.*, examples.*, guides.*, handoffs.*, me.*, principles.*, signals.*, and team.* each cover separate concerns without functional overlap. Even within groups (e.g., architect.validate vs. architect.validate_consensus), descriptions clearly differentiate single vs. consensus review.

Naming Consistency5/5

All tool names follow a consistent group.verb or group.noun pattern (e.g., clusters.list, clusters.get; examples.search, examples.get; me.add_evidence, me.learning_path). Naming is consistently lowercase with underscores, no mixing of conventions.

Tool Count5/5

24 tools is well-scoped for the server's comprehensive domain covering doctrine discovery, validation with consensus and certification, user learning path, handoffs, and signals. Each tool serves a necessary role without bloat.

Completeness5/5

The tool surface covers full lifecycle for the domain: principle discovery (list, search, get), validation (validate, consensus, certify, history), user progress (add_evidence, coaching_context, learning_path), handoffs (agency, operator, partnership), signals (feedback, report), and team summarization. No obvious gaps for the intended purpose.

Available Tools

24 tools
architect.certifyCertify Production-Ready ArchitectureAInspect

Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. RECOVERY FIRST: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call me.validation_history(run_id='<that-id>') to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with payload_incomplete when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted code argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM payload_incomplete refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: from sqlalchemy.exc import SQLAlchemyError → include a stub class; from app.db import models → include a class models: namespace stub with the columns/methods you reference; module-level imports of dataclass, Literal, json, datetime, timezone MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. The stubs cover IMPORTED dependencies only; the certified code's own enforcement branches (approval gates, policy checks, recovery paths) must be present in full. A # ... placeholder reads as an ABSENT control and is graded against you, not as shorthand for one that exists. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), certification_attempts=[] on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's rejection_reason + guidance are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer , Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.

ParametersJSON Schema
NameRequiredDescriptionDefault
codeNoThe same code that was sent to architect.validate to produce this run_id. Sent verbatim — the cert reviewer needs the actual code to surface production_blockers the first pass missed. May be omitted (empty string) when the prior validate stored the code under the 24h cert-retry hold; in that case the server reuses the stored code automatically. Sent under the same enterprise-safety envelope as architect.validate (transient processing, no training, JSON-escaped + delimited).
run_idYesThe run_id from a prior architect.validate call. Returned in the validate response when persistence_status='saved'. Must be owned by the caller (per-user authorisation, same gate as me.validation_history).

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses long-running nature (60-180s typical, p99 250s), atomic one-shot behavior, first event carrying run_id, and the recovery path. Annotations already indicate non-read-only and non-idempotent, and the description adds rich details like typed rejection reasons and pre-LLM rejection audit trail. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very lengthy and dense, covering many edge cases and details. While well-structured with sections, it could be more concise by reducing repetition (e.g., timeout recovery appears twice). Every sentence adds value, but the overall length may overwhelm agents.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, the description is remarkably comprehensive: it covers eligibility gates, typed rejection reasons, payload completeness requirements, pre-LLM rejection audit, recovery paths, data residency, and privacy. It leaves almost no scenario undocumented.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters have schema descriptions, and the tool description adds significant context: for 'code', it explains why it must match the validated code, when it can be omitted, and the fingerprinting mechanism; for 'run_id', it clarifies it comes from validate response and must be owned. This goes well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it is a 'second-pass adversarial certification' of a prior architect.validate run that scored production_ready, distinguishing it from sibling tools like architect.validate (first pass) and architect.validate_consensus (consensus).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly specifies when to call: only after architect.validate returned production_ready and the code has not been modified. Provides when-not-to-call (other tiers) and a recovery path on timeout using me.validation_history. Also details the retry budget and eligibility gates.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

architect.validateValidate Agent ArchitectureAInspect

Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical); MCP clients commonly close the call before the server returns. Retrying re-runs the 60-180s LLM call from scratch and burns compute. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s (before the LLM call begins) — capture it. On timeout, call me.validation_history(run_id='<that-id>') to fetch the persisted result; the server-side run completes independently within a 20-minute budget. Edge case: if the transport dropped before the first progress notification (very rare; sub-second window), call me.validation_history(repository='<same value you passed here>') to find your most recent run. TASK-AUGMENTED INVOCATION (MCP 2025-11-25, SEP-1686): clients that advertise the tasks capability can task-augment this call by including task: {ttl: <ms>} inside the JSON-RPC request's params (NOT as a tool argument; alongside arguments, _meta, etc.). The server returns a CreateTaskResult immediately (taskId equals the run_id above) and runs the validation in the background. Spec-correct long-running pattern: poll via tasks/get for state, fetch the terminal payload via tasks/result, listen for notifications/tasks/status for push updates, and cancel via tasks/cancel. _meta.progressToken from the original request stays valid for the entire task lifetime. Sync (non-augmented) calls behave exactly as before, backwards-compatible by construction. The me.validation_history(run_id=...) recovery path remains the canonical recovery handle for clients that don't yet advertise the tasks capability. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer , Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as implementation_context (NO truncation, NO ... placeholders, NO comment removal — the architect treats your ... as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. PAYLOAD COMPLETENESS (load-bearing if you intend to architect.certify this run): the validate first-pass is permissive — it scores on doctrine alignment + structural patterns visible in the submitted code. Cert's adversarial second-pass is rigorous — it scores on cert-payload-completeness as well as code correctness. A run that scores 100/A at validate can cert-reject pre-LLM with payload_incomplete when imported modules' surfaces aren't visible. To validate with INTENT TO CERT, also bundle verbatim public-surface stubs for every imported module: from sqlalchemy.exc import SQLAlchemyError → include a stub class; from app.db import models → include a class models: namespace stub with the columns/methods the code references; module-level imports of dataclass, Literal, json, datetime, timezone MUST also be in the payload (cert correctly catches when they're omitted — the module would NameError on import as submitted). 'Submit Like Production': the payload should be the code as it would actually run. TWO COMPLETENESS AXES. (1) IMPORTS: stub the public surface of every dependency (above). (2) ENFORCEMENT BRANCHES: the code under cert itself (approval gates, policy checks, recovery paths) must be the REAL logic, fully written. A placeholder body (# ... execute approved action ..., pass # TODO, a bare ...) is graded as a MISSING control, not shorthand; cert scores what would actually run. Never sketch the agent you are certifying. Empirically reconfirmed PR #157 iter8 → iter9 cert downgrades. SCORE VARIANCE DISCLOSURE (anomaly #10 — empirically documented): validate scores are POINT ESTIMATES with an observed empirical variance band of ~20-67 pts on BYTE-IDENTICAL input. Runs against the same repository, same code, same deterministic seed (the seed is derived from input — same input → same seed) can produce materially different scores AND different top-blocker rankings, because OpenAI's reasoning models at reasoning_effort=high are not strictly deterministic even with the seed parameter pinned. The reproducibility_mode='best_effort' field on every response is the platform's honest disclosure of this property. For decisions where stability matters more than speed, call architect.validate_consensus (N=3-5 aggregated, median verdict + per-principle stability metrics) instead — collapses the variance, surfaces unstable principles explicitly. A single validate run is a single roll; consensus is the right tool when one score isn't enough. ITERATION LOOP — repository keying. Pass the SAME repository value across calls to chain iteration rounds; the validator auto-resolves the most recent prior run on (user, repository, scope) as prior_run_baseline and the LLM grades the new submission with iteration context (per-principle severity deltas surface in the response). Changing the repository string between calls — even subtly with an iter-2 suffix — silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in task or commit messages, never in repository. See the architect-validation-orchestration skill in the agent-asset pack for the full validate → consensus → certify sequence. VERIFICATION LAYERS (the two-layer doctrine this platform practices on itself): validate verifies DOCTRINE ALIGNMENT against the 10-principle Blueprint — design patterns, hand-off explicitness, operational-state inspectability, race/blocker handling at the architectural level. validate does NOT guarantee runtime correctness. cert verifies PAYLOAD COMPLETENESS and runs an adversarial second pass over the submitted code — catches production_blockers the first pass missed, name-errors on import, missing module surfaces, etc. cert does NOT verify runtime correctness either. Passing validate is a NECESSARY condition for production_ready, not a sufficient one. Runtime correctness (does this actually execute and behave?) is verified at the THIRD layer — your tests, types, walks. The platform's own recursive-integrity practice: every PR runs validate against its own primitives, then cert. Real bugs surfaced via this practice in PR #157 — NULL-UUID false-positive (iter3) and tie-breaker mismatch (iter5) — that 25 unit tests had missed. Two-layer verification is the discipline, not 'either/or'. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated). See Payload Completeness above for the common pre-cert pitfall.

ParametersJSON Schema
NameRequiredDescriptionDefault
taskNoWhat the agent or workflow is trying to accomplish. Adds evaluation context.
filesNoList of file paths relevant to the implementation context.
goalsNoSpecific safety or quality goals to evaluate against (e.g. 'prevent irreversible actions', 'explicit approvals').
languageNoProgramming language of the code being evaluated (e.g. 'python', 'typescript').
focus_areaNoNarrow the evaluation to a specific principle cluster or slug (e.g. 'delegation', 'visibility', 'establish-trust-through-inspectability').
repositoryNoIteration key. SAME value across calls auto-resolves the most recent prior run as `prior_run_baseline` for iteration-aware grading (per-principle severity deltas, regressions/improvements). CHANGING the value (even subtly with an `iter-2` suffix) silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in `task`, not here. Empirical evidence of why anchoring matters: PR #157 iter1 33/F vs iter2 100/A on byte-identical baseline-race primitives (+67 spread); invoice-payment-manager #158 38/F vs #159 74/C (+36 spread) — same code, score variance from non-deterministic LLM at reasoning_effort=high; the baseline anchor collapses this onto a stable arc.
example_limitNoMaximum number of curated examples to include in recommendations.
private_sessionNoSet to true to disable logging AND prior-run anchoring AND run_id recovery for this call. Use for private one-shots that don't participate in the iteration arc. Default false.
implementation_contextYesThe artifact under review. SEND FULL FILE CONTENTS VERBATIM — the architect cites per-line evidence (identifiers, branch ordering, structural choices); any compression destroys evidence and produces hallucinated findings on code that isn't there. CONCRETE DON'TS: do NOT replace docstrings/comments with `...`; do NOT condense multi-line statements; do NOT replace dict/set comprehensions with `{...}`; do NOT remove explanatory comments to save tokens. If the file is large, split into MULTIPLE architect.validate calls scoped by file/module — never truncate one call. Architecture summaries (high-level prose) accepted ONLY for greenfield (no code yet); never as a substitute for code that already exists.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description fully discloses behavioral traits: long-running LLM call (60-180s typical, 20-min server budget), non-deterministic scores (variance band 20-67 pts), persistence and recovery mechanisms, effects of private_session, and that it does not verify runtime correctness. This adds significant context beyond the minimal annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very long and contains some repetition (e.g., timeout recovery mentioned twice). While it covers necessary details, it could be more concise and better organized. It is still functional but not as streamlined as it could be.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers all relevant aspects: input requirements, behavioral details, recovery, edge cases, iteration loops, verification layers, typed failures, and next steps. Given the complexity of the tool and the presence of an output schema, the description is remarkably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description enriches parameter meanings: e.g., implementation_context must be verbatim, repository is an iteration key (with warning about changing it), private_session disables logging. It clarifies constraints and best practices beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint.' It distinguishes from siblings like architect.certify and architect.validate_consensus, and explicitly states when not to call (non-agentic plumbing).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'WHEN TO CALL' and 'WHEN NOT TO CALL' sections, recovery procedures for timeouts, guidance on task-augmented invocation, and alternatives like validate_consensus and certification path. It also warns about common pitfalls like truncating implementation_context and changing repository names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

architect.validate_consensusValidate Agent Architecture (Consensus Mode)AInspect

Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call me.validation_history(run_id='<that-id>') to fetch the persisted consensus envelope. Runs N parallel architect.validate calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot architect.validate re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call architect.validate which keeps baseline injection on. CHAIN RESUME: each child runs with private_session=True (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with lifecycle_status='completed' — the next single-shot architect.validate on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the architect-validation-orchestration skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED UserValidationRun row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer , Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. n is the only extra arg (range 2..5). private_session is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries score_consensus_median (headline), score_stdev (honest uncertainty), score_range (min, max), mode_stability_min_pct (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), per_principle (mode + distribution + severity median per principle), and representative_response (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: consensus_quorum_failed when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).

ParametersJSON Schema
NameRequiredDescriptionDefault
nNoNumber of parallel child runs. Default 3 (the variance signal is visible at N=3; cost = 3× LLM bill). Capped server-side by Settings.consensus_n_max (default 5).
taskNoWhat the agent or workflow is trying to accomplish.
filesNoList of file paths relevant to the implementation.
goalsNoSpecific safety or quality goals to evaluate against.
languageNoProgramming language of the code (e.g. 'python').
focus_areaNoOptional: narrow the review to a principle cluster or slug.
repositoryNoIteration key. Consensus children all run unanchored (`private_session=True`), but the consolidated row IS persisted under this key — discoverable as prior baseline for the next single-shot `architect.validate`. Same value across calls keeps the iteration arc inspectable.
example_limitNoMax curated examples per child run.
implementation_contextYesThe artifact under review. SEND FULL FILE CONTENTS VERBATIM — same constraint as architect.validate. Truncation produces hallucinated findings on code that isn't there.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses important behaviors beyond annotations: long-running (~80-120s), non-idempotent (retrying re-runs from scratch), private_session pattern, persistence of consolidated row but not children, and rate limiting. No contradiction with annotations (readOnlyHint=false, destructiveHint=false, idempotentHint=false, openWorldHint=true).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-organized with labeled sections (WHEN TO CALL, BEHAVIOR, etc.) and front-loaded with the most critical information. Some redundancy (e.g., repeating 'private_session=True' multiple times) keeps it from being a 5.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 params, 1 required, output schema present, numerous sibling tools), the description covers all aspects: purpose, usage, behavioral edge cases, parameter details, output structure, and failure modes. Nothing essential is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining `n`'s default and cap, cost implication, importance of `implementation_context` ('SEND FULL FILE CONTENTS VERBATIM'), and `repository` as iteration key. This is above the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs an 'N-shot CONSENSUS doctrine review of agentic code' and distinguishes itself from siblings like `architect.validate` by emphasizing it provides an 'unanchored honest read' with variance. The purpose is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call ('user wants an HONEST first-pass score with variance surfaced') and when not to call ('when you NEED the iteration delta against a prior run', recommending `architect.validate` instead). Also provides recovery procedure on timeout.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assets.listList Agent AssetsA
Read-onlyIdempotent
Inspect

Public — list downloadable doctrine and agent asset artifacts (skill packs, rule packs, MCP setup snippets) the user can drop into their AI coding tool to import the Blueprint as native skill/rule files. Returns a list of assets with name, format (one of: zip / md / markdown / mdc / json / toml / text — the full vocabulary), pack_version, download_url, and platform target (Claude Code, Cursor, Codex, Gemini, Qwen). The response also carries count (length of assets) for symmetry with principles.list / clusters.list / guides.list. WHEN TO CALL: the user asks how to bring the Blueprint into their coding agent, or wants to install it as a local skill/rule file. WHEN NOT TO CALL: for the live MCP tools themselves — those are already available through this server. For doctrine content, prefer principles.list/get and guides.list/get. BEHAVIOR: read-only, idempotent, no auth required. Asset artefacts are regenerated on every deploy from the canonical doctrine.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds valuable context: 'Public', 'no auth required', 'assets regenerated on every deploy'. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with sections for purpose, then usage guidelines, then behavior. Each sentence adds value, though slightly verbose in listing formats. Could be slightly more concise but still very clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and output schema exists, the description fully explains the output fields (name, format, pack_version, download_url, platform target) and the extra 'count' field. Covers all necessary context for an agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters, so baseline is 4. Description does not need to add meaning beyond schema, which already has 100% coverage. No additional param info needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists downloadable doctrine and agent asset artifacts, with specific verb 'list' and resource 'assets'. It distinguishes from sibling tools like principles.list/guides.list by clarifying that those are for doctrine content, while assets.list returns installable files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'WHEN TO CALL' (user asks how to bring Blueprint into coding agent) and 'WHEN NOT TO CALL' (for live MCP tools or doctrine content, suggesting principles.list/get and guides.list/get as alternatives). This leaves no ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clusters.getGet ClusterA
Read-onlyIdempotent
Inspect

Get one principle cluster by stable slug. Returns the cluster definition, shared rationale, and the full set of member principles (slug + title) so the caller can pivot into principles.get without a second list call. WHEN TO CALL: the user has already named a specific cluster (e.g. 'delegation', 'visibility', 'trust', 'orchestration') OR you have a slug from a prior clusters.list / principles.list response and need its full definition + member principles. The response embeds member principle slugs + titles already, so DO NOT loop principles.get over each member to get a cluster overview — read the response. WHEN NOT TO CALL: the user is describing a topic, failure mode, or keyword in natural language (call principles.search instead); the user wants to discover which clusters exist (call clusters.list); the user wants the definition of one specific principle (call principles.get directly). Idempotent + cacheable per slug. Returns 404-shaped error_payload on unknown slug — the slug must match exactly the value emitted by clusters.list, with no normalization.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesStable slug of the principle cluster (e.g. 'delegation', 'visibility', 'trust', 'orchestration').

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare idempotent, read-only, non-destructive. Description adds specifics: returns 404 error_payload on unknown slug, exact slug matching required, no normalization, and cacheability per slug. This provides useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose and clear usage sections. Sentences are efficient though the description is somewhat lengthy. Every sentence contributes valuable information, but slight tightening possible.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having an output schema, the description fully explains return content: cluster definition, shared rationale, and member principles with slug+title. It also covers error behavior and caching, making it complete for a single-parameter get tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single 'slug' parameter described. Description adds examples of valid slugs and notes it's a 'stable slug' and must match exactly. While helpful, it doesn't dramatically enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly identifies the resource (principle cluster) and action (get by stable slug). It distinguishes from siblings like clusters.list, principles.get, and principles.search, ensuring the agent knows this tool is for retrieving a specific cluster's full definition and member principles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call (user named a specific cluster or slug from list response) and when not to call (natural language topics, discovering clusters, getting a single principle). Also advises against looping due to embedded member data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clusters.listList ClustersA
Read-onlyIdempotent
Inspect

List all principle clusters with their stable slugs and linked principle titles. Use this to discover which clusters exist before drilling in with clusters.get or filtering principles.list by cluster. Prefer clusters.get when you already know the cluster slug and need full detail.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds that it returns a list of clusters, consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states purpose and output, second gives usage guidance. No wasted words, front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a zero-parameter list tool with good annotations, clear output, and guidance on sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters, schema coverage 100%, description explains the output content (slugs and linked titles), which adds value beyond the empty schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists all principle clusters with stable slugs and linked principle titles. Distinguishes from clusters.get (full detail) and principles.list (filtering by cluster).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using this to discover clusters before drilling in with clusters.get or filtering principles.list by cluster, and to prefer clusters.get when the slug is known.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

examples.getGet ExampleA
Read-onlyIdempotent
Inspect

Get one curated example by stable slug. Returns title, summary, source-code links, principle coverage (the principle slugs the example demonstrates), difficulty, library/framework, and implementation notes. Use this when you already have the slug from examples.search, a principles.get response, or a guide cross-link; prefer examples.search when filtering by topic / principle / difficulty / library; prefer guides.get when the caller wants a full walkthrough rather than a single reference example. Returns error_payload on unknown slug.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesStable slug of the curated example (e.g. 'agents-building-blocks-5-control').

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint; description adds value by mentioning error_payload on unknown slug, but the safety profile is already clear from annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise two-sentence structure with front-loaded purpose and a brief list of return fields; every sentence is informative without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple one-parameter tool, rich annotations, and presence of output schema, the description fully covers usage, return values, and error behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description of the slug parameter; description does not add new semantic meaning beyond the example slug format implied in usage guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get one curated example by stable slug' and lists specific return fields, distinguishing it from siblings like examples.search and guides.get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: use when slug is known from examples.search, principles.get, or guides cross-link; prefer examples.search for filtering; prefer guides.get for full walkthrough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

examples.searchSearch ExamplesA
Read-onlyIdempotent
Inspect

Search curated examples by free-text query, ranked by relevance, with optional filters: principle_ids (only examples covering those principles), difficulty (beginner/intermediate/advanced), library (e.g. 'langgraph', 'openai'). Returns each match's slug, title, summary, principle coverage, difficulty, library, and source-code link — slug is the handle examples.get hydrates. Default limit 5, capped server-side. Use this when the user describes a use case, technique, or library and wants matching examples; prefer examples.get when you already have the slug; prefer guides.search when the user wants a full walkthrough; prefer principles.search when the user wants doctrine guidance, not an implementation.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results to return. Capped at server maximum.
queryYesFree-text search query matched against example title, summary, and metadata.
libraryNoFilter by library or framework name (e.g. 'langgraph', 'openai', 'anthropic').
difficultyNoFilter by difficulty level.
principle_idsNoFilter to examples that cover these principle IDs.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds useful context: ranking by relevance, default limit 5, server-side capping, and return fields including slug as handle for examples.get. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is succinct: one sentence for core action and filters, one for return fields, one for usage guidance. No redundant information; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema (described in text), rich annotations, and full parameter coverage, the description covers all necessary aspects: operation, filters, output details, usage guidance, and alternative tools. Comprehensive for a search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds explanatory context for filters (principle_ids, difficulty, library) and how query is matched, but does not provide significant new information beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Search curated examples by free-text query', specifying verb and resource. It distinguishes from siblings like examples.get, guides.search, and principles.search with explicit usage guidance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use this tool ('when the user describes a use case...') and when to prefer alternatives ('prefer examples.get when...', 'prefer guides.search...', 'prefer principles.search...'). This provides clear decision rules.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

guides.getGet Application GuideA
Read-onlyIdempotent
Inspect

Get a full application guide by its stable slug (e.g. 'security-application', 'observable-evaluation'). Returns sections, action items, and linked principles. Use this when you already have the guide slug from guides.list or guides.search. Prefer guides.search when the user describes a topic in natural language; prefer guides.list when you need the full inventory.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesStable slug of the application guide (e.g. 'security-application', 'observable-evaluation').

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds return content (sections, action items, linked principles). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, then usage guidance. Every sentence is valuable and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, description covers return content. Single required parameter fully documented. Complete for this tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with description for slug. Description adds context about needing slug from list/search, adding meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it gets a full application guide by slug, and distinguishes from siblings guides.list and guides.search by specifying when to use each.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says use this when you have the slug from guides.list/search, and prefers guides.search for natural language topics and guides.list for full inventory.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

guides.listList Application GuidesA
Read-onlyIdempotent
Inspect

List application guides that show how Blueprint principles apply to engineering challenges (security, evaluation, observability, etc.). Use this to discover which guides exist before drilling in. Prefer guides.search when the user describes a topic or failure mode in natural language. Prefer guides.get when you already know the guide slug and need full detail.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description adds contextual purpose but does not disclose additional behavioral traits beyond what annotations imply. With strong annotations, this is adequate but not above average.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no fluff: first sentence states purpose, next two give usage guidance and alternatives. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no parameters and an existing output schema, the description fully explains the tool's role and usage. No need for return value details since output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has no parameters; schema description coverage is 100%. Description does not need to add parameter info, and baseline for 0 parameters is 4. No param info in description, but schema covers it fully.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list' and the resource 'application guides', specifies the content (Blueprint principles applied to engineering challenges), and distinguishes from siblings by mentioning alternatives for drilling in.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use this tool ('discover which guides exist before drilling in') and when to prefer alternatives (guides.search for natural language queries, guides.get for known slugs). No ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

guides.searchSearch Application GuidesA
Read-onlyIdempotent
Inspect

Search application guides by free-text query, matched against section answers and action items. Use this when the user describes an engineering challenge (security review, evaluation harness, observability) and wants matching guides. Prefer guides.get when you already have the guide slug; prefer guides.list when you need the full inventory.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results to return. Capped at server maximum.
queryYesFree-text search query matched against all guide content including section answers and action items.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so description's burden is lighter. The description adds the important detail that search matches against section answers and action items, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with core purpose, then immediately provides usage guidance. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, output schema exists, high schema coverage, rich annotations), the description covers purpose, usage, alternatives, and search scope comprehensively. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions already present. Description does not add further parameter-specific details, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb ('Search'), resource ('application guides'), and matching scope ('section answers and action items'). Distinguishes from siblings (guides.get, guides.list) by specifying when to use each.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool (engineering challenge, wants matching guides) and when to use alternatives (guides.get for known slug, guides.list for full inventory).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

handoffs.agencyRequest Agency HandoffAInspect

Authenticated — submit an agency engagement enquiry on behalf of the caller for a founder-led discovery call. Persists an AgencyHandoff row routed to the agency inbox; the user is contacted by the team for a scoped proposal. Engagement scopes: workflow sprint (rapid agentic workflow implementation), proof-of-concept (validate a specific agent design in a bounded timeframe), pilot support (co-design and validate a production-ready pilot), advisory (ongoing architectural guidance across a product team). WHEN TO CALL: the user has identified a paid hands-on expert engagement need beyond self-service learning, and explicitly asks to talk to the team or book a discovery call. ALWAYS confirm with the user before firing — this creates a sales-visible record. WHEN NOT TO CALL: for free training / partnerships discussion (use handoffs.partnership); for support / billing / access (use handoffs.operator); proactively or as a sales push. BEHAVIOR: write-only, single insert, side-effecting. Auth: Bearer (Firebase ID token, any plan). UK/EU residency. Response confirms the ticket id + scope so the user can reference it.

ParametersJSON Schema
NameRequiredDescriptionDefault
roleNoRole or title of the person submitting the agency inquiry.
localeNoResponse locale for the acknowledgment.en
reasonYesDescription of the engagement need: workflow sprint, proof-of-concept, pilot support, or advisory.
companyNoCompany or team name submitting the agency inquiry.
websiteNoWebsite or relevant URL for the team or project.
agent_nameNoName of the agent or client triggering the handoff.mcp-client
support_typeNoType of support needed.
trace_summaryNoOptional agent trace summary for operator context.
agent_platformNoPlatform or runtime the agent is running on.
workflow_stageNoCurrent workflow stage.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only provide readOnlyHint=false, destructiveHint=false, etc. Description adds key traits: 'write-only, single insert, side-effecting', auth method (Bearer token, any plan), UK/EU residency, and response confirms ticket ID + scope. Full behavioral disclosure beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is dense but well-structured with clear sections (engagement scopes, WHEN TO CALL/WHEN NOT TO CALL, BEHAVIOR). At ~150 words, it's appropriately sized for the complexity, though could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 10 parameters (1 required) and existing output schema, description covers purpose, usage, safety, auth, residency, user confirmation, and behavior. No gaps remain for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description lists engagement scopes in body but schema already describes reason field. No additional semantic value for other parameters that isn't already in schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'submit an agency engagement enquiry on behalf of the caller for a founder-led discovery call' with specific verb and resource. Distinguishes from siblings by explicitly naming handoffs.partnership and handoffs.operator in the WHEN NOT TO CALL section.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit WHEN TO CALL (user identifies paid engagement need and asks to talk to team) and WHEN NOT TO CALL (free training/partnerships, support/billing/access, proactive sales push) with alternatives named. Also instructs to confirm with user before firing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

handoffs.operatorRequest Operator HandoffAInspect

Authenticated — creates a support handoff record when an agent needs human review, account-specific escalation, or operator follow-up that cannot be resolved with the read-only doctrine tools. Persists a SupportHandoff row (reason, topic, page_url, agent_name, agent_platform, trace_summary, user_email) routed to the support inbox; user is contacted by the team. WHEN TO CALL: user explicitly asks for human help, hits a billing/access issue, or the agent has tried the doctrine tools and the user still needs a human. ALWAYS confirm with the user before firing — this creates a human-visible ticket. WHEN NOT TO CALL: proactively, silently, or to log debugging traces (use diagnostic logs instead); for partnerships/agency enquiries (use handoffs.partnership / handoffs.agency); for content questions answerable by principles.search / guides.search. BEHAVIOR: write-only, single insert, side-effecting (creates a ticket the team will see). Auth: Bearer (any plan). UK/EU residency. Response confirms ticket id + topic so the user can reference it.

ParametersJSON Schema
NameRequiredDescriptionDefault
topicNoTopic category for routing (e.g. 'agent', 'billing', 'access', 'general').agent
localeNoResponse locale for the handoff acknowledgment.en
reasonYesClear description of why a human operator review is needed.
page_urlNoURL of the page or context where the handoff was triggered.
agent_nameNoName of the agent or client triggering the handoff.mcp-client
trace_summaryNoOptional summary of the agent's recent actions or trace for operator context.
agent_platformNoPlatform or runtime the agent is running on (e.g. 'claude-code', 'cursor', 'copilot').

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=false, destructiveHint=false), description adds 'write-only, single insert, side-effecting (creates a ticket the team will see)'. Also mentions auth (Bearer token) and UK/EU residency. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: purpose, when to call, when not to call, behavior. Each sentence adds value. Could be slightly more concise, but appropriate for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all relevant aspects: authentication, residency, side effects, routing, user confirmation requirement, and response confirmation of ticket id + topic. Given open world hint and availability of output schema, description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so schema already documents all parameters. Description adds context by listing the fields stored in the SupportHandoff row (reason, topic, page_url, etc.) and explaining routing. This adds meaning beyond the schema, though not extensively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'creates a support handoff record' with specific verb and resource. Distinguishes from sibling tools handoffs.partnership and handoffs.agency by explicitly saying when to use them instead for partnerships/agency enquiries. Also contrasts with diagnostic logs and content tools. Purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit WHEN TO CALL (user asks for human help, billing/access issues, tried doctrine tools) and WHEN NOT TO CALL (proactively, silently, logging traces, partnerships/agency). Includes ALWAYS confirm with user. This gives comprehensive guidance for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

handoffs.partnershipRequest Partnership HandoffAInspect

Authenticated — creates a partnerships handoff record for design-partner, ecosystem, training, or advisory conversations needing human review. Persists a PartnershipHandoff row routed to the partnerships inbox; the user is contacted by the team. WHEN TO CALL: user explicitly wants to engage as a design partner, co-marketing/training partner, or evaluate the Blueprint for their org's training programme. ALWAYS confirm with the user before firing — this creates a human-visible partnerships ticket. WHEN NOT TO CALL: for general support / billing / access issues (use handoffs.operator); for paid-engagement enquiries (use handoffs.agency); proactively or as a sales prompt — only when the user has explicitly asked. BEHAVIOR: write-only, single insert, side-effecting (creates a ticket). Auth: Bearer (any plan). UK/EU residency. Response confirms the ticket id + audience so the user can reference it.

ParametersJSON Schema
NameRequiredDescriptionDefault
roleNoRole or title of the person submitting the partnership inquiry.
topicNoPartnership topic category.ecosystem
localeNoResponse locale for the handoff acknowledgment.en
reasonYesClear description of the partnership opportunity or inquiry.
websiteNoWebsite of the organization for additional context.
agent_nameNoName of the agent or client triggering the handoff.mcp-client
organizationNoName of the organization or company making the partnership inquiry.
trace_summaryNoOptional agent trace summary for operator context.
agent_platformNoPlatform or runtime the agent is running on.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses write-only, single insert, side-effect, authentication, residency, and response content, adding context beyond annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections, but slightly verbose. Every sentence is valuable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all aspects: purpose, usage, behavior, auth, residency, and response. Completes the tool definition given complexity and existing structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are already documented. Description does not add new meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb (creates) and resource (partnership handoff record) and distinguishes from siblings by naming alternative tools for different scenarios.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit WHEN TO CALL and WHEN NOT TO CALL sections, including specific sibling tools (handoffs.operator, handoffs.agency) and instruction to confirm with user.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

me.add_evidenceAdd Evidence NoteAInspect

Authenticated — append a free-text evidence note to a specific stage in the caller's active course. Notes record concrete implementation observations, decisions, or artefacts that demonstrate progress through a Blueprint principle (e.g. how a delegation boundary was implemented, what approval flow was chosen and why). Persisted as UserStageEvidence rows scoped to (user_id, course_slug, stage_slug). WHEN TO CALL: AFTER the user has articulated something concrete they have built, observed, or decided — not to capture intent or speculation. Pair with me.coaching_context to close evidence gaps. WHEN NOT TO CALL: to log every conversation turn; to record planning, ideas, or todos; on behalf of another user; without the user's awareness (they should know their progress is being recorded). BEHAVIOR: write-only, single insert. Auth: Bearer (Firebase ID token, any plan). UK/EU residency. Notes are visible only to the owning user and are surfaced on me.learning_path / me.coaching_context. Confirms the stage_slug + course_slug pair in the response so the user can see which stage was credited.

ParametersJSON Schema
NameRequiredDescriptionDefault
noteYesEvidence note to append to the delegation boundary notes for this stage.
stage_idYesID of the stage to append the evidence note to.
course_slugYesSlug of the course the stage belongs to (e.g. 'agentic-fundamentals').

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false. The description adds substantial behavioral context: 'write-only, single insert', authentication requirements (Bearer token, any plan), residency constraint (UK/EU), visibility (only to owning user), and response confirmation. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with section headers (Authenticated, WHEN TO CALL, etc.) and front-loads the core action. While slightly verbose, every sentence adds value, and the format aids comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers auth, residency, data visibility, pairing with other tools, and response behavior. An output schema exists (confirmed by context), and the description complements it by explaining what the response confirms (stage_slug+course_slug). For a single-insert tool, this is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers all 3 parameters with descriptions, achieving 100% coverage. The tool description does not add significant new meaning beyond the schema, as the description for the 'note' parameter is identical. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states specifically that the tool appends a free-text evidence note to a specific stage in the caller's active course. It uses a specific verb ('append') and resource ('evidence note to a specific stage'), and distinguishes itself from related tools like me.coaching_context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'WHEN TO CALL' and 'WHEN NOT TO CALL' sections, giving clear guidance on appropriate contexts (after concrete actions) and exclusions (not for intent, not for others, not every turn). It also suggests pairing with me.coaching_context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

me.coaching_contextGet My Coaching ContextA
Read-onlyIdempotent
Inspect

Authenticated — returns stages in the caller's active course where recorded evidence is thin relative to the stage's principle requirements. Each thin stage carries the missing principle slugs + a short diagnostic so the caller can suggest the user record concrete evidence. WHEN TO CALL: when the user asks 'what should I work on next' or 'what's weak in my Blueprint progress'; before suggesting which guide/example to consult. Pair with me.add_evidence to close gaps. WHEN NOT TO CALL: to lecture the user on principles they have already satisfied; on every conversation turn (state changes only when evidence is added). BEHAVIOR: read-only, idempotent. Auth: Bearer (any plan). Returns thin_stages list with stage slug, course slug, missing principles, evidence_count, and a coaching_note.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description states 'read-only, idempotent' which matches annotations, and adds auth details ('Bearer <token>') and return structure (thin_stages list with fields). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with labeled sections like 'WHEN TO CALL' and 'BEHAVIOR'. It is front-loaded with the core purpose and every sentence adds value without unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and existence of output schema, the description fully covers behavior, authentication, and return fields (thin_stages with slug, missing principles, etc.). No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so baseline is 4. The description implicitly confirms no arguments are needed. Schema coverage is 100% trivially, so no additional meaning required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns stages with thin evidence relative to principle requirements. It uses specific verbs ('returns') and resource ('stages in active course'), and distinguishes from siblings like me.add_evidence by noting pairing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides 'WHEN TO CALL' (e.g., user asks 'what should I work on next') and 'WHEN NOT TO CALL' (e.g., every turn, on satisfied principles), plus suggests pairing with me.add_evidence. This is exemplary guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

me.learning_pathGet My Learning PathA
Read-onlyIdempotent
Inspect

Authenticated — returns the caller's Blueprint learning-path state: current course slug, stage progress, certification status (Foundation, Practitioner, Capstone), Capstone track eligibility flags, and the next recommended stage. WHEN TO CALL: the user asks 'where am I', 'what's next', or 'am I Capstone-eligible'; before suggesting next-step coaching content. WHEN NOT TO CALL: as a heartbeat (state changes only when the user completes a stage); to read another user's progress. BEHAVIOR: read-only, idempotent. Auth: Bearer (any plan, including basic). Returns user_email, course_slug, stages list with completion timestamps, certification block, and a next_stage hint.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint; description reinforces with 'read-only, idempotent' and adds auth details (Bearer token, any plan). No contradiction; fully discloses behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (authenticated, returns, when to call, when not, behavior). Concise yet informative; every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a zero-parameter, read-only tool with full annotation coverage and an output schema. Covers auth, use cases, and behavioral notes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters (0 params, schema coverage 100%), so no parameter info is needed. Baseline 4 is appropriate; description adds value by explaining what the tool returns, which is covered by output schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'returns' and names the exact resource 'caller's Blueprint learning-path state', listing key fields (course slug, stage progress, certification status). It clearly distinguishes from sibling tools like me.coaching_context and me.validation_history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states WHEN TO CALL (user asks 'where am I', 'what's next', or 'am I Capstone-eligible') and WHEN NOT TO CALL (as heartbeat, to read another user's progress). Provides actionable guidance for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

me.validation_historyMy Architect Agent Validation HistoryA
Read-onlyIdempotent
Inspect

Pro/Teams — return the authenticated user's architect.validate run history with the Blueprint Readiness Score (0-100), letter grade (A-F), and tier (draft, emerging, production_ready). Three lookup modes: (1) run_id=<id> returns a SINGLE run with the full persisted result_json — use this to RECOVER a result when your MCP client tool-call timed out before architect.validate returned. The run completes server-side and persists; the run_id is surfaced in the first progress notification of every architect.validate call so you have the recovery handle even when your client gives up early. (2) repository=<name> returns the full per-run trend for that repository plus a regression diff between the latest two runs. (3) No arguments returns one summary per repository the user has validated, sorted by most recent. Use modes (2) or (3) BEFORE calling architect.validate again on the same repository — they tell you which principles regressed since the last run, so you can focus the new review on what is actually changing. Auth: Bearer . Pro or Teams plan required.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of runs to return when scoped to a single repository. Capped at 50. Ignored when `run_id` is provided.
run_idNoSingle-run lookup by run_id (UUID). Returns the persisted result_json verbatim — the same payload architect.validate would have returned if your client hadn't timed out. Use this to recover a result when your MCP tool-call closed before the server returned. Per-run authorisation: returns only runs owned by the calling user.
repositoryNoRepository name or path to scope the history to. Pass the same value you would pass to architect.validate. Omit to get one summary per repository. Mutually exclusive with `run_id` — if both are passed, `run_id` wins.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds valuable context about recovering from timeouts, progress notifications, server-side persistence, and per-run authorization. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly long (about 150 words) but well-structured with numbered modes and front-loaded with key info. Every sentence adds value, though slight conciseness improvements could be made.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (three modes, recovery mechanism, output schema exists), the description covers everything needed: three modes, how to recover timed-out calls, trend/regression analysis, and auth requirements.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description elaborates on the purpose of each mode and how they work, especially the run_id recovery mechanism, adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns the authenticated user's architect.validate run history with scores, grades, tiers, and three lookup modes. It distinguishes from sibling tools like architect.validate (which runs validation) and others, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use each mode, including guidance to use modes (2) or (3) before calling architect.validate again to see regressions. It also mentions auth and plan requirements. However, it does not explicitly state when NOT to use this tool, though the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

principles.getGet PrincipleA
Read-onlyIdempotent
Inspect

Get one Blueprint principle by stable slug. Returns id, title, cluster, definition, rationale, risk-if-violated, implementation heuristics, and linked example slugs (which examples.get can hydrate). Use this when you already have the exact slug from principles.list or principles.search; prefer principles.search when the user describes a topic or failure mode in natural language; prefer principles.list when you need every principle or every principle within a cluster. Returns error_payload on unknown slug.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesStable slug of the principle (e.g. 'establish-trust-through-inspectability').

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds the behavioral note that it returns error_payload on unknown slug, which is valuable context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured: first sentence states purpose and output, second provides usage guidance, third handles error case. No unnecessary words, and key info is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get-by-slug tool with a single parameter, the description is complete. It lists fields returned, addresses error handling, and with an output schema present, return value details are covered. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with one required 'slug' parameter. The description adds meaning by providing an example ('establish-trust-through-inspectability') and clarifying it's a 'stable slug', which goes beyond the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get one Blueprint principle by stable slug.' It lists the specific fields returned and differentiates itself from sibling tools like principles.list and principles.search, making it unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('when you already have the exact slug') and when to prefer alternatives (principles.search for natural language, principles.list for all or cluster-filtered). It also notes the error case for unknown slugs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

principles.listList PrinciplesA
Read-onlyIdempotent
Inspect

List all 10 Blueprint principles with stable slugs, titles, and clusters. Use this when you need the full inventory or want every principle in one cluster (pass cluster slug to filter). Prefer principles.search when the user describes a topic, failure mode, or keyword in natural language. Prefer principles.get when you already know the exact slug and need full detail.

ParametersJSON Schema
NameRequiredDescriptionDefault
clusterNoCluster slug to filter by (e.g. 'delegation', 'visibility', 'trust', 'orchestration'). Omit to return all principles.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description adds that results include stable slugs, titles, clusters and filtering behavior, which is useful beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose, second gives usage guidance. No extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Single optional parameter with full schema description, output schema exists, and description covers all needed context for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. Description provides concrete examples of cluster slugs (e.g., 'delegation', 'visibility') and clarifies behavior when omitted.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'List all 10 Blueprint principles' with specific verb and resource. Distinguishes from sibling tools by naming principles.search and principles.get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool (full inventory, filter by cluster) and when to prefer alternatives (principles.search for natural language, principles.get for known slug).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

principles.searchSearch PrinciplesA
Read-onlyIdempotent
Inspect

Search Blueprint principles by free-text query and return the closest matches ranked by relevance. Use this to find principles related to a specific design challenge, failure mode, or keyword (e.g. 'reversibility', 'approval flow', 'delegation boundary'). Returns principle title, cluster, definition, rationale, and implementation heuristics. Prefer this over principles.list when you have a specific topic in mind rather than wanting all principles.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results to return. Capped at server maximum.
queryYesFree-text search query matched against principle title, definition, rationale, and cluster.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds that it returns principle title, cluster, definition, rationale, and implementation heuristics, and results are ranked by relevance, providing full behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. Front-loaded with action ('Search Blueprint principles'), then usage guidance, then returned fields. Every sentence is meaningful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return type is covered. Description explains what fields are returned and the ranking approach. For a search tool with good annotations and output schema, this is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. Description adds meaning by explaining that query is matched against title, definition, rationale, cluster, and that limit is maximum results capped at server maximum. This adds value beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool searches Blueprint principles by free-text query, returns closest matches ranked by relevance. It distinguishes from the sibling 'principles.list' by specifying this is for specific topics rather than listing all.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool (when you have a specific topic, e.g., 'reversibility', 'approval flow') and when to prefer alternatives (use 'principles.list' for all principles). Provides concrete examples.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

signals.feedbackSubmit FeedbackAInspect

Public — records explicit free-text user feedback about the Blueprint, this tool surface, or a specific principle/example. Captures category (bug, doctrine_critique, missing_example, ergonomics, other), free-text body, and optional contact_email when permission_to_follow_up is true. WHEN TO CALL: ONLY when the user explicitly says they want to give feedback (e.g. 'can you log this as feedback', 'file this critique', 'send a bug report'). Use signals.report instead for value-moment metrics (rating validate's output 1-5). WHEN NOT TO CALL: proactively, silently, or to substitute for signals.report. Never harvest contact info without explicit permission_to_follow_up=true. BEHAVIOR: write-only, no auth required (open to all callers), single insert into UserFeedback. UK/EU residency. contact_email is stored ONLY when permission_to_follow_up=true, and that fact is confirmed back in the response so the user can see the privacy boundary.

ParametersJSON Schema
NameRequiredDescriptionDefault
surfaceNoWhich Blueprint surface the feedback is about. Use 'mcp' if the session was via Claude Code or another MCP client. Use 'principles', 'examples', 'guides', 'coaching', or 'validation' based on what the user interacted with.
task_typeNoWhat the user was doing when they decided to give feedback. Use plain English — e.g. 'code-review', 'architecture-design', 'agent-setup', 'onboarding', 'validation'. Infer from context.
what_helpedNoAsk the user: 'What was most helpful?' Record their answer verbatim or paraphrased in plain English. Max 1000 chars. No code snippets, no proprietary content.
what_missingNoAsk the user: 'What was missing or could be improved?' Record their answer verbatim or paraphrased. Max 1000 chars.
contact_emailNoOnly ask for this if the user explicitly says they want a follow-up response. Never prompt for email unprompted. Only stored when permission_to_follow_up=true.
rating_clarityNoAsk the user: 'How clear was the Blueprint guidance? Rate 1–5.' 1 = very unclear, 5 = very clear. Only set if the user gives an explicit number.
would_use_againNoAsk the user: 'Would you use the Blueprint again for a similar task?' Set true/false based on their answer. Only set if they answer explicitly.
rating_usefulnessNoAsk the user: 'How useful was the Blueprint for this task? Rate 1–5.' 1 = not useful, 5 = very useful. Only set if the user gives an explicit number.
permission_to_follow_upNoSet to true only if the user explicitly said they want a follow-up. Must be confirmed before storing contact_email.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description reveals write-only nature, no auth required, single insert into UserFeedback, UK/EU residency, and privacy boundary for contact_email. Annotations provide readOnlyHint=false and destructiveHint=false, but description adds critical behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single paragraph that front-loads purpose and usage guidelines. Every sentence adds value, though slightly verbose. Could be more concise but well-structured overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 optional parameters and no required fields, the description comprehensively covers usage context, when to call, parameter semantics, privacy considerations, and behavioral constraints. Output schema existence is mentioned in behavioral transparency, providing complete guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good parameter descriptions. The description adds value by providing behavioral guidance for when to ask each parameter (e.g., contact_email only when permission_to_follow_up is true) and ethical constraints (never harvest contact info without explicit permission).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool records explicit free-text user feedback about the Blueprint, specifying categories and data captured. It distinguishes from sibling tool 'signals.report' by noting that sibling is for value-moment metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call: 'ONLY when the user explicitly says they want to give feedback' with examples. Also states when not to call: proactively, silently, or as substitute for signals.report, providing clear alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

signals.reportReport Value EventAInspect

Pro/Teams — records a value moment (review_confidence, runtime_risk_found, regression_caught, recommendation_taken) after a successful architect.validate or design session. Each event captures event_type, surface_used (mcp/web/cli), perceived_value (1-5), and an optional brief_context — structured fields only, NO prompts or code stored. WHEN TO CALL: after architect.validate returns a clearly useful result AND the user has acknowledged the value (or you ask them "would you rate this 1-5?"). Validate's response carries an explicit next_step instruction telling the agent to OFFER this call — surface that offer to the user. WHEN NOT TO CALL: silently or without the user's awareness; on every validate (only after a clear value moment); to capture intent or speculative value. If the user declines, do not retry within the same session. BEHAVIOR: write-only, single insert into ValueEvent. Auth: Bearer , Pro or Teams plan required. UK/EU residency. Do NOT include proprietary code, prompt content, or PII in brief_context — it surfaces in admin AI-visibility dashboards. Expect a 1-line acknowledgment in the response; the structured feedback is then aggregated server-side.

ParametersJSON Schema
NameRequiredDescriptionDefault
team_sizeNoIf the user mentions their team size during the session, record it here. Do not ask for it explicitly — only capture if volunteered.
event_typeYesPick the type that best matches what just happened: 'review_confidence' — architect.validate returned aligned; 'runtime_risk_found' — architect.validate found violations; 'workflow_clarity' — principles/examples clarified a design decision; 'agent_setup_success' — user successfully wired up an agent or MCP tool; 'onboarding_helped' — user understood how to start using the Blueprint; 'research_time_saved' — user found relevant doctrine faster than expected; 'team_alignment' — Blueprint helped align a team on agentic design; 'other' — use only if none of the above fit.
surface_usedNoWhere the value was experienced. Use 'mcp' when called from Claude Code, Cursor, Windsurf, or any MCP client. Use 'principles' if the user was browsing or searching principles. Use 'examples' if the user was reading implementation examples. Use 'for-agents' if the user came via the /for-agents page. Use 'learn' or 'certification' for course-related sessions.
brief_contextNo1–2 plain-English sentences summarising what was helpful. Example: 'Validation identified a missing approval gate before email send.' No code snippets, no proprietary content, no user PII. Max 500 chars.
workflow_stageNoInfer from what the user was doing: 'exploring' — reading doctrine, browsing principles; 'designing' — planning architecture or agent flows; 'implementing' — writing or refactoring code; 'reviewing' — running architect.validate on existing code; 'shipping' — preparing for production or deployment.
perceived_valueNoAsk the user: 'On a scale of 1–5, how valuable was this session?' Map their answer directly: 1=low, 5=high. Do not guess — only set this if the user gave an explicit score.
would_recommendNoAsk the user: 'Would you recommend the Blueprint to a colleague?' Set true/false based on their answer. Only set if asked — do not assume.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavioral traits beyond annotations: 'write-only, single insert into ValueEvent', authentication requirements ('Bearer <token>, Pro or Teams plan required'), and data residency constraints ('UK/EU residency'). It also specifies what not to include in brief_context, which is crucial for compliance. Annotations only indicate non-read-only, non-idempotent, non-destructive; the description adds substantive context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (WHEN TO CALL, WHEN NOT TO CALL, BEHAVIOR) and uses bold headers for emphasis. It is relatively long but every sentence adds necessary context. Minor improvement could be to condense some repetitive points, but overall it is efficiently organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters with 100% schema coverage and the presence of an output schema, the description fully explains the tool's context. It covers prerequisites (Pro/Teams), authentication, geographic restrictions, ethical use (not to call silently), and a clear workflow linkage to architect.validate. The description leaves no obvious gaps for an agent to misinterpret.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, meaning every parameter already has a description in the schema. However, the description adds significant value by providing usage guidance for parameters like perceived_value ('Ask the user...') and would_recommend ('Ask the user...'). It also clarifies the meaning of fields like brief_context with examples and restrictions, going beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'records a value moment ... after a successful architect.validate or design session.' It lists specific event types and explicitly distinguishes from siblings by indicating the tool is for post-session value reporting, unlike other tools like signals.feedback or architect.validate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-call ('after architect.validate returns a clearly useful result AND the user has acknowledged the value') and when-not-to-call ('silently or without the user's awareness; on every validate'). Also includes a note about not retrying if declined. The description references architect.validate's next_step instruction, giving clear context for invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

team.summarizeSummarize Team UsageA
Read-onlyIdempotent
Inspect

Pro/Teams — summarises the caller's tool-usage patterns and value signals over a configurable window (default 30 days). Returns tool_call_counts, top principles cited in validate runs, value_event_counts by event_type, and an aggregate readiness trend. WHEN TO CALL: the user asks 'how is the Blueprint helping me/my team', 'what should I explore next', or 'show me my Blueprint usage'. WHEN NOT TO CALL: proactively or on every conversation turn (the summary is an explicit retrospective, not telemetry); to compare users (returns only the caller's own data). BEHAVIOR: read-only, idempotent over the same window. Aggregates from AIToolCallLog + ValueEvent + AIValidationRunLog. Pass private_session=true to bypass server-side logging for this summary call (the underlying historical data still exists; only this read is untracked). Auth: Bearer , Pro or Teams plan. UK/EU residency.

ParametersJSON Schema
NameRequiredDescriptionDefault
days_backNoNumber of days of usage history to include in the summary.
private_sessionNoSet to true to skip logging this summary call.

Output Schema

ParametersJSON Schema
NameRequiredDescription

No output parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description adds value by detailing read-only and idempotent behavior, data sources, and private_session behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: summary, when to call/not call, behavior, auth, residency. Concise but thorough. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, description covers return fields adequately. Mentions data sources and constraints (plan, residency). No gaps for this complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds context: explains default window (30 days) aligns with days_back, and describes private_session flag's purpose (bypass logging, not delete data). Provides meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it 'summarises the caller's tool-usage patterns and value signals over a configurable window'. It lists returned fields and distinguishes from sibling tools (no other sibling provides usage summary).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit WHEN TO CALL and WHEN NOT TO CALL sections with example user queries and clear exclusions (proactive calls, comparing users). Provides definitive context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources