Skip to main content
Glama
205,114 tools. Last updated 2026-06-16 04:11

"namespace:io.github.elestirelbilinc-sketch" matching MCP tools:

  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. **RECOVERY FIRST**: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. The stubs cover IMPORTED dependencies only; the certified code's own enforcement branches (approval gates, policy checks, recovery paths) must be present in full. A `# ...` placeholder reads as an ABSENT control and is graded against you, not as shorthand for one that exists. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. **RECOVERY FIRST**: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. The stubs cover IMPORTED dependencies only; the certified code's own enforcement branches (approval gates, policy checks, recovery paths) must be present in full. A `# ...` placeholder reads as an ABSENT control and is graded against you, not as shorthand for one that exists. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. **RECOVERY FIRST**: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. The stubs cover IMPORTED dependencies only; the certified code's own enforcement branches (approval gates, policy checks, recovery paths) must be present in full. A `# ...` placeholder reads as an ABSENT control and is graded against you, not as shorthand for one that exists. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical); MCP clients commonly close the call before the server returns. Retrying re-runs the 60-180s LLM call from scratch and burns compute. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s (before the LLM call begins) — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted result; the server-side run completes independently within a 20-minute budget. Edge case: if the transport dropped before the first progress notification (very rare; sub-second window), call `me.validation_history(repository='<same value you passed here>')` to find your most recent run. TASK-AUGMENTED INVOCATION (MCP 2025-11-25, SEP-1686): clients that advertise the `tasks` capability can task-augment this call by including `task: {ttl: <ms>}` inside the JSON-RPC request's `params` (NOT as a tool argument; alongside `arguments`, `_meta`, etc.). The server returns a `CreateTaskResult` immediately (taskId equals the run_id above) and runs the validation in the background. Spec-correct long-running pattern: poll via `tasks/get` for state, fetch the terminal payload via `tasks/result`, listen for `notifications/tasks/status` for push updates, and cancel via `tasks/cancel`. `_meta.progressToken` from the original request stays valid for the entire task lifetime. Sync (non-augmented) calls behave exactly as before, backwards-compatible by construction. The me.validation_history(run_id=...) recovery path remains the canonical recovery handle for clients that don't yet advertise the tasks capability. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer <token>, Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as `implementation_context` (NO truncation, NO `...` placeholders, NO comment removal — the architect treats your `...` as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="<name>" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. PAYLOAD COMPLETENESS (load-bearing if you intend to architect.certify this run): the validate first-pass is permissive — it scores on doctrine alignment + structural patterns visible in the submitted code. Cert's adversarial second-pass is rigorous — it scores on cert-payload-completeness as well as code correctness. A run that scores 100/A at validate can cert-reject pre-LLM with `payload_incomplete` when imported modules' surfaces aren't visible. To validate with INTENT TO CERT, also bundle verbatim public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods the code references; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — the module would NameError on import as submitted). 'Submit Like Production': the payload should be the code as it would actually run. TWO COMPLETENESS AXES. (1) IMPORTS: stub the public surface of every dependency (above). (2) ENFORCEMENT BRANCHES: the code under cert itself (approval gates, policy checks, recovery paths) must be the REAL logic, fully written. A placeholder body (`# ... execute approved action ...`, `pass # TODO`, a bare `...`) is graded as a MISSING control, not shorthand; cert scores what would actually run. Never sketch the agent you are certifying. Empirically reconfirmed PR #157 iter8 → iter9 cert downgrades. SCORE VARIANCE DISCLOSURE (anomaly #10 — empirically documented): validate scores are POINT ESTIMATES with an observed empirical variance band of ~20-67 pts on BYTE-IDENTICAL input. Runs against the same repository, same code, same deterministic seed (the seed is derived from input — same input → same seed) can produce materially different scores AND different top-blocker rankings, because OpenAI's reasoning models at reasoning_effort=high are not strictly deterministic even with the seed parameter pinned. The `reproducibility_mode='best_effort'` field on every response is the platform's honest disclosure of this property. For decisions where stability matters more than speed, call `architect.validate_consensus` (N=3-5 aggregated, median verdict + per-principle stability metrics) instead — collapses the variance, surfaces unstable principles explicitly. A single validate run is a single roll; consensus is the right tool when one score isn't enough. ITERATION LOOP — repository keying. Pass the SAME `repository` value across calls to chain iteration rounds; the validator auto-resolves the most recent prior run on (user, repository, scope) as `prior_run_baseline` and the LLM grades the new submission with iteration context (per-principle severity deltas surface in the response). Changing the `repository` string between calls — even subtly with an `iter-2` suffix — silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in `task` or commit messages, never in `repository`. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. VERIFICATION LAYERS (the two-layer doctrine this platform practices on itself): validate verifies DOCTRINE ALIGNMENT against the 10-principle Blueprint — design patterns, hand-off explicitness, operational-state inspectability, race/blocker handling at the architectural level. validate does NOT guarantee runtime correctness. cert verifies PAYLOAD COMPLETENESS and runs an adversarial second pass over the submitted code — catches production_blockers the first pass missed, name-errors on import, missing module surfaces, etc. cert does NOT verify runtime correctness either. Passing validate is a NECESSARY condition for production_ready, not a sufficient one. Runtime correctness (does this actually execute and behave?) is verified at the THIRD layer — your tests, types, walks. The platform's own recursive-integrity practice: every PR runs validate against its own primitives, then cert. Real bugs surfaced via this practice in PR #157 — NULL-UUID false-positive (iter3) and tie-breaker mismatch (iter5) — that 25 unit tests had missed. Two-layer verification is the discipline, not 'either/or'. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated). See Payload Completeness above for the common pre-cert pitfall.
    Connector
  • Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical); MCP clients commonly close the call before the server returns. Retrying re-runs the 60-180s LLM call from scratch and burns compute. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s (before the LLM call begins) — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted result; the server-side run completes independently within a 20-minute budget. Edge case: if the transport dropped before the first progress notification (very rare; sub-second window), call `me.validation_history(repository='<same value you passed here>')` to find your most recent run. TASK-AUGMENTED INVOCATION (MCP 2025-11-25, SEP-1686): clients that advertise the `tasks` capability can task-augment this call by including `task: {ttl: <ms>}` inside the JSON-RPC request's `params` (NOT as a tool argument; alongside `arguments`, `_meta`, etc.). The server returns a `CreateTaskResult` immediately (taskId equals the run_id above) and runs the validation in the background. Spec-correct long-running pattern: poll via `tasks/get` for state, fetch the terminal payload via `tasks/result`, listen for `notifications/tasks/status` for push updates, and cancel via `tasks/cancel`. `_meta.progressToken` from the original request stays valid for the entire task lifetime. Sync (non-augmented) calls behave exactly as before, backwards-compatible by construction. The me.validation_history(run_id=...) recovery path remains the canonical recovery handle for clients that don't yet advertise the tasks capability. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer <token>, Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as `implementation_context` (NO truncation, NO `...` placeholders, NO comment removal — the architect treats your `...` as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="<name>" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. PAYLOAD COMPLETENESS (load-bearing if you intend to architect.certify this run): the validate first-pass is permissive — it scores on doctrine alignment + structural patterns visible in the submitted code. Cert's adversarial second-pass is rigorous — it scores on cert-payload-completeness as well as code correctness. A run that scores 100/A at validate can cert-reject pre-LLM with `payload_incomplete` when imported modules' surfaces aren't visible. To validate with INTENT TO CERT, also bundle verbatim public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods the code references; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — the module would NameError on import as submitted). 'Submit Like Production': the payload should be the code as it would actually run. TWO COMPLETENESS AXES. (1) IMPORTS: stub the public surface of every dependency (above). (2) ENFORCEMENT BRANCHES: the code under cert itself (approval gates, policy checks, recovery paths) must be the REAL logic, fully written. A placeholder body (`# ... execute approved action ...`, `pass # TODO`, a bare `...`) is graded as a MISSING control, not shorthand; cert scores what would actually run. Never sketch the agent you are certifying. Empirically reconfirmed PR #157 iter8 → iter9 cert downgrades. SCORE VARIANCE DISCLOSURE (anomaly #10 — empirically documented): validate scores are POINT ESTIMATES with an observed empirical variance band of ~20-67 pts on BYTE-IDENTICAL input. Runs against the same repository, same code, same deterministic seed (the seed is derived from input — same input → same seed) can produce materially different scores AND different top-blocker rankings, because OpenAI's reasoning models at reasoning_effort=high are not strictly deterministic even with the seed parameter pinned. The `reproducibility_mode='best_effort'` field on every response is the platform's honest disclosure of this property. For decisions where stability matters more than speed, call `architect.validate_consensus` (N=3-5 aggregated, median verdict + per-principle stability metrics) instead — collapses the variance, surfaces unstable principles explicitly. A single validate run is a single roll; consensus is the right tool when one score isn't enough. ITERATION LOOP — repository keying. Pass the SAME `repository` value across calls to chain iteration rounds; the validator auto-resolves the most recent prior run on (user, repository, scope) as `prior_run_baseline` and the LLM grades the new submission with iteration context (per-principle severity deltas surface in the response). Changing the `repository` string between calls — even subtly with an `iter-2` suffix — silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in `task` or commit messages, never in `repository`. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. VERIFICATION LAYERS (the two-layer doctrine this platform practices on itself): validate verifies DOCTRINE ALIGNMENT against the 10-principle Blueprint — design patterns, hand-off explicitness, operational-state inspectability, race/blocker handling at the architectural level. validate does NOT guarantee runtime correctness. cert verifies PAYLOAD COMPLETENESS and runs an adversarial second pass over the submitted code — catches production_blockers the first pass missed, name-errors on import, missing module surfaces, etc. cert does NOT verify runtime correctness either. Passing validate is a NECESSARY condition for production_ready, not a sufficient one. Runtime correctness (does this actually execute and behave?) is verified at the THIRD layer — your tests, types, walks. The platform's own recursive-integrity practice: every PR runs validate against its own primitives, then cert. Real bugs surfaced via this practice in PR #157 — NULL-UUID false-positive (iter3) and tie-breaker mismatch (iter5) — that 25 unit tests had missed. Two-layer verification is the discipline, not 'either/or'. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated). See Payload Completeness above for the common pre-cert pitfall.
    Connector
  • Long-poll: blocks until the next edit lands on this board, then returns. WHEN TO CALL THIS: if your MCP client does NOT surface `notifications/resources/updated` events from `resources/subscribe` back to the model (most chat clients do not — they receive the SSE event but don't inject it into your context), this tool is how you 'wait for the human' inside a single turn. Typical flow: you draw / write what you were asked to, then instead of ending your turn you call `wait_for_update(board_id)`. When the human adds, moves, or erases something, the call returns and you refresh with `get_preview` / `get_board` and continue the collaboration. Great for turn-based interactions (games like tic-tac-toe, brainstorming where you respond to each sticky the user drops, sketch-and-feedback loops, etc.). If your client DOES deliver resource notifications natively, prefer `resources/subscribe` — it's cheaper and has no timeout ceiling. BEHAVIOUR: resolves ~3 s after the edit burst settles (same debounce as the push notifications — this is intentional so drags and long strokes collapse into one wake-up). Returns `{ updated: true, timedOut: false }` on a real edit, or `{ updated: false, timedOut: true }` if nothing happened within `timeout_ms`. On timeout, just call it again to keep waiting; chaining calls is cheap. `timeout_ms` is clamped to [1000, 55000]; default 25000 (leaves headroom under typical 60 s proxy timeouts).
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Screens public GitHub repos and PRs to generate risk maps, findings, and merge-readiness signals.

  • GitHub MCP — wraps the GitHub public REST API (no auth required for public endpoints)

  • Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical); MCP clients commonly close the call before the server returns. Retrying re-runs the 60-180s LLM call from scratch and burns compute. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s (before the LLM call begins) — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted result; the server-side run completes independently within a 20-minute budget. Edge case: if the transport dropped before the first progress notification (very rare; sub-second window), call `me.validation_history(repository='<same value you passed here>')` to find your most recent run. TASK-AUGMENTED INVOCATION (MCP 2025-11-25, SEP-1686): clients that advertise the `tasks` capability can task-augment this call by including `task: {ttl: <ms>}` inside the JSON-RPC request's `params` (NOT as a tool argument; alongside `arguments`, `_meta`, etc.). The server returns a `CreateTaskResult` immediately (taskId equals the run_id above) and runs the validation in the background. Spec-correct long-running pattern: poll via `tasks/get` for state, fetch the terminal payload via `tasks/result`, listen for `notifications/tasks/status` for push updates, and cancel via `tasks/cancel`. `_meta.progressToken` from the original request stays valid for the entire task lifetime. Sync (non-augmented) calls behave exactly as before, backwards-compatible by construction. The me.validation_history(run_id=...) recovery path remains the canonical recovery handle for clients that don't yet advertise the tasks capability. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer <token>, Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as `implementation_context` (NO truncation, NO `...` placeholders, NO comment removal — the architect treats your `...` as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="<name>" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. PAYLOAD COMPLETENESS (load-bearing if you intend to architect.certify this run): the validate first-pass is permissive — it scores on doctrine alignment + structural patterns visible in the submitted code. Cert's adversarial second-pass is rigorous — it scores on cert-payload-completeness as well as code correctness. A run that scores 100/A at validate can cert-reject pre-LLM with `payload_incomplete` when imported modules' surfaces aren't visible. To validate with INTENT TO CERT, also bundle verbatim public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods the code references; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — the module would NameError on import as submitted). 'Submit Like Production': the payload should be the code as it would actually run. TWO COMPLETENESS AXES. (1) IMPORTS: stub the public surface of every dependency (above). (2) ENFORCEMENT BRANCHES: the code under cert itself (approval gates, policy checks, recovery paths) must be the REAL logic, fully written. A placeholder body (`# ... execute approved action ...`, `pass # TODO`, a bare `...`) is graded as a MISSING control, not shorthand; cert scores what would actually run. Never sketch the agent you are certifying. Empirically reconfirmed PR #157 iter8 → iter9 cert downgrades. SCORE VARIANCE DISCLOSURE (anomaly #10 — empirically documented): validate scores are POINT ESTIMATES with an observed empirical variance band of ~20-67 pts on BYTE-IDENTICAL input. Runs against the same repository, same code, same deterministic seed (the seed is derived from input — same input → same seed) can produce materially different scores AND different top-blocker rankings, because OpenAI's reasoning models at reasoning_effort=high are not strictly deterministic even with the seed parameter pinned. The `reproducibility_mode='best_effort'` field on every response is the platform's honest disclosure of this property. For decisions where stability matters more than speed, call `architect.validate_consensus` (N=3-5 aggregated, median verdict + per-principle stability metrics) instead — collapses the variance, surfaces unstable principles explicitly. A single validate run is a single roll; consensus is the right tool when one score isn't enough. ITERATION LOOP — repository keying. Pass the SAME `repository` value across calls to chain iteration rounds; the validator auto-resolves the most recent prior run on (user, repository, scope) as `prior_run_baseline` and the LLM grades the new submission with iteration context (per-principle severity deltas surface in the response). Changing the `repository` string between calls — even subtly with an `iter-2` suffix — silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in `task` or commit messages, never in `repository`. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. VERIFICATION LAYERS (the two-layer doctrine this platform practices on itself): validate verifies DOCTRINE ALIGNMENT against the 10-principle Blueprint — design patterns, hand-off explicitness, operational-state inspectability, race/blocker handling at the architectural level. validate does NOT guarantee runtime correctness. cert verifies PAYLOAD COMPLETENESS and runs an adversarial second pass over the submitted code — catches production_blockers the first pass missed, name-errors on import, missing module surfaces, etc. cert does NOT verify runtime correctness either. Passing validate is a NECESSARY condition for production_ready, not a sufficient one. Runtime correctness (does this actually execute and behave?) is verified at the THIRD layer — your tests, types, walks. The platform's own recursive-integrity practice: every PR runs validate against its own primitives, then cert. Real bugs surfaced via this practice in PR #157 — NULL-UUID false-positive (iter3) and tie-breaker mismatch (iter5) — that 25 unit tests had missed. Two-layer verification is the discipline, not 'either/or'. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated). See Payload Completeness above for the common pre-cert pitfall.
    Connector
  • Use this when you need to author a variable-section sweep along a spine. Insert a `variableSweep(spine, sections, opts?)` declaration into the user's .kcad.ts immediately before the last top-level return. The result is a Shape — chain `.translate(...)`, `.union(...)`, etc. via `add_feature`. `spine_binding` references an existing variable (Curve3D / Sketch / Vec3[]) in the source; each `sections[i].profile_binding` references an existing Sketch. Sections must be strictly increasing in `t` and span [0, 1]; first t=0, last t=1. Orientation is not exposed by this MCP tool until runtime orientation support is wired. Validates every binding exists in the source via regex before inserting (fast structured error vs capture-time stack). Returns the modified code + diagnostics. Side-effect-free.
    Connector
  • Use this when you need a canonical pattern snippet for a CAD task. Search the kernelCAD cookbook for canonical pattern snippets. Returns top-k snippets matching the natural-language query, ranked by BM25 over title/tags/keywords/trigger. Use when you need a canonical pattern for fillet-after-subtract, non-overlapping booleans, sketch-to-extrude flows, etc. Returns empty if no snippet scores above the relevance floor — proceed without cookbook help in that case.
    Connector
  • Use this when you need to read facts about a model. One reader, selected by `of`: - 'assembly' — physical assembly inventory (parts, bboxes, connectors, mates, disconnected solids). - 'robot' — URDF/SDFormat export preview (links, joints, planning groups, end-effectors, issues). - 'step' — inspect an imported STEP file. - 'shape' — volume / surfaceArea / bbox for one feature ({ feature_id? }). - 'features' — features captured by the script (kind, id, params, transforms, suppression). - 'assemblies' — assembly intent (assemblies, parts, connectors, joints). - 'topology' — canonical face names + edge count for a feature ({ feature_id? }). - 'edges' — edges of a shape with optional EdgeQuery ({ feature_id?, query? }); returns @kc[...] refs. - 'face-edges' — boundary edges of a named canonical face ({ feature_id?, face_name }). - 'faces' — faces of a shape with optional FaceQuery ({ feature_id?, query? }); returns @kc[...] refs. - 'face-labels' — user-applied labels visible in the script. - 'mates' — mates captured by the script. - 'constraints' — sketch constraints captured by the script. - 'part-stats' — bundled parts-catalog statistics. - 'bend-table' — sheet-metal bend table for a flattened pattern. - 'params' — declared model parameters. - 'part-categories' — top-level part-catalog categories available in the bundled (and configured remote) catalog. - 'part-families' — part families within a category ({ category? }); count + exemplar ids per family. All params except `of` are subject-specific and forwarded verbatim. Most subjects accept { file | code }.
    Connector
  • Use this when you need to list the kernelCAD script-runtime surface: global functions (box, path, selectEdges, helix, etc), Shape methods (fillet, sweep, lower, etc), Sketch methods (extrude, revolve, sweep), PathBuilder methods, EdgeQuery/FaceQuery key sets, and featureKindFaceLabels (which globals accept opts.faceLabels and valid value shapes). Use this to discover what is callable from a .kcad.ts script.
    Connector
  • Convert an image (whiteboard photo, screenshot, hand-drawn sketch) into a clean diagram. Use this tool when the user provides an image URL or base64-encoded image and wants it converted to a proper software engineering diagram. Accepts public image URLs or base64 data URIs (data:image/...;base64,...). Returns a link to view and edit the generated diagram in the browser.
    Connector
  • Use this when you need the unfolded flat pattern of a bent sheet-metal part. Return the unfolded 2D flat-pattern of a bent sheet-metal Shape as a Region (outer polyline + holes + bend lines + sketch plane). Slice 1: at most 2 bends. Pass { file } or { code }; optional { featureId } to pick a specific Shape.
    Connector
  • Use this when you need to solve a 2D sketch constraint set. Solve a 2D sketch constraint set. Side-effect-free: pass { entities, constraints } and receive solved entities plus the original constraints. Entities are POINT, LINE, and CIRCLE records; constraints use the kernelCAD constraint vocabulary.
    Connector
  • Use this when you need to author text into a kernelCAD script before the last top-level return. One authoring path, selected by `mode`: - 'sketch' — insert a sketch.text(...) call. The emitted sketch is chainable: pair with subsequent .extrude(...) / cut(...) edits to land an engraved or raised text feature. - 'emboss' — insert a `<shape>.embossText({...})` chained call onto an existing Shape `target`. Use for engraved brand text on faces (Ray-Ban temple, CE mark, model number). `depth > 0` raises text out of the face; `depth < 0` engraves text into the face. Lowers via replicad drawText → sketchOnFace → extrude → fuse|cut. Default font is the runtime-bundled Liberation Sans. Side-effect-free; returns the modified code plus diagnostics from re-evaluating. Each mode fails closed on its own missing required params.
    Connector
  • Use this when you need to add a sketch constraint to a list. Append one validated sketch constraint to a constraint list. Side-effect-free: pass { constraints, constraint } and receive the updated list.
    Connector