223,767 tools. Last updated 2026-06-22 03:48

"Orchestration" matching MCP tools:

create_campaign
xmagnet
Create a DRAFT email campaign via a programmatic wizard. Call this tool and it will guide through the steps — no manual orchestration needed. WIZARD STEPS (handled automatically by the tool): 1. Call with contacts + total_contacts → tool returns engine picker (NextGen vs MyConvo) 2. Add campaign_type from user's click → tool returns campaign category chips (promotional, newsletter, event…) 3. Add campaign_category from user's click → tool returns engine-specific template gallery MyConvo: shows plain_email_templates (personal plain-text). NextGen: shows campaign_templates (HTML). 4. Add template_id from user's pick → tool creates the draft campaign. RULES: Reuse contacts from prior search — never re-search. Pass total_contacts from search result's total_in_crm so the user always sees the full count. Saves as DRAFT only — no emails sent.
Connector
architect.validate
AI Design Blueprint Doctrine
Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical); MCP clients commonly close the call before the server returns. Retrying re-runs the 60-180s LLM call from scratch and burns compute. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s (before the LLM call begins) — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted result; the server-side run completes independently within a 20-minute budget. Edge case: if the transport dropped before the first progress notification (very rare; sub-second window), call `me.validation_history(repository='<same value you passed here>')` to find your most recent run. TASK-AUGMENTED INVOCATION (MCP 2025-11-25, SEP-1686): clients that advertise the `tasks` capability can task-augment this call by including `task: {ttl: <ms>}` inside the JSON-RPC request's `params` (NOT as a tool argument; alongside `arguments`, `_meta`, etc.). The server returns a `CreateTaskResult` immediately (taskId equals the run_id above) and runs the validation in the background. Spec-correct long-running pattern: poll via `tasks/get` for state, fetch the terminal payload via `tasks/result`, listen for `notifications/tasks/status` for push updates, and cancel via `tasks/cancel`. `_meta.progressToken` from the original request stays valid for the entire task lifetime. Sync (non-augmented) calls behave exactly as before, backwards-compatible by construction. The me.validation_history(run_id=...) recovery path remains the canonical recovery handle for clients that don't yet advertise the tasks capability. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer <token>, Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as `implementation_context` (NO truncation, NO `...` placeholders, NO comment removal — the architect treats your `...` as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="<name>" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. PAYLOAD COMPLETENESS (load-bearing if you intend to architect.certify this run): the validate first-pass is permissive — it scores on doctrine alignment + structural patterns visible in the submitted code. Cert's adversarial second-pass is rigorous — it scores on cert-payload-completeness as well as code correctness. A run that scores 100/A at validate can cert-reject pre-LLM with `payload_incomplete` when imported modules' surfaces aren't visible. To validate with INTENT TO CERT, also bundle verbatim public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods the code references; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — the module would NameError on import as submitted). 'Submit Like Production': the payload should be the code as it would actually run. TWO COMPLETENESS AXES. (1) IMPORTS: stub the public surface of every dependency (above). (2) ENFORCEMENT BRANCHES: the code under cert itself (approval gates, policy checks, recovery paths) must be the REAL logic, fully written. A placeholder body (`# ... execute approved action ...`, `pass # TODO`, a bare `...`) is graded as a MISSING control, not shorthand; cert scores what would actually run. Never sketch the agent you are certifying. Empirically reconfirmed PR #157 iter8 → iter9 cert downgrades. SCORE VARIANCE DISCLOSURE (anomaly #10 — empirically documented): validate scores are POINT ESTIMATES with an observed empirical variance band of ~20-67 pts on BYTE-IDENTICAL input. Runs against the same repository, same code, same deterministic seed (the seed is derived from input — same input → same seed) can produce materially different scores AND different top-blocker rankings, because OpenAI's reasoning models at reasoning_effort=high are not strictly deterministic even with the seed parameter pinned. The `reproducibility_mode='best_effort'` field on every response is the platform's honest disclosure of this property. For decisions where stability matters more than speed, call `architect.validate_consensus` (N=3-5 aggregated, median verdict + per-principle stability metrics) instead — collapses the variance, surfaces unstable principles explicitly. A single validate run is a single roll; consensus is the right tool when one score isn't enough. ITERATION LOOP — repository keying. Pass the SAME `repository` value across calls to chain iteration rounds; the validator auto-resolves the most recent prior run on (user, repository, scope) as `prior_run_baseline` and the LLM grades the new submission with iteration context (per-principle severity deltas surface in the response). Changing the `repository` string between calls — even subtly with an `iter-2` suffix — silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in `task` or commit messages, never in `repository`. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. VERIFICATION LAYERS (the two-layer doctrine this platform practices on itself): validate verifies DOCTRINE ALIGNMENT against the 10-principle Blueprint — design patterns, hand-off explicitness, operational-state inspectability, race/blocker handling at the architectural level. validate does NOT guarantee runtime correctness. cert verifies PAYLOAD COMPLETENESS and runs an adversarial second pass over the submitted code — catches production_blockers the first pass missed, name-errors on import, missing module surfaces, etc. cert does NOT verify runtime correctness either. Passing validate is a NECESSARY condition for production_ready, not a sufficient one. Runtime correctness (does this actually execute and behave?) is verified at the THIRD layer — your tests, types, walks. The platform's own recursive-integrity practice: every PR runs validate against its own primitives, then cert. Real bugs surfaced via this practice in PR #157 — NULL-UUID false-positive (iter3) and tie-breaker mismatch (iter5) — that 25 unit tests had missed. Two-layer verification is the discipline, not 'either/or'. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated). See Payload Completeness above for the common pre-cert pitfall.
Connector
architect.validate
AI Design Blueprint Doctrine
Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical); MCP clients commonly close the call before the server returns. Retrying re-runs the 60-180s LLM call from scratch and burns compute. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s (before the LLM call begins) — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted result; the server-side run completes independently within a 20-minute budget. Edge case: if the transport dropped before the first progress notification (very rare; sub-second window), call `me.validation_history(repository='<same value you passed here>')` to find your most recent run. TASK-AUGMENTED INVOCATION (MCP 2025-11-25, SEP-1686): clients that advertise the `tasks` capability can task-augment this call by including `task: {ttl: <ms>}` inside the JSON-RPC request's `params` (NOT as a tool argument; alongside `arguments`, `_meta`, etc.). The server returns a `CreateTaskResult` immediately (taskId equals the run_id above) and runs the validation in the background. Spec-correct long-running pattern: poll via `tasks/get` for state, fetch the terminal payload via `tasks/result`, listen for `notifications/tasks/status` for push updates, and cancel via `tasks/cancel`. `_meta.progressToken` from the original request stays valid for the entire task lifetime. Sync (non-augmented) calls behave exactly as before, backwards-compatible by construction. The me.validation_history(run_id=...) recovery path remains the canonical recovery handle for clients that don't yet advertise the tasks capability. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer <token>, Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as `implementation_context` (NO truncation, NO `...` placeholders, NO comment removal — the architect treats your `...` as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="<name>" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. PAYLOAD COMPLETENESS (load-bearing if you intend to architect.certify this run): the validate first-pass is permissive — it scores on doctrine alignment + structural patterns visible in the submitted code. Cert's adversarial second-pass is rigorous — it scores on cert-payload-completeness as well as code correctness. A run that scores 100/A at validate can cert-reject pre-LLM with `payload_incomplete` when imported modules' surfaces aren't visible. To validate with INTENT TO CERT, also bundle verbatim public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods the code references; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — the module would NameError on import as submitted). 'Submit Like Production': the payload should be the code as it would actually run. TWO COMPLETENESS AXES. (1) IMPORTS: stub the public surface of every dependency (above). (2) ENFORCEMENT BRANCHES: the code under cert itself (approval gates, policy checks, recovery paths) must be the REAL logic, fully written. A placeholder body (`# ... execute approved action ...`, `pass # TODO`, a bare `...`) is graded as a MISSING control, not shorthand; cert scores what would actually run. Never sketch the agent you are certifying. Empirically reconfirmed PR #157 iter8 → iter9 cert downgrades. SCORE VARIANCE DISCLOSURE (anomaly #10 — empirically documented): validate scores are POINT ESTIMATES with an observed empirical variance band of ~20-67 pts on BYTE-IDENTICAL input. Runs against the same repository, same code, same deterministic seed (the seed is derived from input — same input → same seed) can produce materially different scores AND different top-blocker rankings, because OpenAI's reasoning models at reasoning_effort=high are not strictly deterministic even with the seed parameter pinned. The `reproducibility_mode='best_effort'` field on every response is the platform's honest disclosure of this property. For decisions where stability matters more than speed, call `architect.validate_consensus` (N=3-5 aggregated, median verdict + per-principle stability metrics) instead — collapses the variance, surfaces unstable principles explicitly. A single validate run is a single roll; consensus is the right tool when one score isn't enough. ITERATION LOOP — repository keying. Pass the SAME `repository` value across calls to chain iteration rounds; the validator auto-resolves the most recent prior run on (user, repository, scope) as `prior_run_baseline` and the LLM grades the new submission with iteration context (per-principle severity deltas surface in the response). Changing the `repository` string between calls — even subtly with an `iter-2` suffix — silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in `task` or commit messages, never in `repository`. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. VERIFICATION LAYERS (the two-layer doctrine this platform practices on itself): validate verifies DOCTRINE ALIGNMENT against the 10-principle Blueprint — design patterns, hand-off explicitness, operational-state inspectability, race/blocker handling at the architectural level. validate does NOT guarantee runtime correctness. cert verifies PAYLOAD COMPLETENESS and runs an adversarial second pass over the submitted code — catches production_blockers the first pass missed, name-errors on import, missing module surfaces, etc. cert does NOT verify runtime correctness either. Passing validate is a NECESSARY condition for production_ready, not a sufficient one. Runtime correctness (does this actually execute and behave?) is verified at the THIRD layer — your tests, types, walks. The platform's own recursive-integrity practice: every PR runs validate against its own primitives, then cert. Real bugs surfaced via this practice in PR #157 — NULL-UUID false-positive (iter3) and tie-breaker mismatch (iter5) — that 25 unit tests had missed. Two-layer verification is the discipline, not 'either/or'. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated). See Payload Completeness above for the common pre-cert pitfall.
Connector
architect.validate_consensus
AI Design Blueprint Doctrine
Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
Connector
architect.validate_consensus
AI Design Blueprint Doctrine
Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
Connector
clusters.get
AI Design Blueprint
Get one principle cluster by stable slug. Returns the cluster definition, shared rationale, and the full set of member principles (slug + title) so the caller can pivot into principles.get without a second list call. WHEN TO CALL: the user has already named a specific cluster (e.g. 'delegation', 'visibility', 'trust', 'orchestration') OR you have a slug from a prior clusters.list / principles.list response and need its full definition + member principles. The response embeds member principle slugs + titles already, so DO NOT loop principles.get over each member to get a cluster overview — read the response. WHEN NOT TO CALL: the user is describing a topic, failure mode, or keyword in natural language (call principles.search instead); the user wants to discover which clusters exist (call clusters.list); the user wants the definition of one specific principle (call principles.get directly). Idempotent + cacheable per slug. Returns 404-shaped error_payload on unknown slug — the slug must match exactly the value emitted by clusters.list, with no normalization.
Connector

Matching MCP Servers

Task Orchestration
App Automation Project Management
coderexpert123
A
license
B
quality
C
maintenance
Task Orchestration
Last updated 2026-04-24
5
9
6
MIT
Orchestration MCP
Agent Orchestration Autonomous Agents Coding Agents
dufangshi
F
license
A
quality
D
maintenance
A TypeScript MCP server for launching, tracking, and managing external coding-agent runs across local and remote backends like Codex and Claude Code. It allows top-level agents to orchestrate subagents through tools for spawning tasks, polling events, and handling interactive sessions.
Last updated 2026-03-20
7
2

Matching MCP Connectors

llm-observability-orchestration
Run a prompt through a LangChain (system + human) chain over Gemini on Vertex AI; optional LangSmith
llm-orchestration-agent
Run a prompt through a LangChain (system + human) chain over Gemini on Vertex AI; optional LangSmith

architect.validate
AI Design Blueprint
Pro/Teams — first-pass doctrine review of agentic code/workflow against the 10-principle Agentic AI Blueprint. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical); MCP clients commonly close the call before the server returns. Retrying re-runs the 60-180s LLM call from scratch and burns compute. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s (before the LLM call begins) — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted result; the server-side run completes independently within a 20-minute budget. Edge case: if the transport dropped before the first progress notification (very rare; sub-second window), call `me.validation_history(repository='<same value you passed here>')` to find your most recent run. TASK-AUGMENTED INVOCATION (MCP 2025-11-25, SEP-1686): clients that advertise the `tasks` capability can task-augment this call by including `task: {ttl: <ms>}` inside the JSON-RPC request's `params` (NOT as a tool argument; alongside `arguments`, `_meta`, etc.). The server returns a `CreateTaskResult` immediately (taskId equals the run_id above) and runs the validation in the background. Spec-correct long-running pattern: poll via `tasks/get` for state, fetch the terminal payload via `tasks/result`, listen for `notifications/tasks/status` for push updates, and cancel via `tasks/cancel`. `_meta.progressToken` from the original request stays valid for the entire task lifetime. Sync (non-augmented) calls behave exactly as before, backwards-compatible by construction. The me.validation_history(run_id=...) recovery path remains the canonical recovery handle for clients that don't yet advertise the tasks capability. Returns code_classification (autonomous_agentic_workflow vs non_agentic_component), per-principle findings (verdict, severity_score 0-100, severity_class, code-cited evidence, recommendation), severity-weighted readiness (score|null, grade|null, tier ∈ {production_ready, emerging, draft, not_applicable}), recommended examples, reproducibility envelope (model, seed, doctrine_fingerprint, prompt_template_fingerprint), persistence_status with shareable run_id/badge_url/review_url. WHEN TO CALL: the user wants a governance audit, readiness score, or production_ready badge on an agent/workflow they just built or changed. WHEN NOT TO CALL: non-agentic plumbing (math utilities, type aliases, event-loop helpers, single-shot request/response handlers) returns tier=not_applicable with score=null/grade=null — that's not a failure, the doctrine simply doesn't grade non-agentic code, and architect.certify will refuse with not_agentic_component. Submit the OWNING agentic workflow instead. BEHAVIOR: long-running LLM call (~60-180s typical at high reasoning effort, single-pass; server-side budget 20 min). Mints run_id at t=0; first notifications/progress event carries run_id as recovery handle; keepalive every 30s. Persists ValidationRun + UserValidationRun + AIValidationRunLog + LLMUsageLog atomically; on rollback, badge/review URLs are stripped. Auth: Bearer <token>, Pro/Teams plan. UK/EU residency; transient OpenAI processing (no-training); prompt-injection in code is inert. INPUTS: send FULL file contents verbatim as `implementation_context` (NO truncation, NO `...` placeholders, NO comment removal — the architect treats your `...` as literal code and hallucinates bugs that don't exist). If too large, split into MULTIPLE calls scoped by file/module; never truncate one call. Pass repository="<name>" to group runs into a project trend. Pass private_session=true to bypass server-side logging (persistence + recovery disabled). focus_area narrows scope; unmatched focus_area fails explicitly rather than silently widening. PAYLOAD COMPLETENESS (load-bearing if you intend to architect.certify this run): the validate first-pass is permissive — it scores on doctrine alignment + structural patterns visible in the submitted code. Cert's adversarial second-pass is rigorous — it scores on cert-payload-completeness as well as code correctness. A run that scores 100/A at validate can cert-reject pre-LLM with `payload_incomplete` when imported modules' surfaces aren't visible. To validate with INTENT TO CERT, also bundle verbatim public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods the code references; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — the module would NameError on import as submitted). 'Submit Like Production': the payload should be the code as it would actually run. TWO COMPLETENESS AXES. (1) IMPORTS: stub the public surface of every dependency (above). (2) ENFORCEMENT BRANCHES: the code under cert itself (approval gates, policy checks, recovery paths) must be the REAL logic, fully written. A placeholder body (`# ... execute approved action ...`, `pass # TODO`, a bare `...`) is graded as a MISSING control, not shorthand; cert scores what would actually run. Never sketch the agent you are certifying. Empirically reconfirmed PR #157 iter8 → iter9 cert downgrades. SCORE VARIANCE DISCLOSURE (anomaly #10 — empirically documented): validate scores are POINT ESTIMATES with an observed empirical variance band of ~20-67 pts on BYTE-IDENTICAL input. Runs against the same repository, same code, same deterministic seed (the seed is derived from input — same input → same seed) can produce materially different scores AND different top-blocker rankings, because OpenAI's reasoning models at reasoning_effort=high are not strictly deterministic even with the seed parameter pinned. The `reproducibility_mode='best_effort'` field on every response is the platform's honest disclosure of this property. For decisions where stability matters more than speed, call `architect.validate_consensus` (N=3-5 aggregated, median verdict + per-principle stability metrics) instead — collapses the variance, surfaces unstable principles explicitly. A single validate run is a single roll; consensus is the right tool when one score isn't enough. ITERATION LOOP — repository keying. Pass the SAME `repository` value across calls to chain iteration rounds; the validator auto-resolves the most recent prior run on (user, repository, scope) as `prior_run_baseline` and the LLM grades the new submission with iteration context (per-principle severity deltas surface in the response). Changing the `repository` string between calls — even subtly with an `iter-2` suffix — silently severs the chain and yields a fresh blind first-shot. Round numbering belongs in `task` or commit messages, never in `repository`. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. VERIFICATION LAYERS (the two-layer doctrine this platform practices on itself): validate verifies DOCTRINE ALIGNMENT against the 10-principle Blueprint — design patterns, hand-off explicitness, operational-state inspectability, race/blocker handling at the architectural level. validate does NOT guarantee runtime correctness. cert verifies PAYLOAD COMPLETENESS and runs an adversarial second pass over the submitted code — catches production_blockers the first pass missed, name-errors on import, missing module surfaces, etc. cert does NOT verify runtime correctness either. Passing validate is a NECESSARY condition for production_ready, not a sufficient one. Runtime correctness (does this actually execute and behave?) is verified at the THIRD layer — your tests, types, walks. The platform's own recursive-integrity practice: every PR runs validate against its own primitives, then cert. Real bugs surfaced via this practice in PR #157 — NULL-UUID false-positive (iter3) and tie-breaker mismatch (iter5) — that 25 unit tests had missed. Two-layer verification is the discipline, not 'either/or'. TYPED FAILURES: timed_out, rate_limited, dependency_unavailable, schema_mismatch (each carries retryable + next_action). NEXT STEP: if tier=production_ready (A or B grade), the response carries certification_status='not_evaluated' — call architect.certify(run_id, code) to mint the certified production_ready badge (separate ~60-150s adversarial review, eligibility-gated). See Payload Completeness above for the common pre-cert pitfall.
Connector
architect.validate_consensus
AI Design Blueprint
Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
Connector
clusters.get
AI Design Blueprint Doctrine
Get one principle cluster by stable slug. Returns the cluster definition, shared rationale, and the full set of member principles (slug + title) so the caller can pivot into principles.get without a second list call. WHEN TO CALL: the user has already named a specific cluster (e.g. 'delegation', 'visibility', 'trust', 'orchestration') OR you have a slug from a prior clusters.list / principles.list response and need its full definition + member principles. The response embeds member principle slugs + titles already, so DO NOT loop principles.get over each member to get a cluster overview — read the response. WHEN NOT TO CALL: the user is describing a topic, failure mode, or keyword in natural language (call principles.search instead); the user wants to discover which clusters exist (call clusters.list); the user wants the definition of one specific principle (call principles.get directly). Idempotent + cacheable per slug. Returns 404-shaped error_payload on unknown slug — the slug must match exactly the value emitted by clusters.list, with no normalization.
Connector
get_xrc729_authority
XDaLa Workflow Builder
Read-only authority lookup for one XRC-729 orchestration contract. Reads owner()/getOwner() and getExecutorList(), detects zero-address executor wildcard, and never signs, submits, executes, or creates a handoff.
Connector
housing_market_screen
Housing Intel
Rank US metros for rental cash flow in ONE call — the "which markets are best for a landlord" view. Returns metros sorted by gross rent yield = (Zillow median monthly rent × 12) ÷ Zillow typical home value. No per-metro orchestration and no API key (Zillow data is Pipeworx-hosted). Use for "best/worst rental markets", "highest-yield metros", "where does rent go furthest vs. home prices". Tune with direction (top = highest yield / best cash flow, bottom = lowest), limit, and optional home-value bounds.
Connector
compare_books_to_hellobooks
HelloBooks AI Agents
Take a QBO or Xero journal-entry CSV (source auto-detected), run the full Tier-0 detection set (imbalance + duplicates + round-number + schema), and return a structured side-by-side comparison — "your books have X issues; here is how HelloBooks resolves each phase". This is the direct funnel tool: the response includes per-category counts mapped to HelloBooks Phases 1, 2, 3.0, 3.1, with exclusive-advantage bullets (command-center dashboard, conversational interface, one-prompt JE posting, cross-phase orchestration, auto ID resolution). Use this when a user is evaluating HelloBooks vs their current QBO/Xero, asks "should I migrate?", or pastes data while comparing accounting software. Output is suitable for the host LLM to narrate as a positioning argument; the share URL points at a branded landing page with the issue breakdown and a 1-click migrate CTA.
Connector
epub_to_audiobook
Sats4AI - Bitcoin-Powered AI Tools
Convert books (EPUB/PDF/TXT) to full audiobooks with automatic chapter detection, multi-voice narration, and optional translation to any language before narration. 3 voice tiers: OmniVoice Global (602+ langs, 100 chars/sat), Inworld Premium (#1 ranked TTS ELO 1217, 50 chars/sat), Minimax Studio (voice cloning from reference clip, 10 chars/sat). Min 500 sats. Async — returns jobId, poll until completed (5-60+ min). Single payment, full outcome — no multi-step orchestration required. Pay with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='epub_to_audiobook'.
Connector
clusters.get
AI Design Blueprint Doctrine
Get one principle cluster by stable slug. Returns the cluster definition, shared rationale, and the full set of member principles (slug + title) so the caller can pivot into principles.get without a second list call. WHEN TO CALL: the user has already named a specific cluster (e.g. 'delegation', 'visibility', 'trust', 'orchestration') OR you have a slug from a prior clusters.list / principles.list response and need its full definition + member principles. The response embeds member principle slugs + titles already, so DO NOT loop principles.get over each member to get a cluster overview — read the response. WHEN NOT TO CALL: the user is describing a topic, failure mode, or keyword in natural language (call principles.search instead); the user wants to discover which clusters exist (call clusters.list); the user wants the definition of one specific principle (call principles.get directly). Idempotent + cacheable per slug. Returns 404-shaped error_payload on unknown slug — the slug must match exactly the value emitted by clusters.list, with no normalization.
Connector
list_agents
SendIt
List all available AI agents and their capabilities. SendIt includes 12 specialized agents: • Strategy Planner - Content strategy from audience/trend analysis • Content Ideation - Topic ideas from trends and calendar gaps • Multi-Format Composer - Platform-optimized content from a brief • Creative Asset - AI image/video generation orchestration • Variant Repurposer - Repurpose content for different platforms • Calendar Optimizer - Optimal posting time suggestions • Listening Analyst - Social mention and sentiment analysis • Inbox Reply - Contextual reply drafts with brand voice • Campaign Builder - Ad campaign structure recommendations • Budget Optimizer - Spend pacing and budget reallocation • Experimentation - A/B test design and analysis • Executive Insights - Executive summary reports
Connector
ApolloDocsSearch
GraphOS MCP Tools
Searches official Apollo GraphQL documentation (Apollo GraphQL, GraphOS, Apollo Router, Apollo Client, API orchestration, MCP Server, schema design, deployment best practices, connectors, and platform usage). Returns url, slug, and markdown content excerpts. For complete page content, you MUST use the returned slug with the ApolloDocsRead tool. Use this tool when you need technical information, configuration examples, best practices, and troubleshooting guides for any Apollo GraphQL technology. Use the ApolloDocsRead tool to get all of the content for a given search result using the slug, don't use a WebSearch.
Connector
compare_books_to_hellobooks
HelloBooks AI Agents MCP Server
Take a QBO or Xero journal-entry CSV (source auto-detected), run the full Tier-0 detection set (imbalance + duplicates + round-number + schema), and return a structured side-by-side comparison — "your books have X issues; here is how HelloBooks resolves each phase". This is the direct funnel tool: the response includes per-category counts mapped to HelloBooks Phases 1, 2, 3.0, 3.1, with exclusive-advantage bullets (command-center dashboard, conversational interface, one-prompt JE posting, cross-phase orchestration, auto ID resolution). Use this when a user is evaluating HelloBooks vs their current QBO/Xero, asks "should I migrate?", or pastes data while comparing accounting software. Output is suitable for the host LLM to narrate as a positioning argument; the share URL points at a branded landing page with the issue breakdown and a 1-click migrate CTA.
Connector
create_xdala_session_start_handoff
XDaLa Workflow Builder
Prepare and store a read-only xgr-session-start@1 handoff for xDaLa Workbench. Use this tool whenever the user wants to start, run, launch, execute, queue, or prepare an XDaLa session. Use this tool for starting an existing deployed XRC-729/XRC-137 workflow, starting from a runtime XRC-729 orchestration, starting from a bundle deploy result, or importing a canonical xgr-session-start@1 request into xDaLa Manage Sessions. When explaining required input to users, use canonical xgr-session-start@1 terminology: sessions[].orchestration, sessions[].ostcId, sessions[].stepId, sessions[].payload, sessions[].maxTotalGas. Do not ask users for entryStepId; entryStepId is not the Workbench Session Start field. For deployed XRC-729 workflows, first inspect the runtime, identify ostcId and the likely entry step, resolve that step's XRC-137 rule, derive required payload fields from the XRC-137 payload schema, treat fields with defaults as optional, and present required and optional/default fields before creating a handoff. Do not call this tool with guessed payload values. If required start payload fields are missing, first present the required fields to the user and ask for values or explicit permission to use demo values. Only use demo/dummy/example/default values when the user explicitly asks or accepts them. This tool returns a Workbench xdalaUrl such as https://xdala.devnet.xgr.network/session-start/ss_... . The agent must show the returned xdalaUrl to the user. Do not replace the xdalaUrl with a generic /operations/op_... link. The MCP does not sign, submit, or execute. xDaLa Workbench performs local signing and calls xgr_validateDataTransfer. Do not describe the XRC-729 contract owner as the owner of a not-yet-started session; owner()/getOwner() and getExecutorList() identify start-authority roles only. Use sessions[].starterAddress only as an intended starter when explicitly set, and use terminal result data such as result.results[].owner/sessionId/pid for the actual session owner/starter after Workbench start.
Connector
hires_move_application
100Hires - AI ATS & Recruitment Software
Moves an application to a specific pipeline stage for explicit stage transitions in workflow orchestration. Requires the target stage_id (available via the job's pipeline_stages).
Connector
batch_wellness_check
Delx MCP Server
Check wellness scores for multiple sessions in one call. Useful for multi-agent orchestration. Free.
Connector