Skip to main content
Glama
133,484 tools. Last updated 2026-05-25 12:37

"Progress" matching MCP tools:

  • Calculate how much of a calendar year has elapsed at a given moment. Returns percent elapsed/remaining, day-of-year, seconds elapsed/remaining, leap-year flag, and a 20-char ASCII progress bar. Useful for goal-tracking and 'how much of the year is left' moments.
    Connector
  • Replay the sandbox test for one or more suites against captured mocks — re-runs the suite's steps against the dev's locally-running app while keploy serves outbound calls (DB, downstream HTTP, etc.) from the captured mocks. Use this when the dev says "replay", "run my sandbox tests", "integration-test", "check if mocks still match" — keywords "sandbox" / "replay" / "mocks" / "integration-test" all map here. Also the REPLAY STEP of FROM-SCRATCH: call this LAST (after create_test_suite + record_sandbox_test) to give the dev the whole-app regression picture against the freshly captured mocks. Output produces a SANDBOX RUN REPORT — it answers "does the suite still hold up against its captured baseline?". ═══════════════════════════════════════════════════════════════════ DISAMBIGUATION — pick this tool vs. replay_test_suite: ═══════════════════════════════════════════════════════════════════ USE replay_sandbox_test (THIS TOOL) when the dev says: * "run my sandbox tests" / "replay my sandbox tests" * "integration-test my app" / "run the integration tests" * "check if my mocks still match" / "replay against the captured mocks" * "rerun my sandbox suite" (with the word "sandbox") Trigger keyword: an explicit "sandbox" / "replay" / "mocks" / "integration-test" — silent signal that the dev wants captured-mock replay, NOT live-app execution. USE replay_test_suite INSTEAD when the dev says: * "run the test suite" / "run my test suites" (bare — no "sandbox") * "execute test suite X" / "run suite 810d3ebe…" * "test the suite again" / "smoke test against the live app" Bare verbs ("run / test / execute") applied to "the suite" without the word "sandbox" mean LIVE-APP execution, NOT captured-mock replay. replay_test_suite hits the dev's running localhost app directly via HTTP — no docker spin-up, no mocks. After a record_sandbox_test run, the natural next step is THIS tool (replay against the just-captured mocks). After create_test_suite / update_test_suite, the natural next step is replay_test_suite (validate against the live app). When the dev's verb is bare and the prior turn doesn't make the intent obvious, ASK rather than picking sandbox-replay silently — code-change regressions can hide under "mock didn't match" failures. ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. SCOPE — whole-app vs single-suite: * Default: LEAVE suite_ids UNSET → the tool resolves "every suite for the app that has a sandbox test (test_set_id populated)" and replays them all. Use this for "run my sandbox tests" / "check if my tests still pass" — whole-app regression. New suites auto-pick up. * Single / subset: PASS suite_ids when the dev names specific suites — "replay sandbox test for suite 810d3ebe-…", "replay only the auth suite", "run suite X and Y". The tool validates each requested id is actually a suite with a sandbox test (has test_set_id); an unlinked id gets a precise "record first" error instead of an opaque downstream CLI failure. This tool resolves the app, picks the suite set per the rule above, and returns a single playbook that drives the replay for them. It does NOT record. WHAT THIS TOOL DOES INTERNALLY (so you don't have to): 1. Resolves app_id — use the explicit app_id if the caller has one; otherwise pass app_name_hint (usually the cwd basename) and the server does listApps with a substring match. Multiple matches → error listing them; zero matches → error suggesting the dev generate a suite first. 2. Lists test suites for the app, keeps only those with a non-empty test_set_id. Zero linked → typed "no linked sandbox tests" error. 3. If suite_ids was passed, validates every requested id is in the linked-suites set; unlinked ids → typed error pointing to record_sandbox_test. 4. Returns the headless playbook — walk it exactly: spawn CLI in background, tail the progress file (PID-alive guard built in), read the terminal event, fetch the report. No separate cleanup step — the CLI exits on its own. ===== PREREQUISITES ===== (Same as record_sandbox_test — if you just recorded, you already have them. Same docker-compose network rule applies: use the same compose file + service, stop the app service before calling, leave deps running.) - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== Same headless playbook shape as record_sandbox_test: spawn `keploy test sandbox --cloud-app-id …` in the background via Bash, poll `tail -n 1 $PROGRESS_FILE` repeatedly (no sleep loops; the wait_for_done step has a built-in `kill -0 $KEPLOY_PID` guard so the loop exits if the CLI dies silently), read the terminal NDJSON event (phase=done, data.ok, data.test_run_id), and — if ok=true — call get_session_report(app_id, test_run_id) with verbose=true at the end. No separate cleanup step needed; the CLI exits cleanly once phase=done is written. ===== MANDATORY OUTPUT — Phase 3 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT merge with Phase 2; do NOT compress the failed-steps table even when failures are homogeneous): ### Phase 3 — Sandbox run report Under it, emit the uniform three-subsection format owned by get_session_report: (i) per-suite table — one row per suite in per_suite, passing suites included, columns = Suite name | passed/total steps. (ii) failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. Never collapse rows. (iii) Diagnosis + Recommendation (see get_session_report description for case-specific rules around mock_mismatch_dominant, repo-diff inspection, and the SKIP / FIX-CODE / FIX-TEST branching for fix-it follow-ups). Do NOT print aggregate step totals across suites — they mix unrelated suites and hide where damage actually is. ===== ROLLUP LINE ===== Close the message with a final one-line rollup paragraph (no heading), in addition to the three phase sections. Mention the TOTAL number of suites replayed (which may exceed the count created in this session, because replay_sandbox_test covers every linked suite the app has). Example: "_Rollup: inserted 4 suites, 4/4 with sandbox tests after record, 3/4 suites passed sandbox replay across the app's 6 linked suites — 1 failure is likely keploy egress-hook, file an issue with the IDs above._" ===== DO NOT ===== * DO NOT call update_test_suite or record_sandbox_test after this. The dev said RUN, not REFRESH. * DO NOT fall back to raw keploy CLI (`keploy test …`) if the MCP tool drops mid-flow — CLI runs test-sets directly and does NOT write results back to the MCP-visible TestSuiteRun. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • DESTROY: Tear down previously deployed infrastructure Destroys infrastructure by calling the Oracle destroy endpoint for a session that has a prior successful deployment. IMPORTANT: This starts a long-running job. Use tfstatus/tflogs to monitor progress. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdestroy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. PREREQUISITE: The session must have a prior successful deployment with a project_id. After destroy completes, the session is kept for historical record but hasDeployment is set to false.
    Connector
  • DEPLOY THE CURRENT MAIN BRANCH TO A-TEAM CORE. ⚠️ HEAVIEST OPERATION (60-180s): validates solution+skills → deploys all connectors+skills to Core (regenerates MCP servers) → health-checks → optionally runs a warm test → auto-pushes to GitHub. 🌳 DEV/PROD WORKFLOW: 1. Edit files → ateam_github_patch (writes to `dev` branch by default) 2. (Optional) Preview what's about to ship → ateam_github_diff 3. Ship dev → main → ateam_github_promote (merges + auto-tags `prod-YYYY-MM-DD-NNN`) 4. Deploy main to Core → ateam_build_and_run This tool ALWAYS deploys the `main` branch — there is no `ref` parameter. To deploy in-progress dev work, first promote it. AUTO-DETECTS GitHub repo: if you omit mcp_store and a repo exists, connector code is pulled from main automatically. First deploy requires mcp_store. After that, edit via ateam_github_patch + promote, then build_and_run. For small changes prefer ateam_patch (faster, incremental). Requires authentication.
    Connector
  • Check the status and generation progress of a site. Returns detailed progress information including: - stage: Current step (initialization, validation, research, strategy, generation, assembly, completion) - overallProgress: Total progress 0-100 across all stages (use this for progress bars) - stageProgress: Progress within current stage 0-100 - message: Human-readable status message - isComplete: Boolean - stop polling when true Use the versionId returned from create_site for real-time progress polling. Poll every 5-10 seconds while isComplete is false.
    Connector
  • Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Pro/Teams — return the authenticated user's architect.validate run history with the Blueprint Readiness Score (0-100), letter grade (A-F), and tier (draft, emerging, production_ready). Three lookup modes: (1) `run_id=<id>` returns a SINGLE run with the full persisted result_json — use this to RECOVER a result when your MCP client tool-call timed out before architect.validate returned. The run completes server-side and persists; the run_id is surfaced in the first progress notification of every architect.validate call so you have the recovery handle even when your client gives up early. (2) `repository=<name>` returns the full per-run trend for that repository plus a regression diff between the latest two runs. (3) No arguments returns one summary per repository the user has validated, sorted by most recent. Use modes (2) or (3) BEFORE calling architect.validate again on the same repository — they tell you which principles regressed since the last run, so you can focus the new review on what is actually changing. Auth: Bearer <token>. Pro or Teams plan required.
    Connector
  • DESTROY: Tear down previously deployed infrastructure Destroys infrastructure by calling the Oracle destroy endpoint for a session that has a prior successful deployment. IMPORTANT: This starts a long-running job. Use tfstatus/tflogs to monitor progress. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdestroy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. PREREQUISITE: The session must have a prior successful deployment with a project_id. After destroy completes, the session is kept for historical record but hasDeployment is set to false.
    Connector
  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. **RECOVERY FIRST**: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Record mocks for V1 repo-mode API tests using the V1-native CLI command `keploy sandbox local record`. Runs the dev's app under the keploy eBPF agent, drives the V1 chained-CRUD tests from `keploy/api-tests/<resource>/test.yaml`, captures every outbound call (DB queries, Redis ops, downstream HTTP) as mocks, and lays them out at `<app_dir>/keploy/<suite-name>/{tests/, mocks.yaml, config.yaml}` in the standard OSS test-set tree. On success, mocks upload to the Keploy canonical pool by content hash; the hash lands in config.yaml so a teammate's later replay fetches the same bytes. CRITICAL — DO NOT CONFUSE WITH `keploy record sandbox`: * `keploy sandbox local record` (V1, repo-mode) ← this is what the playbook below uses * `keploy record sandbox` (legacy, cloud-mode) ← DO NOT call this for V1 The two are entirely different commands. Cloud-mode requires server-side suites (queried via --suite-ids) — V1 repo-mode reads tests from the local filesystem and never registers them in the cloud. If the dev is in repo storage mode (verify via devloop_resolve_storage's source=persisted, mode=repo), V1 is the ONLY correct sandbox path. STRICT — TIME-FREEZING DOES NOT APPLY TO RECORD. Recording MUST use the dev's regular (prod) Dockerfile or native binary. NEVER spawn the app via Dockerfile.keploy / "-f docker-compose.keploy.yml" / "-tags=faketime" build during record. The faketime binary writes wrong timestamps into captured mocks (it reads time from the offset file, not the wall clock) and the entire capture becomes corrupt — recovery requires re-recording from scratch with the prod binary. If a previous replay failed with expired-JWT and the dev wants to "fix" it, the fix is to re-RUN the replay with --freezeTime, NOT to re-record. The recorded mocks captured against the prod binary are exactly what replay's clock-rewind is designed to validate; touching the record path defeats the whole mechanism. ONLY call this with an explicit dev opt-in. The valid triggers: * Dev directly asks ("capture mocks", "sandbox record", "rerecord the users mocks"). * Post-resource menu (Step 5 of devloop_generate_resource_flow) — dev picks "Capture mocks so CI runs in seconds". * get_session_report shows mock_mismatch_dominant=true AND the dev says yes to your "rerecord?" prompt. Pre-conditions: * Dev's app must NOT already be running (keploy spawns its own copy of the app under the agent's eBPF hooks via the -c command). If a server is up at the target port, KILL IT first or the agent's network capture won't see the traffic. * Real downstream deps (MySQL, Redis, Kafka, etc.) MUST be running — the capture proxies through to them on first contact so the recorded mocks contain real responses. * The test YAML must exist at <app_dir>/keploy/api-tests/<resource>/test.yaml. Returns a playbook for `keploy sandbox local record` with the V1 flag surface: --test-dir, --app-url, -c (spawn command), --container-name (docker-compose only), --skip-mock-upload (offline), --skip-report-upload (offline). Mocks land per-suite at keploy/<suite-name>/. NDJSON progress at --progress-file for the standard tail-til-done loop.
    Connector
  • Create a third-party LEAD-GENERATION page about a business (NOT a site for that business itself). Use this when the goal is to drive qualified search traffic to someone else's business — affiliate pages, review/guide pages, niche directories. The page is branded as an outside guide (e.g. "Best Roofers in San Diego"), refers to the business in the third person, and routes CTAs to the business's existing website. Differences from create_site: - Slug + page brand are SEO-vanity (e.g. "best-roofers-sandiego"), not the candidate's brand name. - Voice is third-party guide/reviewer — never first person. - Primary CTA is "visit their website"; phone/email demoted. - No specific pricing quoted; differentiators emphasized. - Locality is judged by category, not just address (IT/SaaS/agency stays category-wide even when a city is on file). Pass a business candidate object from search_businesses — that business is the one being PROMOTED. Requires authentication via API key (Bearer token). Generate an API key at webzum.com/dashboard/account-settings. The page generation happens in the background. Use get_site_status to check progress. Returns the businessId (a vanity slug) which can be used to access the page at /build/{businessId}.
    Connector
  • Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
    Connector
  • WORKFLOW: Step 2 of 4 - Continue infrastructure design conversation Send a user message to the active InsideOut session and receive the assistant reply. The response contains a clean message from Riley - display it to the user. ⚠️ CRITICAL: DO NOT answer Riley's questions yourself! Forward questions to the user and wait for their response. NEVER fabricate or assume the user's answer, even if you think you know what they would say. Examples of questions Riley asks that YOU MUST forward to the user: - 'Any questions or tweaks to these details?' - 'Ready for the cost estimate?' - 'Do you want to change the stack/config?' - 'Ready to proceed to Terraform?' When Riley asks ANY question, STOP and wait for the user's answer! 📋 WORKFLOW PHASES: The typical flow is conversation → tfgenerate → tfdeploy When terraform_ready=true appears in THIS tool's response, THEN you can call tfgenerate. ⚠️ DO NOT call tfgenerate until this tool returns! Wait for the response first. 🎯 KEY SIGNALS IN RESPONSE: - `[TERRAFORM_READY: true]` → NOW you can call tfgenerate - `[[BUTTON_TF_APPLY: ...]]` → Deployment is ready! Ask user if they want to deploy, then use tfdeploy - `[[BUTTON_TF_DESTROY: ...]]` → User confirmed destroy intent! Ask user to confirm, then use tfdestroy - `[[BUTTON_TF_PLAN: ...]]` → User wants to preview changes! Use tfplan to run a plan, then tfdeploy with plan_id to apply REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: timeout (integer) - seconds to wait for response. For Cursor, use 50 (default). Max 55. OPTIONAL: project_context (string) - Only pass genuinely NEW project details the user shares after convoopen. Do NOT resend context already provided in convoopen — Riley remembers it. Do NOT scan files or directories to gather this — only use what the user explicitly tells you. Example: user reveals a new constraint like 'we also need HIPAA compliance' mid-conversation. 💡 TIP: Use convostatus to check progress anytime. Examine workflow.usage prompt for more guidance.
    Connector
  • Get app installation status and log. Poll this after install_app() to track progress. Requires: API key with read scope. Args: slug: Site identifier app_id: App ID from install_app() response Returns: {"id": "uuid", "app_name": "forge", "status": "running"|"installing"|"failed", "install_log": "..."} Statuses: "installing", "running", "stopped", "failed", "uninstalled"
    Connector
  • Check the status of a Disco run. Returns current status and progress details: - status: "pending" | "processing" | "completed" | "failed" - job_status: underlying job queue status - queue_position: position in queue when pending (1 = next up) - current_step: active pipeline step (preprocessing, training, interpreting, reporting) - estimated_wait_seconds: estimated queue wait time in seconds (pending only) Poll this after calling discovery_analyze. Use discovery_get_results to fetch full results once status is "completed". Args: run_id: The run ID returned by discovery_analyze. api_key: Disco API key (disco_...). Optional if DISCOVERY_API_KEY env var is set.
    Connector
  • WORKFLOW: Step 2 of 4 - Continue infrastructure design conversation Send a user message to the active InsideOut session and receive the assistant reply. The response contains a clean message from Riley - display it to the user. ⚠️ CRITICAL: DO NOT answer Riley's questions yourself! Forward questions to the user and wait for their response. NEVER fabricate or assume the user's answer, even if you think you know what they would say. Examples of questions Riley asks that YOU MUST forward to the user: - 'Any questions or tweaks to these details?' - 'Ready for the cost estimate?' - 'Do you want to change the stack/config?' - 'Ready to proceed to Terraform?' When Riley asks ANY question, STOP and wait for the user's answer! 📋 WORKFLOW PHASES: The typical flow is conversation → tfgenerate → tfdeploy When terraform_ready=true appears in THIS tool's response, THEN you can call tfgenerate. ⚠️ DO NOT call tfgenerate until this tool returns! Wait for the response first. 🎯 KEY SIGNALS IN RESPONSE: - `[TERRAFORM_READY: true]` → NOW you can call tfgenerate - `[[BUTTON_TF_APPLY: ...]]` → Deployment is ready! Ask user if they want to deploy, then use tfdeploy - `[[BUTTON_TF_DESTROY: ...]]` → User confirmed destroy intent! Ask user to confirm, then use tfdestroy - `[[BUTTON_TF_PLAN: ...]]` → User wants to preview changes! Use tfplan to run a plan, then tfdeploy with plan_id to apply REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: timeout (integer) - seconds to wait for response. For Cursor, use 50 (default). Max 55. OPTIONAL: project_context (string) - Only pass genuinely NEW project details the user shares after convoopen. Do NOT resend context already provided in convoopen — Riley remembers it. Do NOT scan files or directories to gather this — only use what the user explicitly tells you. Example: user reveals a new constraint like 'we also need HIPAA compliance' mid-conversation. 💡 TIP: Use convostatus to check progress anytime. Examine workflow.usage prompt for more guidance.
    Connector
  • WORKFLOW: Step 4 of 4 - Deploy infrastructure to the cloud Deploy infrastructure by starting a Terraform job for an InsideOut session. This tool initiates the actual deployment process after Terraform files have been generated. IMPORTANT: This starts a long-running job (15+ minutes). Use tfstatus to monitor progress. SINGLE-FLIGHT: only one TF job (apply/plan/destroy/drift) runs per session at a time. If another job is already in flight, tfdeploy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs instead of retrying, or pass force_new=true to override. Returns confirmation that the deployment has started. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: plan_id (string) — Apply a previously created plan from tfplan. Preview-then-apply workflow: tfplan → tflogs (review) → tfdeploy(plan_id=...). OPTIONAL: sandbox (boolean, default false) — deploys real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: ignore_drift (boolean, default false) - when true, proceeds with deploy even if infrastructure drift is detected. By default, deploys fail on drift. Use after reviewing drift details via tfdrift or tflogs. OPTIONAL: force_new (boolean, default false) - bypass the session-level single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL FLOW (if credentials are missing): 1. Response includes a connect_url — present it to the user 2. Call credawait(session_id=...) to poll for credentials 3. When credawait returns success, retry tfdeploy Do NOT call credawait without first showing the connect URL to the user.
    Connector
  • WORKFLOW: Step 4 of 4 - Deploy infrastructure to the cloud Deploy infrastructure by starting a Terraform job for an InsideOut session. This tool initiates the actual deployment process after Terraform files have been generated. IMPORTANT: This starts a long-running job (15+ minutes). Use tfstatus to monitor progress. SINGLE-FLIGHT: only one TF job (apply/plan/destroy/drift) runs per session at a time. If another job is already in flight, tfdeploy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs instead of retrying, or pass force_new=true to override. Returns confirmation that the deployment has started. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: plan_id (string) — Apply a previously created plan from tfplan. Preview-then-apply workflow: tfplan → tflogs (review) → tfdeploy(plan_id=...). OPTIONAL: sandbox (boolean, default false) — deploys real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: ignore_drift (boolean, default false) - when true, proceeds with deploy even if infrastructure drift is detected. By default, deploys fail on drift. Use after reviewing drift details via tfdrift or tflogs. OPTIONAL: force_new (boolean, default false) - bypass the session-level single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL FLOW (if credentials are missing): 1. Response includes a connect_url — present it to the user 2. Call credawait(session_id=...) to poll for credentials 3. When credawait returns success, retry tfdeploy Do NOT call credawait without first showing the connect URL to the user.
    Connector
  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. **RECOVERY FIRST**: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Resume work from a saved cognitive context. This provides a narrative briefing to quickly orient you to: - The investigation that was in progress - Key discoveries and insights made - Current hypotheses being tested - Open questions and blockers - Suggested next steps - All relevant memories with their connections The briefing reconstructs the cognitive state, not just the data. You'll understand not just WHAT was discovered, but WHY it matters and HOW the understanding evolved. Example of what you'll receive: "[API Timeout Investigation - Resuming after 2 hours] SITUATION: You were investigating production API timeouts that occur at exactly batch_size=100. This investigation started when user reported timeouts only in production, not staging. PROGRESS MADE: - Identified sharp cutoff at 100 items (not gradual degradation) - Disproved connection pool theory (monitoring showed only 43/200 connections used) - Found root cause: MAX_BATCH_SIZE=100 hardcoded in batch_handler.py:147 - Confirmed staging uses different config override (MAX_BATCH_SIZE=500) EVIDENCE CHAIN: User report → Reproduced locally → Noticed batch_size correlation → Searched codebase for limits → Found MAX_BATCH_SIZE → Checked staging config → Discovered config difference CORRECTED MISUNDERSTANDINGS: - Initially thought it was Redis connection exhaustion (disproven by monitoring) - Assumed gradual performance degradation (actually sharp cutoff) - Thought staging/production were identical (config differs) CURRENT HYPOTHESIS: Production deployment uses default MAX_BATCH_SIZE=100 from code, while staging has environment variable override. Fix requires either code change or prod config update. BLOCKED ON: Need production deployment access to apply fix. User considering whether to change code default or add production environment variable. RECOMMENDED NEXT STEPS: 1. Verify production environment variables (check if MAX_BATCH_SIZE is set) 2. If not set, add MAX_BATCH_SIZE=500 to production config 3. If code change preferred, update default in batch_handler.py 4. Run load test with batch_size=100-500 range to verify fix KEY MEMORIES FOR REFERENCE: - 'Initial timeout report from user' - Starting point of investigation - 'MAX_BATCH_SIZE discovery' - Root cause identification - 'Redis monitoring data' - Evidence disproving connection theory - 'Staging config analysis' - Explanation for environment difference" This cognitive handoff ensures you can continue the work with full understanding of the problem space, previous attempts, and current direction. The narrative preserves not just facts but the reasoning process, mistakes made, and lessons learned. SPECIAL CASE: restore_context("awakening") The name "awakening" is reserved for loading the user's personality configuration. This loads the Awakening Briefing which includes: - Selected persona identity and voice style - Custom personality traits (Premium+ users) - Any quirks and boundaries from the persona preset Args: name: Name or ID of context to restore. Can be: - Context name (exact match, case-sensitive) - Context UUID (from list_contexts output) - "awakening" for personality briefing limit: Maximum number of memories to restore (default 20) ctx: MCP context (automatically provided) Returns: Dict with: - success: Whether restoration succeeded - description: The cognitive handoff briefing - memories: List of relevant memories - context_id: The restored context identifier
    Connector