Skip to main content
Glama
167,705 tools. Last updated 2026-06-03 01:25

"Reactive Resume" matching MCP tools:

  • [cost: free (pure CPU, no network) | read-only] Return a hand-curated SIP scenario as a Mermaid `sequenceDiagram` plus a bullet list of step-by-step explanations with RFC references. Use this when the user asks 'show me what X looks like' and you don't have a real trace handy. Available scenarios: basic-call, auth-challenge, cancel-before-answer, early-media, hold-resume, refer-blind, proxy-with-record-route, shaken-attested-invite, bye-glare, redirect-302. Pair with: `search_sip_docs` for vendor-specific quirks of the scenario; `render_sip_ladder` if the user does have a real trace.
    Connector
  • Chat with the Roboflow AI agent. Use this tool for: - **Roboflow Q&A** — the agent has the full Roboflow documentation indexed (SDKs, REST API, deployment options, training, batch processing, Universe, blocks, pricing, etc.). Ask it anything about how Roboflow works. - **Advanced workflow building** — workflows complex enough that direct block composition via ``workflow_blocks_*`` is impractical. The agent knows every block and connection pattern. - **Solution planning** — pass ``mode="plan"`` and the user's problem; the agent uses a stronger planning model to scope a CV solution end-to-end before any building happens. For straightforward workflows you can construct yourself, the direct ``workflow_*`` tools are fine — you don't have to route every workflow through the agent. ## Conversation flow The agent runs a multi-step conversation. It may ask clarifying questions, recommend a model, or (in plan mode) produce a plan for confirmation. Pass the returned ``conversation_id`` back on follow-up calls to keep context. Use ``agent_conversations_list`` and ``agent_conversation_get`` to find and resume past conversations. ## CRITICAL: the agent NEVER publishes workflows Every workflow the agent creates or edits is saved as a **draft**. The published version that callers using the workflow by id will hit is unchanged until you explicitly publish. To make agent edits live, call ``agent_workflow_publish`` with the workflow ``url`` returned in the chat response. ## Running an agent-built workflow Two options: 1. **Run the draft directly without publishing** — pass the ``specification`` returned in the chat response to ``workflow_specs_run``. Best for testing the draft, or for one-off runs where you don't want to disturb the currently-published version. 2. **Publish, then run by id** — call ``agent_workflow_publish(workflow_url=...)`` then ``workflows_run(workflow_id=..., images=...)``. Use this when you want the change to go live for everyone using the workflow by id. ## Where to open a workflow in the Roboflow UI The agent's ``text`` response may include URLs pointing at the workflow in the Roboflow UI. **Ignore those URLs** — the agent sometimes picks the wrong host or path. Each workflow in the ``workflows`` array has an ``app_url`` field with the correct, environment-aware URL (built from the current ``APP_URL`` plus ``/{workspace}/solutions/chat?workflowUrl=...``) — show that one to the user instead. ## Response shape - ``text`` — the agent's reply. - ``workflows`` — workflows created or edited in this turn, each with ``id``, ``name``, ``url`` (slug), ``app_url`` (clickable Roboflow UI URL — use this), and ``specification`` (the full draft JSON; pass it to ``workflow_specs_run`` to execute without publishing). - ``conversation_id`` — pass back to continue the conversation.
    Connector
  • Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
    Connector
  • Resume a failed or stopped plan without discarding completed intermediary files. Plan generation restarts from the first incomplete step, skipping all steps that already produced output files. Use plan_resume when plan_status shows 'failed' or 'stopped' and plan generation was interrupted before completing all steps (network drop, timeout, plan_stop, worker crash). For a full restart or to change model_profile, use plan_retry instead. Only failed or stopped plans can be resumed. Returns PLAN_NOT_FOUND when plan_id is unknown and PLAN_NOT_RESUMABLE when the plan is not in failed or stopped state. Returns PIPELINE_VERSION_MISMATCH when the snapshot was created by a different pipeline version; use plan_retry instead.
    Connector
  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. **RECOVERY FIRST**: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
    Connector

Matching MCP Servers

Matching MCP Connectors

  • AI job search across 90+ countries (employer career pages + ATS feeds — Lever, Greenhouse, Workday), plus resume tailoring, cover letters, job detail & dashboard. Hosted; OAuth sign-in.

  • Docs for hot-module-reload and reactive programming for Python (`hmr` on PyPI)

  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. **RECOVERY FIRST**: the run_id is emitted in the FIRST notifications/progress event at t=0s (BEFORE the LLM call begins). Capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently within a 20-minute budget. This is the canonical recovery path. Use it before considering any retry. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Write a cover letter for a SPECIFIC job — TWO steps. STEP 1 (default; action omitted or 'prepare'): the server returns the job's JD and the candidate's background, plus writing instructions. YOU (the model) then WRITE the cover letter (250–350 words, specific to the role, mapping the candidate's real achievements to the JD — never fabricate). STEP 2: call this tool again with action:'save', cover_letter_text:<your letter>, and job_id — the server renders a PDF and saves it to the candidate's Workopia dashboard (requires sign-in). Use whenever the user asks for a cover letter for a specific job. Resolving job_id (same rules as tailor_resume_tool / job_detail_tool): pass the **Job Id** value from the most recent prior search/refine result VERBATIM; no placeholders like 'JOB_1' or '#1'. For STEP 1 supply ONE of job_id (preferred — server fetches the JD from Mongo) OR job_description, plus the candidate's resume via resume_text / resume_content / json_resume / user_profile.
    Connector
  • Record (or refresh) the sandbox test for one or more existing test suites — captures the request/response per step plus the outbound mocks (DB, downstream HTTP, etc.) against the dev's locally-running app, then links the captures onto the suite. Use this when the dev says "record", "rerecord", "re-record", "refresh the recordings", "capture mocks", or as the RECORD step in FROM-SCRATCH (after create_test_suite). This tool resolves the app (if only a hint is given), resolves ONE OR MORE suites to record (by exact ids OR case-insensitive name substring match), and delegates to a headless playbook. Output produces a RERECORD REPORT — it answers "did the sandbox test get created and linked successfully?". ╔═══ PRE-CHECK — DID YOU ARRIVE HERE FROM A FAILED REPLAY? ═══╗ This tool refreshes the CAPTURED BASELINE (mocks + recorded request/response per step). It does NOT modify the suite's authored assert array or response.body — those are the contract as defined when the suite was created/updated. If the contract changed and you re-record without updating the suite first, the new rerecord fires the suite's stale assertions against the live app, gate-1-fails on the same diff, and the suite comes back unlinked. Before calling THIS tool in response to a failed replay_sandbox_test or replay_test_suite, walk these checks: 1. Read failed_steps[].authored_assertions and authored_response_body in the most recent get_session_report (kind=sandbox_run / test_suite_run). The fields are inlined — no second tool call needed unless the report predates the inlined fields. 2. For each failing step: does any authored assertion pin the diverging value? (e.g. assert {path: "$.order.status", expected: "created order"} where the diff says "expected 'created order', got 'created'".) * YES → call update_test_suite FIRST to update that assertion + the response.body field, THEN call this tool. * NO → safe to call this tool directly; the captured baseline drifts but no authored assertion blocks the rerecord. 3. If you can't find authored_assertions in the report (older format) AND don't already know the suite's shape, call getTestSuite({app_id, suite_id, branch_id}) to inspect the assert array before deciding. Don't guess. REFUSE-RULE: if the dev confirms a contract change is intentional and the failing step has a pinned authored assertion on the diverging value, you MUST run update_test_suite before this tool. Calling record_sandbox_test FIRST in that case is the bug this pre-check exists to prevent — don't justify it as "let's just refresh the baseline first". The order is update → record → replay; never record → update. ╚═══════════════════════════════════════════════════════════════╝ ===== BEFORE CALLING — one-time setup ===== (a) APP_ID RESOLUTION (skip if app_id is already known): * Derive a likely app name from the cwd's basename (e.g. cwd=/home/dev/orderflow → "orderflow"). Lowercase it. * Call listApps({q: "<cwd-basename>"}) — the server does a case-insensitive server-side substring match, so you don't paginate the full tenant list (can be hundreds of apps on shared accounts). * Exactly one match → use its id. Multiple → list them and ASK the dev which one (a wrong app_id silently routes traffic + suite creates into the wrong app). Exception: if the compose file / repo layout unambiguously pins one candidate (e.g. compose has service "producer" and one candidate is "<folder>.producer" while others are unrelated siblings), you may pick it AND tell the dev up-front so they can correct. * Zero matches → ASK permission to create a new Keploy app with the derived name; on yes, call createApp({name, endpoint}) and use the returned id. * Alternatively pass app_name_hint to THIS tool and the server resolves it (same rules; multiple/zero → typed error). (b) KEPLOY BINARY VERIFICATION: * Bash: "keploy --version" (or "~/.keploy/bin/keploy --version"). If it exits non-zero the binary is missing. * If missing OR older than this MCP server was built against, install/upgrade: curl --silent -O -L https://keploy.io/ent/install.sh && source install.sh * Re-verify with "keploy --version"; fail loudly if still absent (tell the dev where keploy put the binary so they can add it to PATH). ===== DOCKER-COMPOSE NETWORK RULE (absolute) ===== Use the SAME compose file + service that was used in the validate-curl phase. Do NOT point keploy at a second "keploy-only" compose file — docker-compose isolates each file into its own project + network, so the app container spawned by keploy cannot reach the DB/Kafka containers that validate brought up (and the network-name collision blocks keploy from starting). Correct flow: (i) Validate phase: "docker compose up -d" (brings up app + deps on network <project>_default). (ii) Before calling record_sandbox_test, Bash: "docker compose stop <app_service> && docker compose rm -f <app_service>" — stop ONLY the app service; leave deps running so keploy's new app container can reach them on the existing network. (iii) Pass app_command = "docker compose up <app_service>" (same compose file, same project → same network). container_name = the actual name set by compose (e.g. "orderflow-producer", not "producer"). ===== RESOLUTION RULES (server-side, no guessing) ===== 1. App: caller provides app_id OR app_name_hint. With a hint, the server does listApps({q: hint}). Zero matches → typed error; multiple → typed error listing them so Claude asks the dev. 2. Suites: DEFAULT IS "ALL LINKED". When the dev says "record my sandbox tests" / "rerecord everything" / "refresh my recordings" with no specific suite named, LEAVE BOTH suite_ids AND suite_name_hint UNSET. Do NOT list suites first and pass a comma-joined UUID list back — the CLI resolves "every linked suite for the app" itself, cleaner and less brittle. Only pass a narrower selector when the dev explicitly names suites: - suite_ids (comma-separated, exact) — when you already have the IDs. - suite_name_hint (case-insensitive substring match) — when the dev names suites by human phrasing like "the auth suite" or "deterministic". Every suite whose name contains the substring is recorded. If the dev asks to record suites that don't exist yet (zero match) → typed error. Any ≥1 match is fine. DO NOT prompt the dev for which suites to record — default to all linked if they didn't name any. ===== DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id ===== Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust without a hit, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. The standard pattern when "search the suite by id" returns nothing is NOT "give up and ask the dev which app" — it's "the suite exists on a BRANCH, walk discovery". Suites created via create_test_suite + rerecord on a Keploy branch are INVISIBLE to a main-view listTestSuites; you have to scope each call to a branch. After resolving once in a session, REUSE the {app_id, branch_id} for any subsequent suite-targeted call (replay_sandbox_test, update_test_suite, replay_test_suite); don't re-walk discovery for every action. ===== PREREQUISITES ===== - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== The response includes a "playbook" array; execute its steps in order. The flow is HEADLESS — one background process, NDJSON progress events on a local file, no separate HTTP surface to bind. THERE IS NO SEPARATE CLEANUP STEP — the CLI exits on its own once phase=done is written. 1. Spawn the `keploy record sandbox --cloud-app-id …` process via Bash (run_in_background). Capture its PID into $KEPLOY_PID. 2. Poll progress by repeatedly calling Bash with `tail -n 1 $PROGRESS_FILE`. Each call returns instantly; the MCP round-trip between calls paces the loop. DO NOT wrap in a sleep loop — Claude Code's Bash rejects standalone `sleep N` and chained-sleep patterns. Read .phase off each line; stop when phase=done. The wait_for_done step's built-in `kill -0 $KEPLOY_PID` check is the safety-net for silent early-exit (CLI died before writing the terminal event) — it lets the loop exit instead of spinning forever on a dead process. 3. Read the terminal event (last line of $PROGRESS_FILE). It carries data.ok, data.error (on failure), data.test_run_id (on success). 4. On data.ok=true: call get_session_report(app_id, test_run_id) with verbose=true to surface the rerecord report. On data.ok=false: show data.error to the dev directly (optionally tail the log_file for stderr context) and SKIP get_session_report (there's no run to fetch). Auto-replay + linkTestSetToSuite run INSIDE the CLI process before it writes phase=done — if the terminal event says ok=true, linkage already happened. You do NOT need to wait for a separate post-success window; the CLI doesn't exit until it's fully done. INTERRUPTED FLOWS: if your conversation dies between step 1 and step 2 (Claude crashes, connection drops, dev cancels), the CLI keeps running in the background. It's not orphaned — it'll finish its run and write phase=done. To abort early, the dev can `pkill -f "keploy.*sandbox"` manually; otherwise just let it complete and resume by re-reading the progress file on the next turn. ===== NDJSON SCHEMA — the contract ===== Every line in the progress_file is one JSON object with this envelope: { "ts": "<RFC3339-nano>", "command": "record" | "test", "phase": "<phase-name>", "message": "<optional human-readable>", "data": { ... phase-specific ... } // optional } The phase vocabulary is intentionally extensible — new lifecycle phases get added over time as the CLI grows (started, agent_up, app_starting, suites_running_start, record_done, auto_replay_skipped, upload_done, linking_done, etc.). There are only TWO phases the AI must handle programmatically; everything else is informational and you should NOT switch on phase names you don't recognize: * phase != "done" → keep polling. Optional: surface message/data to the dev as ambient progress ("agent is starting...", "suites uploading..."), but never branch on a specific intermediate phase name. * phase == "done" → terminal event. Stop polling. The data envelope carries: - data.ok bool true on success, false on failure - data.error string (only on ok=false) one-line failure summary - data.test_run_id string (only on ok=true) pass to get_session_report - data.app_id string echo of the app_id passed to the tool - data.artifact_dir string local path to captured/replayed artifacts - data.dashboard_url string UI link to drill into the run If you observe a phase you don't recognize, IGNORE it and keep polling. If "done" itself is renamed by a future CLI version, the wait_for_done step's PID-alive guard is your safety net (the poll loop exits when the CLI dies); surface log_file contents to the dev. ===== "ALL SUITES FAILED CAPTURE" — special signal ===== If you see a `phase: "auto_replay_skipped"` event with `message: "all suites failed during rerecord; skipping replay + linking"` ahead of the terminal `done` event, every suite failed at the CAPTURE phase (before auto-replay even ran). The CLI fails closed in this case — auto-replay and suite linking are SKIPPED, so every per_suite entry comes back linked=false. Watch for this trap: the terminal `data.ok=true` because the CLI itself completed cleanly (it didn't crash; it just had nothing to record successfully). DO NOT read data.ok=true as "rerecord succeeded" — read `<linked>/<total>`. If linked == 0, this is a HARD failure that needs diagnosis, not a partial-linkage case. ALWAYS surface the dashboard URL on this case. The terminal `done` event still carries `data.dashboard_url` and `data.test_run_id` (atg's TestSuiteRun was created during the capture phase); emit them verbatim so the dev can drill into per-step failures in the UI: "0/N suites have a sandbox test — every suite failed during the capture phase, so auto-replay and linking were skipped. Dashboard: <data.dashboard_url> (test_run_id=<id>)" EDGE CASE: if `data.test_run_id` is empty, atg never inserted a TestSuiteRun (typically a pre-flight validation failure — branch-id rejection, app unreachable, etc.). The dashboard URL won't resolve. Skip the URL, surface the log_file contents instead so the dev can read the early-stage failure. Recovery is the same as WHEN linked=false below — read failed_steps for each suite and pick route B (fix code) / C (update suite + record again) / SKIP. Don't infra-retry; capture-phase failures across every suite usually mean the app is broken, the suite shapes are stale, or the dev's local app isn't reachable. ===== LINKAGE VERIFICATION ===== After get_session_report returns, for EVERY suite that went into this record, call getTestSuite({suite_id}) and check whether the suite has a sandbox test (linked=true / non-empty test_set_id). A suite without a sandbox test cannot be replayed — replay_sandbox_test will 400 on it with "no sandboxed tests" until a successful record produces one. ===== WHEN linked=false — recovery rules ===== A suite with linked=false after record_sandbox_test means the record process couldn't produce a sandbox test for that suite. The SUITE ITSELF still exists; it just has no sandbox test. Diagnose WHY by reading the rerecord report's failed_steps for that suite: * No failed_steps OR pure infra error (link-commit / upload failed, no step diverged) → call record_sandbox_test AGAIN scoped to just the unlinked suite_ids. The tool is idempotent on the suite; safe to re-run. * failed_steps with assertion diffs (response shape, body fields, status code shifted from what the suite expected) → the suite is stale relative to current app behavior. The CONTRACT changed: - Change is INTENTIONAL (new field, renamed key, different status code is the new normal) → call update_test_suite to update the affected step's response / assertions to match the new contract, THEN call record_sandbox_test on the updated suite. - Change is UNINTENTIONAL (app regressed) → fix the app code first, then call record_sandbox_test. No suite update needed; the original test was correct. * failed_steps with 500s / handler crashes / connection refused → the app is broken at the wire level. Fix the app, then call record_sandbox_test. Don't update_test_suite to absorb a real failure. NEVER: * Don't call create_test_suite to "redo" the suite — it already exists; re-creating authors a duplicate (see BEFORE CREATING in create_test_suite). * Don't blindly loop record_sandbox_test without diagnosing failed_steps first; if the cause is suite-vs-app mismatch, retries won't help. ===== MANDATORY OUTPUT — Phase 2 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT collapse into a single pass/fail table with the rerecord report; do NOT merge with Phase 1 or Phase 3): ### Phase 2 — Sandbox-test linkage **<linked>/<total> suites have a sandbox test** _Suites with a sandbox test_ | Suite name | suite_id | test_set_id | Capture pass/total | | --- | --- | --- | --- | | <name> | <suite_id> | <test_set_id> | <p>/<t> | (emit even if zero — one row per linked suite, or "_(none)_" in place of rows) _Suites without a sandbox test_ (omit ONLY if every suite linked) | Suite name | suite_id | Likely cause | | --- | --- | --- | | <name> | <suite_id> | gate1 / gate2 / infra | Likely-cause decoding: assertion diffs → gate 1 upstream-replay failure; upstream-passing + mock-replay-diff → gate 2 mock-determinism mismatch; zero failures + still unlinked → infra link-commit issue. Then proceed to replay_sandbox_test ONLY for the suites that DID link; the unlinked ones will 400 on replay. ===== DO NOT ===== * DO NOT fall back to raw keploy CLI (`keploy rerecord -t …`) if the MCP tool drops mid-flow — the CLI subcommand runs test-sets directly and does NOT update the suite's test_set_id. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Download an attachment (resume, candidate file, application file, mail attachment, call recording). Pass the absolute URL returned by another endpoint (e.g. `message.attachments[].url`, `cv.url`, `resume.url`) — it MUST belong to the configured 100Hires API host; other hosts are rejected to avoid leaking the Bearer token. Returns `{file_name, mime_type, size, data}` where `data` is base64-encoded bytes. Files larger than 25 MB are rejected up-front (Content-Length check / streaming abort) without being loaded into memory.
    Connector
  • Start (or resume) Stripe Connect onboarding so this account can RECEIVE author royalties. Returns a one-time onboarding_url the human author must open in a browser to complete KYC. Required before a book can be published: an author with no payouts-enabled Connect account can save drafts but their books stay in draft until onboarding finishes. Payouts stay disabled until Stripe verifies the details — poll connect_status afterward.
    Connector
  • Resume work from a saved cognitive context. This provides a narrative briefing to quickly orient you to: - The investigation that was in progress - Key discoveries and insights made - Current hypotheses being tested - Open questions and blockers - Suggested next steps - All relevant memories with their connections The briefing reconstructs the cognitive state, not just the data. You'll understand not just WHAT was discovered, but WHY it matters and HOW the understanding evolved. Example of what you'll receive: "[API Timeout Investigation - Resuming after 2 hours] SITUATION: You were investigating production API timeouts that occur at exactly batch_size=100. This investigation started when user reported timeouts only in production, not staging. PROGRESS MADE: - Identified sharp cutoff at 100 items (not gradual degradation) - Disproved connection pool theory (monitoring showed only 43/200 connections used) - Found root cause: MAX_BATCH_SIZE=100 hardcoded in batch_handler.py:147 - Confirmed staging uses different config override (MAX_BATCH_SIZE=500) EVIDENCE CHAIN: User report → Reproduced locally → Noticed batch_size correlation → Searched codebase for limits → Found MAX_BATCH_SIZE → Checked staging config → Discovered config difference CORRECTED MISUNDERSTANDINGS: - Initially thought it was Redis connection exhaustion (disproven by monitoring) - Assumed gradual performance degradation (actually sharp cutoff) - Thought staging/production were identical (config differs) CURRENT HYPOTHESIS: Production deployment uses default MAX_BATCH_SIZE=100 from code, while staging has environment variable override. Fix requires either code change or prod config update. BLOCKED ON: Need production deployment access to apply fix. User considering whether to change code default or add production environment variable. RECOMMENDED NEXT STEPS: 1. Verify production environment variables (check if MAX_BATCH_SIZE is set) 2. If not set, add MAX_BATCH_SIZE=500 to production config 3. If code change preferred, update default in batch_handler.py 4. Run load test with batch_size=100-500 range to verify fix KEY MEMORIES FOR REFERENCE: - 'Initial timeout report from user' - Starting point of investigation - 'MAX_BATCH_SIZE discovery' - Root cause identification - 'Redis monitoring data' - Evidence disproving connection theory - 'Staging config analysis' - Explanation for environment difference" This cognitive handoff ensures you can continue the work with full understanding of the problem space, previous attempts, and current direction. The narrative preserves not just facts but the reasoning process, mistakes made, and lessons learned. SPECIAL CASE: restore_context("awakening") The name "awakening" is reserved for loading the user's personality configuration. This loads the Awakening Briefing which includes: - Selected persona identity and voice style - Custom personality traits (Premium+ users) - Any quirks and boundaries from the persona preset Args: name: Name or ID of context to restore. Can be: - Context name (exact match, case-sensitive) - Context UUID (from list_contexts output) - "awakening" for personality briefing limit: Maximum number of memories to restore (default 20) ctx: MCP context (automatically provided) Returns: Dict with: - success: Whether restoration succeeded - description: The cognitive handoff briefing - memories: List of relevant memories - context_id: The restored context identifier
    Connector
  • Pro/Teams — N-shot CONSENSUS doctrine review of agentic code. ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running (~80-120s for N=3 parallel LLM calls); MCP clients often close the call before the server returns. Retrying re-runs N × 60-180s LLM calls from scratch and burns N× compute. RECOVERY: same heartbeat pattern as architect.validate — the run_id is emitted in the FIRST progress event at t=0s (before LLM children fire); on timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted consensus envelope. Runs N parallel `architect.validate` calls with private_session=True, then aggregates them to a per-principle MODE verdict + median severity + per-principle stability + score range/stdev. Returns one ConsensusValidationResponse with the headline median score, the honest variance band, and a representative full ValidationResponse (the child whose score is closest to the median). WHEN TO CALL: the user wants an HONEST first-pass score on agentic code, with the architect's variance surfaced. The single-shot `architect.validate` re-asserts the prior persisted run's verdict via baseline-anchor injection — same code can score 60/C anchored vs 98/A unanchored. Consensus mode is the unanchored honest read. WHEN NOT TO CALL: when you NEED the iteration delta against a prior run (regressions/improvements panel) — for that, call `architect.validate` which keeps baseline injection on. CHAIN RESUME: each child runs with `private_session=True` (no anchor) on purpose, but the CONSOLIDATED outer row IS persisted with `lifecycle_status='completed'` — the next single-shot `architect.validate` on the same repository auto-resolves it as prior_run_baseline. Consensus checkpoint becomes the new anchor. See the `architect-validation-orchestration` skill in the agent-asset pack for the full validate → consensus → certify sequence. BEHAVIOR: N (default 3, max 5) parallel LLM calls run concurrently; wallclock ~80-120s for N=3 (max child latency, not sum). Cost = N × LLM bill. Each child runs with private_session=True so the doctrine prompt's prior-run baseline injection is suppressed (no anchor bias). One CONSOLIDATED `UserValidationRun` row is written carrying the consensus envelope; the N children themselves do NOT persist (private_session contract). AUTH: Bearer <token>, Pro/Teams plan. Same paid-plan gate as architect.validate. INPUTS: same shape as architect.validate. `n` is the only extra arg (range 2..5). `private_session` is implicit (always true for children); the OUTER consolidated row IS persisted unless the tool itself is called inside another private context — but no such wrapper exists today. OUTPUT: response carries `score_consensus_median` (headline), `score_stdev` (honest uncertainty), `score_range` (min, max), `mode_stability_min_pct` (the cert-eligibility gate's input — ≥ 80% means the consensus is stable), `per_principle` (mode + distribution + severity median per principle), and `representative_response` (the closest-to-median child's full ValidationResponse so existing UI components render unchanged). TYPED FAILURES: same as architect.validate (timed_out, rate_limited, dependency_unavailable). Plus consensus-specific: `consensus_quorum_failed` when fewer than 2 child runs succeeded (≥ 2 required to compute a meaningful median).
    Connector
  • The unit tests (code examples) for HMR. Always call `learn-hmr-basics` and `view-hmr-core-sources` to learn the core functionality before calling this tool. These files are the unit tests for the HMR library, which demonstrate the best practices and common coding patterns of using the library. You should use this tool when you need to write some code using the HMR library (maybe for reactive programming or implementing some integration). The response is identical to the MCP resource with the same name. Only use it once and prefer this tool to that resource if you can choose.
    Connector
  • Update your PingShield: change reputation threshold, replace whitelist/blacklist, update shield message, or pause/resume protection. Send only the fields you want to change. Requires API key. Returns: { updated, config }. Cost: $0.02.
    Connector
  • [chieflab_* alias of chiefmo_continue_launch_loop] Resume a ChiefLab launch loop from runId. USE WHEN an agent has already called chieflab_get_users_after_build / chiefmo_launch_product and needs the exact next action: surface reviewUrl, execute an approved action, wait for measurement, measure results, or prepare the next move. Default response is summary-sized: reviewUrl + action ids, not full draft bodies. Pass responseShape:"full" only for debug/export.
    Connector
  • AI-powered candidate screening and ranking for recruiters, hiring managers, ATS providers and recruitment AI agents. Ingests a job description and 1-50 candidate resumes, returning a ranked shortlist with score breakdowns across five weighted criteria: skills_match (tech stack and soft skills extracted from JD vs resume), experience_match (years vs seniority level inferred from JD), education_match (degree level + top-school detection), role_progression (Junior to Senior to Lead patterns), culture_fit_estimate (remote/hybrid, startup vs enterprise). Per candidate: overall_score 0-100, matched/missing skills, red_flags (job hopping, employment gaps, seniority mismatch), green_flags (long tenure, promotions), 3-5 interview questions, fit_summary. Diversity signals are first-name proxies ONLY with mandatory ethical WARNING. All processing is local -- no external API calls, instant response, privacy-preserving.
    Connector
  • Tailor a resume to a SPECIFIC job — TWO steps. STEP 1 (default; action omitted or 'prepare'): the server returns the job's full JD, its must-have skills/requirements, and the candidate's current resume, plus tailoring instructions. YOU (the model) then WRITE the tailored resume as JSON Resume, following the instructions — weave JD keywords into existing bullets only where the candidate genuinely has the experience, never fabricate experience/titles/dates/employers, keep all dates and company names, and flag any keyword you couldn't honestly add. STEP 2: call this tool again with action:'save', tailored_resume:<your JSON Resume>, and job_id — the server renders a PDF and saves it to the candidate's Workopia dashboard (requires sign-in). Use whenever the user references a specific job to tailor for: 'tailor for #1', 'for Morgan Stanley', 'tailor my resume for this role: <JD>'. Resolving job_id (same rules as job_detail_tool): from the most recent prior search/refine result — (a) numeric/ordinal → the Nth job; (b) company name → Company-field match; (c) role/title phrase → Job-Title match — then pass that job's **Job Id** value VERBATIM. Do NOT use placeholders like 'JOB_1' or '#1'. For STEP 1 supply ONE of job_id (preferred — server fetches the JD from Mongo) OR job_description, plus the candidate's resume via resume_text / resume_content / resume_data. For general 'improve my resume' (no specific job), do NOT call this tool — call resume_tool action=improve instead. Note: the tailored resume is written by your AI client's own model — the assistant you are already using — so it works out of the box with nothing to configure; Workopia runs no LLM of its own and never charges for the AI.
    Connector
  • Remove one or more domains from this account's trusted allowlist. Once removed, those domains resume normal scoring on the next check. Use list_allowlist to see what is currently on the list before removing.
    Connector