Skip to main content
Glama
138,647 tools. Last updated 2026-05-20 18:02

"A sandbox environment to run Python code and install dependencies" matching MCP tools:

  • Replay the sandbox test for one or more suites against captured mocks — re-runs the suite's steps against the dev's locally-running app while keploy serves outbound calls (DB, downstream HTTP, etc.) from the captured mocks. Use this when the dev says "replay", "run my sandbox tests", "integration-test", "check if mocks still match" — keywords "sandbox" / "replay" / "mocks" / "integration-test" all map here. Also the REPLAY STEP of FROM-SCRATCH: call this LAST (after create_test_suite + record_sandbox_test) to give the dev the whole-app regression picture against the freshly captured mocks. Output produces a SANDBOX RUN REPORT — it answers "does the suite still hold up against its captured baseline?". ═══════════════════════════════════════════════════════════════════ DISAMBIGUATION — pick this tool vs. replay_test_suite: ═══════════════════════════════════════════════════════════════════ USE replay_sandbox_test (THIS TOOL) when the dev says: * "run my sandbox tests" / "replay my sandbox tests" * "integration-test my app" / "run the integration tests" * "check if my mocks still match" / "replay against the captured mocks" * "rerun my sandbox suite" (with the word "sandbox") Trigger keyword: an explicit "sandbox" / "replay" / "mocks" / "integration-test" — silent signal that the dev wants captured-mock replay, NOT live-app execution. USE replay_test_suite INSTEAD when the dev says: * "run the test suite" / "run my test suites" (bare — no "sandbox") * "execute test suite X" / "run suite 810d3ebe…" * "test the suite again" / "smoke test against the live app" Bare verbs ("run / test / execute") applied to "the suite" without the word "sandbox" mean LIVE-APP execution, NOT captured-mock replay. replay_test_suite hits the dev's running localhost app directly via HTTP — no docker spin-up, no mocks. After a record_sandbox_test run, the natural next step is THIS tool (replay against the just-captured mocks). After create_test_suite / update_test_suite, the natural next step is replay_test_suite (validate against the live app). When the dev's verb is bare and the prior turn doesn't make the intent obvious, ASK rather than picking sandbox-replay silently — code-change regressions can hide under "mock didn't match" failures. ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. SCOPE — whole-app vs single-suite: * Default: LEAVE suite_ids UNSET → the tool resolves "every suite for the app that has a sandbox test (test_set_id populated)" and replays them all. Use this for "run my sandbox tests" / "check if my tests still pass" — whole-app regression. New suites auto-pick up. * Single / subset: PASS suite_ids when the dev names specific suites — "replay sandbox test for suite 810d3ebe-…", "replay only the auth suite", "run suite X and Y". The tool validates each requested id is actually a suite with a sandbox test (has test_set_id); an unlinked id gets a precise "record first" error instead of an opaque downstream CLI failure. This tool resolves the app, picks the suite set per the rule above, and returns a single playbook that drives the replay for them. It does NOT record. WHAT THIS TOOL DOES INTERNALLY (so you don't have to): 1. Resolves app_id — use the explicit app_id if the caller has one; otherwise pass app_name_hint (usually the cwd basename) and the server does listApps with a substring match. Multiple matches → error listing them; zero matches → error suggesting the dev generate a suite first. 2. Lists test suites for the app, keeps only those with a non-empty test_set_id. Zero linked → typed "no linked sandbox tests" error. 3. If suite_ids was passed, validates every requested id is in the linked-suites set; unlinked ids → typed error pointing to record_sandbox_test. 4. Returns the headless playbook — walk it exactly: spawn CLI in background, tail the progress file (PID-alive guard built in), read the terminal event, fetch the report. No separate cleanup step — the CLI exits on its own. ===== PREREQUISITES ===== (Same as record_sandbox_test — if you just recorded, you already have them. Same docker-compose network rule applies: use the same compose file + service, stop the app service before calling, leave deps running.) - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== Same headless playbook shape as record_sandbox_test: spawn `keploy test sandbox --cloud-app-id …` in the background via Bash, poll `tail -n 1 $PROGRESS_FILE` repeatedly (no sleep loops; the wait_for_done step has a built-in `kill -0 $KEPLOY_PID` guard so the loop exits if the CLI dies silently), read the terminal NDJSON event (phase=done, data.ok, data.test_run_id), and — if ok=true — call get_session_report(app_id, test_run_id) with verbose=true at the end. No separate cleanup step needed; the CLI exits cleanly once phase=done is written. ===== MANDATORY OUTPUT — Phase 3 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT merge with Phase 2; do NOT compress the failed-steps table even when failures are homogeneous): ### Phase 3 — Sandbox run report Under it, emit the uniform three-subsection format owned by get_session_report: (i) per-suite table — one row per suite in per_suite, passing suites included, columns = Suite name | passed/total steps. (ii) failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. Never collapse rows. (iii) Diagnosis + Recommendation (see get_session_report description for case-specific rules around mock_mismatch_dominant, repo-diff inspection, and the SKIP / FIX-CODE / FIX-TEST branching for fix-it follow-ups). Do NOT print aggregate step totals across suites — they mix unrelated suites and hide where damage actually is. ===== ROLLUP LINE ===== Close the message with a final one-line rollup paragraph (no heading), in addition to the three phase sections. Mention the TOTAL number of suites replayed (which may exceed the count created in this session, because replay_sandbox_test covers every linked suite the app has). Example: "_Rollup: inserted 4 suites, 4/4 with sandbox tests after record, 3/4 suites passed sandbox replay across the app's 6 linked suites — 1 failure is likely keploy egress-hook, file an issue with the IDs above._" ===== DO NOT ===== * DO NOT call update_test_suite or record_sandbox_test after this. The dev said RUN, not REFRESH. * DO NOT fall back to raw keploy CLI (`keploy test …`) if the MCP tool drops mid-flow — CLI runs test-sets directly and does NOT write results back to the MCP-visible TestSuiteRun. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • WORKFLOW: Step 3 of 4 - Generate Terraform files from completed design Generate Terraform files from an InsideOut session that has completed infrastructure design. ⚠️ PREREQUISITE: Only call this AFTER convoreply returns with `terraform_ready=true` in the response metadata. DO NOT call this while convoreply is still running or before terraform_ready is confirmed! If you get 'session has not reached terraform-ready state', wait for convoreply to complete first. 🎯 USE THIS TOOL WHEN: convoreply has returned with terraform_ready=true, OR the user asks to 'see the terraforms', 'generate terraform', 'show me the code', etc. **DEFAULT RESPONSE**: Returns summary table + download URL (keeps code out of LLM context). **FALLBACK**: Set `include_code: true` to get full code inline if curl/unzip fails. **CRITICAL WORKFLOW** (default mode): 1. Call this tool to get file summary and download URL 2. ASK the user: 'Where would you like me to save the Terraform files? Default: ./insideout-infra/' 3. WAIT for user confirmation before running the download command 4. Run the curl/unzip command with the user's chosen directory 5. If curl/unzip FAILS (sandbox, security, platform issues), retry with `include_code: true` **AFTER GENERATION**: Ask user if they want to review the files and then deploy with tfdeploy REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: include_code (boolean) - set true to return full code inline as fallback. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.
    Connector
  • Returns available payment and authentication options for accessing live market data. Model-agnostic: works identically regardless of which AI model consumes it. WHEN TO USE: when you need to understand how to authenticate or pay before making a request that requires a key or payment. Returns upgrade ladder: sandbox (200 calls free), x402 per-request ($0.001 USDC), x402 sandbox (10 credits for $0.001), credit packs ($5 = 1000 calls), builder subscription ($99/mo = 50K/day). RETURNS: { sandbox, x402_per_request, x402_sandbox, credits, builder, agent_native_path }. No authentication required. Always returns 200.
    Connector
  • Returns available payment and authentication options for accessing live market data. Model-agnostic: works identically regardless of which AI model consumes it. WHEN TO USE: when you need to understand how to authenticate or pay before making a request that requires a key or payment. Returns upgrade ladder: sandbox (200 calls free), x402 per-request ($0.001 USDC), x402 sandbox (10 credits for $0.001), credit packs ($5 = 1000 calls), builder subscription ($99/mo = 50K/day). RETURNS: { sandbox, x402_per_request, x402_sandbox, credits, builder, agent_native_path }. No authentication required. Always returns 200.
    Connector
  • Generate the exact CI workflow YAML to add keploy sandbox tests to a pull-request pipeline, and tell you where to write it. Use this when the dev asks to "add keploy sandbox tests to my pipeline" / "wire keploy into CI" / "run keploy on PR" / "add a CI job for keploy" — the server emits the file contents verbatim so you don't have to compose the flag list yourself. ===== GOAL ===== Write a CI workflow file that runs `keploy test sandbox --cloud-app-id <uuid> --app-url <url>` on pull requests and gates the PR on the result. NEVER kick off an actual test run in this flow — it is pure file authoring, ends with the file on disk. DO NOT fire replay_sandbox_test, record_sandbox_test, replay_test_suite, or any other run-starting MCP tool here. ===== HOW (absolute) ===== Call this tool. It returns { file_path, content, summary }. Write the "content" to "file_path" VERBATIM via your Write tool — NO flag renames, NO flag removals, NO step reordering, NO synthesis. The server owns the YAML template; your job is only to (1) resolve the inputs from the repo and api-server and (2) Write the returned content. Do NOT compose the YAML yourself from general knowledge — flag drift (missing --cloud-app-id, inventing --app) is the most common bug when Claude improvises. DO NOT ASK the dev for confirmation before writing. Resolve everything from the repo + api-server, pick the GitHub Actions default, call this tool, Write the file. The dev's prompt is already the go-ahead. ===== STEPS ===== 1. DETECT THE CI SYSTEM: * Default = GitHub Actions (biggest share). File = .github/workflows/keploy-sandbox.yml. * If .gitlab-ci.yml exists → GitLab (not yet supported by this tool; tell the dev and stop). * If .circleci/config.yml exists → Circle (not yet supported; tell the dev and stop). * Otherwise → GitHub Actions. 2. RESOLVE VALUES by calling MCP tools + reading the repo: * app_id: call listApps({q: "<cwd basename>"}). Exactly one → use its id. Multiple → pick the one whose name most specifically matches the repo's primary service (e.g. "orderflow.producer" wins over "orderflow" when there's a ./producer directory); mention which you picked in the final message. Zero → stop and tell the dev to create the app + rerecord first. * suite_ids: DO NOT pass this arg by default. An empty suite_ids means the CLI resolves "every linked sandbox suite for the app" at CI run time — which is what you want (new suites auto-pick up without workflow edits). The tool still verifies there's ≥1 linked suite at scaffold time so the first PR run doesn't fail empty-handed. Only pass suite_ids when the dev explicitly narrows ("run only the auth suite in CI"); don't pin "all current suites" — that's staleness waiting to happen. * compose_file: READ THE REPO. Default is docker-compose.yml. AVOID passing a docker-compose-keploy.yaml variant that has `networks: default: external: true` — those variants only work locally, where another compose run has already created the external network. In CI the runner starts clean and `external: true` fails with "network not found". If the primary docker-compose.yml brings up the full app (deps + app service), use it end-to-end. * app_service, container_name, app_port: read from the SAME compose_file you picked above. app_service = the service key (e.g. "producer"); container_name = that service's container_name: field in that same compose file (e.g. "orderflow-producer" if compose_file=docker-compose.yml, but "producer" if compose_file=docker-compose-keploy.yaml — THESE DIFFER, pick consistently); app_port = the host-side of its ports: mapping. * app_url = http://localhost:<app_port>. The tool derives this; you don't pass it separately. 3. CALL THIS TOOL with app_id, app_service, container_name, app_port, compose_file (and suite_ids only if the dev explicitly narrowed scope). It returns { file_path, content, summary }. Write the "content" to the "file_path" VERBATIM. ===== FLAG NAME RULES (absolute, do not drift when reviewing the output) ===== * `--cloud-app-id` ← NOT `--app-id`. The OSS config has an `appId` uint64 field that viper maps `--app-id` into; passing a UUID there fails with "invalid syntax" before RunE runs. * `keploy test sandbox --cloud-app-id <uuid> --app-url <url>` ← the CI form. NOT `keploy test --cloud-app-id` (must be `test sandbox` — the headless flags live on the sandbox subcommand only), NOT `keploy test-suite run` (that command doesn't exist). There is NO `--pipeline` flag. * Install URL = `https://keploy.io/ent/install.sh` ← NOT `https://keploy.io/install.sh` (OSS; no sandbox subcommand at all), NOT a github.com/keploy/keploy release tarball. If the server-emitted content ever disagrees with these rules, trust the server output and file a bug — don't edit the YAML. ===== RESOLUTION ARGS ===== * Pass either app_id (explicit UUID) or app_name_hint (substring; server does listApps and requires exactly one match). * Pass app_service (docker-compose service name), container_name (from compose container_name: field read from the SAME compose_file arg), and app_port (HTTP port the service exposes). * compose_file is optional, defaults to "docker-compose.yml". If the repo has a -keploy.yaml variant with `external: true` networks, do NOT point compose_file at it — it won't work in CI. * suite_ids is optional and should be LEFT BLANK by default — the CLI resolves every linked suite at run time. Only pin an explicit list when the dev narrows scope. ===== FINAL RESPONSE — three short sections, no questions ===== ### Created | File | Lines | | --- | --- | | .github/workflows/keploy-sandbox.yml | N | ### Summary - App: <name> (<app_id>), <N> linked suites replayed on every PR - Trigger: pull_request → main, + manual workflow_dispatch - Failure on any suite gates the PR (non-zero exit from the CLI) ### Before the first run, add this GitHub secret - `KEPLOY_API_KEY` — at https://github.com/<owner>/<repo>/settings/secrets/actions/new (self-hosted users — point at your own api-server by building the enterprise binary with -X main.api_server_uri=<url>; there is no runtime env override on the released binary.) This tool does NOT run anything. It only generates file contents.
    Connector
  • Use only after explicit user confirmation and a prior prepare_publish result to publish or schedule a Dreamlit workflow. Side effect: installs live database/repeating/auth triggers, schedules or sends broadcasts, and may enable notification delivery; sandbox mode can hold notifications for inspection. Returns the published workflow status and app URLs. Do not call speculatively or without carrying forward the prepare_publish safety fields.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Cloudflare Workers MCP server: code-explainer

  • Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.

  • WORKFLOW: Step 3 of 4 - Generate Terraform files from completed design Generate Terraform files from an InsideOut session that has completed infrastructure design. ⚠️ PREREQUISITE: Only call this AFTER convoreply returns with `terraform_ready=true` in the response metadata. DO NOT call this while convoreply is still running or before terraform_ready is confirmed! If you get 'session has not reached terraform-ready state', wait for convoreply to complete first. 🎯 USE THIS TOOL WHEN: convoreply has returned with terraform_ready=true, OR the user asks to 'see the terraforms', 'generate terraform', 'show me the code', etc. **DEFAULT RESPONSE**: Returns summary table + download URL (keeps code out of LLM context). **FALLBACK**: Set `include_code: true` to get full code inline if curl/unzip fails. **CRITICAL WORKFLOW** (default mode): 1. Call this tool to get file summary and download URL 2. ASK the user: 'Where would you like me to save the Terraform files? Default: ./insideout-infra/' 3. WAIT for user confirmation before running the download command 4. Run the curl/unzip command with the user's chosen directory 5. If curl/unzip FAILS (sandbox, security, platform issues), retry with `include_code: true` **AFTER GENERATION**: Ask user if they want to review the files and then deploy with tfdeploy REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: include_code (boolean) - set true to return full code inline as fallback. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.
    Connector
  • Re-deploy skills WITHOUT changing any definitions. ⚠️ HEAVY OPERATION: regenerates MCP servers (Python code) for every skill, pushes each to A-Team Core, restarts connectors, and verifies tool discovery. Takes 30-120s depending on skill count. Use after connector restarts, Core hiccups, or stale state. For incremental changes, prefer ateam_patch (which updates + redeploys in one step).
    Connector
  • PREVIEW: Run terraform plan to preview infrastructure changes Runs a terraform plan for an InsideOut session without applying any changes. This lets the user review what will be created/changed/destroyed before committing. Returns job_id, plan_id, and project_id. Use tflogs to stream the plan output. After the plan completes, use tfdeploy with plan_id to apply the exact plan. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfplan returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: sandbox (boolean, default false) — plans real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL HANDLING: Same as tfdeploy - credentials must be configured first.
    Connector
  • Start the transfer verification flow by sending a code to the registrant's email. Always call this before get_transfer_code or unlock_domain. Then ask the user to check their email and provide the 6-digit code, then call verify_transfer_code to get a transfer_token. Args: order_id: The order ID of a completed domain purchase.
    Connector
  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Record mocks for V1 repo-mode API tests using the V1-native CLI command `keploy sandbox local record`. Runs the dev's app under the keploy eBPF agent, drives the V1 chained-CRUD tests from `keploy/api-tests/<resource>/test.yaml`, captures every outbound call (DB queries, Redis ops, downstream HTTP) as mocks, and lays them out at `<app_dir>/keploy/<suite-name>/{tests/, mocks.yaml, config.yaml}` in the standard OSS test-set tree. On success, mocks upload to the Keploy canonical pool by content hash; the hash lands in config.yaml so a teammate's later replay fetches the same bytes. CRITICAL — DO NOT CONFUSE WITH `keploy record sandbox`: * `keploy sandbox local record` (V1, repo-mode) ← this is what the playbook below uses * `keploy record sandbox` (legacy, cloud-mode) ← DO NOT call this for V1 The two are entirely different commands. Cloud-mode requires server-side suites (queried via --suite-ids) — V1 repo-mode reads tests from the local filesystem and never registers them in the cloud. If the dev is in repo storage mode (verify via devloop_resolve_storage's source=persisted, mode=repo), V1 is the ONLY correct sandbox path. STRICT — TIME-FREEZING DOES NOT APPLY TO RECORD. Recording MUST use the dev's regular (prod) Dockerfile or native binary. NEVER spawn the app via Dockerfile.keploy / "-f docker-compose.keploy.yml" / "-tags=faketime" build during record. The faketime binary writes wrong timestamps into captured mocks (it reads time from the offset file, not the wall clock) and the entire capture becomes corrupt — recovery requires re-recording from scratch with the prod binary. If a previous replay failed with expired-JWT and the dev wants to "fix" it, the fix is to re-RUN the replay with --freezeTime, NOT to re-record. The recorded mocks captured against the prod binary are exactly what replay's clock-rewind is designed to validate; touching the record path defeats the whole mechanism. ONLY call this with an explicit dev opt-in. The valid triggers: * Dev directly asks ("capture mocks", "sandbox record", "rerecord the users mocks"). * Post-resource menu (Step 5 of devloop_generate_resource_flow) — dev picks "Capture mocks so CI runs in seconds". * get_session_report shows mock_mismatch_dominant=true AND the dev says yes to your "rerecord?" prompt. Pre-conditions: * Dev's app must NOT already be running (keploy spawns its own copy of the app under the agent's eBPF hooks via the -c command). If a server is up at the target port, KILL IT first or the agent's network capture won't see the traffic. * Real downstream deps (MySQL, Redis, Kafka, etc.) MUST be running — the capture proxies through to them on first contact so the recorded mocks contain real responses. * The test YAML must exist at <app_dir>/keploy/api-tests/<resource>/test.yaml. Returns a playbook for `keploy sandbox local record` with the V1 flag surface: --test-dir, --app-url, -c (spawn command), --container-name (docker-compose only), --skip-mock-upload (offline), --skip-report-upload (offline). Mocks land per-suite at keploy/<suite-name>/. NDJSON progress at --progress-file for the standard tail-til-done loop.
    Connector
  • Audit a project's dependencies in one shot. Returns a single-sentence `verdict` (e.g. "DO NOT INSTALL — 1 hallucinated: fastapi-turbo") that an agent can paste into its reply, plus per-package health/vulns/recommendation. Detects hallucinated packages, deprecated, typosquats, critical vulnerabilities. Accepts EITHER {ecosystem, packages:[name@ver, …]} (up to 100, returns JSON) OR {packages:[{ecosystem, package}, …]} (up to 50, mixed ecosystems, returns text brief). USE WHEN: user pastes package.json/requirements.txt/Cargo.toml; agent generated install command; 'is my stack OK'. RETURNS: JSON with `verdict`, `project_risk`, `summary.hallucinated_packages`, `summary.deprecated_packages`, per-package health.
    Connector
  • Pro/Teams — second-pass adversarial certification of an architect.validate run that scored production_ready (A or B first-pass tier). ON CLIENT TIMEOUT — DO NOT RETRY THIS TOOL. Long-running LLM call (60-180s typical; exceeds Claude Code's ~60s idle budget); MCP clients commonly close the call before the server returns. Retrying re-runs the LLM call AND burns one of your 3 cert retry-budget attempts. RECOVERY: the run_id is emitted in the FIRST notifications/progress event at t=0s — capture it. On timeout, call `me.validation_history(run_id='<that-id>')` to fetch the persisted cert verdict; the server-side run completes independently. Mints the certified production_ready badge when both reviewers sign off; caps the run to C/emerging when the second pass surfaces a missed production_blocker. MANDATORY DOCTRINE RULE (load-bearing): the badge certifies the EXACT code that produced the validate run_id, NOT 'this codebase' in general. If you modify, fix, or iterate the code between architect.validate and architect.certify — even a single character — cert rejects with code_fingerprint_mismatch. Fixing the code voids the run. The recovery path is always: edit code → architect.validate → fresh run_id → architect.certify on the fresh run. Do NOT cert from a stale run_id after iteration; ask the user to re-validate first. WHEN TO CALL: only after architect.validate returned tier=production_ready AND the user wants the certified badge AND the code has not been touched since the validate run. NOT for tier=draft/emerging/not_applicable runs (typed rejections fire — see below). NOT idempotent across attempts: each call is one of the 3 attempts in the retry budget. BEHAVIOR: atomic one-shot single LLM call, ~60-180s server-side at high reasoning effort (small payloads finish faster; observed p99 ~250s; server-side budget is 20 min, ~5× observed max). Exceeds typical MCP-client tool-call idle budget (~60s in Claude Code), so the FIRST notifications/progress event fires at t=0 carrying the run_id. The run is atomic by contract — no in_progress lifecycle, no cancellation, no resume. Updates the persisted run's result_json (public review URL + me.validation_history(run_id=...) reflect the cert outcome). ELIGIBILITY GATE (typed rejection enum on failure): caller must own the run, tier=production_ready, less than 24h old, not already certified, within cert retry budget (max 3 attempts), no other cert call in flight for the same run_id, code fingerprint must match the validated code, AND the submitted payload must be cert-payload-complete (see Payload Completeness below — cert rejects pre-LLM with `payload_incomplete` when an imported module's surface isn't visible in the validate payload that produced this run_id). Rejection reasons (typed Literal): auth_required, paid_plan_required, run_not_found, not_run_owner, not_eligible_tier, not_agentic_component (tier=not_applicable runs), already_certified, certification_age_exceeded, retry_budget_exhausted, code_fingerprint_mismatch, code_fingerprint_missing, code_not_on_file (caller omitted `code` argument AND the 24h cert-retry hold for this run has expired or was never written. Recovery: re-run architect.certify from the same MCP session that ran architect.validate, passing the code explicitly — the server never persists code by design), payload_incomplete (submitted/validated payload imports modules whose contents aren't visible — cert refuses pre-LLM to prevent a false-precision downgrade. Recovery: re-validate with verbatim public-surface stubs for every imported module, then re-cert on the fresh run_id. Empirically validated: PR #157 iter8/iter9 cert rejections were exactly this class — code on disk was correct, the submitted payload merely omitted module visibility), cert_consensus_score_below_threshold (consensus_median<75 — consensus runs only), cert_consensus_unstable_blocker (any principle mode_stability<80% — consensus runs only), run_state_corrupt, cert_persistence_failed, cert_in_flight (a prior architect.certify call on this run_id is still running. Poll me.validation_history for the verdict; do not retry until it resolves). PAYLOAD COMPLETENESS (load-bearing for cert eligibility): the cert reviewer reads the EXACT payload that produced the validate run_id. Imported modules whose surface isn't present in the payload cause pre-LLM `payload_incomplete` refusal. Avoidance — when validating with intent to cert, bundle public-surface stubs for every imported module: `from sqlalchemy.exc import SQLAlchemyError` → include a stub class; `from app.db import models` → include a `class models:` namespace stub with the columns/methods you reference; module-level imports of `dataclass`, `Literal`, `json`, `datetime`, `timezone` MUST also be in the payload (cert correctly catches when they're omitted — code would NameError on import). 'Submit Like Production': the payload should be the code as it would actually run, not a compressed sketch. PRE-LLM REJECTION AUDIT TRAIL: when cert rejects before the LLM call (payload_incomplete, code_fingerprint_mismatch, etc.), `certification_attempts=[]` on the response — no attempt landed in the retry budget, no LLM hop occurred. The rejection envelope's `rejection_reason` + `guidance` are the actionable surface. (Audit-trail UI surfacing of pre-LLM rejections is tracked in the platform self-audit set as anomaly #5; out of scope for the cert tool itself.) INPUTS: re-send the SAME code that produced the run_id (the architect persists findings + recommendations, never code, by design — privacy-preserving). Server compares the submitted code's SHA-256 fingerprint to the stored fingerprint and rejects mismatches. Auth: Bearer <token>, Pro or Teams plan required. UK/EU data residency (Cloud Run europe-west2). Code processed transiently by OpenAI (no-training-on-API-data) and dropped; payloads JSON-escaped + delimited as inert untrusted data — prompt-injection inside code is ignored. If the cert call fails outright (provider error, persistence error), a fresh architect.certify is the recovery path; the eligibility gate enforces the 3-attempt retry budget. For long-running cert workflows the answer is to re-validate, not to make this tool stateful. OUTCOMES: certification_status ∈ {confirmed_production_ready (badge mints), downgraded_to_emerging (cert review surfaced a missed production_blocker, tier capped at C/emerging), unavailable_provider_error (LLM call failed, retry within budget)}. Cert findings + summary + attempt history surfaced on the persisted run for full inspectability.
    Connector
  • Read-only. Use to find workflows in a project by name, description, or trigger type before inspection or editing. Trigger filters include database, auth email, repeating, broadcast, and no-trigger workflows. Returns paginated workflow summaries, published/sandbox state, trigger type, workflow URLs, totalCount, hasMore, and nextOffset. Do not use as the final source of truth before editing; call get_workflow_and_preview_url for full structure.
    Connector
  • PREVIEW: Run terraform plan to preview infrastructure changes Runs a terraform plan for an InsideOut session without applying any changes. This lets the user review what will be created/changed/destroyed before committing. Returns job_id, plan_id, and project_id. Use tflogs to stream the plan output. After the plan completes, use tfdeploy with plan_id to apply the exact plan. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfplan returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: sandbox (boolean, default false) — plans real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL HANDLING: Same as tfdeploy - credentials must be configured first.
    Connector
  • POST /tools/tool_compute_sandbox/run — Executes Python 3.12 code in an isolated subprocess with a 5-second hard timeout. Input: {python_code: string, input_data: any (optional, bound as variable 'input_data')}. Output: {success, result, stdout (capped 50KB), execution_time_ms, error_type}. Return value: assign to 'result' variable. Pre-loaded: math, json, re, statistics, itertools, functools, collections, decimal, datetime, random, hashlib, base64. Blocked: import, open(), eval(), exec(), os, sys, network, class definitions, dunder attributes. error_type values: syntax_error | security_error | runtime_error | timeout_error. Cost: $0.1500 USDC per call.
    Connector
  • Connect to the user's catalogue using a pairing code. IMPORTANT: Most users connect via OAuth (sign-in popup) — if get_profile already works, the user is connected and you do NOT need this tool. Only use this tool when: (1) get_profile returns an authentication error, AND (2) the user shares a code matching the pattern WORD-1234 (e.g., TULIP-3657). Never proactively ask for a pairing code — try get_profile first. If the user does share a code, call this tool immediately without asking for confirmation. Never say "pairing code" to the user — just say "your code" or refer to it naturally.
    Connector
  • Find working SOURCE CODE examples from 37 indexed Senzing GitHub repositories. REQUIRED: either `query` (string, for search) or `repo` with `file_path` or `list_files=true` — the call WILL FAIL without one. Three modes: (1) Search: pass `query` to find examples across all repos, (2) File listing: pass `repo` + `list_files=true`, (3) File retrieval: pass `repo` + `file_path`. Indexes source code (.py, .java, .cs, .rs) and READMEs — NOT build/data files. For sample data, use get_sample_data. Covers Python, Java, C#, Rust SDK patterns: initialization, ingestion, search, redo, configuration, message queues, REST APIs. Use max_lines to limit large files. Returns GitHub raw URLs for file retrieval.
    Connector