Skip to main content
Glama
136,556 tools. Last updated 2026-05-26 03:32

"Alacritty Terminal Emulator" matching MCP tools:

  • Replay the sandbox test for one or more suites against captured mocks — re-runs the suite's steps against the dev's locally-running app while keploy serves outbound calls (DB, downstream HTTP, etc.) from the captured mocks. Use this when the dev says "replay", "run my sandbox tests", "integration-test", "check if mocks still match" — keywords "sandbox" / "replay" / "mocks" / "integration-test" all map here. Also the REPLAY STEP of FROM-SCRATCH: call this LAST (after create_test_suite + record_sandbox_test) to give the dev the whole-app regression picture against the freshly captured mocks. Output produces a SANDBOX RUN REPORT — it answers "does the suite still hold up against its captured baseline?". ═══════════════════════════════════════════════════════════════════ DISAMBIGUATION — pick this tool vs. replay_test_suite: ═══════════════════════════════════════════════════════════════════ USE replay_sandbox_test (THIS TOOL) when the dev says: * "run my sandbox tests" / "replay my sandbox tests" * "integration-test my app" / "run the integration tests" * "check if my mocks still match" / "replay against the captured mocks" * "rerun my sandbox suite" (with the word "sandbox") Trigger keyword: an explicit "sandbox" / "replay" / "mocks" / "integration-test" — silent signal that the dev wants captured-mock replay, NOT live-app execution. USE replay_test_suite INSTEAD when the dev says: * "run the test suite" / "run my test suites" (bare — no "sandbox") * "execute test suite X" / "run suite 810d3ebe…" * "test the suite again" / "smoke test against the live app" Bare verbs ("run / test / execute") applied to "the suite" without the word "sandbox" mean LIVE-APP execution, NOT captured-mock replay. replay_test_suite hits the dev's running localhost app directly via HTTP — no docker spin-up, no mocks. After a record_sandbox_test run, the natural next step is THIS tool (replay against the just-captured mocks). After create_test_suite / update_test_suite, the natural next step is replay_test_suite (validate against the live app). When the dev's verb is bare and the prior turn doesn't make the intent obvious, ASK rather than picking sandbox-replay silently — code-change regressions can hide under "mock didn't match" failures. ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. SCOPE — whole-app vs single-suite: * Default: LEAVE suite_ids UNSET → the tool resolves "every suite for the app that has a sandbox test (test_set_id populated)" and replays them all. Use this for "run my sandbox tests" / "check if my tests still pass" — whole-app regression. New suites auto-pick up. * Single / subset: PASS suite_ids when the dev names specific suites — "replay sandbox test for suite 810d3ebe-…", "replay only the auth suite", "run suite X and Y". The tool validates each requested id is actually a suite with a sandbox test (has test_set_id); an unlinked id gets a precise "record first" error instead of an opaque downstream CLI failure. This tool resolves the app, picks the suite set per the rule above, and returns a single playbook that drives the replay for them. It does NOT record. WHAT THIS TOOL DOES INTERNALLY (so you don't have to): 1. Resolves app_id — use the explicit app_id if the caller has one; otherwise pass app_name_hint (usually the cwd basename) and the server does listApps with a substring match. Multiple matches → error listing them; zero matches → error suggesting the dev generate a suite first. 2. Lists test suites for the app, keeps only those with a non-empty test_set_id. Zero linked → typed "no linked sandbox tests" error. 3. If suite_ids was passed, validates every requested id is in the linked-suites set; unlinked ids → typed error pointing to record_sandbox_test. 4. Returns the headless playbook — walk it exactly: spawn CLI in background, tail the progress file (PID-alive guard built in), read the terminal event, fetch the report. No separate cleanup step — the CLI exits on its own. ===== PREREQUISITES ===== (Same as record_sandbox_test — if you just recorded, you already have them. Same docker-compose network rule applies: use the same compose file + service, stop the app service before calling, leave deps running.) - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== Same headless playbook shape as record_sandbox_test: spawn `keploy test sandbox --cloud-app-id …` in the background via Bash, poll `tail -n 1 $PROGRESS_FILE` repeatedly (no sleep loops; the wait_for_done step has a built-in `kill -0 $KEPLOY_PID` guard so the loop exits if the CLI dies silently), read the terminal NDJSON event (phase=done, data.ok, data.test_run_id), and — if ok=true — call get_session_report(app_id, test_run_id) with verbose=true at the end. No separate cleanup step needed; the CLI exits cleanly once phase=done is written. ===== MANDATORY OUTPUT — Phase 3 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT merge with Phase 2; do NOT compress the failed-steps table even when failures are homogeneous): ### Phase 3 — Sandbox run report Under it, emit the uniform three-subsection format owned by get_session_report: (i) per-suite table — one row per suite in per_suite, passing suites included, columns = Suite name | passed/total steps. (ii) failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. Never collapse rows. (iii) Diagnosis + Recommendation (see get_session_report description for case-specific rules around mock_mismatch_dominant, repo-diff inspection, and the SKIP / FIX-CODE / FIX-TEST branching for fix-it follow-ups). Do NOT print aggregate step totals across suites — they mix unrelated suites and hide where damage actually is. ===== ROLLUP LINE ===== Close the message with a final one-line rollup paragraph (no heading), in addition to the three phase sections. Mention the TOTAL number of suites replayed (which may exceed the count created in this session, because replay_sandbox_test covers every linked suite the app has). Example: "_Rollup: inserted 4 suites, 4/4 with sandbox tests after record, 3/4 suites passed sandbox replay across the app's 6 linked suites — 1 failure is likely keploy egress-hook, file an issue with the IDs above._" ===== DO NOT ===== * DO NOT call update_test_suite or record_sandbox_test after this. The dev said RUN, not REFRESH. * DO NOT fall back to raw keploy CLI (`keploy test …`) if the MCP tool drops mid-flow — CLI runs test-sets directly and does NOT write results back to the MCP-visible TestSuiteRun. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Poll a video generation task Retrieve the status and result of a video generation task. Call with the `taskId` returned by the submit endpoint and poll periodically (e.g. every 5-10 seconds) until `status` is terminal. On `succeeded`, `output.video_url` holds the generated video; on `failed`, `error` holds the reason. Available to CONTRACT-tier API keys only. The first poll that observes `succeeded` charges the task cost in USD from the account wallet and returns the provider token usage in `meta.tokensUsage`. Polls of an already-terminal task return the cached result and are not charged. The per-call response carries no USD amount — derive cost from `meta.tokensUsage` and the published video pricing, or call `GET /openapi/v2/account/balance`. Path parameter: - `task_id` (string, required): the `taskId` from the submit call. Returns 404 if it does not exist or belongs to another account. Response `data`: - `taskId` (string): the polled task id. - `status` (string): one of `pending`, `running`, `succeeded`, `failed`, `cancelled`. The last three are terminal. - `output` (object | null): on `succeeded`, an object with `video_url` plus any additional provider fields; `null` otherwise. - `error` (object | null): on `failed`, a `{code, message}` object with the provider failure reason; `null` otherwise. Response `meta`: - `tokensUsage` (object | null): provider token usage (`completionTokens`, `totalTokens`, ...) on any terminal poll; `null` while `pending` / `running`. ### Responses: **200**: Successful Response (Success Response) Content-Type: application/json **Example Response:** ```json { "success": true, "meta": { "requestId": "Requestid", "timestamp": "Timestamp" } } ``` **Output Schema:** ```json { "properties": { "success": { "type": "boolean", "title": "Success", "description": "Whether the request was successful", "default": true }, "data": { "description": "Response data payload" }, "error": { "description": "Error details if request failed" }, "meta": { "description": "Metadata for API responses.\n\nCredit fields follow the ADR-0003 parallel-fields strategy (Option 3):\n- `credits_remaining` / `credits_consumed` (int): legacy fields, rounded\n to whole credits, kept for zero-breaking-change to existing SDK clients.\n- `credits_remaining_exact` / `credits_consumed_exact` (float): new\n precision-aware fields for clients that opt in to decimal credits.\n\nSee ADR-0003 decision 5 and the \u00a78 deprecation timeline.\n\nTODO(2026-11, ADR-0003 \u00a78 +6mo): mark `credits_remaining` /\n`credits_consumed` as `deprecated=True` in their Field() definitions\nand announce in customer changelog.\nTODO(2027-05, ADR-0003 \u00a78 +12mo): remove the legacy int fields via a\nmajor-version bump of the OpenAPI surface.", "properties": { "requestId": { "type": "string", "title": "Requestid", "description": "Unique request identifier" }, "timestamp": { "type": "string", "title": "Timestamp", "description": "Response timestamp in ISO 8601 format" }, "total": { "title": "Total", "description": "Total number of records" }, "page": { "title": "Page", "description": "Current page number" }, "pageSize": { "title": "Pagesize", "description": "Number of records per page" }, "totalPages": { "title": "Totalpages", "description": "Total number of pages" }, "creditsRemaining": { "title": "Creditsremaining", "description": "Remaining API credits (rounded to whole credits; see creditsRemainingExact for precise value)" }, "creditsConsumed": { "title": "Creditsconsumed", "description": "Credits consumed by this request (rounded; see creditsConsumedExact for precise value)" }, "creditsRemainingExact": { "title": "Creditsremainingexact", "description": "Remaining API credits, precise to 1 decimal place" }, "creditsConsumedExact": { "title": "Creditsconsumedexact", "description": "Credits consumed by this request, precise to 1 decimal place" }, "tokensUsage": { "description": "Provider token-usage block \u2014 populated on terminal video polls only, null on every non-video endpoint. See TokensUsage for its fields." } }, "type": "object", "required": [ "requestId", "timestamp" ], "title": "ResponseMeta" } }, "type": "object", "required": [ "meta" ], "title": "OpenApiResponse[VideoTaskData]", "examples": [] } ``` **422**: Validation Error Content-Type: application/json **Example Response:** ```json { "detail": [ { "loc": [], "msg": "Message", "type": "Error Type", "ctx": {} } ] } ``` **Output Schema:** ```json { "properties": { "detail": { "items": { "properties": { "loc": { "items": {}, "type": "array", "title": "Location" }, "msg": { "type": "string", "title": "Message" }, "type": { "type": "string", "title": "Error Type" }, "input": { "title": "Input" }, "ctx": { "type": "object", "title": "Context" } }, "type": "object", "required": [ "loc", "msg", "type" ], "title": "ValidationError" }, "type": "array", "title": "Detail" } }, "type": "object", "title": "HTTPValidationError" } ```
    Connector
  • Opens a persistent SSE connection that emits events as the task progresses. The stream closes automatically when the task reaches a terminal state or after ~90 seconds (timeout). Heartbeat comments are sent every ~15 seconds to keep the connection alive through proxies. Event types: - `status` — emitted when status changes (pending → running → complete/failed) - `result` — emitted on `complete` with the full result payload - `error` — emitted on `failed`, `cancelled`, or `expired` with error info - SSE comment (`: heartbeat`) — keepalive, no data Use this tool when: - You want real-time progress without polling. - You are in an environment that supports SSE (EventSource API). Do NOT use this tool when: - You want a simple one-shot status check — use `get_task` instead. - Your HTTP client doesn't support streaming responses. Inputs: - `task_id` (path, required): 26-char ULID. Returns: - SSE stream (`text/event-stream`). Each event is `event: <type>\\ndata: <json>\\n\\n`. Cost: - Free. Counts as one request against rate limits when the stream opens. Latency: - First event: <200ms. Stream duration: up to 90s.
    Connector
  • Creates a new perspective in DRAFT status from a natural-language description and starts the design agent. Returns immediately with a job_id and status "pending"; long-poll perspective_await_job with that job_id to receive the generated outline or follow-up question. Behavior: - Creates a new perspective on every call — not safe to retry blindly. Identical input produces a new perspective each time. - If workspace_id is omitted, the user's default workspace is used; errors with "No default workspace found..." if none exists. - Tip: use workspace_list to see all workspaces with their descriptions, then pick the best-matching workspace_id based on context. - Title is auto-generated from the description. - The design agent runs in the background and may take seconds to a minute. Resolve via perspective_await_job; terminal states are "ready" (outline generated, share/direct/preview URLs returned) or "needs_input" (follow-up question requires the user's answer). - description can reference research goals, source URLs, or audience details. Examples: "understand why trial users aren't converting", "convert the form at https://example.com/contact", "talk to churned customers from Q3". - agent_context selects the agent role: 'research' = Interviewer (default; deep qualitative interviews), 'form' = Concierge (replaces static forms with conversational flow), 'survey' = Evaluator (turns surveys into engaging conversations), 'advocate' = Advocate (listens, then responds from a brand/cause playbook). When to use this tool: - The user wants to create a new perspective from a brief. - You're starting the design conversation that may iterate via perspective_respond. When NOT to use this tool: - The perspective already exists and the user wants to change it — use perspective_update. - The agent already asked a follow-up question — use perspective_respond with the user's answer. - Listing or finding existing perspectives — use perspective_list. Typical flow: 1. perspective_create → start design (returns job_id) 2. perspective_await_job → long-poll until "ready" or "needs_input" 3. perspective_respond → if "needs_input", answer and re-poll 4. perspective_get_preview_link → test 5. perspective_update → refine 6. perspective_get_embed_options → deploy
    Connector
  • POST /tools/sa-airport-oracle/run — Returns live flight status from ACSA (airports.co.za). Input: {airport_code: 'JNB'|'CPT'|'DUR', flight_number: string, request_type: 'arrival'|'departure'}. Output: {success, live_status, scheduled_time, estimated_time, actual_time, gate, carousel, terminal, flight_number, airport_code, request_type, error}. Coverage: JNB (O.R. Tambo), CPT (Cape Town Int'l), DUR (King Shaka). Data window: flights within 48 hours. Call GET /tools/sa-airport-oracle/health (free) first — if structure_valid=false, do not proceed. error_type values: 'stale_data' (do not retry), 'not found' (retry after 10-15 min), network error (retry once). flight_number is case-insensitive and normalised to uppercase internally. Read-only — no booking/ticketing. Cost: $0.1200 USDC per call.
    Connector
  • Poll a checkout session for status updates. Call this after complete_checkout to track payment and provisioning. Polling strategy: - First 60 seconds: every 5 seconds - After 60 seconds: every 15 seconds - Stop after 10 minutes if not completed Checkout statuses (in order): - "not_ready": Missing required fields (slug) - "ready": All fields set, awaiting payment - "awaiting_payment": Stripe checkout page opened, waiting for human - "in_progress": Payment received, site being provisioned - "completed": Site ready — API key included (shown once, then cleared) - "canceled": Checkout was abandoned - "failed": Payment or provisioning failed Terminal statuses: "completed", "canceled", "failed". Args: checkout_id: Checkout session ID Returns (when completed): {"id": "uuid", "status": "completed", "api_key": "bh_...", "api_key_message": "Store this API key securely...", "subscription_id": "uuid", "completed_at": "iso8601"} Note: The api_key field appears ONCE in the first poll after completion, then is permanently cleared. Store it immediately.
    Connector

Matching MCP Servers

  • F
    license
    A
    quality
    C
    maintenance
    Enables management of visible, interactive terminal sessions across platforms (macOS, Windows, Linux, WSL). Supports creating, executing commands, capturing output, and managing multiple terminal windows simultaneously.
    Last updated
    5
    1
  • A
    license
    A
    quality
    C
    maintenance
    Enables AI agents to interact with terminal-based TUI applications by capturing visual terminal output as PNG screenshots and simulating keyboard input through a virtual X11 display.
    Last updated
    4
    2
    Apache 2.0
  • Mark a gathering as cancelled. Works from any non-terminal state (draft, awaiting_responses, live, rescheduled). Records the cancellation reason in the audit log if provided. Already-issued invites stay in the database (audit trail) but the RSVP page will show the gathering as cancelled. Requires API key authentication.
    Connector
  • Open a formal dispute on a task. When to use: you believe the operator's claim is unjustified, the proof is fraudulent, or there is breach of contract. Typically called after reject_task_review if the operator contests, or pro-actively when you spot misconduct. Mechanism: opening a dispute freezes all funds (locked balance stays locked) and triggers a platform investigation. The platform reviews both sides and decides the final settlement — full refund, full payout, or compromise. Funds remain frozen until the dispute is resolved. Typical resolution time: 1-3 days. Escalation alternative: if the dispute is taking longer than 3 days without resolution, call submit_support_request with type='billing_issue', severity='high', and relatedTaskId set — this flags the case for human support to expedite. Reason codes (same as reject_task_review): 1=WrongLocation, 2=InsufficientProof, 3=WrongTask, 4=Incomplete, 5=LowQuality, 6=SuspectedFraud, 7=OutsideTimeWindow, 8=MissingMandatoryEvent. Requires authentication. Next: monitor task.disputed → terminal state via get_task_events.
    Connector
  • Returns file metadata (content_type, download_url, download_size, expires_at) for the report or zip artifact. Use artifact='report' (default) for the interactive HTML report (~700KB, self-contained with embedded JS for collapsible sections and interactive Gantt charts — open in a browser). Use artifact='zip' for the full pipeline output bundle (md, json, csv intermediary files that fed the report). While the task is still pending or processing, returns {ready:false,reason:"processing"}. Check readiness by testing whether download_url is present in the response. Once ready, present download_url to the user or fetch and save the file locally. Download URLs expire after 15 minutes (see expires_at); call plan_file_info again to get a fresh URL if needed. Terminal error codes: generation_failed (plan failed), content_unavailable (artifact missing). Unknown plan_id returns error code PLAN_NOT_FOUND.
    Connector
  • FIRST STEP in any troubleshooting workflow. Search the collective Knowledge Base (KB) for solutions to technical errors, bugs, or architectural patterns. Uses full-text search across titles, content, tags, and categories. Results are ranked by relevance and success rate. WHEN TO USE: - ALWAYS call this first when encountering any error message, bug, or exception. - Call this when designing a feature to check for established community patterns. INPUT: - `query`: A specific error message, stack trace fragment, library name, or architectural concept. - `category`: (Optional) Filter by category (e.g., 'devops', 'terminal', 'supabase'). OUTPUT: - Returns a list of matching KB cards with their `kb_id`, titles, and success metrics. - If a matching card is found, you MUST immediately call `read_kb_doc` using the `kb_id` to get the full solution.
    Connector
  • Get the status of a domain purchase order. Polls the backend every 3 seconds (up to 120 seconds) until the order reaches a terminal state (complete or failed). Args: order_id: The order ID returned from buy_domain (e.g. "ord_abc123").
    Connector
  • Block until an order reaches a terminal state (FILLED, PARTIAL, FAILED, or CANCELLED) by polling get_order at fixed intervals. Use this right after create_order when you need to confirm the energy/bandwidth has actually been delegated on-chain before sending the next transaction. Returns the final order details including the on-chain delegation tx hash. Auth required.
    Connector
  • Record (or refresh) the sandbox test for one or more existing test suites — captures the request/response per step plus the outbound mocks (DB, downstream HTTP, etc.) against the dev's locally-running app, then links the captures onto the suite. Use this when the dev says "record", "rerecord", "re-record", "refresh the recordings", "capture mocks", or as the RECORD step in FROM-SCRATCH (after create_test_suite). This tool resolves the app (if only a hint is given), resolves ONE OR MORE suites to record (by exact ids OR case-insensitive name substring match), and delegates to a headless playbook. Output produces a RERECORD REPORT — it answers "did the sandbox test get created and linked successfully?". ╔═══ PRE-CHECK — DID YOU ARRIVE HERE FROM A FAILED REPLAY? ═══╗ This tool refreshes the CAPTURED BASELINE (mocks + recorded request/response per step). It does NOT modify the suite's authored assert array or response.body — those are the contract as defined when the suite was created/updated. If the contract changed and you re-record without updating the suite first, the new rerecord fires the suite's stale assertions against the live app, gate-1-fails on the same diff, and the suite comes back unlinked. Before calling THIS tool in response to a failed replay_sandbox_test or replay_test_suite, walk these checks: 1. Read failed_steps[].authored_assertions and authored_response_body in the most recent get_session_report (kind=sandbox_run / test_suite_run). The fields are inlined — no second tool call needed unless the report predates the inlined fields. 2. For each failing step: does any authored assertion pin the diverging value? (e.g. assert {path: "$.order.status", expected: "created order"} where the diff says "expected 'created order', got 'created'".) * YES → call update_test_suite FIRST to update that assertion + the response.body field, THEN call this tool. * NO → safe to call this tool directly; the captured baseline drifts but no authored assertion blocks the rerecord. 3. If you can't find authored_assertions in the report (older format) AND don't already know the suite's shape, call getTestSuite({app_id, suite_id, branch_id}) to inspect the assert array before deciding. Don't guess. REFUSE-RULE: if the dev confirms a contract change is intentional and the failing step has a pinned authored assertion on the diverging value, you MUST run update_test_suite before this tool. Calling record_sandbox_test FIRST in that case is the bug this pre-check exists to prevent — don't justify it as "let's just refresh the baseline first". The order is update → record → replay; never record → update. ╚═══════════════════════════════════════════════════════════════╝ ===== BEFORE CALLING — one-time setup ===== (a) APP_ID RESOLUTION (skip if app_id is already known): * Derive a likely app name from the cwd's basename (e.g. cwd=/home/dev/orderflow → "orderflow"). Lowercase it. * Call listApps({q: "<cwd-basename>"}) — the server does a case-insensitive server-side substring match, so you don't paginate the full tenant list (can be hundreds of apps on shared accounts). * Exactly one match → use its id. Multiple → list them and ASK the dev which one (a wrong app_id silently routes traffic + suite creates into the wrong app). Exception: if the compose file / repo layout unambiguously pins one candidate (e.g. compose has service "producer" and one candidate is "<folder>.producer" while others are unrelated siblings), you may pick it AND tell the dev up-front so they can correct. * Zero matches → ASK permission to create a new Keploy app with the derived name; on yes, call createApp({name, endpoint}) and use the returned id. * Alternatively pass app_name_hint to THIS tool and the server resolves it (same rules; multiple/zero → typed error). (b) KEPLOY BINARY VERIFICATION: * Bash: "keploy --version" (or "~/.keploy/bin/keploy --version"). If it exits non-zero the binary is missing. * If missing OR older than this MCP server was built against, install/upgrade: curl --silent -O -L https://keploy.io/ent/install.sh && source install.sh * Re-verify with "keploy --version"; fail loudly if still absent (tell the dev where keploy put the binary so they can add it to PATH). ===== DOCKER-COMPOSE NETWORK RULE (absolute) ===== Use the SAME compose file + service that was used in the validate-curl phase. Do NOT point keploy at a second "keploy-only" compose file — docker-compose isolates each file into its own project + network, so the app container spawned by keploy cannot reach the DB/Kafka containers that validate brought up (and the network-name collision blocks keploy from starting). Correct flow: (i) Validate phase: "docker compose up -d" (brings up app + deps on network <project>_default). (ii) Before calling record_sandbox_test, Bash: "docker compose stop <app_service> && docker compose rm -f <app_service>" — stop ONLY the app service; leave deps running so keploy's new app container can reach them on the existing network. (iii) Pass app_command = "docker compose up <app_service>" (same compose file, same project → same network). container_name = the actual name set by compose (e.g. "orderflow-producer", not "producer"). ===== RESOLUTION RULES (server-side, no guessing) ===== 1. App: caller provides app_id OR app_name_hint. With a hint, the server does listApps({q: hint}). Zero matches → typed error; multiple → typed error listing them so Claude asks the dev. 2. Suites: DEFAULT IS "ALL LINKED". When the dev says "record my sandbox tests" / "rerecord everything" / "refresh my recordings" with no specific suite named, LEAVE BOTH suite_ids AND suite_name_hint UNSET. Do NOT list suites first and pass a comma-joined UUID list back — the CLI resolves "every linked suite for the app" itself, cleaner and less brittle. Only pass a narrower selector when the dev explicitly names suites: - suite_ids (comma-separated, exact) — when you already have the IDs. - suite_name_hint (case-insensitive substring match) — when the dev names suites by human phrasing like "the auth suite" or "deterministic". Every suite whose name contains the substring is recorded. If the dev asks to record suites that don't exist yet (zero match) → typed error. Any ≥1 match is fine. DO NOT prompt the dev for which suites to record — default to all linked if they didn't name any. ===== DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id ===== Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust without a hit, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. The standard pattern when "search the suite by id" returns nothing is NOT "give up and ask the dev which app" — it's "the suite exists on a BRANCH, walk discovery". Suites created via create_test_suite + rerecord on a Keploy branch are INVISIBLE to a main-view listTestSuites; you have to scope each call to a branch. After resolving once in a session, REUSE the {app_id, branch_id} for any subsequent suite-targeted call (replay_sandbox_test, update_test_suite, replay_test_suite); don't re-walk discovery for every action. ===== PREREQUISITES ===== - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== The response includes a "playbook" array; execute its steps in order. The flow is HEADLESS — one background process, NDJSON progress events on a local file, no separate HTTP surface to bind. THERE IS NO SEPARATE CLEANUP STEP — the CLI exits on its own once phase=done is written. 1. Spawn the `keploy record sandbox --cloud-app-id …` process via Bash (run_in_background). Capture its PID into $KEPLOY_PID. 2. Poll progress by repeatedly calling Bash with `tail -n 1 $PROGRESS_FILE`. Each call returns instantly; the MCP round-trip between calls paces the loop. DO NOT wrap in a sleep loop — Claude Code's Bash rejects standalone `sleep N` and chained-sleep patterns. Read .phase off each line; stop when phase=done. The wait_for_done step's built-in `kill -0 $KEPLOY_PID` check is the safety-net for silent early-exit (CLI died before writing the terminal event) — it lets the loop exit instead of spinning forever on a dead process. 3. Read the terminal event (last line of $PROGRESS_FILE). It carries data.ok, data.error (on failure), data.test_run_id (on success). 4. On data.ok=true: call get_session_report(app_id, test_run_id) with verbose=true to surface the rerecord report. On data.ok=false: show data.error to the dev directly (optionally tail the log_file for stderr context) and SKIP get_session_report (there's no run to fetch). Auto-replay + linkTestSetToSuite run INSIDE the CLI process before it writes phase=done — if the terminal event says ok=true, linkage already happened. You do NOT need to wait for a separate post-success window; the CLI doesn't exit until it's fully done. INTERRUPTED FLOWS: if your conversation dies between step 1 and step 2 (Claude crashes, connection drops, dev cancels), the CLI keeps running in the background. It's not orphaned — it'll finish its run and write phase=done. To abort early, the dev can `pkill -f "keploy.*sandbox"` manually; otherwise just let it complete and resume by re-reading the progress file on the next turn. ===== NDJSON SCHEMA — the contract ===== Every line in the progress_file is one JSON object with this envelope: { "ts": "<RFC3339-nano>", "command": "record" | "test", "phase": "<phase-name>", "message": "<optional human-readable>", "data": { ... phase-specific ... } // optional } The phase vocabulary is intentionally extensible — new lifecycle phases get added over time as the CLI grows (started, agent_up, app_starting, suites_running_start, record_done, auto_replay_skipped, upload_done, linking_done, etc.). There are only TWO phases the AI must handle programmatically; everything else is informational and you should NOT switch on phase names you don't recognize: * phase != "done" → keep polling. Optional: surface message/data to the dev as ambient progress ("agent is starting...", "suites uploading..."), but never branch on a specific intermediate phase name. * phase == "done" → terminal event. Stop polling. The data envelope carries: - data.ok bool true on success, false on failure - data.error string (only on ok=false) one-line failure summary - data.test_run_id string (only on ok=true) pass to get_session_report - data.app_id string echo of the app_id passed to the tool - data.artifact_dir string local path to captured/replayed artifacts - data.dashboard_url string UI link to drill into the run If you observe a phase you don't recognize, IGNORE it and keep polling. If "done" itself is renamed by a future CLI version, the wait_for_done step's PID-alive guard is your safety net (the poll loop exits when the CLI dies); surface log_file contents to the dev. ===== "ALL SUITES FAILED CAPTURE" — special signal ===== If you see a `phase: "auto_replay_skipped"` event with `message: "all suites failed during rerecord; skipping replay + linking"` ahead of the terminal `done` event, every suite failed at the CAPTURE phase (before auto-replay even ran). The CLI fails closed in this case — auto-replay and suite linking are SKIPPED, so every per_suite entry comes back linked=false. Watch for this trap: the terminal `data.ok=true` because the CLI itself completed cleanly (it didn't crash; it just had nothing to record successfully). DO NOT read data.ok=true as "rerecord succeeded" — read `<linked>/<total>`. If linked == 0, this is a HARD failure that needs diagnosis, not a partial-linkage case. ALWAYS surface the dashboard URL on this case. The terminal `done` event still carries `data.dashboard_url` and `data.test_run_id` (atg's TestSuiteRun was created during the capture phase); emit them verbatim so the dev can drill into per-step failures in the UI: "0/N suites have a sandbox test — every suite failed during the capture phase, so auto-replay and linking were skipped. Dashboard: <data.dashboard_url> (test_run_id=<id>)" EDGE CASE: if `data.test_run_id` is empty, atg never inserted a TestSuiteRun (typically a pre-flight validation failure — branch-id rejection, app unreachable, etc.). The dashboard URL won't resolve. Skip the URL, surface the log_file contents instead so the dev can read the early-stage failure. Recovery is the same as WHEN linked=false below — read failed_steps for each suite and pick route B (fix code) / C (update suite + record again) / SKIP. Don't infra-retry; capture-phase failures across every suite usually mean the app is broken, the suite shapes are stale, or the dev's local app isn't reachable. ===== LINKAGE VERIFICATION ===== After get_session_report returns, for EVERY suite that went into this record, call getTestSuite({suite_id}) and check whether the suite has a sandbox test (linked=true / non-empty test_set_id). A suite without a sandbox test cannot be replayed — replay_sandbox_test will 400 on it with "no sandboxed tests" until a successful record produces one. ===== WHEN linked=false — recovery rules ===== A suite with linked=false after record_sandbox_test means the record process couldn't produce a sandbox test for that suite. The SUITE ITSELF still exists; it just has no sandbox test. Diagnose WHY by reading the rerecord report's failed_steps for that suite: * No failed_steps OR pure infra error (link-commit / upload failed, no step diverged) → call record_sandbox_test AGAIN scoped to just the unlinked suite_ids. The tool is idempotent on the suite; safe to re-run. * failed_steps with assertion diffs (response shape, body fields, status code shifted from what the suite expected) → the suite is stale relative to current app behavior. The CONTRACT changed: - Change is INTENTIONAL (new field, renamed key, different status code is the new normal) → call update_test_suite to update the affected step's response / assertions to match the new contract, THEN call record_sandbox_test on the updated suite. - Change is UNINTENTIONAL (app regressed) → fix the app code first, then call record_sandbox_test. No suite update needed; the original test was correct. * failed_steps with 500s / handler crashes / connection refused → the app is broken at the wire level. Fix the app, then call record_sandbox_test. Don't update_test_suite to absorb a real failure. NEVER: * Don't call create_test_suite to "redo" the suite — it already exists; re-creating authors a duplicate (see BEFORE CREATING in create_test_suite). * Don't blindly loop record_sandbox_test without diagnosing failed_steps first; if the cause is suite-vs-app mismatch, retries won't help. ===== MANDATORY OUTPUT — Phase 2 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT collapse into a single pass/fail table with the rerecord report; do NOT merge with Phase 1 or Phase 3): ### Phase 2 — Sandbox-test linkage **<linked>/<total> suites have a sandbox test** _Suites with a sandbox test_ | Suite name | suite_id | test_set_id | Capture pass/total | | --- | --- | --- | --- | | <name> | <suite_id> | <test_set_id> | <p>/<t> | (emit even if zero — one row per linked suite, or "_(none)_" in place of rows) _Suites without a sandbox test_ (omit ONLY if every suite linked) | Suite name | suite_id | Likely cause | | --- | --- | --- | | <name> | <suite_id> | gate1 / gate2 / infra | Likely-cause decoding: assertion diffs → gate 1 upstream-replay failure; upstream-passing + mock-replay-diff → gate 2 mock-determinism mismatch; zero failures + still unlinked → infra link-commit issue. Then proceed to replay_sandbox_test ONLY for the suites that DID link; the unlinked ones will 400 on replay. ===== DO NOT ===== * DO NOT fall back to raw keploy CLI (`keploy rerecord -t …`) if the MCP tool drops mid-flow — the CLI subcommand runs test-sets directly and does NOT update the suite's test_set_id. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Scaffold the GitHub Actions workflow that runs the V1 API tests on every PR. Returns the exact YAML content to write to .github/workflows/keploy.yml + the Bash command to set the KEPLOY_API_KEY secret. The AI walks the playbook with its Write tool + the `gh` CLI. PRECONDITIONS — CHECK BEFORE CALLING. Calling this tool out of order is a DEVLOOP violation; the doc-stated user-flow ordering is generate → run → mutation-prove (opt-in) → expand (opt-in) → CI (opt-in). Specifically you must have: 1. Generated at least one test via devloop_generate_resource_flow AND watched it pass via "keploy test-gen run --ci". 2. SURFACED the mutation-prove opt-in to the dev verbatim: "Want me to prove the test catches bugs by applying 3 small mutations to your handler and reverting?" — and the dev answered (yes-walked through devloop_mutation_demo, or explicit no/skip/later). Doing the test runs is NOT the same as offering mutation-prove; the offer is a separate dev-facing question. 3. ASKED the dev "want me to wire this into CI?" — explicit yes from the dev. If ANY of those three are missing, STOP and back up. The mutation-prove gate is what builds the dev's trust before they commit Keploy to CI; skipping it ships shallow tests into a workflow the dev hasn't validated. What this tool does NOT do (intentionally — the dev keeps custody): * Mint the CI API key server-side. The dev provisions it themselves in the Keploy dashboard (Step 2 of the returned playbook walks them through it). The AI never sees the kep_* value — it transits dashboard clipboard → terminal stdin → gh CLI's encrypted POST. This is a security property, not a limitation. * Post structured PR comments from api-server. V1 relies on GitHub Actions' native status-check rendering; the structured comment renderer is a V1.5 lift. The emitted workflow runs on pull_request (default base branch) and reads app_id / test-dir / context-dir from keploy/api-tests/keploy-test-gen.yaml — the dev never has to thread flags through the workflow. TIME-FREEZING — DEFAULT ON, ALMOST ALWAYS NEEDED FOR BACKEND APPS. Almost every backend app has authentication (login → JWT/session/OAuth). The dev's recorded tests carry those tokens in headers. Between record time and the first PR's CI run, the tokens' exp claims pass real wall-clock — CI then 401s on every authenticated step, and the dev blames Keploy. Keploy's time-freezing rewinds the app's clock to the record moment so the recorded tokens validate. Default policy: time_freezing=true. The AI MUST inspect the dev's test suites BEFORE calling this tool: - <app_dir>/keploy/api-tests/<resource>/test.yaml (V1 sources) - <app_dir>/keploy/<SuiteName>/tests/*.yaml (captured sandbox tests) Look for: Authorization Bearer headers; steps hitting /login /auth /signin /token /oauth; response bodies containing jwt / token / access_token / refresh_token / expires_in / iat / exp. If any of those signals appear (or you're unsure), keep time_freezing=true. Only pass time_freezing=false when you've audited every suite and confirmed zero time-sensitive tokens (rare for a real backend). When time_freezing=true, this tool also requires app_language (go / node / python / java / ruby / other) and app_service (docker-compose service name). Output then includes: - Modified workflow YAML (pre-populates keploy-sockets-vol; uses -f docker-compose.yml -f docker-compose.keploy.yml; passes --freezeTime) - docker-compose.keploy.yml override (volume mount + LD_PRELOAD for non-Go, or Dockerfile.keploy build for Go) - Dockerfile.keploy (Go ONLY — vDSO bypasses LD_PRELOAD, requires -tags=faketime rebuild) The dev's plain "docker compose up" is unaffected. Time-freezing only activates when CI (or the dev locally) explicitly passes both compose files. TIME-FREEZING IS REPLAY-ONLY — STRICT INVARIANT. The Dockerfile.keploy / docker-compose.keploy.yml / --freezeTime flag this tool emits exist purely to make recorded JWTs validate at REPLAY time. They MUST NEVER apply when recording. Concretely: - Record uses the dev's PROD Dockerfile + plain "docker compose up" (no override file). - Replay uses Dockerfile.keploy + "docker compose -f docker-compose.yml -f docker-compose.keploy.yml up" + the --freezeTime flag on the CLI. If a recording is captured against a faketime-built binary, every timestamp in the captured mocks is wrong and the whole capture is corrupt — there is no recovery short of re-recording from scratch with the prod binary. The CI YAML this tool emits in ci_mode=sandbox-replay is a REPLAY workflow; it boots via the compose override on purpose. The dev's separate record flow (devloop_record_sandbox) must NOT touch the override. TIME-FREEZING IS FORCED ON FOR ci_mode=sandbox-replay — NON-NEGOTIABLE. Any explicit time_freezing=false passed alongside ci_mode=sandbox-replay is silently overridden back to true. Rationale: sandbox replay processes the recorded request stream verbatim — any time-sensitive token in any captured request (JWT exp, OAuth iat, session cookie) goes stale the moment wall-clock passes the recorded moment, and silently fails replay. Whether the dev's suite happens to carry such a token is not auditable at scaffold time, and the failure is silent (401 on the first auth-gated step in CI). The cost of force-ON for a hypothetical zero-token app is one dormant volume mount + a no-op CLI flag; the cost of force-OFF for a token-bearing app is every PR failing. Asymmetric — force-ON wins. For ci_mode=api-tests, the workflow runs against live deps with current wall-clock so recorded tokens never enter the picture; time_freezing defaults to false and is overridable by the AI if they want the artifacts pre-staged for a later sandbox switch.
    Connector
  • Forward discounted-cash-flow valuation: caller provides growth + WACC + terminal assumptions, returns per-share intrinsic value + 5×5 sensitivity grid. Pulls FCF base + net debt + shares from R2; caller can override any field. Does NOT persist a report — use `create_report` for that. Tier: sp500+.
    Connector
  • Enqueue an async fork of a public Universe project into your workspace. Provide either a Universe `url` or `source_project`. Returns the platform's 202 payload verbatim — at minimum `{ taskId, url }`, where `url` points at the async-task polling endpoint. Poll completion with `async_tasks_get(task_id=taskId)` every 5 seconds until `status` is terminal (`completed` or `failed`). Processing may take up to 30 seconds to start.
    Connector
  • WRITE to the Knowledge Base. This tool has TWO modes: **MODE 1 — SAVE a new card**: Provide `content` with full Markdown following the ACTIONABLE schema below. **MODE 2 — REPORT OUTCOME**: Provide `kb_id` + `outcome` ('success' or 'failure'). WHEN TO USE: - Mode 1: After successfully fixing a bug IF no existing KB card covered it. - Mode 2: ALWAYS after applying a solution from `read_kb_doc` and running verification. INPUT: - `content`: (Mode 1) Full Markdown KB card content — follow the EXACT template below. - `overwrite`: (Mode 1) Set to True to update an existing card. - `kb_id`: (Mode 2) ID of the card to report outcome for. - `outcome`: (Mode 2) 'success' or 'failure'. - `enrichment`: (Mode 2, optional) Additional context to merge into the card when outcome is 'failure'. ━━━ CARD TEMPLATE (Mode 1) — copy this structure EXACTLY ━━━ ``` --- kb_id: "[PLATFORM]_[CATEGORY]_[NUMBER]" # e.g. WIN_TERM_001, CROSS_DOCKER_002 title: "[Short Title — max 5 words]" category: "[terminal|devops|supabase|fastmcp|network|database|...]" platform: "[windows|linux|macos|cross-platform]" technologies: [tech1, tech2] complexity: [1-10] criticality: "[low|medium|high|critical]" created: "[YYYY-MM-DD]" tags: [tag1, tag2, tag3] related_kb: [] --- # [Short Title — max 5 words] > **TL;DR**: [One sentence — what's the problem + solution] > **Fix Time**: ~[X min] | **Platform**: [Windows/Linux/macOS/All] --- ## 🔍 This Is Your Problem If: - [ ] [Symptom 1 — specific symptom or error message] - [ ] [Symptom 2 — specific error code or log line] - [ ] [Symptom 3 — environment/version condition] **Where to Check**: [console / logs / env / task manager / etc.] --- ## ✅ SOLUTION (copy-paste) ### 🎯 Integration Pattern: [Global Scope] / [Inside Init] / [Event Handler] ```[language] # [One-line comment — what this code does] [depersonalized code WITHOUT specific paths, use __VAR__ for things to replace] ``` ### ⚡ Critical (won't work without this): - ✓ **[Critical Point 1]** — [why it's essential] - ✓ **[Critical Point 2]** — [common mistake to avoid] ### 📌 Versions: - **Works**: [OS/library versions where confirmed working] - **Doesn't Work**: [OS/library versions where known broken] --- ## ✔️ Verification (<30 sec) ```bash [single command to verify the fix worked] ``` **Expected**: ✓ [Specific output or behavior that confirms success] **If it didn't work** → see Fallback below ⤵ --- ## 🔄 Fallback (if main solution failed) ### Option 1: [approach name] ```bash [command] ``` **When**: [condition to use this option] | **Risks**: [what might break] ### Option 2: [alternative approach] ```bash [command] ``` **When**: [condition] | **Risks**: [what might break] --- ## 💡 Context (optional) **Root Cause**: [1 sentence — why this problem occurs] **Side Effects**: [what might change after applying the fix] **Best Practice**: [how to avoid this in future — 1 point] **Anti-Pattern**: ✗ [what NOT to do — common mistake] --- **Applicable**: [OS, library versions, conditions] **Frequency**: [rare / common / very common] ``` ━━━ END OF TEMPLATE ━━━ RULES for ACTIONABLE cards: 1. Solution FIRST — after diagnosis, code immediately 2. Depersonalize — no names, project names, or absolute paths 3. Use `__VAR__` markers for anything the user must replace 4. One Verification command, result visible in <30 sec 5. Fallback — 1-2 options max, always include When/Risks 6. Context at End — WHY is optional reading for curious agents
    Connector
  • Marks the task as `cancelled`. If the task is already in a terminal state (`complete`, `failed`, `expired`), returns 409 Conflict. Only the identity that created the task may cancel it. Use this tool when: - You submitted a probe with `?async=true` and no longer need the result. - You want to free up a pending task before it expires. Do NOT use this tool when: - The task is already complete — cancellation is not possible. Inputs: - `task_id` (path, required): 26-char ULID. Returns: - `task_id` and `status: cancelled`. Cost: - Free. Latency: - Typical: <150ms.
    Connector
  • Sends the user's answer to a follow-up question raised by the design agent during perspective creation, then re-runs the design step. Returns a new pending job_id; long-poll perspective_await_job for the next terminal state. Behavior: - Appends the user's reply to the design conversation and kicks off another design pass. Each call starts another pass. - ONLY valid while the perspective is in DRAFT status. Errors with "This perspective already has an outline. Use the update tool to make changes." otherwise. - Errors when the perspective is not found or you do not have access. - Returns "pending" immediately. perspective_await_job resolves to "ready" (outline generated) or "needs_input" (another follow-up — call this tool again). When to use this tool: - perspective_await_job returned status "needs_input" with a follow_up_question and you have the user's reply. - Continuing the design dialogue before any outline is generated. When NOT to use this tool: - The perspective already has an outline — use perspective_update for revisions. - Starting a new perspective — use perspective_create. - Polling a previously-enqueued job — use perspective_await_job.
    Connector
  • Resolve a flight number to its airports, terminals, scheduled times, and status. Use BEFORE book_ride whenever the customer provides a flight number so that pickup time and terminal are correct. Input is tolerant: accepts 'AF007', 'AF 007', 'AF-007', 'af7', 'AFR007' — the tool normalizes internally. Returns an array of matching operations (usually 1 when direction is set).
    Connector