Skip to main content
Glama
164,693 tools. Last updated 2026-05-31 09:32

"Jetpack Compose" matching MCP tools:

  • Replay the sandbox test for one or more suites against captured mocks — re-runs the suite's steps against the dev's locally-running app while keploy serves outbound calls (DB, downstream HTTP, etc.) from the captured mocks. Use this when the dev says "replay", "run my sandbox tests", "integration-test", "check if mocks still match" — keywords "sandbox" / "replay" / "mocks" / "integration-test" all map here. Also the REPLAY STEP of FROM-SCRATCH: call this LAST (after create_test_suite + record_sandbox_test) to give the dev the whole-app regression picture against the freshly captured mocks. Output produces a SANDBOX RUN REPORT — it answers "does the suite still hold up against its captured baseline?". ═══════════════════════════════════════════════════════════════════ DISAMBIGUATION — pick this tool vs. replay_test_suite: ═══════════════════════════════════════════════════════════════════ USE replay_sandbox_test (THIS TOOL) when the dev says: * "run my sandbox tests" / "replay my sandbox tests" * "integration-test my app" / "run the integration tests" * "check if my mocks still match" / "replay against the captured mocks" * "rerun my sandbox suite" (with the word "sandbox") Trigger keyword: an explicit "sandbox" / "replay" / "mocks" / "integration-test" — silent signal that the dev wants captured-mock replay, NOT live-app execution. USE replay_test_suite INSTEAD when the dev says: * "run the test suite" / "run my test suites" (bare — no "sandbox") * "execute test suite X" / "run suite 810d3ebe…" * "test the suite again" / "smoke test against the live app" Bare verbs ("run / test / execute") applied to "the suite" without the word "sandbox" mean LIVE-APP execution, NOT captured-mock replay. replay_test_suite hits the dev's running localhost app directly via HTTP — no docker spin-up, no mocks. After a record_sandbox_test run, the natural next step is THIS tool (replay against the just-captured mocks). After create_test_suite / update_test_suite, the natural next step is replay_test_suite (validate against the live app). When the dev's verb is bare and the prior turn doesn't make the intent obvious, ASK rather than picking sandbox-replay silently — code-change regressions can hide under "mock didn't match" failures. ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. SCOPE — whole-app vs single-suite: * Default: LEAVE suite_ids UNSET → the tool resolves "every suite for the app that has a sandbox test (test_set_id populated)" and replays them all. Use this for "run my sandbox tests" / "check if my tests still pass" — whole-app regression. New suites auto-pick up. * Single / subset: PASS suite_ids when the dev names specific suites — "replay sandbox test for suite 810d3ebe-…", "replay only the auth suite", "run suite X and Y". The tool validates each requested id is actually a suite with a sandbox test (has test_set_id); an unlinked id gets a precise "record first" error instead of an opaque downstream CLI failure. This tool resolves the app, picks the suite set per the rule above, and returns a single playbook that drives the replay for them. It does NOT record. WHAT THIS TOOL DOES INTERNALLY (so you don't have to): 1. Resolves app_id — use the explicit app_id if the caller has one; otherwise pass app_name_hint (usually the cwd basename) and the server does listApps with a substring match. Multiple matches → error listing them; zero matches → error suggesting the dev generate a suite first. 2. Lists test suites for the app, keeps only those with a non-empty test_set_id. Zero linked → typed "no linked sandbox tests" error. 3. If suite_ids was passed, validates every requested id is in the linked-suites set; unlinked ids → typed error pointing to record_sandbox_test. 4. Returns the headless playbook — walk it exactly: spawn CLI in background, tail the progress file (PID-alive guard built in), read the terminal event, fetch the report. No separate cleanup step — the CLI exits on its own. ===== PREREQUISITES ===== (Same as record_sandbox_test — if you just recorded, you already have them. Same docker-compose network rule applies: use the same compose file + service, stop the app service before calling, leave deps running.) - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== Same headless playbook shape as record_sandbox_test: spawn `keploy test sandbox --cloud-app-id …` in the background via Bash, poll `tail -n 1 $PROGRESS_FILE` repeatedly (no sleep loops; the wait_for_done step has a built-in `kill -0 $KEPLOY_PID` guard so the loop exits if the CLI dies silently), read the terminal NDJSON event (phase=done, data.ok, data.test_run_id), and — if ok=true — call get_session_report(app_id, test_run_id) with verbose=true at the end. No separate cleanup step needed; the CLI exits cleanly once phase=done is written. ===== MANDATORY OUTPUT — Phase 3 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT merge with Phase 2; do NOT compress the failed-steps table even when failures are homogeneous): ### Phase 3 — Sandbox run report Under it, emit the uniform three-subsection format owned by get_session_report: (i) per-suite table — one row per suite in per_suite, passing suites included, columns = Suite name | passed/total steps. (ii) failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. Never collapse rows. (iii) Diagnosis + Recommendation (see get_session_report description for case-specific rules around mock_mismatch_dominant, repo-diff inspection, and the SKIP / FIX-CODE / FIX-TEST branching for fix-it follow-ups). Do NOT print aggregate step totals across suites — they mix unrelated suites and hide where damage actually is. ===== ROLLUP LINE ===== Close the message with a final one-line rollup paragraph (no heading), in addition to the three phase sections. Mention the TOTAL number of suites replayed (which may exceed the count created in this session, because replay_sandbox_test covers every linked suite the app has). Example: "_Rollup: inserted 4 suites, 4/4 with sandbox tests after record, 3/4 suites passed sandbox replay across the app's 6 linked suites — 1 failure is likely keploy egress-hook, file an issue with the IDs above._" ===== DO NOT ===== * DO NOT call update_test_suite or record_sandbox_test after this. The dev said RUN, not REFRESH. * DO NOT fall back to raw keploy CLI (`keploy test …`) if the MCP tool drops mid-flow — CLI runs test-sets directly and does NOT write results back to the MCP-visible TestSuiteRun. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Search the corpus for Eurorack modules matching a combination of filters. Filters compose with AND. Omit any filter to leave that dimension unrestricted. The result is sorted by module name; pagination metadata in the response envelope lets you page through long result sets. Args: - capability (string): capability id, e.g. 'envelope-generator', 'clock-source'. Use list_capabilities to see the full taxonomy. Retired/variant slugs resolve via the capability_aliases layer (e.g. 'low-pass-gate' → 'lowpass-gate', 'quantiser' → 'quantizer'), so either form is accepted. - manufacturer (string): manufacturer id, e.g. 'make-noise', 'mutable-instruments'. - hp_min, hp_max (number): module width in HP. hp_max=10 finds modules ≤ 10 HP. - signal_type_in (string): the module accepts a jack of this signal type as input. One of audio, cv, gate, trigger, clock, mixed. signal_type_in='audio' and ='cv' both also match jacks tagged 'mixed' (the schema's value for jacks the source describes as accepting both audio and CV — e.g. Joranalogue Compare 2's signal inputs); the other values match literally. - signal_type_out (string): the module produces a jack of this signal type as output. Same 'mixed'-superset semantics as signal_type_in. - text (string): free-text match against module id, name, slug, description, and the ids/labels/descriptions of capabilities the module has (case-insensitive substring). Matches hyphenated forms like "filter-8" against the slug/id even when the display name uses a space ("Filter 8"), and is whitespace-insensitive on id/slug/name so "3x MIA" finds the module named "3xMIA". Capability-label coverage means text="multiband" finds modules tagged multiband-filter without knowing the kebab-case id, and a curated alias layer extends that to common word-form variants ("multi-output" / "multi-band" / "band-split" → multiband-filter, "low-pass" → lowpass-filter, retired ids like "voltage-controlled-filter" → vcf). Truly novel wording still requires list_capabilities; if you expected a hit and got 0, call report_gap so the alias can be added. - voct_tracking_range_min (number): the module has a V/Oct input whose source-stated tracking range is at least this many octaves. Use for "filters that track 5+ octaves" / "oscillators with wide V/Oct range". - voct_tracking_quality (string): the module has a V/Oct input with this tracking quality, one of 'calibrated', 'temperature-compensated', 'approximate', 'uncalibrated'. 'temperature-compensated' is the strongest claim. - voct_temperature_compensated (boolean): the module has a V/Oct input whose source explicitly states temperature compensation. Implies calibrated but separately flagged because some manuals call out only one. - audio_outputs_min (number): the module has at least this many output jacks with signal_type='audio'. Use for "multi-output filters" (≥3 audio outs surfaces LP/BP/HP-tap VCFs like Three Sisters, QPAS, A-108, Polaris) or any multi-tap audio module. Combine with capability='vcf' for the canonical multi-output-filter query. - limit (number): default 50, max 200. - offset (number): pagination offset. Returns: { "results": [{ id, name, manufacturer, hp, capabilities: [string], description, production_status }], "total": number, // total matches (across all pages) "_meta": { "query": <args>, // Present when the server's token-AND fallback rescued an otherwise-empty // phrase query (e.g. "pamela workout" → "Pamela's NEW Workout" via per-word // identifier match). Not an error; just signals that results came from the // relaxed pass rather than the literal phrase. "relaxed_to_tokens": true, // On total=0 (after the token-AND fallback has already been attempted), the // server adds these diagnostic hints so you can retry productively in one // turn instead of guessing variants. Each is independently optional: "would_match_without": ["capability", "text"], // filters that, if individually dropped, would yield ≥1 result — the named filter(s) cost you the match "closest_text_hits": [{ id, name, manufacturer }], // top 3 modules matching 'text' alone (other filters dropped); inspect for a close hit you filtered out by accident "capability_suggestions": [{ id, label }], // top 3 valid capabilities matching the 'capability' arg you passed (only set when the arg wasn't a known slug or alias) — use list_capabilities for the full taxonomy "feedback_hint": "..." // fallback prompt to call report_gap when no other diagnostic applies } } Examples: - "What envelope generators under 8 HP exist?" → {capability: 'envelope-generator', hp_max: 8} - "What ALM modules are in the corpus?" → {manufacturer: 'alm-busy-circuits'} - "What clock sources are there?" → {signal_type_out: 'clock'} - "Modules with 'workout' in the name" → {text: 'workout'} - "Filters that track V/Oct over 5 octaves" → {capability: 'vcf', voct_tracking_range_min: 5} - "Temperature-compensated filter cores" → {voct_tracking_quality: 'temperature-compensated'} - "Multi-output filters with LP/BP/HP taps" → {capability: 'vcf', audio_outputs_min: 3} Errors: - Returns an empty results array (and total=0) if nothing matches. Not an error — inspect _meta.would_match_without / closest_text_hits / capability_suggestions to decide whether to broaden the query or call report_gap. - Invalid filter values pass through to the WHERE clause; if no module satisfies them you get total=0. After picking a hit, call get_module with the id for full details.
    Connector
  • Cut and assemble a clip from any prior video job (find_clips, summarize, or video transcribe). Operates on a parent job — possessing the parent `source_job_id` is the capability, no upload step. Pass one segment for a simple cut, or multiple non-contiguous segments to compose a single mp4 highlight reel — same flat $0.50 either way. Two-call flow: (1) call with `source_job_id` + `segments` (ordered array of `{start, end, label?}` in source seconds, total duration capped at 30 minutes) to receive {job_id, payment_challenge}; (2) pay via MPP and call with `job_id` + `payment_credential` to start processing. No upload step. Poll get_job_status(job_id) for completion; outputs are role `clip-video` (the assembled .mp4, frame-accurate boundaries with 15ms audio fades at segment joins) and — when `include_transcript: true` (default) — roles `clip-srt` + `clip-words` (transcripts stitched and time-shifted to match the assembled video). Set `include_transcript: false` to skip transcript outputs. Payment: MPP — accepts Tempo USDC and Stripe SPT. The challenge's WWW-Authenticate header and /.well-known/mpp.json are authoritative for which methods are offered. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check `expires_at` from get_job_status on the parent). Multiple extract_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
    Connector
  • Generate the exact CI workflow YAML to add keploy sandbox tests to a pull-request pipeline, and tell you where to write it. Use this when the dev asks to "add keploy sandbox tests to my pipeline" / "wire keploy into CI" / "run keploy on PR" / "add a CI job for keploy" — the server emits the file contents verbatim so you don't have to compose the flag list yourself. ===== GOAL ===== Write a CI workflow file that runs `keploy test sandbox --cloud-app-id <uuid> --app-url <url>` on pull requests and gates the PR on the result. NEVER kick off an actual test run in this flow — it is pure file authoring, ends with the file on disk. DO NOT fire replay_sandbox_test, record_sandbox_test, replay_test_suite, or any other run-starting MCP tool here. ===== HOW (absolute) ===== Call this tool. It returns { file_path, content, summary }. Write the "content" to "file_path" VERBATIM via your Write tool — NO flag renames, NO flag removals, NO step reordering, NO synthesis. The server owns the YAML template; your job is only to (1) resolve the inputs from the repo and api-server and (2) Write the returned content. Do NOT compose the YAML yourself from general knowledge — flag drift (missing --cloud-app-id, inventing --app) is the most common bug when Claude improvises. DO NOT ASK the dev for confirmation before writing. Resolve everything from the repo + api-server, pick the GitHub Actions default, call this tool, Write the file. The dev's prompt is already the go-ahead. ===== STEPS ===== 1. DETECT THE CI SYSTEM: * Default = GitHub Actions (biggest share). File = .github/workflows/keploy-sandbox.yml. * If .gitlab-ci.yml exists → GitLab (not yet supported by this tool; tell the dev and stop). * If .circleci/config.yml exists → Circle (not yet supported; tell the dev and stop). * Otherwise → GitHub Actions. 2. RESOLVE VALUES by calling MCP tools + reading the repo: * app_id: call listApps({q: "<cwd basename>"}). Exactly one → use its id. Multiple → pick the one whose name most specifically matches the repo's primary service (e.g. "orderflow.producer" wins over "orderflow" when there's a ./producer directory); mention which you picked in the final message. Zero → stop and tell the dev to create the app + rerecord first. * suite_ids: DO NOT pass this arg by default. An empty suite_ids means the CLI resolves "every linked sandbox suite for the app" at CI run time — which is what you want (new suites auto-pick up without workflow edits). The tool still verifies there's ≥1 linked suite at scaffold time so the first PR run doesn't fail empty-handed. Only pass suite_ids when the dev explicitly narrows ("run only the auth suite in CI"); don't pin "all current suites" — that's staleness waiting to happen. * compose_file: READ THE REPO. Default is docker-compose.yml. AVOID passing a docker-compose-keploy.yaml variant that has `networks: default: external: true` — those variants only work locally, where another compose run has already created the external network. In CI the runner starts clean and `external: true` fails with "network not found". If the primary docker-compose.yml brings up the full app (deps + app service), use it end-to-end. * app_service, container_name, app_port: read from the SAME compose_file you picked above. app_service = the service key (e.g. "producer"); container_name = that service's container_name: field in that same compose file (e.g. "orderflow-producer" if compose_file=docker-compose.yml, but "producer" if compose_file=docker-compose-keploy.yaml — THESE DIFFER, pick consistently); app_port = the host-side of its ports: mapping. * app_url = http://localhost:<app_port>. The tool derives this; you don't pass it separately. 3. CALL THIS TOOL with app_id, app_service, container_name, app_port, compose_file (and suite_ids only if the dev explicitly narrowed scope). It returns { file_path, content, summary }. Write the "content" to the "file_path" VERBATIM. ===== FLAG NAME RULES (absolute, do not drift when reviewing the output) ===== * `--cloud-app-id` ← NOT `--app-id`. The OSS config has an `appId` uint64 field that viper maps `--app-id` into; passing a UUID there fails with "invalid syntax" before RunE runs. * `keploy test sandbox --cloud-app-id <uuid> --app-url <url>` ← the CI form. NOT `keploy test --cloud-app-id` (must be `test sandbox` — the headless flags live on the sandbox subcommand only), NOT `keploy test-suite run` (that command doesn't exist). There is NO `--pipeline` flag. * Install URL = `https://keploy.io/ent/install.sh` ← NOT `https://keploy.io/install.sh` (OSS; no sandbox subcommand at all), NOT a github.com/keploy/keploy release tarball. If the server-emitted content ever disagrees with these rules, trust the server output and file a bug — don't edit the YAML. ===== RESOLUTION ARGS ===== * Pass either app_id (explicit UUID) or app_name_hint (substring; server does listApps and requires exactly one match). * Pass app_service (docker-compose service name), container_name (from compose container_name: field read from the SAME compose_file arg), and app_port (HTTP port the service exposes). * compose_file is optional, defaults to "docker-compose.yml". If the repo has a -keploy.yaml variant with `external: true` networks, do NOT point compose_file at it — it won't work in CI. * suite_ids is optional and should be LEFT BLANK by default — the CLI resolves every linked suite at run time. Only pin an explicit list when the dev narrows scope. ===== FINAL RESPONSE — three short sections, no questions ===== ### Created | File | Lines | | --- | --- | | .github/workflows/keploy-sandbox.yml | N | ### Summary - App: <name> (<app_id>), <N> linked suites replayed on every PR - Trigger: pull_request → main, + manual workflow_dispatch - Failure on any suite gates the PR (non-zero exit from the CLI) ### Before the first run, add this GitHub secret - `KEPLOY_API_KEY` — at https://github.com/<owner>/<repo>/settings/secrets/actions/new (self-hosted users — point at your own api-server by building the enterprise binary with -X main.api_server_uri=<url>; there is no runtime env override on the released binary.) This tool does NOT run anything. It only generates file contents.
    Connector
  • Record mocks for V1 repo-mode API tests using the V1-native CLI command `keploy sandbox local record`. Runs the dev's app under the keploy eBPF agent, drives the V1 chained-CRUD tests from `keploy/api-tests/<resource>/test.yaml`, captures every outbound call (DB queries, Redis ops, downstream HTTP) as mocks, and lays them out at `<app_dir>/keploy/<suite-name>/{tests/, mocks.yaml, config.yaml}` in the standard OSS test-set tree. On success, mocks upload to the Keploy canonical pool by content hash; the hash lands in config.yaml so a teammate's later replay fetches the same bytes. CRITICAL — DO NOT CONFUSE WITH `keploy record sandbox`: * `keploy sandbox local record` (V1, repo-mode) ← this is what the playbook below uses * `keploy record sandbox` (legacy, cloud-mode) ← DO NOT call this for V1 The two are entirely different commands. Cloud-mode requires server-side suites (queried via --suite-ids) — V1 repo-mode reads tests from the local filesystem and never registers them in the cloud. If the dev is in repo storage mode (verify via devloop_resolve_storage's source=persisted, mode=repo), V1 is the ONLY correct sandbox path. STRICT — TIME-FREEZING DOES NOT APPLY TO RECORD. Recording MUST use the dev's regular (prod) Dockerfile or native binary. NEVER spawn the app via Dockerfile.keploy / "-f docker-compose.keploy.yml" / "-tags=faketime" build during record. The faketime binary writes wrong timestamps into captured mocks (it reads time from the offset file, not the wall clock) and the entire capture becomes corrupt — recovery requires re-recording from scratch with the prod binary. If a previous replay failed with expired-JWT and the dev wants to "fix" it, the fix is to re-RUN the replay with --freezeTime, NOT to re-record. The recorded mocks captured against the prod binary are exactly what replay's clock-rewind is designed to validate; touching the record path defeats the whole mechanism. ONLY call this with an explicit dev opt-in. The valid triggers: * Dev directly asks ("capture mocks", "sandbox record", "rerecord the users mocks"). * Post-resource menu (Step 5 of devloop_generate_resource_flow) — dev picks "Capture mocks so CI runs in seconds". * get_session_report shows mock_mismatch_dominant=true AND the dev says yes to your "rerecord?" prompt. Pre-conditions: * Dev's app must NOT already be running (keploy spawns its own copy of the app under the agent's eBPF hooks via the -c command). If a server is up at the target port, KILL IT first or the agent's network capture won't see the traffic. * Real downstream deps (MySQL, Redis, Kafka, etc.) MUST be running — the capture proxies through to them on first contact so the recorded mocks contain real responses. * The test YAML must exist at <app_dir>/keploy/api-tests/<resource>/test.yaml. Returns a playbook for `keploy sandbox local record` with the V1 flag surface: --test-dir, --app-url, -c (spawn command), --container-name (docker-compose only), --skip-mock-upload (offline), --skip-report-upload (offline). Mocks land per-suite at keploy/<suite-name>/. NDJSON progress at --progress-file for the standard tail-til-done loop.
    Connector
  • Persist a single falsifiable, evidence-backed CLAIM — the atomic unit of the research graph. Use this for each discrete assertion an analysis produces (e.g. 'NVDA gross margin stays above 70% through FY2026'), then compose claims into a thesis with `link_claim_to_thesis`. Claims are scored independently of theses, so claim accuracy is tracked as its own track record. Pick `claim_type` by HOW it's judged, not what it's about: `assertion` = true now, checked against data; `prediction` = resolves at `horizon_days` via `verifiable_condition`; `judgment` = qualitative, not auto-scored. Use `tags` for the topic (financial, valuation, macro, …). Set `eval_mode: 'auto'` + a `verifiable_condition` for deterministic grading, else `'agent'`/`'manual'`. Tier: all paid + free tiers (sample rejected — guest has no customerId). Verifiable claims must cite evidence.
    Connector

Matching MCP Servers

  • Edit an existing test suite — change one or more step bodies, assertions, headers, or remove/add steps. Returns a playbook that delegates to `keploy update-test-suite`, which validates the new state (static structural checks + 2 live runs for idempotency + GET-coupling check) and snapshot-replaces the suite via api-server. POST-EDIT BEHAVIOUR: any structural change here (step method/url/body/headers/extract/assert, or add/delete steps) AUTOMATICALLY clears the suite's sandbox test server-side — the suite comes back as linked=false. Call record_sandbox_test on the updated suite before any sandbox replay; otherwise replay_sandbox_test will 400 with "no sandboxed tests". Cosmetic-only edits (name, description, labels) preserve the sandbox test. ═══════════════════════════════════════════════════════════════════ FETCH-FIRST RULE — required for the edit to be accepted: ═══════════════════════════════════════════════════════════════════ The api-server's replace handler rejects updates that preserve ZERO step IDs from the existing suite ("full rewrite, not an edit"). To make a real edit: 1. Call getTestSuite first (or use download_recording / get_app_testing_context if you already have the suite). Capture each existing step's "id" field. 2. Compose your new steps_json INCLUDING the existing "id" on every step you want to KEEP or EDIT. Omit "id" only on steps you're ADDING. Drop a step entirely from steps_json to DELETE it. 3. Call this tool with that merged steps_json. If you author a fresh JSON without the existing step IDs, the server rejects it with "preserves no steps from the existing suite". When that happens, your two options are: (a) re-author with IDs preserved (preferred — keeps history), or (b) call delete_test_suite then create_test_suite (loses history, fresh suite_id). ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has no on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app, then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. The getTestSuite call in step 4 is the one whose response you also use to capture every step's existing "id" for the FETCH-FIRST RULE above — so step 4 is actually a 2-for-1: discovery AND fetch-first happen on the same call. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. ═══════════════════════════════════════════════════════════════════ INPUTS ═══════════════════════════════════════════════════════════════════ * app_id (required) — Keploy app id * suite_id (required) — UUID of the suite to update * branch_id (required) — Keploy branch UUID (resolve via the two-step flow before calling) * steps_json (required) — JSON array of the FULL desired step list. Each kept step MUST carry the existing "id". Same step shape as create_test_suite (response, extract, assert, etc — all static structural checks apply). * name / description / labels (optional) — overrides for top-level suite metadata * app_url (required) — base URL of the dev's running local app, e.g. http://localhost:8080. The CLI fires the new state TWICE against this for the idempotency check + GET-coupling check. * app_dir (optional) — repo root the CLI cd's into; defaults to "." ═══════════════════════════════════════════════════════════════════ HOW THIS TOOL WORKS ═══════════════════════════════════════════════════════════════════ This tool DOES NOT call api-server itself. It returns a 3-step playbook for you (Claude) to walk via Bash — same shape as create_test_suite: 1. Write merged JSON to a temp file. 2. Run `keploy update-test-suite --suite-id <id> --file <path> --branch-id <uuid> --base-url <url>` — runs every static structural check, fires the new state twice locally, applies the GET-coupling check, then POSTs the snapshot-replace. 3. Cleanup the temp file. Walk the playbook in order. If step 2 exits non-zero, surface stdout to the dev — it has the rule violation / failure detail. OUTCOMES the AI should recognize: * Exit 0 + stdout has "✓ suite updated:" + "View:" line → success. Surface the View URL to the dev. * Exit 1 + "preserves no steps from the existing suite" → fetch-first rule was missed. Re-author with step IDs preserved (or call delete_test_suite + create_test_suite as the documented escape hatch). * Exit 1 + structural-check violations → fix the suite per the violation messages, then REWRITE the suite file via Bash and RE-RUN this CLI command directly. DO NOT call update_test_suite again to retry — the playbook + file path are already valid; only the JSON content needs revision. The validator output includes a canonical step skeleton on structural failures. * Exit 2 + "couldn't reach the dev's app" → ensure the app is up at app_url and retry. PREREQUISITES the playbook assumes: * The dev's app is up and reachable at app_url. * `keploy` binary is on PATH. If missing, install before calling this tool: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. * Either ~/.keploy/cred.yaml exists or KEPLOY_API_KEY is exported.
    Connector
  • Returns a chronological fill feed, filterable by `coin`, `dex`, and/or `user`. Each row is a single fill carrying price, size, side (`BID` or `ASK`), directional intent (`OPEN_LONG`, `CLOSE_SHORT`, `LIQUIDATED_CROSS_LONG`, `AUTO_DELEVERAGING`, and others), closed PnL, fees (negative values represent maker rebates), and order-level metadata (`order_id`, `client_order_id`, `twap_id`, `crossed`). For balance-changing events on a user (deposits, withdrawals, funding payments, vault flows), use `/v1/hyperliquid/users/activity`. At least one of `coin`, `dex`, or `user` is required. Filters compose additively — pass any combination to narrow further; a mismatched pair (e.g. `coin=cash:TSLA&dex=xyz`) returns empty. Defaults to the last 24 hours when no time range is specified — provide `start_time` and `end_time` to query older data. **Query Parameters:** - **coin**: Hyperliquid coin identifier. Core perps have no prefix (`BTC`, `HYPE`); spot pairs use `@N` (`@107`); builder DEXs prefix the symbol with the DEX name (`xyz:SILVER`).<br>Single value or array of values* (separate multiple values with `,`)<br>*Plan restricted. - **dex**: DEX identifier. `perps` for core perps, `spot` for `@N` spot pairs, or a builder DEX name (`xyz`, `cash`, …). Call `/v1/hyperliquid/dexes` for the live set.<br>Single value or array of values* (separate multiple values with `,`)<br>*Plan restricted. - **user**: Filter by address<br>Single value or array of values* (separate multiple values with `,`)<br>*Plan restricted. - **start_time**: UNIX timestamp in seconds or date string (e.g. "2025-01-01T00:00:00Z", "2025-01-01", ...). - **end_time**: UNIX timestamp in seconds or date string (e.g. "2025-01-01T00:00:00Z", "2025-01-01", ...). - **limit**: Number of items* returned in a single request.<br>*Plan restricted. - **page**: Page number to fetch.<br>Empty `data` array signifies end of results. **Responses:** - **200** (Success): Successful Response - Content-Type: `application/json` - **Response Properties:** - **request_time**: ISO 8601 datetime string - **Example:** ```json { "data": [ { "block_num": 1, "timestamp": "string", "transaction_hash": "string", "direction": "string", "transaction_id": 1, "closed_pnl": 1.5, "fee": 1.5, "notional": 1.5, "dex": "string", "crossed": true, "price": 1.5, "size": 1.5, "market_name": "string", "start_position": "string", "fee_token": "string", "twap_id": 1, "user": "string", "client_order_id": "string", "order_id": 1, "coin": "string", "side": "string" } ], "statistics": { "elapsed": 1.5, "rows_read": 1.5, "bytes_read": 1.5 }, "pagination": { "previous_page": 1, "current_page": 1 }, "request_time": "string", "duration_ms": 1.5, "results": 1.5 } ``` - **400**: Client side error - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **401**: Authentication failed - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **403**: Forbidden - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **404**: Not found - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **500**: Server side error - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "bad_database_response", "message": "string" } ```
    Connector
  • Record (or refresh) the sandbox test for one or more existing test suites — captures the request/response per step plus the outbound mocks (DB, downstream HTTP, etc.) against the dev's locally-running app, then links the captures onto the suite. Use this when the dev says "record", "rerecord", "re-record", "refresh the recordings", "capture mocks", or as the RECORD step in FROM-SCRATCH (after create_test_suite). This tool resolves the app (if only a hint is given), resolves ONE OR MORE suites to record (by exact ids OR case-insensitive name substring match), and delegates to a headless playbook. Output produces a RERECORD REPORT — it answers "did the sandbox test get created and linked successfully?". ╔═══ PRE-CHECK — DID YOU ARRIVE HERE FROM A FAILED REPLAY? ═══╗ This tool refreshes the CAPTURED BASELINE (mocks + recorded request/response per step). It does NOT modify the suite's authored assert array or response.body — those are the contract as defined when the suite was created/updated. If the contract changed and you re-record without updating the suite first, the new rerecord fires the suite's stale assertions against the live app, gate-1-fails on the same diff, and the suite comes back unlinked. Before calling THIS tool in response to a failed replay_sandbox_test or replay_test_suite, walk these checks: 1. Read failed_steps[].authored_assertions and authored_response_body in the most recent get_session_report (kind=sandbox_run / test_suite_run). The fields are inlined — no second tool call needed unless the report predates the inlined fields. 2. For each failing step: does any authored assertion pin the diverging value? (e.g. assert {path: "$.order.status", expected: "created order"} where the diff says "expected 'created order', got 'created'".) * YES → call update_test_suite FIRST to update that assertion + the response.body field, THEN call this tool. * NO → safe to call this tool directly; the captured baseline drifts but no authored assertion blocks the rerecord. 3. If you can't find authored_assertions in the report (older format) AND don't already know the suite's shape, call getTestSuite({app_id, suite_id, branch_id}) to inspect the assert array before deciding. Don't guess. REFUSE-RULE: if the dev confirms a contract change is intentional and the failing step has a pinned authored assertion on the diverging value, you MUST run update_test_suite before this tool. Calling record_sandbox_test FIRST in that case is the bug this pre-check exists to prevent — don't justify it as "let's just refresh the baseline first". The order is update → record → replay; never record → update. ╚═══════════════════════════════════════════════════════════════╝ ===== BEFORE CALLING — one-time setup ===== (a) APP_ID RESOLUTION (skip if app_id is already known): * Derive a likely app name from the cwd's basename (e.g. cwd=/home/dev/orderflow → "orderflow"). Lowercase it. * Call listApps({q: "<cwd-basename>"}) — the server does a case-insensitive server-side substring match, so you don't paginate the full tenant list (can be hundreds of apps on shared accounts). * Exactly one match → use its id. Multiple → list them and ASK the dev which one (a wrong app_id silently routes traffic + suite creates into the wrong app). Exception: if the compose file / repo layout unambiguously pins one candidate (e.g. compose has service "producer" and one candidate is "<folder>.producer" while others are unrelated siblings), you may pick it AND tell the dev up-front so they can correct. * Zero matches → ASK permission to create a new Keploy app with the derived name; on yes, call createApp({name, endpoint}) and use the returned id. * Alternatively pass app_name_hint to THIS tool and the server resolves it (same rules; multiple/zero → typed error). (b) KEPLOY BINARY VERIFICATION: * Bash: "keploy --version" (or "~/.keploy/bin/keploy --version"). If it exits non-zero the binary is missing. * If missing OR older than this MCP server was built against, install/upgrade: curl --silent -O -L https://keploy.io/ent/install.sh && source install.sh * Re-verify with "keploy --version"; fail loudly if still absent (tell the dev where keploy put the binary so they can add it to PATH). ===== DOCKER-COMPOSE NETWORK RULE (absolute) ===== Use the SAME compose file + service that was used in the validate-curl phase. Do NOT point keploy at a second "keploy-only" compose file — docker-compose isolates each file into its own project + network, so the app container spawned by keploy cannot reach the DB/Kafka containers that validate brought up (and the network-name collision blocks keploy from starting). Correct flow: (i) Validate phase: "docker compose up -d" (brings up app + deps on network <project>_default). (ii) Before calling record_sandbox_test, Bash: "docker compose stop <app_service> && docker compose rm -f <app_service>" — stop ONLY the app service; leave deps running so keploy's new app container can reach them on the existing network. (iii) Pass app_command = "docker compose up <app_service>" (same compose file, same project → same network). container_name = the actual name set by compose (e.g. "orderflow-producer", not "producer"). ===== RESOLUTION RULES (server-side, no guessing) ===== 1. App: caller provides app_id OR app_name_hint. With a hint, the server does listApps({q: hint}). Zero matches → typed error; multiple → typed error listing them so Claude asks the dev. 2. Suites: DEFAULT IS "ALL LINKED". When the dev says "record my sandbox tests" / "rerecord everything" / "refresh my recordings" with no specific suite named, LEAVE BOTH suite_ids AND suite_name_hint UNSET. Do NOT list suites first and pass a comma-joined UUID list back — the CLI resolves "every linked suite for the app" itself, cleaner and less brittle. Only pass a narrower selector when the dev explicitly names suites: - suite_ids (comma-separated, exact) — when you already have the IDs. - suite_name_hint (case-insensitive substring match) — when the dev names suites by human phrasing like "the auth suite" or "deterministic". Every suite whose name contains the substring is recorded. If the dev asks to record suites that don't exist yet (zero match) → typed error. Any ≥1 match is fine. DO NOT prompt the dev for which suites to record — default to all linked if they didn't name any. ===== DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id ===== Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust without a hit, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. The standard pattern when "search the suite by id" returns nothing is NOT "give up and ask the dev which app" — it's "the suite exists on a BRANCH, walk discovery". Suites created via create_test_suite + rerecord on a Keploy branch are INVISIBLE to a main-view listTestSuites; you have to scope each call to a branch. After resolving once in a session, REUSE the {app_id, branch_id} for any subsequent suite-targeted call (replay_sandbox_test, update_test_suite, replay_test_suite); don't re-walk discovery for every action. ===== PREREQUISITES ===== - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== The response includes a "playbook" array; execute its steps in order. The flow is HEADLESS — one background process, NDJSON progress events on a local file, no separate HTTP surface to bind. THERE IS NO SEPARATE CLEANUP STEP — the CLI exits on its own once phase=done is written. 1. Spawn the `keploy record sandbox --cloud-app-id …` process via Bash (run_in_background). Capture its PID into $KEPLOY_PID. 2. Poll progress by repeatedly calling Bash with `tail -n 1 $PROGRESS_FILE`. Each call returns instantly; the MCP round-trip between calls paces the loop. DO NOT wrap in a sleep loop — Claude Code's Bash rejects standalone `sleep N` and chained-sleep patterns. Read .phase off each line; stop when phase=done. The wait_for_done step's built-in `kill -0 $KEPLOY_PID` check is the safety-net for silent early-exit (CLI died before writing the terminal event) — it lets the loop exit instead of spinning forever on a dead process. 3. Read the terminal event (last line of $PROGRESS_FILE). It carries data.ok, data.error (on failure), data.test_run_id (on success). 4. On data.ok=true: call get_session_report(app_id, test_run_id) with verbose=true to surface the rerecord report. On data.ok=false: show data.error to the dev directly (optionally tail the log_file for stderr context) and SKIP get_session_report (there's no run to fetch). Auto-replay + linkTestSetToSuite run INSIDE the CLI process before it writes phase=done — if the terminal event says ok=true, linkage already happened. You do NOT need to wait for a separate post-success window; the CLI doesn't exit until it's fully done. INTERRUPTED FLOWS: if your conversation dies between step 1 and step 2 (Claude crashes, connection drops, dev cancels), the CLI keeps running in the background. It's not orphaned — it'll finish its run and write phase=done. To abort early, the dev can `pkill -f "keploy.*sandbox"` manually; otherwise just let it complete and resume by re-reading the progress file on the next turn. ===== NDJSON SCHEMA — the contract ===== Every line in the progress_file is one JSON object with this envelope: { "ts": "<RFC3339-nano>", "command": "record" | "test", "phase": "<phase-name>", "message": "<optional human-readable>", "data": { ... phase-specific ... } // optional } The phase vocabulary is intentionally extensible — new lifecycle phases get added over time as the CLI grows (started, agent_up, app_starting, suites_running_start, record_done, auto_replay_skipped, upload_done, linking_done, etc.). There are only TWO phases the AI must handle programmatically; everything else is informational and you should NOT switch on phase names you don't recognize: * phase != "done" → keep polling. Optional: surface message/data to the dev as ambient progress ("agent is starting...", "suites uploading..."), but never branch on a specific intermediate phase name. * phase == "done" → terminal event. Stop polling. The data envelope carries: - data.ok bool true on success, false on failure - data.error string (only on ok=false) one-line failure summary - data.test_run_id string (only on ok=true) pass to get_session_report - data.app_id string echo of the app_id passed to the tool - data.artifact_dir string local path to captured/replayed artifacts - data.dashboard_url string UI link to drill into the run If you observe a phase you don't recognize, IGNORE it and keep polling. If "done" itself is renamed by a future CLI version, the wait_for_done step's PID-alive guard is your safety net (the poll loop exits when the CLI dies); surface log_file contents to the dev. ===== "ALL SUITES FAILED CAPTURE" — special signal ===== If you see a `phase: "auto_replay_skipped"` event with `message: "all suites failed during rerecord; skipping replay + linking"` ahead of the terminal `done` event, every suite failed at the CAPTURE phase (before auto-replay even ran). The CLI fails closed in this case — auto-replay and suite linking are SKIPPED, so every per_suite entry comes back linked=false. Watch for this trap: the terminal `data.ok=true` because the CLI itself completed cleanly (it didn't crash; it just had nothing to record successfully). DO NOT read data.ok=true as "rerecord succeeded" — read `<linked>/<total>`. If linked == 0, this is a HARD failure that needs diagnosis, not a partial-linkage case. ALWAYS surface the dashboard URL on this case. The terminal `done` event still carries `data.dashboard_url` and `data.test_run_id` (atg's TestSuiteRun was created during the capture phase); emit them verbatim so the dev can drill into per-step failures in the UI: "0/N suites have a sandbox test — every suite failed during the capture phase, so auto-replay and linking were skipped. Dashboard: <data.dashboard_url> (test_run_id=<id>)" EDGE CASE: if `data.test_run_id` is empty, atg never inserted a TestSuiteRun (typically a pre-flight validation failure — branch-id rejection, app unreachable, etc.). The dashboard URL won't resolve. Skip the URL, surface the log_file contents instead so the dev can read the early-stage failure. Recovery is the same as WHEN linked=false below — read failed_steps for each suite and pick route B (fix code) / C (update suite + record again) / SKIP. Don't infra-retry; capture-phase failures across every suite usually mean the app is broken, the suite shapes are stale, or the dev's local app isn't reachable. ===== LINKAGE VERIFICATION ===== After get_session_report returns, for EVERY suite that went into this record, call getTestSuite({suite_id}) and check whether the suite has a sandbox test (linked=true / non-empty test_set_id). A suite without a sandbox test cannot be replayed — replay_sandbox_test will 400 on it with "no sandboxed tests" until a successful record produces one. ===== WHEN linked=false — recovery rules ===== A suite with linked=false after record_sandbox_test means the record process couldn't produce a sandbox test for that suite. The SUITE ITSELF still exists; it just has no sandbox test. Diagnose WHY by reading the rerecord report's failed_steps for that suite: * No failed_steps OR pure infra error (link-commit / upload failed, no step diverged) → call record_sandbox_test AGAIN scoped to just the unlinked suite_ids. The tool is idempotent on the suite; safe to re-run. * failed_steps with assertion diffs (response shape, body fields, status code shifted from what the suite expected) → the suite is stale relative to current app behavior. The CONTRACT changed: - Change is INTENTIONAL (new field, renamed key, different status code is the new normal) → call update_test_suite to update the affected step's response / assertions to match the new contract, THEN call record_sandbox_test on the updated suite. - Change is UNINTENTIONAL (app regressed) → fix the app code first, then call record_sandbox_test. No suite update needed; the original test was correct. * failed_steps with 500s / handler crashes / connection refused → the app is broken at the wire level. Fix the app, then call record_sandbox_test. Don't update_test_suite to absorb a real failure. NEVER: * Don't call create_test_suite to "redo" the suite — it already exists; re-creating authors a duplicate (see BEFORE CREATING in create_test_suite). * Don't blindly loop record_sandbox_test without diagnosing failed_steps first; if the cause is suite-vs-app mismatch, retries won't help. ===== MANDATORY OUTPUT — Phase 2 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT collapse into a single pass/fail table with the rerecord report; do NOT merge with Phase 1 or Phase 3): ### Phase 2 — Sandbox-test linkage **<linked>/<total> suites have a sandbox test** _Suites with a sandbox test_ | Suite name | suite_id | test_set_id | Capture pass/total | | --- | --- | --- | --- | | <name> | <suite_id> | <test_set_id> | <p>/<t> | (emit even if zero — one row per linked suite, or "_(none)_" in place of rows) _Suites without a sandbox test_ (omit ONLY if every suite linked) | Suite name | suite_id | Likely cause | | --- | --- | --- | | <name> | <suite_id> | gate1 / gate2 / infra | Likely-cause decoding: assertion diffs → gate 1 upstream-replay failure; upstream-passing + mock-replay-diff → gate 2 mock-determinism mismatch; zero failures + still unlinked → infra link-commit issue. Then proceed to replay_sandbox_test ONLY for the suites that DID link; the unlinked ones will 400 on replay. ===== DO NOT ===== * DO NOT fall back to raw keploy CLI (`keploy rerecord -t …`) if the MCP tool drops mid-flow — the CLI subcommand runs test-sets directly and does NOT update the suite's test_set_id. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Scaffold the GitHub Actions workflow that runs the V1 API tests on every PR. Returns the exact YAML content to write to .github/workflows/keploy.yml + the Bash command to set the KEPLOY_API_KEY secret. The AI walks the playbook with its Write tool + the `gh` CLI. PRECONDITIONS — CHECK BEFORE CALLING. Calling this tool out of order is a DEVLOOP violation; the doc-stated user-flow ordering is generate → run → mutation-prove (opt-in) → expand (opt-in) → CI (opt-in). Specifically you must have: 1. Generated at least one test via devloop_generate_resource_flow AND watched it pass via "keploy test-gen run --ci". 2. SURFACED the mutation-prove opt-in to the dev verbatim: "Want me to prove the test catches bugs by applying 3 small mutations to your handler and reverting?" — and the dev answered (yes-walked through devloop_mutation_demo, or explicit no/skip/later). Doing the test runs is NOT the same as offering mutation-prove; the offer is a separate dev-facing question. 3. ASKED the dev "want me to wire this into CI?" — explicit yes from the dev. If ANY of those three are missing, STOP and back up. The mutation-prove gate is what builds the dev's trust before they commit Keploy to CI; skipping it ships shallow tests into a workflow the dev hasn't validated. What this tool does NOT do (intentionally — the dev keeps custody): * Mint the CI API key server-side. The dev provisions it themselves in the Keploy dashboard (Step 2 of the returned playbook walks them through it). The AI never sees the kep_* value — it transits dashboard clipboard → terminal stdin → gh CLI's encrypted POST. This is a security property, not a limitation. * Post structured PR comments from api-server. V1 relies on GitHub Actions' native status-check rendering; the structured comment renderer is a V1.5 lift. The emitted workflow runs on pull_request (default base branch) and reads app_id / test-dir / context-dir from keploy/api-tests/keploy-test-gen.yaml — the dev never has to thread flags through the workflow. TIME-FREEZING — DEFAULT ON, ALMOST ALWAYS NEEDED FOR BACKEND APPS. Almost every backend app has authentication (login → JWT/session/OAuth). The dev's recorded tests carry those tokens in headers. Between record time and the first PR's CI run, the tokens' exp claims pass real wall-clock — CI then 401s on every authenticated step, and the dev blames Keploy. Keploy's time-freezing rewinds the app's clock to the record moment so the recorded tokens validate. Default policy: time_freezing=true. The AI MUST inspect the dev's test suites BEFORE calling this tool: - <app_dir>/keploy/api-tests/<resource>/test.yaml (V1 sources) - <app_dir>/keploy/<SuiteName>/tests/*.yaml (captured sandbox tests) Look for: Authorization Bearer headers; steps hitting /login /auth /signin /token /oauth; response bodies containing jwt / token / access_token / refresh_token / expires_in / iat / exp. If any of those signals appear (or you're unsure), keep time_freezing=true. Only pass time_freezing=false when you've audited every suite and confirmed zero time-sensitive tokens (rare for a real backend). When time_freezing=true, this tool also requires app_language (go / node / python / java / ruby / other) and app_service (docker-compose service name). Output then includes: - Modified workflow YAML (pre-populates keploy-sockets-vol; uses -f docker-compose.yml -f docker-compose.keploy.yml; passes --freezeTime) - docker-compose.keploy.yml override (volume mount + LD_PRELOAD for non-Go, or Dockerfile.keploy build for Go) - Dockerfile.keploy (Go ONLY — vDSO bypasses LD_PRELOAD, requires -tags=faketime rebuild) The dev's plain "docker compose up" is unaffected. Time-freezing only activates when CI (or the dev locally) explicitly passes both compose files. TIME-FREEZING IS REPLAY-ONLY — STRICT INVARIANT. The Dockerfile.keploy / docker-compose.keploy.yml / --freezeTime flag this tool emits exist purely to make recorded JWTs validate at REPLAY time. They MUST NEVER apply when recording. Concretely: - Record uses the dev's PROD Dockerfile + plain "docker compose up" (no override file). - Replay uses Dockerfile.keploy + "docker compose -f docker-compose.yml -f docker-compose.keploy.yml up" + the --freezeTime flag on the CLI. If a recording is captured against a faketime-built binary, every timestamp in the captured mocks is wrong and the whole capture is corrupt — there is no recovery short of re-recording from scratch with the prod binary. The CI YAML this tool emits in ci_mode=sandbox-replay is a REPLAY workflow; it boots via the compose override on purpose. The dev's separate record flow (devloop_record_sandbox) must NOT touch the override. TIME-FREEZING IS FORCED ON FOR ci_mode=sandbox-replay — NON-NEGOTIABLE. Any explicit time_freezing=false passed alongside ci_mode=sandbox-replay is silently overridden back to true. Rationale: sandbox replay processes the recorded request stream verbatim — any time-sensitive token in any captured request (JWT exp, OAuth iat, session cookie) goes stale the moment wall-clock passes the recorded moment, and silently fails replay. Whether the dev's suite happens to carry such a token is not auditable at scaffold time, and the failure is silent (401 on the first auth-gated step in CI). The cost of force-ON for a hypothetical zero-token app is one dormant volume mount + a no-op CLI flag; the cost of force-OFF for a token-bearing app is every PR failing. Asymmetric — force-ON wins. For ci_mode=api-tests, the workflow runs against live deps with current wall-clock so recorded tokens never enter the picture; time_freezing defaults to false and is overridable by the AI if they want the artifacts pre-staged for a later sandbox switch.
    Connector
  • WORKFLOW: Step 1 of 4 - Start infrastructure design conversation Open an InsideOut V2 session and receive the assistant's intro message. The response contains a clean message from Riley (the infrastructure advisor) - display it to the user. ⚠️ Riley will ask questions - forward these to the user, DO NOT answer on their behalf. CRITICAL: This tool returns a session_id in the response metadata. You MUST use this session_id for ALL subsequent tool calls (convoreply, tfgenerate, tfdeploy, etc.). ⚠️ The session_id includes a ?token=... suffix (format: sess_v2_xxx?token=yyy) which is part of the session credential — without it, downstream tools fall back to a tokenless connect URL that 401s. Always pass session_id verbatim to subsequent tools and to the user; do NOT shorten, paraphrase, or strip the ?token= portion when summarizing the session in chat or in your own scratch notes. Use when the user mentions keywords like: 'setup my cloud infra', 'provision infrastructure', 'deploy infra', 'start insideout', 'use insideout', or similar intent to begin infra setup. OPTIONAL: project_context (string) - General tech stack summary so Riley can skip discovery questions and jump to recommendations. The agent should confirm this with the user before sending. Include whichever apply: language/framework, databases/services, container usage, existing IaC, CI/CD platform, cloud provider, Kubernetes usage, what the project does. Example: 'Next.js 14 + TypeScript, PostgreSQL, Redis, Docker Compose, deployed to AWS ECS, GitHub Actions CI/CD, ~50k MAU'. NEVER include credentials, secrets, API keys, PII, source code, or internal URLs/IPs -- only general metadata summaries useful to a cloud architect agent. IMPORTANT: source (string) - You MUST set this to identify which IDE/tool you are. Auto-detect from your environment: 'claude-code', 'codex', 'antigravity', 'kiro', 'vscode', 'web', 'mcp'. If unsure, use the name of your IDE/tool in lowercase. Do NOT omit this — it controls the 'Open {IDE}' button on the credential connect screen. OPTIONAL: github_username (string) - GitHub username for deploy commit attribution. Pre-populates the GitHub username field on the connect page. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.
    Connector
  • Generate official sweepstakes rules via the 14-step wizard. BEFORE CALLING: 1) fetch_sweepstakes to get token, dates, name. 2) get_business + get_profile to pre-fill sponsor fields. 3) fetch_rules to check for existing primary rules — if primary exists, warn user new rules will be SECONDARY. Ask wizard questions in order (steps A-N), one at a time or in small groups. Only ask for data you cannot get from API calls. PRIMARY RULES LINK: If result is_primary=true, give user the URL: https://swpp.me/r/[handler] (handler in lowercase from fetch_sweepstakes). RULES LANGUAGE: Always set rules_language="en". The wizard generates ALL legal text server-side — NEVER compose rules language yourself. AMOE URL: The AMOE URL is NOT the entry page URL — the wizard handles AMOE language automatically based on method_of_entry. AGE GATE: Only activate Age Gate when min_age=2 (21+). NEVER for min_age=1 (18+) or min_age=3 (13+). GEOLOCATION: Use the states parameter for geographic eligibility. NEVER use GeoLocation entry settings for state restrictions — GeoLocation is for GPS/IP boundaries only. # create_rules_wizard ## When to use Generate official sweepstakes rules via the 14-step wizard. BEFORE CALLING: 1) fetch_sweepstakes to get token, dates, name. 2) get_business + get_profile to pre-fill sponsor fields. 3) fetch_rules to check for existing primary rules — if primary exists, warn user new rules will be SECONDARY. Ask wizard questions in order (steps A-N), one at a time or in small groups. Only ask for data you cannot get from API calls. PRIMARY RULES LINK: If result is_primary=true, give user the URL: https://swpp.me/r/[handler] (handler in lowercase from fetch_sweepstakes). RULES LANGUAGE: Always set rules_language="en". The wizard generates ALL legal text server-side — NEVER compose rules language yourself. AMOE URL: The AMOE URL is NOT the entry page URL — the wizard handles AMOE language automatically based on method_of_entry. AGE GATE: Only activate Age Gate when min_age=2 (21+). NEVER for min_age=1 (18+) or min_age=3 (13+). GEOLOCATION: Use the states parameter for geographic eligibility. NEVER use GeoLocation entry settings for state restrictions — GeoLocation is for GPS/IP boundaries only. ## Pre-calls required 1. fetch_sweepstakes if the user gave you a sweepstakes name instead of a token 2. fetch_rules(sweepstakes_token) — if primary rules already exist, WARN that new rules will be SECONDARY (not published) 3. get_business — auto-populate sponsor info (legal name, address) 4. get_entry_settings — confirm AMOE state matches the entry method ## Parameters to validate before calling - sweepstakes_token (string, required) — Sweepstakes token (UUID). Get via fetch_sweepstakes. - arv (number, required) — one of: 1, 2 — Approximate Retail Value threshold. 1 = ARV >= $5,000. 2 = ARV < $5,000. - alcohol_sweeps (number, required) — one of: 1, 2 — Is this an alcohol-related sweepstakes? 1 = Yes, 2 = No. - sweepstakes_name (string, required) — Official promotional name (6-60 characters). - start_date (string, required) — Start date (YYYY-MM-DD). - start_time (string, required) — Start time (e.g. "09:00 AM" or "14:00"). - start_timezone (string, required) — Start timezone (e.g. "US/Eastern", "EST", "CST"). - end_date (string, required) — End date (YYYY-MM-DD). - end_time (string, required) — End time (e.g. "11:59 PM" or "23:59"). - end_timezone (string, required) — End timezone. - prize_description (string, required) — Detailed prize description (max 5000 chars). - prize_include_travel (boolean, required) — Does the prize include travel? - prize_is_vehicle (boolean, required) — Is the prize a vehicle? - prize_value (number, required) — Total prize value in USD. Must be > 0. - entry_period_selector (number, required) — one of: 1, 2 — 1 = single drawing period, 2 = multiple entry periods. - sponsor_name (string, required) — Legal sponsor name. Pre-fill from get_business. - sponsor_address (string, required) — Sponsor street address. Pre-fill from get_business. - sponsor_city (string, required) — Sponsor city. Pre-fill from get_business. - sponsor_state (string, required) — Sponsor state or abbreviation. Pre-fill from get_business. - sponsor_zip_code (string, required) — Sponsor zip code (5 digits). Pre-fill from get_business. - method_of_entry (number, required) — one of: 1, 2, 3, 4, 5, 6, 7, 8 — Entry method: 1=Website, 2=SMS, 3=Social Media, 4=Other, 5=Purchase ($1=1 entry), 6=Purchase (1 order=1 entry), 7=Donation, 8=Subscription. - min_age (number, required) — one of: 1, 2, 3 — Minimum age: 1=18+, 2=21+, 3=13+ with parental consent. - states (number, required) — one of: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 — Geographic eligibility. 1=All 50+DC, 2=+PR, 3=+All Territories, 4=Select specific, 5=US&Canada, 6=US&Canada+PR, 7=US&UK, 8=US&Mexico, 9=Worldwide, 10=US,Canada,Mexico. - privacy_policy_url (string, required) — Privacy policy URL (min 11 chars, must include http/https). - sweeppea_entry_page (number, required) — one of: 1, 2, 3 — 1 = Sweeppea hosted page, 2 = custom URL, 3 = none. - winners_to_draw (number, optional) — Number of winners to draw (>= 1). Required when entry_period_selector = 1. - winner_drawing_date (string, optional) — Drawing date (YYYY-MM-DD). Required when entry_period_selector = 1. - winner_drawing_time (string, optional) — Drawing time. Required when entry_period_selector = 1. - winner_drawing_timezone (string, optional) — Drawing timezone. Required when entry_period_selector = 1. - winner_notification_date (string, optional) — Winner notification date (YYYY-MM-DD). Required when entry_period_selector = 1. - winner_notification_time (string, optional) — Winner notification time. Required when entry_period_selector = 1. - winner_notification_timezone (string, optional) — Winner notification timezone. Required when entry_period_selector = 1. - entry_period_items (array, optional) — Array of period objects. Required when entry_period_selector = 2. - sponsor_telephone (string, optional) — Sponsor phone number (optional). Pre-fill from get_profile. - sponsor_email (string, optional) — Sponsor email (optional). Pre-fill from get_profile. - social_media_entry_description (string, optional) — Social media entry details. Required when method_of_entry = 3. - other_description (string, optional) — Other entry method description. Required when method_of_entry = 4. - sponsor_ecommerce_store_url_a (string, optional) — Ecommerce store URL. Required when method_of_entry = 5. - sponsor_ecommerce_store_url_b (string, optional) — Ecommerce store URL. Required when method_of_entry = 6. - sponsor_donations_acceptance_page_url (string, optional) — Donations acceptance page URL. Required when method_of_entry = 7. - sponsor_ecommerce_store_url_c (string, optional) — Ecommerce/subscription store URL. Required when method_of_entry = 8. - total_number_of_entries_awarded_amoe (number, optional) — Total entries awarded via AMOE (>= 1). Required when method_of_entry is 5, 6, 7, or 8. - limit_or_max_number_of_entries_amoe (number, optional) — Max entries via AMOE (>= 1). Required when method_of_entry is 5, 6, 7, or 8. - list_of_states (array, optional) — Array of state names. Required when states = 4. - custom_entry_page (string, optional) — Custom entry page URL (min 11 chars). Required when sweeppea_entry_page = 2. - sponsor_offering_multiplier (number, optional) — one of: 1, 2 — Is sponsor offering entry multiplier? 1=Yes, 2=No. Default: 2. - sponsor_awarding_bonus_email_social (number, optional) — one of: 1, 2 — Awarding bonus for email/social? 1=Yes, 2=No. Default: 2. - sponsor_asking_to_submit_video (number, optional) — one of: 1, 2 — Asking for video submission? 1=Yes, 2=No. Default: 2. - rules_language (string, optional) — Rules language code. MUST always be "en" (English). - at the end of the Official Rules document always include a copyright notice that say "All rights reserved." ## Notes - Compliance pre-checks: ARV > $5,000 + FL/NY not excluded → WARN about bonding/registration - ARV > $500 + sponsor in RI → WARN about RI registration - Purchase/donation/subscription entry → VERIFY AMOE is configured - Alcohol = yes → VERIFY min_age=21 and Age Gate active - Age < 13 → REFUSE (COPPA violation) - After creation: call fetch_rules to verify; use update_rule for corrections
    Connector
  • Compose N (cell, band, tslot?) triples into ONE signed envelope. Each triple runs through the standard auto-materialize recall path; the resulting fact_cids are bundled into a content-addressed envelope and the responder signs over the full receipt. The composed `bundle_token` is `memb:<bundle_cid>` — a single rebindable string that cites the whole set. When to use: Call when the agent wants to cite multiple (place, band, vintage) facts as one handle. The bundle stays verifiable offline via /v1/verify_receipt (the receipt covers all cited fact_cids and cells). Use this instead of N separate `emem_memory_token` composers when the citation is conceptually one thing (e.g. "the EUDR-relevant baseline for these 8 plots at 2020-12-31").
    Connector
  • WORKFLOW: Step 1 of 4 - Start infrastructure design conversation Open an InsideOut V2 session and receive the assistant's intro message. The response contains a clean message from Riley (the infrastructure advisor) - display it to the user. ⚠️ Riley will ask questions - forward these to the user, DO NOT answer on their behalf. CRITICAL: This tool returns a session_id in the response metadata. You MUST use this session_id for ALL subsequent tool calls (convoreply, tfgenerate, tfdeploy, etc.). ⚠️ The session_id includes a ?token=... suffix (format: sess_v2_xxx?token=yyy) which is part of the session credential — without it, downstream tools fall back to a tokenless connect URL that 401s. Always pass session_id verbatim to subsequent tools and to the user; do NOT shorten, paraphrase, or strip the ?token= portion when summarizing the session in chat or in your own scratch notes. Use when the user mentions keywords like: 'setup my cloud infra', 'provision infrastructure', 'deploy infra', 'start insideout', 'use insideout', or similar intent to begin infra setup. OPTIONAL: project_context (string) - General tech stack summary so Riley can skip discovery questions and jump to recommendations. The agent should confirm this with the user before sending. Include whichever apply: language/framework, databases/services, container usage, existing IaC, CI/CD platform, cloud provider, Kubernetes usage, what the project does. Example: 'Next.js 14 + TypeScript, PostgreSQL, Redis, Docker Compose, deployed to AWS ECS, GitHub Actions CI/CD, ~50k MAU'. NEVER include credentials, secrets, API keys, PII, source code, or internal URLs/IPs -- only general metadata summaries useful to a cloud architect agent. IMPORTANT: source (string) - You MUST set this to identify which IDE/tool you are. Auto-detect from your environment: 'claude-code', 'codex', 'antigravity', 'kiro', 'vscode', 'web', 'mcp'. If unsure, use the name of your IDE/tool in lowercase. Do NOT omit this — it controls the 'Open {IDE}' button on the credential connect screen. OPTIONAL: github_username (string) - GitHub username for deploy commit attribution. Pre-populates the GitHub username field on the connect page. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.
    Connector
  • Compose and send an email — with subject, CC/BCC, and attachments. Use for email; for chat messages (Telegram/WhatsApp/livechat) use messages.send instead.
    Connector
  • Returns trading aggregates per user: fill count, volume broken down by side, total fees (negative values represent net maker rebates), realized PnL, net funding paid or received, liquidation-fill count, distinct coins traded, and first/last trade timestamps. Omit `user` for leaderboard mode — returns a paginated list sorted by `sort_by`. Provide `user` for profile mode — returns a single row. Filters `coin` and `dex` compose additively — pass either or both to narrow the scope (`coin=BTC` for one market, `dex=xyz` for one venue, both together for redundancy). A mismatched combination (e.g. `coin=cash:TSLA&dex=xyz`) returns an empty result. Aggregation windows are fixed via the `interval` parameter — `1h`, `1d`, `1w`, `30d`, or omit for all-time. Data is refreshed hourly, so `1h` lags up to 1h. Vaults trade as normal accounts, so passing a vault address as `user` returns its trading performance — pair with `/v1/hyperliquid/vaults` for depositor-side stats. **Query Parameters:** - **user**: Filter by address<br>Single value or array of values* (separate multiple values with `,`)<br>*Plan restricted. - **interval**: Lookback window for user statistics (1 hour, 1 day, 1 week, 30 days). Omit for all-time. - **sort_by**: No description. - **coin**: Hyperliquid coin identifier. Core perps have no prefix (`BTC`, `HYPE`); spot pairs use `@N` (`@107`); builder DEXs prefix the symbol with the DEX name (`xyz:SILVER`).<br>Single value or array of values* (separate multiple values with `,`)<br>*Plan restricted. - **dex**: DEX identifier. `perps` for core perps, `spot` for `@N` spot pairs, or a builder DEX name (`xyz`, `cash`, …). Call `/v1/hyperliquid/dexes` for the live set.<br>Single value or array of values* (separate multiple values with `,`)<br>*Plan restricted. - **limit**: Number of items* returned in a single request.<br>*Plan restricted. - **page**: Page number to fetch.<br>Empty `data` array signifies end of results. **Responses:** - **200** (Success): Successful Response - Content-Type: `application/json` - **Response Properties:** - **request_time**: ISO 8601 datetime string - **Example:** ```json { "data": [ { "user": "string", "coin": "unknown_type", "dex": "unknown_type", "total_funding": 1.5, "volume_sold": 1.5, "transactions": 1, "realized_pnl": 1.5, "volume_bought": 1.5, "liquidation_fills": 1, "last_trade": "string", "interval": "unknown_type", "buys": 1, "total_volume": 1.5, "sells": 1, "first_trade": "string", "total_fees": 1.5, "coins_traded": 1 } ], "statistics": { "elapsed": 1.5, "rows_read": 1.5, "bytes_read": 1.5 }, "pagination": { "previous_page": 1, "current_page": 1 }, "request_time": "string", "duration_ms": 1.5, "results": 1.5 } ``` - **400**: Client side error - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **401**: Authentication failed - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **403**: Forbidden - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **404**: Not found - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "authentication_failed", "message": "string" } ``` - **500**: Server side error - Content-Type: `application/json` - **Response Properties:** - **Example:** ```json { "status": "unknown_type", "code": "bad_database_response", "message": "string" } ```
    Connector
  • Switch the app's V1 CI from "boot the real app + deps" mode to sandbox mode (mocks fetched by content-hash from the cloud canonical pool). The doc-stated trigger: ~1 week after CI is wired, when the dev has felt the slow runs / flakes and you can pitch "your CI takes 90s and flaked twice this week — rerecord mocks and CI drops to ~8s." What flips: * The CI workflow YAML gets a --sandbox flag on `keploy test-gen run` and the docker-compose-up step removed. This tool returns the updated YAML; you re-PR it. Pre-condition: every resource you want in CI must have recorded mocks (config.yaml.mockRegistry.mock populated). Resources without mocks will fail in sandbox mode because there's nothing to serve. Run devloop_record_sandbox per resource first; verify via devloop_schema_drift_report-style checks before proposing the switch.
    Connector
  • Single-shot free-text answer about a real-world location, backed by signed satellite/elevation/water/built-up receipts. Forwards a place mention plus a question; runs the locate → recall → algorithm chain server-side; returns one packaged envelope. When to use: Use when the question concerns a specific real-world place and a packaged, citation-bearing answer is preferable to manual primitive composition. Forward the user's question verbatim as `q` plus the location as `place` (free text), `cell` (cell64), or `lat`+`lng`. The server resolves the location, classifies the question to a topic, recalls every relevant band (auto-materializing Sentinel-2 / Sentinel-1 / Cop-DEM / JRC GSW / Overture / weather on miss), surfaces the algorithm recipes that compose those bands into named scores, and returns a single envelope with `topic_routing`, `facts`, `algorithms_for_question`, an optional Sentinel-2 RGB scene URL, and a `caveats` block (grid resolution, revisit cadence). All facts are signed by the responder; the signed `receipt` (and its content-addressed `fact_cids`) is surfaced at the envelope ROOT — `response.receipt` / `response.fact_cids` — exactly like every other primitive, and is also mirrored under `facts_summary.receipt` for back-compat. Set `include_image: true` to bundle the latest cloud-free Sentinel-2 thumbnail. Out-of-scope questions return `topic_routing.matched_topic: null` plus the full inventory so the caller can route elsewhere.
    Connector
  • [IN DEVELOPMENT] [READ] Composite trust score (0..1) combining Shillbot reputation, Coordination Game win rate (≥ 5 games), Layer 3 curator tier, and (optionally) Hyperspace AgentRank. Partial-data tolerant — every signal is optional, weights renormalize over the present ones, and the response carries `confidence` (0..=4, how many signals contributed). Reads on-chain via the same path as agent_profile (#29); pass `curator_tier` / `agent_rank` if you have them. Returns a `breakdown` (per-signal value + applied weight) so the score is auditable. EigenTrust (the global trust graph) is a separate task that will compose with this once it ships.
    Connector
  • Create a new API test suite with test steps. Each step defines an HTTP request and assertions to validate the response. Steps can extract values from responses into variables for chaining requests. ═══════════════════════════════════════════════════════════════════ STEP 0 — read the canonical schema BEFORE drafting: ═══════════════════════════════════════════════════════════════════ If you've already called get_app_testing_context, the canonical step schema is in its response under the `step_schema` field — read it from there. Otherwise run `keploy test-suite-format` once before writing any suite JSON. The schema describes the MANDATORY rules below in detail plus the two-step prelude+POST skeleton you must follow. Authors who skip this and draft from training-data priors burn ~50s per validator rejection on iter 1. ═══════════════════════════════════════════════════════════════════ MANDATORY FOR EVERY STEP — the validator rejects on iter 1 if any of these are violated: ═══════════════════════════════════════════════════════════════════ R10 — every step MUST carry a captured "response": {status, body, headers} block. Hit the endpoint locally before authoring (curl) and paste the real response. Steps with no response block are rejected outright; four downstream rules (R4 / R11 / R15-R16 / R27) silently no-op until R10 is satisfied, so missing a response also hides every assertion / extract problem in that step. R9 — every POST / PUT / PATCH body MUST reference at least one {{var}} whose generator is declared on an EARLIER step's "extract" (typically a /health prelude as step 0). Without this, the second run collides on the first run's database state. App-level appLevelCustomVariables DO NOT qualify for R9 — the validator only credits step-level extracts. R2 — pre-request fields ("body", "url", "headers") CANNOT reference the CURRENT step's own "extract" outputs. Extract runs AFTER the response comes back; pre-request substitution sees nothing yet. Together R9 + R2 force the prelude pattern: declare generators on step 0, use them from step 1+. The STEP SHAPE example below shows the canonical two-step layout. R15 — every assertion's path / status / header MUST resolve against the AUTHORED response block. JSONPath uses gjson dot-array syntax: $.orders.0.id — NOT $.orders[0].id (the bracket form does not resolve in gjson; the assertion is rejected as "key not present in recorded body"). For status_code / header_* assertions, the values must match what's in response.status / response.headers verbatim — capture the real response via curl before authoring. R32 — every step-level extract key MUST NOT collide with the app's appLevelCustomVariables (enumerate them via get_app_testing_context or getApp before authoring). The runtime's variable lookup resolves app-level first, so a colliding key means the suite's function silently never runs. Suite-suffix when in doubt: userNonceForSuite, not genUserId. Don't invent a parallel generator with the same name as an existing app-level one. ═══════════════════════════════════════════════════════════════════ GRAPHQL APPS — read this section FIRST when the app's primary endpoint is /graphql (or any other path that accepts a GraphQL request envelope). REST apps can skip this section entirely. ═══════════════════════════════════════════════════════════════════ WHEN GRAPHQL VALIDATION RUNS — automatic dispatch, no flag to set: The validator inspects each step's request body. If ANY step's body parses to a JSON envelope containing a non-empty "query" string OR an "extensions.persistedQuery.sha256Hash" (APQ flow), the ENTIRE suite routes through the GraphQL validator (G-rules). Otherwise it routes through the REST validator (Charan's R-rules described above). Decision is per-suite, NOT per-step. Consequences: - Pure GraphQL suite (every step is POST /graphql with envelope body) → G-rules - Pure REST suite (no step's body looks like a GraphQL envelope) → R-rules - MIXED suite (some REST steps, some GraphQL steps in ONE suite) → tilts GraphQL and REST steps fail G1 envelope check with a clear error Rule of thumb: DO NOT mix in one suite. Author one suite per surface — separate REST suites for REST endpoints, separate GraphQL suites for /graphql. App-level both is fine; suite-level mixing is rejected by design. GRAPHQL SUITE SHAPE — every step is POST to /graphql with body = { "query": "<operation document>", "variables": { <map of $variable values, optional> }, "operationName": "<name>" // required when query has multiple operations } Body is a JSON-encoded STRING (same convention as the REST shape). URL is always /graphql. Method is always POST. headers must include Content-Type: application/json. PLACEHOLDER POSITION — substitute {{var}} INSIDE envelope.variables values, NEVER inside the query string: ✓ "body":"{\"query\":\"query Q($id: ID!){user(id:$id){id email}}\", \"variables\":{\"id\":\"{{createdUserId}}\"},\"operationName\":\"Q\"}" ✗ "body":"{\"query\":\"query{user(id:\\\"{{createdUserId}}\\\"){id email}}\"}" The second form fires G23 — placeholder inside query string. Substitution at the document level injects raw text into syntactic positions where it doesn't belong; the resulting query may parse weirdly server-side. Always use the variables map. OPERATION-TYPE SEMANTICS — the validator parses the query and detects whether the active operation is query / mutation / subscription: - mutation operations must include ≥1 dynamic value in envelope.variables (G24). Idempotent mutations (e.g. mutation { logout }) waive the check via step.idempotent:true. - subscription operations are REJECTED outright (G25). Live-run is request/response only; streaming subscriptions can't be exercised here. Author a separate harness for those. - query operations have no dynamism requirement (reads are inherently idempotent). EXTRACT PATHS for GraphQL responses — paths always start with $.data.* or $.errors.* (the GraphQL response envelope wraps everything under one of those keys): ✓ extract: { "createdId": "$.data.createUser.id" } ✓ extract: { "firstErrCode": "$.errors.0.extensions.code" } ✗ extract: { "createdId": "$.id" } ← fires G30, fails to resolve Aliased fields: if a query aliases a field ('me: user(id: ...)'), the extract path must reference the ALIAS, not the underlying field: ✓ query { me: user(...){ id } } → extract: { "userId": "$.data.me.id" } ✗ query { me: user(...){ id } } → extract: { "userId": "$.data.user.id" } ← fires G32 GENERATOR PLACEMENT — fn-generator {{var}} values (backed by an "extract" entry whose value starts with "function name(){...}") re-evaluate on EVERY substitution call. They're correct ONLY in mutation envelope.variables for the FIRST time a value is created: ✓ Step 1 (mutation): variables.email = "{{regenEmail}}" ← fresh email per run, OK ✗ Step 2 (query): variables.id = "{{regenEmail}}" ← G28 rejects (regenerator returns a fresh value the server never persisted) ✗ assert[N].expected = "{{regenEmail}}" ← G27 rejects (re-evaluates at assert time, won't match the value the request body sent) For query variables / assert.expected, ALWAYS reference a JSONPath extract from a prior creating step: Step 1 (mutation): extract: { "createdUserId": "$.data.createUser.id" } Step 2 (query): variables: { "id": "{{createdUserId}}" } ← stable, OK Step 2 (query): assert: [{ ..., "expected": "{{createdUserId}}" }] ← stable, OK ASSERTION PATTERNS for GraphQL — every step needs MORE than status_code (GraphQL returns 200 even when errors[] is populated; bare status_code:"200" hides real resolver failures): - Happy-path steps: include {"type":"graphql_no_errors"} — verifies errors[] is empty (G40). Bare status_code without graphql_no_errors fires G40. - Error-path steps: set step.expect_error:true AND include at least one of: {"type":"graphql_has_error_code", "expected":"<code>"} {"type":"graphql_error_path_eq", "expected":"<dotted.path>"} {"type":"graphql_error_message_contains", "expected":"<substring>"} An expect_error:true step without any error-shape assertion fires G42. - DON'T use ONLY status_code in any GraphQL step — pair with body-level assertions (G41). CANONICAL HAPPY-PATH SHAPE — three steps, prelude + mutation + read-back: Step 0 (prelude, idempotent:true): body: {"query":"{ __typename }"} extract: { "regenEmail": "function regenEmail(){return 'u_'+Date.now()+'_'+Math.random().toString(36).slice(2,8)+'@x.com';}" } assert: [{"type":"status_code","expected":"200"}, {"type":"graphql_no_errors"}] Step 1 (mutation): body: {"query":"mutation M($input: CreateUserInput!){ createUser(input: $input){ id email name role } }", "variables":{"input":{"email":"{{regenEmail}}","name":"Alice","role":"USER"}},"operationName":"M"} extract: { "createdId": "$.data.createUser.id", "createdEmail": "$.data.createUser.email" } assert: [{"type":"status_code","expected":"200"}, {"type":"graphql_no_errors"}, {"type":"json_equal","key":"$.data.createUser.role","expected":"USER"}] Step 2 (query read-back): body: {"query":"query Q($id: ID!){ user(id: $id){ id email name role } }", "variables":{"id":"{{createdId}}"},"operationName":"Q"} assert: [{"type":"status_code","expected":"200"}, {"type":"graphql_no_errors"}, {"type":"json_equal","key":"$.data.user.id","expected":"{{createdId}}"}, {"type":"json_equal","key":"$.data.user.email","expected":"{{createdEmail}}"}] CANONICAL ERROR-PATH SHAPE — single step that intentionally triggers a server-side error: { "name":"createUser with duplicate email", "method":"POST", "url":"/graphql", "headers":{"Content-Type":"application/json"}, "body":"{\"query\":\"mutation M($input: CreateUserInput!){ createUser(input:$input){id} }\", \"variables\":{\"input\":{\"email\":\"seed@example.com\",\"name\":\"Dup\",\"role\":\"USER\"}},\"operationName\":\"M\"}", "expect_error": true, // tells G42 this is an error-path step "idempotent": true, // step is idempotent — always errors with same code "response": { "status": 200, "headers": {"Content-Type":"application/json"}, "body": "{\"data\":null,\"errors\":[{\"message\":\"user with email \\\"seed@example.com\\\" already exists\",\"path\":[\"createUser\"],\"extensions\":{\"code\":\"EMAIL_EXISTS\"}}]}" }, "assert": [ {"type":"status_code","expected":"200"}, {"type":"graphql_has_error_code","expected":"EMAIL_EXISTS"}, {"type":"graphql_error_path_eq","expected":"createUser"} ] } ═══════════════════════════════════════════════════════════════════ APP CONFIG FIRST — read the app before authoring: ═══════════════════════════════════════════════════════════════════ Before any other step, call getApp({app_id}) and read these fields: * appLevelCustomVariables — dynamic generators and static fixtures pre-configured by the dev, shared across every suite for this app. Common shapes: - genUserId, genProductName (JS functions returning fresh entropy per run, e.g. `alice_<rand1-10000>`) - staticUser (a fixed user the dev wants tests to use) - zeroQuantity, negativePrice, invalidUser (static fixtures for validation tests) PREFER these over inventing your own JS-function in `extract`. They're the dev's authoritative dynamic-input set — using them in POST/PUT/PATCH bodies via `{{varName}}` means each replay hits a fresh row, sidestepping duplicate-key errors. Inventing a parallel generator with the same intent risks name-collision rejection (see Name-collision check below). * auth — the auth shape suites must satisfy (header / cookie / oauth / none). * ignoreEndpoints, rateLimit, timeout — runtime knobs that shape what assertions can hold. If a relevant `gen*` already exists in appLevelCustomVariables, ALWAYS reference it via `{{name}}` rather than authoring a parallel one. The dev configured it for a reason. ═══════════════════════════════════════════════════════════════════ BEFORE CREATING — check for duplicates AND for existing recordings: ═══════════════════════════════════════════════════════════════════ A (app_id, branch_id) tuple holds at most one suite per scenario, AND if the dev has already captured the relevant traffic via `keploy record`, you should seed from that recording instead of curling the app fresh. Two bounded checks before create_test_suite: (1) Duplicate-suite check — call listTestSuites({app_id, branch_id, q: "<scenario-keyword>"}) where <scenario-keyword> is a substring of the name you're about to author (e.g. "checkout", "auth"). The server filters by name regex, so the response is bounded to relevant matches regardless of how many suites the app has. If you can't pick a keyword (the dev's intent is vague), call with page_size=20 and NO q, then scan the first page only — DON'T paginate further. Match by name (case-insensitive) AND by intent. If any existing suite covers the same scenario: - Same scenario, refresh wanted → call update_test_suite (preserves history) or delete_test_suite + create_test_suite (loses history). - Adjacent but distinct scenario (e.g. "checkout with discount" vs "checkout without discount") → create with a name that distinguishes them clearly. (2) Recording-reuse check — call listRecordings({app_id, limit: 10}) to fetch the 10 most recent `keploy record` sessions. Recordings cluster by scenario; the top 10 cover what's likely relevant — DON'T paginate the full history. For any recording whose name/timestamp suggests it covers the scenario you're authoring, call download_recording({app_id, test_set_id}) to pull its captured test cases (real request/response pairs from the live app). Seed your steps_json from those test cases — convert each one into a step (method/url/headers/body → request fields; recorded response → step's `response` field). This is more faithful than re-curling and saves the dev's time. If no recent recording covers the scenario, fall through to the normal validate-locally-before-inserting flow (curl each endpoint yourself). (3) Only after both checks → proceed with create_test_suite. Skipping (1) leaves the dev with two suites covering the same flow — confusing reports, double rerecord cost, and orphaned sandbox tests on whichever suite they stop using. Skipping (2) re-curls endpoints whose traffic the dev already captured. ═══════════════════════════════════════════════════════════════════ ONE SCENARIO PER SUITE — load-bearing constraint: ═══════════════════════════════════════════════════════════════════ A suite represents EXACTLY ONE user-facing scenario / use-case (e.g. "user registers and creates their first order", "admin promotes a user role", "checkout with discount applied"). Do NOT pack multiple unrelated scenarios into a single suite — every step in a suite shares state and ordering with every other step. Mixing scenarios breaks idempotency (cleanup for one scenario can wipe state another scenario assumed), makes failures harder to diagnose, and inflates rerecord cost. Tests for "auth + payments + cleanup" → THREE suites, not one. Related steps that share extracted vars and a state assumption belong in the same suite; unrelated flows don't. When in doubt: if you can't write a single sentence describing what the suite tests in user-facing terms, split it. ═══════════════════════════════════════════════════════════════════ IDEMPOTENCY CONTRACT — the load-bearing rule for every suite: ═══════════════════════════════════════════════════════════════════ Every suite MUST be replayable indefinitely without state drift. The same suite run twice in a row, or 100 times back-to-back, must produce the same per-step outcomes. Failing this makes the suite useless for sandbox replay (the captured mocks freeze a single point-in-time response, so any state-dependent step diverges on rerun). How to design for it: * Duplicate-key 500 on POST / PUT / PATCH replay ("duplicate key" / "already exists" / "unique constraint violated") is ALWAYS a SUITE design problem, NEVER an app problem. Fix order: (1) Reference an app-level `gen*` var via `{{name}}` in the body — works if one exists (you read appLevelCustomVariables in APP CONFIG FIRST). (2) If no fitting app-level generator, declare your own JS-function in a PRIOR step's `extract` (see PRELUDE PATTERN); reference it via `{{name}}` in the failing step's body. (3) Add a DELETE cleanup step earlier in the suite to clear the conflicting row. NEVER propose modifying app code (e.g. adding `ON CONFLICT` to the INSERT, retry loops, transactional wrappers). The app's dedup is correct; the suite is what's missing entropy. See DO NOT MODIFY APP SOURCE CODE below. * If a step CREATES a resource, a later step in the same suite MUST clean it up (DELETE the row, revert the state) — OR the create must be idempotent on the server side (PUT-by-key, upsert). A naked POST that always allocates a new ID will diverge on every replay. * If a step depends on a resource, EXTRACT its identity from a prior step's response into a `{{var}}` — never hard-code an ID that "happens to exist right now". Hard-coded IDs rot. * Reject "natural-language idempotency" reasoning ("the dev will reset the DB before each run"). The suite must work without external setup. If you can't guarantee it, you've packed two scenarios into one suite — split them. * Do not assume time-of-day, ordering relative to other suites, or random-but-stable values. Each suite is its own universe. * Pagination / list endpoints: extract the count or a known item, don't assert on absolute indices ("the third item is X") — index drifts as the dataset grows. * Auth tokens: pull from app-level custom variables or extract from a login step IN THE SUITE. Never inline a token that expires. If the dev's request implies non-idempotent behaviour (e.g. "create user, then test that creating the same user fails"), capture both states explicitly inside the suite — first step creates, second step asserts the conflict response, third step deletes — so the suite as a whole is still replayable. Don't push the cleanup outside the suite. A suite that fails idempotency is rejected at create_test_suite time by the dynamic validator (2 live runs check). When that fails, do NOT retry by tweaking syntax — restructure the scenario. ═══════════════════════════════════════════════════════════════════ DO NOT MODIFY APP SOURCE CODE during suite authoring: ═══════════════════════════════════════════════════════════════════ At create_test_suite time, your job is to author a suite that fits the app AS IT IS. You may patch the app's CONFIG (auth, appLevelCustomVariables, ignoreEndpoints, rateLimit) via updateApp({app_id, ...}) — those are runtime knobs the dev expects to tune. You may NOT modify the app's SOURCE CODE. If a step is failing because of how the app behaves (500s, contract mismatches, missing endpoints, validation errors), the response is ONE of: * Adjust the suite to match observed behavior (steps_json edits before insert). * Use an app-level dynamic var (see APP CONFIG FIRST) or a JS-function generator to avoid the failure (see IDEMPOTENCY CONTRACT's duplicate-key fix order). * Patch the app's CONFIG via updateApp if the cause is auth / vars / rate limit. * If the dev confirms the app is broken AND the suite is correct, ASK the dev to fix the app — do NOT propose code changes yourself during authoring. NEVER propose ON CONFLICT clauses, retry loops, transactional wrappers, or any code-level change to the dev's application as a way to make the suite work. The suite must accommodate the app, not the other way around. ═══════════════════════════════════════════════════════════════════ NEVER-MISS-THESE — the validator HARD-REJECTS suites missing any of these: ═══════════════════════════════════════════════════════════════════ 1. `response` on EVERY step — { status: <int>, headers: {…}, body: "<string>" }. Captured from a real curl against the dev's app. body MUST be a JSON-encoded STRING (the raw body bytes), NOT a parsed object. Wrap with json.dumps / JSON.stringify if your tool gave you a dict. 2. `extract` is the ONLY authoring slot — never `extract_variables`. `extract_variables` is a post-run runtime SNAPSHOT field; the extract_variables-input rejection rule hard-rejects it on input. If you read an existing suite via getTestSuite / get_app_testing_context / download_recording and see `extract_variables` populated there — IGNORE IT, that's the runtime's display state, not the suite's input. Always author with `extract`. 3. POST/PUT/PATCH bodies need a per-run dynamic `{{var}}` (mutating-step dynamism check). Declare a JS-function generator on a PRIOR step's `extract` (typically a /health prelude as step 0). The declare-and-use-same-step check forbids declaring-and-using on the same step. See PRELUDE PATTERN below. 4. JSONPath uses gjson dot-array syntax: `$.orders.0.user_id` — NOT `$.orders[0].user_id`. ═══════════════════════════════════════════════════════════════════ STEP SHAPE (steps_json is an ARRAY — copy this two-step skeleton verbatim, preserve the prelude pattern): [ { // Step 0: cheap read prelude. Its sole job is to declare JS-function generators that // later POST/PUT/PATCH bodies reference. Required by R9 (mutating bodies need a per-run // dynamic var) + R2 (same-step extract isn't usable in pre-request fields). If your // suite already has a natural read step (/health, /me, version), reuse it as the prelude. "name": "health prelude (declares generators)", "method": "GET", "url": "/health", "headers": { "Accept": "application/json" }, "extract": { "genUserId": "function genUserId(){return 'u_'+Date.now()+'_'+Math.random().toString(36).slice(2,8);}" }, "assert": [ { "type": "status_code", "expected": "200" }, { "type": "json_equal", "key": "$.status", "expected": "healthy" } ], "response": { "status": 200, "headers": { "Content-Type": "application/json" }, "body": "{\"status\":\"healthy\"}" } }, { // Step 1: the actual mutation. Body references {{genUserId}} from the PRIOR step's // extract — satisfies R9 (per-run dynamic var) and R2 (not same-step). This step's // own "extract" captures the SERVER's response value (JSONPath) so a later step can // chain to {{user_id}} — JSONPath captures on the same step ARE legal because they // resolve post-response and only matter for subsequent steps. "name": "create user", "method": "POST", "url": "/api/users", "headers": { "Content-Type": "application/json" }, "body": "{\"name\":\"{{genUserId}}\"}", "extract": { "user_id": "$.data.id" }, "assert": [ { "type": "status_code", "expected": "201" }, // assert a STATIC field of the response, not a dynamic one. R30 // forbids {{genUserId}} in assert.expected (the runtime would // re-evaluate the function at assertion time and the value // wouldn't match the body's earlier call). Pick something the // server always returns the same — here the literal "status" // field. To assert against the dynamic id the server minted, // capture it via extract (above) and reference {{user_id}} in // a LATER step's assertion or url, not this step's. { "type": "json_equal", "key": "$.status", "expected": "created" } ], "response": { "status": 201, "headers": { "Content-Type": "application/json" }, "body": "{\"data\":{\"id\":\"abc-123\",\"name\":\"u_1700000000_xyz\"},\"status\":\"created\"}" } } ] VALID assertion types (ONLY use these — anything else fails the step at runtime with "invalid assertion type"): * status_code — exact HTTP status match. {type, expected:"201"} * status_code_class — match by class 2xx/3xx/… {type, expected:"2xx"} * status_code_in — any of a set. DELETE STEPS ONLY (status_code_in-scope check). For POST/GET/PUT/PATCH this is rejected. If you reach for it to absorb a duplicate-key 500 on re-runs, the right fix is a JS-function {{var}} in the body (see `extract` rules below) so each run hits a fresh row and only 201 is ever returned. {type, expected:"200,201,204"} * header_equal — response header exact match. {type, key:"Content-Type", expected:"application/json"} * header_contains — header value substring. {type, key:"Location", expected:"/orders/"} * header_exists — header is present. {type, key:"X-Request-Id"} * header_matches — header regex. {type, key:"Etag", expected:"^W/\\\".+\\\"$"} * json_equal — response body JSON path text match (string-compares the value at the path; type-blind, so number 2 matches string "2" and ["pending"] matches "pending"). {type, key:"$.order.status", expected:"created"} * json_strict_equal — response body JSON path TYPE-STRICT deep-equal. Catches shape mutations json_equal misses: number↔string-of-number, bool↔string-of-bool, null↔"", scalar↔single-element array. `expected` MUST be a JSON-typed literal (NOT a quoted string) for non-string types: `expected: 2` (number), `expected: true` (bool), `expected: null`, `expected: ["a","b"]`, `expected: "hello"` (string). Use this when the test is meant to catch wire-shape regressions. {type, key:"$.order.amount", expected: 99.5} * json_contains — response body JSON path substring/partial. {type, key:"$.message", expected:"success"} * custom_functions — inline JS function: (request, response, variables, steps) => boolean. {type, expected:"function f(request,response){return response.status===201;}"} ADDITIONAL GRAPHQL-AWARE assertion types (use on suites whose body is a GraphQL request envelope — required by validator rules G40 / G42 to verify resolver-level success or error contracts that the bare status_code 200 would silently miss): * graphql_no_errors — pass when response.body.errors[] is absent or empty. REQUIRED on every happy-path GraphQL step (G40). {type} * graphql_has_error_code — pass when at least one errors[*].extensions.code equals expected. Use on expect_error:true steps. {type, expected:"EMAIL_EXISTS"} * graphql_error_path_eq — pass when at least one errors[*].path (joined with ".") equals expected. {type, expected:"createUser.email"} * graphql_error_message_contains — pass when at least one errors[*].message contains expected as substring. {type, expected:"already exists"} * graphql_data_not_null — pass when response.body.data is non-null. Inverse companion to graphql_no_errors. {type} DO NOT use any assertion type not in the bulleted lists above (or any earlier bulleted list of generic types for REST/shared use). The combined whitelist also includes schema and selected_fields (REST-parity types valid on GraphQL responses too). Anything outside the whitelist is rejected — there are no wildcards. These are types AIs commonly invent that DO NOT EXIST in keploy: ✗ json_type — there is no type-of check; assert against the literal value via json_equal, or use custom_functions with a typeof predicate. ✗ json_path — paths are passed via the "key" field of json_equal / json_contains; there is no separate path-only type. ✗ json_schema — no schema validation; closest is custom_functions with an inline schema check. ✗ json_array_length — no length-only assertion; capture .length via extract, or use custom_functions. ✗ header_starts / header_ends — only header_equal, header_contains, header_exists, header_matches (regex) exist. ✗ status_in / status_range — the real names are status_code_in / status_code_class. ✗ body / body_equal — no body-level type; assert against parsed paths via json_equal / json_contains, or use custom_functions. Anything not literally in the bulleted list above will get rejected by the validator — don't extrapolate from prefixes. `expected` values must be STRINGS (put numbers like 201 in quotes). `expected_string` is auto-populated; you can omit it. VARIABLES — purpose-first: a suite is a SCENARIO CHAIN; variables carry continuity between steps. Step N creates or fetches a resource → extracts its identity into a named var → step N+M uses `{{var}}` to reference that identity. If you find yourself extracting a value that NO LATER STEP references, DROP the extract — it's noise that hides which fields actually drive the scenario. Mechanically: extract values from one step's response with `extract: {varname: "$.path"}` (JSONPath). Reference later with `{{varname}}` in headers, body, url, or assertion `expected` values. STEP IDS & TRACKING HEADERS are auto-injected — don't provide them. The server assigns a UUID per step and adds X-Keploy-Test-Step-ID / X-Keploy-Test-Suite-ID / Keploy-Test-Name so the sandbox runner can correlate responses to steps. VARIABLE RULES (the runner follows these exactly — see pkg/service/atg/customFeatures.go ResolveCustomVariables): * Syntax: {{name}} — regex matched: {{(\w+)}} (letters, digits, and underscore only; NO whitespace, hyphens, or dots inside the braces). Names like {{gen-user}} or {{gen.user}} will NOT be substituted — use {{gen_user}} instead. * Substitution happens in: url, body, headers values, AND assertion "expected" values (so an assertion expecting {{genUserId}} gets the SAME resolved value the body used). * Resolution sources (looked up in this order): 1. The step's own `extract` map (seeded into the vars pool at step entry — pkg/service/atg/core.go:3698). 2. Variables produced by EARLIER steps' `extract` maps (post-response JSONPath captures). 3. App-level custom variables (stored on the app record, shared across all suites). THE `extract` FIELD IS THE ONLY AUTHORING SLOT — use it for BOTH static values and JS-function generators. `extract_variables` IS NOT AN AUTHORING SLOT. It's a post-run runtime SNAPSHOT — the runner writes resolved {{var}} values there after each step executes so the UI can show what landed at runtime. **The validator now HARD-REJECTS any step with `extract_variables` populated (extract_variables-input rejection).** If you see `extract_variables` while reading an existing suite via getTestSuite / download_recording / get_app_testing_context, IGNORE IT — that's the runtime's display state, not the suite's input. To author the equivalent, put every entry into `extract` instead (same keys, same values: JSONPath strings stay JSONPath, JS-function strings stay JS). TWO SHAPES THE `extract` FIELD ACCEPTS — pick the right one: (a) JSONPath capture — `"order_id": "$.order.id"` Evaluated against the step's recorded response.body after the request returns. The captured value is staged into vars for LATER steps to reference via {{order_id}}. Use this when the value you need is in the server's response. (b) Inline JS-function generator — `"genUserId": "function genUserId(){ return 'alice_' + Date.now() + '_' + Math.random().toString(36).slice(2,8); }"` The value string must contain the keyword `function` — that is how the runner (core.go:4107 isInlineJs branch) distinguishes JS from a JSONPath. Signature: function <name>(steps) { ... return '<string>'; } returning a string. The `steps` arg is a map of prior-step {request,response} snapshots; ignore it if unused. Use this for inputs that must be unique per run (user_ids, timestamps, uuids) so the suite stays idempotent on re-runs against the same DB. Examples: {"genUserId": "function genUserId() { return 'alice_' + Date.now() + '_' + Math.random().toString(36).slice(2,8); }"} {"genTs": "function genTs() { return String(Date.now() * 1e6 + Math.floor(Math.random()*1e6)); }"} {"genOrderId": "function genOrderId(steps) { return 'ord-' + Math.random().toString(36).slice(2,10); }"} Put the JS-function entry on the FIRST step that needs it (often a health-check step whose own body doesn't reference the var — that's fine, the seed fires on step entry regardless). Later steps reference `{{genUserId}}` in body/url/headers/assertion-expected and see the same resolved value within one run, a fresh value on the next run. WHY THIS MATTERS (the mistake to avoid): if you pin a static user_id like "alice_1776638347063146000" into `extract` AND your validation curl ALREADY inserted that row, the very next record/sandbox replay run will fire the same POST body, the producer (with deterministic ids or unique-constraint indexes) will reject the duplicate with a 500, and every downstream $.order.* / $.shard.* assertion will hit <missing>. Fix: use a JS-function entry so every run gets a fresh user_id. VARIABLE CHAINING (JS-function generator on step 1, JSONPath capture for step-2 chaining): step 1: body: {"user_id":"alice_{{genUserId}}","product_name":"Keyboard_{{genUserId}}"} extract: { "genUserId": "function genUserId(){ return Date.now() + '_' + Math.random().toString(36).slice(2,8); }", "order_id": "$.order.id" } assertions: [ {type:"status_code", expected:"201"}, {type:"json_equal", key:"$.order.user_id", expected:"alice_{{genUserId}}"} -- SAME resolved value as the body used ] step 2: url: /api/orders/{{order_id}} -- resolves from step 1's JSONPath extract at run time assertions: [ {type:"json_equal", key:"$.id", expected:"{{order_id}}"} ] PRELUDE PATTERN — when MULTIPLE POST steps each need their OWN per-run dynamic var: A common mistake is to put the JS-function generator on the SAME step that uses it in the request body. The declare-and-use-same-step check rejects this — same-step `extract` is post-response, so its values aren't in scope when the request fires. Pattern that works: declare the generator on an EARLIER step's `extract` (typically a cheap /health GET as a "prelude"). The runner seeds extract values at STEP ENTRY, so a generator on step 0 is in scope from step 0 onwards — every later POST can reference it. step 0 — prelude (the extract entry is what matters; the step itself can be anything cheap): method: GET, url: /health extract: { "uniq": "function uniq(){ return 'p_'+Date.now()+'_'+Math.random().toString(36).slice(2,8); }" } assertions: [ {type:"status_code", expected:"200"} ] step 1, 2, 3 — POSTs that all reference {{uniq}}: method: POST, url: /api/orders body: '{"user_id":"alice","product_name":"{{uniq}}",...}' // (no `extract` needed — the generator is in scope from step 0) The prelude itself doesn't need to USE the var; declaring it is enough. This is the right shape for "create N orders with different unique keys" — without the prelude, you'd hit the declare-and-use-same-step check on every POST that tries to declare-and-use a generator on the same step. VALIDATE-LOCALLY-BEFORE-INSERTING (CRITICAL for a usable suite): DO NOT call this tool with raw un-tested steps. EVERY step you send MUST have its "response" and "extract" fields populated from a live run. These are NOT optional. Without them: - The UI cannot render the step (shows an empty panel). - The rerecord runs blind and fails. - The step's {{variables}} won't resolve. Required per-step fields when calling this tool (in steps_json): • name, method, url, headers, assert — obviously • body — for POST/PUT/PATCH; MUST reference random inputs as {{varname}} placeholders, NOT inline timestamps. Inline timestamps get baked into the suite and collide on re-run. • extract — MUST contain a resolvable entry for every {{varname}} you reference in body/url/headers. JS-function entries are fine (they're the canonical "dynamic input" shape); JSONPath entries chain values from one step's response into later steps. Example: body has {"user_id":"alice_{{genUserId}}"} → step's extract must have {"genUserId":"function genUserId(){ ... }"}. • response — the raw captured response from your local curl. Shape: {"body":"<raw string>","status":201,"headers":{"Content-Type":"application/json",...}}. MANDATORY validate-locally flow (do this BEFORE calling create_test_suite): 1. Bring the dev's app up locally (Bash: docker compose up -d, or instruct the dev). Wait for /health readiness. 2. For EACH step in order (simulating what the runner will do): a. For dynamic inputs (user_id, timestamps, uuids): DON'T inline a value — write a JS function into the step's `extract` map, e.g. {"genUserId":"function genUserId(){return 'alice_'+Date.now()+'_'+Math.random().toString(36).slice(2,8);}"}, and reference it in body/url/headers/assertion-expected as {{genUserId}}. For the local curl, you still need a CONCRETE value for that run — so as you're building the step, JS-eval the function yourself (or just pick a value consistent with the function's output shape) to do ONE concrete local curl for capturing the response. That concrete value goes ONLY into the captured "response" body — NOT into `extract` (which keeps the JS function verbatim). b. Substitute {{name}} everywhere in the step's url/body/headers using accumulated variables (this step's extract + earlier steps' extract results). c. curl the SUBSTITUTED request against the live app. Capture the response. d. Check each "assert" against the captured response. If any fails → regenerate (different inputs / loosen assertion / change body shape) and retry this step. DO NOT move on with a failing step. e. Save the captured response into the step's "response" field as {"body":"<raw string>","status":<int>,"headers":{...}}. f. If the step has JSONPath entries in `extract`, evaluate each path against the response and note those values so later steps can use them in their {{var}} substitutions. 3. AFTER the first pass of all steps, run the WHOLE SEQUENCE a SECOND TIME against the same live app — no DB wipe in between. Because you're using JS-function generators, each run should pick fresh random inputs → no unique-constraint collision. If ANY step that passed the 1st run fails the 2nd run (common symptom: 500 "failed to save order" / "duplicate key" / "already exists"), the suite is NOT idempotent. Go back to step 2a and either (i) increase the entropy of the JS function, (ii) restructure the step to be READ-AFTER-WRITE instead of POST-then-POST, (iii) drop the step if it genuinely can't be made idempotent. 4. Only once every step passes BOTH validation runs → call create_test_suite. Pass each step's response + extract (with JS functions verbatim) through steps_json. If you call create_test_suite without response + extract on every step, you are creating a suite that is broken by construction. The UI + rerecord WILL fail. SELF-CONTAINED TESTS (required for repeat runs): * The suite will be re-run many times (local validation + record + sandbox replay + ad-hoc UI runs). It must not depend on prior state. * Put random inputs in `extract` JS functions with high entropy (timestamps, uuid). Plain "alice" will collide on re-run against producers with dedupe. * Prefer READ-AFTER-WRITE chaining: POST creates resource → extract id → GET uses id. Validates without depending on PRE-EXISTING ambient seed data (rows that "happen to exist" in the dev's DB). SEED → TESTED → CLEANUP roles within a suite: When the scenario is "user can read X" or "user can list X", you can't assert against ambient state — there's no guarantee the X exists when the suite runs. Pattern: step seed: POST /X (with dynamic {{var}} body) → extract id step tested: GET /X (or list) → assert against the seeded id step cleanup: DELETE /X/{{id}} → restore baseline Only the "tested" step is what the suite is FOR; the seed and cleanup steps are scaffolding so the test can run any number of times against any starting state. Without an explicit seed step, you're either testing nothing (empty list) or relying on pre-existing data the next replay won't have. DO NOT assert "the user has 3 orders" — that's ambient state. Seed N orders inside the suite first, then assert the count. Same applies to any "list / search / count" scenario: seed the data the test depends on, never assume it's there. ═══════════════════════════════════════════════════════════════════ SUBSTITUTION RULES — where {{var}} is and isn't allowed ═══════════════════════════════════════════════════════════════════ `extract` values come in two flavours and they substitute very differently: TYPE A — JS-function generator (`extract: { genUserId: "function genUserId(){return 'apple_'+Date.now()}" }`): The runtime STORES the source string and RE-RUNS the function on EVERY `{{genUserId}}` substitution site. Each call returns a fresh value. ONLY safe in POST / PUT / PATCH request BODIES — that's the one place you actually want a fresh value (uniqueness for inserts). TYPE B — JSONPath extract (`extract: { createdId: "$.order.user_id" }`): Evaluated ONCE against the step's recorded response, the resulting STRING is stored. Subsequent `{{createdId}}` substitutions resolve to that same fixed string. Safe everywhere — URLs, assertions, downstream bodies. ALLOWED placements for `{{generatorFn}}` (TYPE A): ✓ POST / PUT / PATCH request body — the canonical "give me a fresh value to insert" use case. FORBIDDEN placements for `{{generatorFn}}` (TYPE A) — the validator REJECTS these (generator-placement checks): ✗ `assert[*].expected` — the assertion's expected value will be a fresh function call, NOT the value the body sent. Static literals or `{{TYPE_B_extract}}` only. ✗ GET / DELETE / HEAD / PUT / PATCH URL (path or query) — those target an existing resource; the URL must encode the SAME id the creating POST used. Use a TYPE B extract from the creating step. ✗ Path/query of a downstream step's URL when the value should match what the upstream step inserted — same reason. CANONICAL PATTERN — read-after-write with a stable id: Step 0 (POST creates resource): body: {"user_id":"{{genUserId}}","name":"widget"} // TYPE A in body — fresh per run, OK extract: {"genUserId":"function genUserId(){...}", // TYPE A — body source "createdId":"$.order.user_id"} // TYPE B — capture server's stored id assert: [{type:"status_code", expected:"201"}, {type:"json_equal", key:"$.order.id", expected:"{{createdId}}"}] // TYPE B in assertion — stable Step 1 (GET reads it back): url: "/api/orders?user_id={{createdId}}" // TYPE B in GET URL — stable assert: [{type:"json_equal", key:"$.orders.0.user_id", expected:"{{createdId}}"}] // TYPE B — stable DO NOT do this (the validator will reject it): Step 0: body: {"user_id":"{{genUserId}}"} assert: [{type:"json_equal", key:"$.order.user_id", expected:"{{genUserId}}"}] // generator-placement check — TYPE A in assertion Step 1: url: "/api/orders?user_id={{genUserId}}" // generator-placement check — TYPE A in GET URL Name-collision check — do NOT pick an `extract` key that already exists on the app's appLevelCustomVariables. Use get_app_testing_context (or check the app payload) to enumerate them first; if a collision is unavoidable, scope-suffix your key (e.g. `genUserId_smokeTest`). Otherwise the runtime resolves the app-level variable first and silently shadows the suite's extract. You can also construct steps from data fetched via download_recording or get_app_testing_context, but the validate-locally-before-inserting rule still applies. CRITICAL — READING EXISTING SUITES: when the data you fetch via getTestSuite / get_app_testing_context / download_recording shows steps with `extract_variables` populated, that's the runtime's POST-EXECUTION SNAPSHOT (resolved values the runner wrote back for UI display). It is NOT what was authored. Treating that field as a copy-paste template makes the validator's extract_variables-input rejection reject every suite you produce. To replicate the authored behavior: copy every entry into `extract` instead, preserving keys and values verbatim (JS-function strings stay JS, JSONPath strings stay JSONPath). When in doubt: `extract_variables` is read-only output state; `extract` is input. ===== FROM-SCRATCH SCOPE RULE ===== When the dev asks to "generate / create / add / build keploy tests" without narrowing the scope, DEFAULT = ALL ENDPOINTS. Enumerate every non-trivial endpoint the app exposes (OpenAPI spec, router code, handler files) and author ONE suite per logical grouping — e.g. "user-crud", "auth-flow", "order-happy-path", "order-validation-errors". A single-endpoint app might produce one suite; a typical microservice produces 3-8. Groupings should be READ-AFTER-WRITE coherent (each suite's steps chain via extract variables rather than depending on outside state). TELL THE DEV up-front how many suites you're about to create and what each covers, then proceed — do NOT ask for confirmation mid-flow. If the dev explicitly narrows it ("just the happy path for orders", "only the auth flow"), honor that. ===== HARD RULE — NO DB-STATE-DEPENDENT STEPS ===== Do NOT include any step whose response body depends on total DB / queue / file-system state. Concretely: GET /items with no filter, GET /orders with no user_id filter, any "list-all" / "count" / "search" that returns more rows the longer the app has been running, any endpoint returning the CURRENT time / UUID / request-id. The auto-replay after record byte-compares the recorded response with what the live app returns under mocks, and non-deterministic bodies make gate 2 skip → the suite is never linked → sandbox replay fails with "no sandboxed tests". If you find yourself reasoning "this list might vary slightly but the mock should handle it" — STOP and drop the step. The suite should contain ONLY steps whose response body is fully determined by that step's own request: health checks, create-with-fresh-ids, read-back-by-id for the ids you just minted, validation-error 400s on bad payloads. Filtered reads using a freshly-extracted id are fine; unfiltered reads are not. ===== SERVER-SIDE IDEMPOTENCY ENFORCEMENT ===== This tool REJECTS (with a typed error, before creating anything) any suite whose mutating steps aren't idempotent on re-run. The rule: For each step with method ∈ {POST, PUT, PATCH} and a non-empty body, the body MUST contain at least one `{{name}}` placeholder that resolves to a per-run dynamic value. "Dynamic" means one of: (a) a JS-function entry in any step's `extract` map (e.g. {"genUserId": "function genUserId(){ return 'u_'+Date.now()+'_'+Math.random().toString(36).slice(2,8); }"}), OR (b) a JSONPath extract output from an EARLIER step (transitively dynamic if that step's own body was idempotent). If the endpoint is GENUINELY idempotent (e.g. POST /auth/refresh, PUT /tags/apply-same-input — repeat calls don't hit unique constraints) set `"idempotent": true` on the step to waive the check. Use this sparingly — the default is "assume it'll collide" because that's the common case. On rejection the error names the offending step and lists the dynamic variable names already in scope so you can see what's wireable. Fix the suite and retry — do NOT just flip `idempotent: true` to bypass. ===== MANDATORY OUTPUT — Phase 1 section ===== After all create_test_suite calls in a FROM-SCRATCH flow succeed, your final message to the dev MUST contain a section with this exact heading (do NOT collapse into prose; emit even for a single suite): ### Phase 1 — Inserted suites | Suite name | suite_id | Step count | | --- | --- | --- | | <name> | <suite_id> | <N> | One row per suite created in this flow. Next step after Phase 1 is record_sandbox_test (see its description for Phase 2). ===== HOW THIS TOOL ACTUALLY INSERTS THE SUITE ===== This tool DOES NOT POST the suite to api-server itself. It returns a "playbook" — a small array of shell steps for you (Claude) to walk via Bash. The playbook spawns the enterprise CLI `keploy create-test-suite` which: 1. Reads the suite JSON the playbook wrote to disk. 2. Runs every static structural check — exits 1 with violations on stdout if anything fails. 3. Fires the suite against the dev's local app TWICE (idempotency check) — exits 1 if the second run diverges from the first. 4. Runs dynamic checks (generator-dynamism + GET-coupling) — exits 1 with violation messages on failure. 5. POSTs the validated suite to api-server (HTTP 201 → success; HTTP 426 → CLI is older than api-server's rule set, dev needs to upgrade `keploy`). Walk the playbook in order. If step 2 (the CLI run) exits non-zero, surface its stdout to the dev — it lists the offending step / check / fix-it hint and includes a canonical step skeleton on structural failures. ITERATE LOCALLY: revise the JSON in your draft, REWRITE the same suite file via Bash, and RE-RUN step 2 directly. DO NOT call create_test_suite again per iteration — that mints a fresh playbook and a new nonce-path for no reason; the existing one is reusable. The CLI ALSO requires every step to have `response` and `extract` populated (step completeness check plus the validate-locally rules above), so the validate-locally curl flow described earlier is still required BEFORE calling this tool. PREREQUISITES the playbook assumes: * The dev's app is up and reachable at app_url. * `keploy` binary is on PATH. If missing, install before calling this tool: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. * Either ~/.keploy/cred.yaml exists (API key) or KEPLOY_API_KEY is exported. The CLI uses the API key for the api-server POST (different from the OAuth-JWT path the sandbox tools use).
    Connector