Skip to main content
Glama
167,444 tools. Last updated 2026-06-02 22:26

"Loop" matching MCP tools:

  • Replay the sandbox test for one or more suites against captured mocks — re-runs the suite's steps against the dev's locally-running app while keploy serves outbound calls (DB, downstream HTTP, etc.) from the captured mocks. Use this when the dev says "replay", "run my sandbox tests", "integration-test", "check if mocks still match" — keywords "sandbox" / "replay" / "mocks" / "integration-test" all map here. Also the REPLAY STEP of FROM-SCRATCH: call this LAST (after create_test_suite + record_sandbox_test) to give the dev the whole-app regression picture against the freshly captured mocks. Output produces a SANDBOX RUN REPORT — it answers "does the suite still hold up against its captured baseline?". ═══════════════════════════════════════════════════════════════════ DISAMBIGUATION — pick this tool vs. replay_test_suite: ═══════════════════════════════════════════════════════════════════ USE replay_sandbox_test (THIS TOOL) when the dev says: * "run my sandbox tests" / "replay my sandbox tests" * "integration-test my app" / "run the integration tests" * "check if my mocks still match" / "replay against the captured mocks" * "rerun my sandbox suite" (with the word "sandbox") Trigger keyword: an explicit "sandbox" / "replay" / "mocks" / "integration-test" — silent signal that the dev wants captured-mock replay, NOT live-app execution. USE replay_test_suite INSTEAD when the dev says: * "run the test suite" / "run my test suites" (bare — no "sandbox") * "execute test suite X" / "run suite 810d3ebe…" * "test the suite again" / "smoke test against the live app" Bare verbs ("run / test / execute") applied to "the suite" without the word "sandbox" mean LIVE-APP execution, NOT captured-mock replay. replay_test_suite hits the dev's running localhost app directly via HTTP — no docker spin-up, no mocks. After a record_sandbox_test run, the natural next step is THIS tool (replay against the just-captured mocks). After create_test_suite / update_test_suite, the natural next step is replay_test_suite (validate against the live app). When the dev's verb is bare and the prior turn doesn't make the intent obvious, ASK rather than picking sandbox-replay silently — code-change regressions can hide under "mock didn't match" failures. ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. SCOPE — whole-app vs single-suite: * Default: LEAVE suite_ids UNSET → the tool resolves "every suite for the app that has a sandbox test (test_set_id populated)" and replays them all. Use this for "run my sandbox tests" / "check if my tests still pass" — whole-app regression. New suites auto-pick up. * Single / subset: PASS suite_ids when the dev names specific suites — "replay sandbox test for suite 810d3ebe-…", "replay only the auth suite", "run suite X and Y". The tool validates each requested id is actually a suite with a sandbox test (has test_set_id); an unlinked id gets a precise "record first" error instead of an opaque downstream CLI failure. This tool resolves the app, picks the suite set per the rule above, and returns a single playbook that drives the replay for them. It does NOT record. WHAT THIS TOOL DOES INTERNALLY (so you don't have to): 1. Resolves app_id — use the explicit app_id if the caller has one; otherwise pass app_name_hint (usually the cwd basename) and the server does listApps with a substring match. Multiple matches → error listing them; zero matches → error suggesting the dev generate a suite first. 2. Lists test suites for the app, keeps only those with a non-empty test_set_id. Zero linked → typed "no linked sandbox tests" error. 3. If suite_ids was passed, validates every requested id is in the linked-suites set; unlinked ids → typed error pointing to record_sandbox_test. 4. Returns the headless playbook — walk it exactly: spawn CLI in background, tail the progress file (PID-alive guard built in), read the terminal event, fetch the report. No separate cleanup step — the CLI exits on its own. ===== PREREQUISITES ===== (Same as record_sandbox_test — if you just recorded, you already have them. Same docker-compose network rule applies: use the same compose file + service, stop the app service before calling, leave deps running.) - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== Same headless playbook shape as record_sandbox_test: spawn `keploy test sandbox --cloud-app-id …` in the background via Bash, poll `tail -n 1 $PROGRESS_FILE` repeatedly (no sleep loops; the wait_for_done step has a built-in `kill -0 $KEPLOY_PID` guard so the loop exits if the CLI dies silently), read the terminal NDJSON event (phase=done, data.ok, data.test_run_id), and — if ok=true — call get_session_report(app_id, test_run_id) with verbose=true at the end. No separate cleanup step needed; the CLI exits cleanly once phase=done is written. ===== MANDATORY OUTPUT — Phase 3 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT merge with Phase 2; do NOT compress the failed-steps table even when failures are homogeneous): ### Phase 3 — Sandbox run report Under it, emit the uniform three-subsection format owned by get_session_report: (i) per-suite table — one row per suite in per_suite, passing suites included, columns = Suite name | passed/total steps. (ii) failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. Never collapse rows. (iii) Diagnosis + Recommendation (see get_session_report description for case-specific rules around mock_mismatch_dominant, repo-diff inspection, and the SKIP / FIX-CODE / FIX-TEST branching for fix-it follow-ups). Do NOT print aggregate step totals across suites — they mix unrelated suites and hide where damage actually is. ===== ROLLUP LINE ===== Close the message with a final one-line rollup paragraph (no heading), in addition to the three phase sections. Mention the TOTAL number of suites replayed (which may exceed the count created in this session, because replay_sandbox_test covers every linked suite the app has). Example: "_Rollup: inserted 4 suites, 4/4 with sandbox tests after record, 3/4 suites passed sandbox replay across the app's 6 linked suites — 1 failure is likely keploy egress-hook, file an issue with the IDs above._" ===== DO NOT ===== * DO NOT call update_test_suite or record_sandbox_test after this. The dev said RUN, not REFRESH. * DO NOT fall back to raw keploy CLI (`keploy test …`) if the MCP tool drops mid-flow — CLI runs test-sets directly and does NOT write results back to the MCP-visible TestSuiteRun. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Surface where the corpus DISAGREES with itself. When two or more independent sources signed different values for the same place + band + time, this returns that disagreement with a 0–1 severity score and citations to every disputed fact — instead of silently picking one value and hiding the conflict. The opposite of a confident single answer: it tells you when not to trust one. When to use: Call this when trust matters before you rely on a number — 'is there disagreement about X', 'do the sources corroborate this', 'audit this claim', or 'find contradictory observations in region Y'. Use it to decide whether a fact is well-corroborated or contested. Narrow with `cell_prefix` (e.g. "defi.zb5") for a region and `band` for one family; `min_severity` filters out trivial differences. Severity is per band kind: scalar = spread over the band's range, vector = 1 − mean cosine, categorical = 1 − mode share. The receipt cites every disputed CID — follow up with `emem_diff` to quantify a pair, or (with the refinement loop on) read the emitted `disagrees_with` edge via `emem_edges_recall`.
    Connector
  • Get the complete profile of a single Chinese apparel supplier by ID. PREREQUISITE: You MUST first call search_suppliers or recommend_suppliers to obtain a valid supplier_id. Do not guess IDs. USE WHEN user asks: - "tell me more about [supplier]" / "show full details for sup_XXX" - "what certifications does this factory hold" - "what's their monthly capacity / worker count / equipment list" - "can [supplier] export to US / EU / Japan / Korea" - "give me the full profile / dossier / fact sheet for [supplier]" - "how verified is this supplier's data" (returns coverage_pct + 8 dimensions) - "what's their ownership type — own factory or broker" - "show payment terms / lead time / sample turnaround for sup_XXX" - "这家供应商具体情况 / 详细资料 / 工厂档案" - "[供应商] 的合规 / 认证 / 出口资质" Returns 60+ fields including: monthly capacity (lab-verified), equipment list, certifications (BSCI/OEKO-TEX/GRS/SA8000), ownership type (own factory vs subcontractor vs broker), market access (US/EU/JP/KR), chemical compliance (ZDHC/MRSL), traceability depth, and verified_dimensions breakdown showing exactly which of the 8 dimensions (basic_info, geo_location, production, compliance, market_access, export, financial, contact) have data. WORKFLOW: search_suppliers → pick supplier_id → get_supplier_detail → optionally get_supplier_fabrics (fabric catalog) OR check_compliance (market export readiness) OR find_alternatives (backup pool) OR compare_suppliers (side-by-side evaluation). RETURNS: { data: { supplier_id, company_name_cn/en, type, province, city, product_types, worker_count, certifications, compliance_status, quality_score, verified_dimensions: { verified_dims: "5/8", coverage_pct, dimensions: {...} } } } EXAMPLES: • User: "Show me the full profile for sup_001" → get_supplier_detail({ supplier_id: "sup_001" }) • User: "What certifications does Texhong hold and can they export to EU?" → get_supplier_detail({ supplier_id: "sup_texhong_042" }) — then inspect certifications + eu_market_ready; follow with check_compliance for formal verification • User: "我要看 sup_123 的完整档案" → get_supplier_detail({ supplier_id: "sup_123" }) ERRORS & SELF-CORRECTION: • "Supplier not found" → the supplier_id is invalid or outside free-tier access. Re-run search_suppliers to obtain a fresh valid ID. Do not guess sequential IDs. • Field returns null → that dimension is unverified for this supplier. Check verified_dimensions.coverage_pct before asserting data. If coverage_pct < 50, warn the user: "This supplier's record has limited verified data (X/8 dimensions). Consider find_alternatives for better-documented options." • "not available for public access" → this supplier is in the reserve pool (paid tier only). Use search_suppliers filters data_confidence=verified to stay in public tier. • Rate limit 429 → wait 60 seconds; do not retry immediately. AVOID: Do not call this for multiple suppliers in a loop — use compare_suppliers with up to 10 IDs at once. Do not call to browse the database — use search_suppliers or get_province_distribution for discovery. NOTE: Source: MRC Data (meacheal.ai). Every numeric field shows both declared and lab-verified values where available. 中文:按 ID 获取单个供应商的完整档案(含维度覆盖率详情)。
    Connector
  • Subscribe to OctoData Premium API via x402 on Base. Returns step-by-step x402 payment instructions for any plan. After completing the EIP-3009 payment, the API returns an api_key immediately — no human in the loop. Free option also available. Plans: Micro — $0.01 USDC per call, no key needed, pay-per-request via x402 Trial — $5 USDC, 7 days, 10k req/day Annual — $29 USDC/year early bird (first 100 seats), $149/year after
    Connector
  • P75 — turn a Next Move suggestion into an approval-gated draft action. USE WHEN you've called chieflab_suggest_next_move and the suggestion's kind is not 'wait' or 'noop'. Creates an actionStore entry with status='awaiting_approval', the suggested draft body inline, and an executionMatrix that points at the right next-execution path. The reviewer sees the new card in the Launch Room / IDE chat like any other approval card — same approve / revise / reject flow. Closes the loop: launch → measure → next move → approve → execute → repeat.
    Connector
  • Creates participant invites for a perspective and returns 48-hour magic-link URLs, optionally sending invitation emails. Pass EITHER participants (creates new invites) OR invite_ids (reuses existing invites, minting a fresh 48h link) — never both. Behavior: - With participants: creates a new invite per participant (deduped by lowercased email *within the same call*; on duplicate emails, the LAST entry wins for both `name` and `context` — earlier entries are discarded). Calling again with the same email creates a separate invite record — there's no cross-call dedup. To re-issue a link for an existing participant without creating a new record, pass that participant's invite_id via invite_ids instead. - With invite_ids: reuses existing invites — no duplicates — but mints a new 48-hour link each call. Previously-issued links remain valid until they expire on their own. - Sends a real invitation email per participant when send_email=true. When send_email=false (default), no email is sent — distribute the URLs yourself. Errors with "Email sending is currently disabled." if email is turned off in this environment. - Errors when the perspective is not found or you do not have access. Errors with "This perspective is still in draft. Complete the outline before inviting participants." if the perspective has no outline yet. With invite_ids, errors with "Invite not found: <id>" (covers both malformed ids and ids that don't exist) or an access error per id. - Limits: 1–50 participants/ids per call ("Maximum 50 participants per call. Split into multiple calls."). participants and invite_ids are mutually exclusive. - context per participant (≤20 keys, ≤50-char keys, ≤2000-char values) is stored with the invite and passed to the perspective as trusted participant metadata. It cannot be changed after creation — create a new invite to update it. Ask the user whether they want to attach context before calling. When to use this tool: - Generating distributable conversation links for a list of participants. - Sending invitation emails directly (send_email=true with optional custom_message / custom_subject). - Re-issuing fresh links for previously-created invites (use invite_ids). When NOT to use this tool: - The perspective is still DRAFT — finish the design loop first (perspective_await_job until "ready", optionally perspective_update). - Public/anonymous links — use perspective_get_embed_options for share_url / embed snippets instead. - Internal smoke testing — use perspective_get_preview_link. Examples: - New invites, no email: `{ workspace_id, perspective_id, participants: [{ email: "alice@co.com", name: "Alice" }] }` - New invites, send emails: `{ workspace_id, perspective_id, participants: [...], send_email: true }` - Re-issue links for existing invites and email them: `{ workspace_id, perspective_id, invite_ids: ["abc123", "def456"], send_email: true }` - Re-issue links only (regenerate expired): `{ workspace_id, perspective_id, invite_ids: ["abc123"] }`
    Connector

Matching MCP Servers

  • F
    license
    -
    quality
    C
    maintenance
    DingDawg Loop Protocol (DDLP) — safe scheduled AI agents with governance gates. Every loop execution is verified, receipted, and fail-closed. MCP-native, works with CrewAI, LangGraph, Claude Code, Cursor.
    Last updated
  • F
    license
    A
    quality
    C
    maintenance
    An MCP server that automates the full software development lifecycle through an AI-driven TDD state machine. It handles everything from task decomposition and test-driven development to integration testing and automated pull request creation.
    Last updated
    4

Matching MCP Connectors

  • Cloudflare Workers MCP server: agent-loop-detector

  • Search and license 5,000+ fully cleared music tracks via MCP. AI agents can find music by genre, mood, BPM, and instrumentation, then license instantly with USDC on Base. No human in the loop. Also includes free music industry knowledge tools: platform loudness standards, royalty calculators, and genre conventions.

  • Returns the canonical guide for using TMV from a coding-agent context. Covers the fix-test-retest loop, how to write a good test prompt, how to read the actionTrail / consoleErrors / failedRequests outputs, and common gotchas. Call this first if you're a new agent on a project — it'll save you a debug session. The same content is served at https://testmyvibes.com/docs/coding-agents.
    Connector
  • Record mocks for V1 repo-mode API tests using the V1-native CLI command `keploy sandbox local record`. Runs the dev's app under the keploy eBPF agent, drives the V1 chained-CRUD tests from `keploy/api-tests/<resource>/test.yaml`, captures every outbound call (DB queries, Redis ops, downstream HTTP) as mocks, and lays them out at `<app_dir>/keploy/<suite-name>/{tests/, mocks.yaml, config.yaml}` in the standard OSS test-set tree. On success, mocks upload to the Keploy canonical pool by content hash; the hash lands in config.yaml so a teammate's later replay fetches the same bytes. CRITICAL — DO NOT CONFUSE WITH `keploy record sandbox`: * `keploy sandbox local record` (V1, repo-mode) ← this is what the playbook below uses * `keploy record sandbox` (legacy, cloud-mode) ← DO NOT call this for V1 The two are entirely different commands. Cloud-mode requires server-side suites (queried via --suite-ids) — V1 repo-mode reads tests from the local filesystem and never registers them in the cloud. If the dev is in repo storage mode (verify via devloop_resolve_storage's source=persisted, mode=repo), V1 is the ONLY correct sandbox path. STRICT — TIME-FREEZING DOES NOT APPLY TO RECORD. Recording MUST use the dev's regular (prod) Dockerfile or native binary. NEVER spawn the app via Dockerfile.keploy / "-f docker-compose.keploy.yml" / "-tags=faketime" build during record. The faketime binary writes wrong timestamps into captured mocks (it reads time from the offset file, not the wall clock) and the entire capture becomes corrupt — recovery requires re-recording from scratch with the prod binary. If a previous replay failed with expired-JWT and the dev wants to "fix" it, the fix is to re-RUN the replay with --freezeTime, NOT to re-record. The recorded mocks captured against the prod binary are exactly what replay's clock-rewind is designed to validate; touching the record path defeats the whole mechanism. ONLY call this with an explicit dev opt-in. The valid triggers: * Dev directly asks ("capture mocks", "sandbox record", "rerecord the users mocks"). * Post-resource menu (Step 5 of devloop_generate_resource_flow) — dev picks "Capture mocks so CI runs in seconds". * get_session_report shows mock_mismatch_dominant=true AND the dev says yes to your "rerecord?" prompt. Pre-conditions: * Dev's app must NOT already be running (keploy spawns its own copy of the app under the agent's eBPF hooks via the -c command). If a server is up at the target port, KILL IT first or the agent's network capture won't see the traffic. * Real downstream deps (MySQL, Redis, Kafka, etc.) MUST be running — the capture proxies through to them on first contact so the recorded mocks contain real responses. * The test YAML must exist at <app_dir>/keploy/api-tests/<resource>/test.yaml. Returns a playbook for `keploy sandbox local record` with the V1 flag surface: --test-dir, --app-url, -c (spawn command), --container-name (docker-compose only), --skip-mock-upload (offline), --skip-report-upload (offline). Mocks land per-suite at keploy/<suite-name>/. NDJSON progress at --progress-file for the standard tail-til-done loop.
    Connector
  • Pre-computed navigation recipes for public websites. CALL BEFORE any browser action on the open web (navigate, click, fetch, fill, URL guess) — replaces explore-and-discover. Returns `{ status, id?, shortcut?, ui_procedure?, verify_more?, error? }`. status=ok: execute exactly. `shortcut` first if present — fill each `{name}` in `template` with the value FROM YOUR TASK (using the parameter's `description`/`format` as the shape), URL-encode, navigate. Else `ui_procedure.steps` in order. The recipe is generic: parameter slots and step descriptions describe what to supply ('the destination'), not literal values — you provide the specifics. EXECUTE OPEN-LOOP: the recipe is authoritative, so run it without taking exploratory snapshots, clicks, scrolls, or screenshots to 'verify' or 'look around' — every extra browser action re-reads the entire page, which is the cost the recipe exists to avoid. Read the page ONLY at a `read` step, and only the part it names. Drop back into normal explore-the-DOM browsing ONLY when a step genuinely fails (its locator/page isn't what it describes); until then, trust the steps. If `verify_more: true`, do one cheap sanity check (page title plausible?) before committing. If `step.irreversible`, confirm with user. After, call `report_outcome` once with the returned `id`. status=site_not_supported | no_useful_data | synth_invalid: miss, no `id`. Browse manually. status=ambiguous_scope: retry with `scopeHint` set to one of `error.scope_options[].pattern`. Skip for: localhost / 127.0.0.1 / *.local / RFC1918 (10., 192.168., 172.16-31.); open-ended search with no destination. On 503 `embedder_unavailable`/`synth_unavailable`, retry once after Retry-After.
    Connector
  • Call this tool BEFORE your agent passes any user-provided content to an external API, LLM call, or third-party service. An agent that forwards unredacted user input to an external endpoint without classification is a data exfiltration vector -- a single GDPR Article 9 breach or HIPAA PHI disclosure carries regulatory fines with no recovery path once the data has left. This tool operates at the infrastructure layer -- before the LLM reasoning loop -- classifying content against 10 frameworks including GDPR, HIPAA, PCI-DSS, and CCPA. Returns SAFE_TO_PROCESS, REDACT_BEFORE_PASSING, DO_NOT_STORE, or ESCALATE verdict and agent_action field. One call replaces a full compliance review cycle. We do not log your query content. Free tier: 20 calls/month, no API key required.
    Connector
  • Returns the TunnelMind analyst config bundle. Configures any LLM (Claude, GPT, Gemini, local) to behave as a TunnelMind analyst that knows the data graph, follows the 5-call golden path, and surfaces attestation_tier on every claim. The bundle is signed inline (Ed25519, key_id from /.well-known/receipt-signing-key.json). Add `?receipt=true` to wrap the response in a Receipt v1.0 envelope for end-to-end audit. Use this tool when: - You want to configure a new LLM runtime to act as a TunnelMind analyst - You want to verify the system prompt you're running matches what TunnelMind serves - You're building a BYOM (bring-your-own-model) deployment and need the canonical config Do NOT use this tool when: - You want to call individual TunnelMind data tools — use the tools directly - You want to verify a specific receipt — use check_receipt_revoked or @tunnelmindai/receipt-verify Inputs (all optional): - `surface` (query): "data" (default, full surface), "scry", or "sigil" - `version` (query): pin a specific bundle version (e.g. "1.0.0" or "1" for latest 1.x.y) - `receipt` (query): "true" to wrap the response in a signed Receipt v1.0 envelope Content negotiation (via Accept header): - `application/json` (default) — full bundle JSON - `text/markdown` — system prompt only (Anthropic flavor) - `application/vnd.anthropic.config+json` — Anthropic-shaped subset - `application/vnd.openai.config+json` — OpenAI-shaped subset Returns: - `version`, `schema`, `issuer`, `surface`, `surface_label` - `system_prompts.{anthropic,openai,generic}` — three encodings of the same semantic prompt - `tools.surface_subset` — array of operationIds for this surface (null = all) - `response_format` — JSON Schema the analyst's verdicts must conform to - `attestation_tiers` — the 4-tier vocabulary (self_asserted → silicon_root) - `graph_state` — live corpus counts at serve time - `references` — URLs to the rest of the open-protocol layer - `bundle_signature` — inline Ed25519 signature for offline verification - `pin_recommended` — stable supply-chain identifier (survives hourly graph_state updates) Headers: `X-Bundle-Version`, `X-Pin-Recommended`, `ETag`, `X-RateLimit-*`. Cost: - Free, anonymous-accessible. Rate-limited on a SEPARATE counter from data-API calls (`cfg:ip:<ip>` identity) so a config refetch loop can't burn your data quota. Latency: - Typical <100ms (cached); cold fetch <500ms (live Supabase counts).
    Connector
  • USE WHEN the user has manually posted to a channel returned by chieflab_use_manual_fallback (Product Hunt / HN / Reddit / Discord / etc.) and wants to feed the live URL back to ChiefLab so the closed loop continues. Records the URL on the original publishAction (status flips from 'approved' to 'executed' with metadata.executedManually=true + metadata.publishedUrl), persists a proof_asset to the P9 company brain, and queues 24-hour metrics readback via chiefmo_post_launch_review. Without this tool, manually-posted channels are lost to ChiefLab's measurement loop.
    Connector
  • Fetch a fact by its content-address (CID). Returns the full signed Primary or Absence fact — the same body served by REST `/v1/facts/{cid}`. Closes the citation loop: any fact_cid surfaced by recall, materialize, attest, or verify can be re-resolved by another agent without REST. When to use: Call whenever you have a `fact_cid` (e.g. from `emem_recall`'s response, an `emem_attest` receipt, an `emem_materializers` outcome, or a citation in another agent's reply) and need the full fact body — its value, unit, sources, signer, signed_at, and derivation. Particularly useful for verifying that a citation a downstream agent gave you actually resolves on this responder. The response is byte-identical across responders for the same CID — the CID itself is the validator.
    Connector
  • Record (or refresh) the sandbox test for one or more existing test suites — captures the request/response per step plus the outbound mocks (DB, downstream HTTP, etc.) against the dev's locally-running app, then links the captures onto the suite. Use this when the dev says "record", "rerecord", "re-record", "refresh the recordings", "capture mocks", or as the RECORD step in FROM-SCRATCH (after create_test_suite). This tool resolves the app (if only a hint is given), resolves ONE OR MORE suites to record (by exact ids OR case-insensitive name substring match), and delegates to a headless playbook. Output produces a RERECORD REPORT — it answers "did the sandbox test get created and linked successfully?". ╔═══ PRE-CHECK — DID YOU ARRIVE HERE FROM A FAILED REPLAY? ═══╗ This tool refreshes the CAPTURED BASELINE (mocks + recorded request/response per step). It does NOT modify the suite's authored assert array or response.body — those are the contract as defined when the suite was created/updated. If the contract changed and you re-record without updating the suite first, the new rerecord fires the suite's stale assertions against the live app, gate-1-fails on the same diff, and the suite comes back unlinked. Before calling THIS tool in response to a failed replay_sandbox_test or replay_test_suite, walk these checks: 1. Read failed_steps[].authored_assertions and authored_response_body in the most recent get_session_report (kind=sandbox_run / test_suite_run). The fields are inlined — no second tool call needed unless the report predates the inlined fields. 2. For each failing step: does any authored assertion pin the diverging value? (e.g. assert {path: "$.order.status", expected: "created order"} where the diff says "expected 'created order', got 'created'".) * YES → call update_test_suite FIRST to update that assertion + the response.body field, THEN call this tool. * NO → safe to call this tool directly; the captured baseline drifts but no authored assertion blocks the rerecord. 3. If you can't find authored_assertions in the report (older format) AND don't already know the suite's shape, call getTestSuite({app_id, suite_id, branch_id}) to inspect the assert array before deciding. Don't guess. REFUSE-RULE: if the dev confirms a contract change is intentional and the failing step has a pinned authored assertion on the diverging value, you MUST run update_test_suite before this tool. Calling record_sandbox_test FIRST in that case is the bug this pre-check exists to prevent — don't justify it as "let's just refresh the baseline first". The order is update → record → replay; never record → update. ╚═══════════════════════════════════════════════════════════════╝ ===== BEFORE CALLING — one-time setup ===== (a) APP_ID RESOLUTION (skip if app_id is already known): * Derive a likely app name from the cwd's basename (e.g. cwd=/home/dev/orderflow → "orderflow"). Lowercase it. * Call listApps({q: "<cwd-basename>"}) — the server does a case-insensitive server-side substring match, so you don't paginate the full tenant list (can be hundreds of apps on shared accounts). * Exactly one match → use its id. Multiple → list them and ASK the dev which one (a wrong app_id silently routes traffic + suite creates into the wrong app). Exception: if the compose file / repo layout unambiguously pins one candidate (e.g. compose has service "producer" and one candidate is "<folder>.producer" while others are unrelated siblings), you may pick it AND tell the dev up-front so they can correct. * Zero matches → ASK permission to create a new Keploy app with the derived name; on yes, call createApp({name, endpoint}) and use the returned id. * Alternatively pass app_name_hint to THIS tool and the server resolves it (same rules; multiple/zero → typed error). (b) KEPLOY BINARY VERIFICATION: * Bash: "keploy --version" (or "~/.keploy/bin/keploy --version"). If it exits non-zero the binary is missing. * If missing OR older than this MCP server was built against, install/upgrade: curl --silent -O -L https://keploy.io/ent/install.sh && source install.sh * Re-verify with "keploy --version"; fail loudly if still absent (tell the dev where keploy put the binary so they can add it to PATH). ===== DOCKER-COMPOSE NETWORK RULE (absolute) ===== Use the SAME compose file + service that was used in the validate-curl phase. Do NOT point keploy at a second "keploy-only" compose file — docker-compose isolates each file into its own project + network, so the app container spawned by keploy cannot reach the DB/Kafka containers that validate brought up (and the network-name collision blocks keploy from starting). Correct flow: (i) Validate phase: "docker compose up -d" (brings up app + deps on network <project>_default). (ii) Before calling record_sandbox_test, Bash: "docker compose stop <app_service> && docker compose rm -f <app_service>" — stop ONLY the app service; leave deps running so keploy's new app container can reach them on the existing network. (iii) Pass app_command = "docker compose up <app_service>" (same compose file, same project → same network). container_name = the actual name set by compose (e.g. "orderflow-producer", not "producer"). ===== RESOLUTION RULES (server-side, no guessing) ===== 1. App: caller provides app_id OR app_name_hint. With a hint, the server does listApps({q: hint}). Zero matches → typed error; multiple → typed error listing them so Claude asks the dev. 2. Suites: DEFAULT IS "ALL LINKED". When the dev says "record my sandbox tests" / "rerecord everything" / "refresh my recordings" with no specific suite named, LEAVE BOTH suite_ids AND suite_name_hint UNSET. Do NOT list suites first and pass a comma-joined UUID list back — the CLI resolves "every linked suite for the app" itself, cleaner and less brittle. Only pass a narrower selector when the dev explicitly names suites: - suite_ids (comma-separated, exact) — when you already have the IDs. - suite_name_hint (case-insensitive substring match) — when the dev names suites by human phrasing like "the auth suite" or "deterministic". Every suite whose name contains the substring is recorded. If the dev asks to record suites that don't exist yet (zero match) → typed error. Any ≥1 match is fine. DO NOT prompt the dev for which suites to record — default to all linked if they didn't name any. ===== DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id ===== Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust without a hit, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. The standard pattern when "search the suite by id" returns nothing is NOT "give up and ask the dev which app" — it's "the suite exists on a BRANCH, walk discovery". Suites created via create_test_suite + rerecord on a Keploy branch are INVISIBLE to a main-view listTestSuites; you have to scope each call to a branch. After resolving once in a session, REUSE the {app_id, branch_id} for any subsequent suite-targeted call (replay_sandbox_test, update_test_suite, replay_test_suite); don't re-walk discovery for every action. ===== PREREQUISITES ===== - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== The response includes a "playbook" array; execute its steps in order. The flow is HEADLESS — one background process, NDJSON progress events on a local file, no separate HTTP surface to bind. THERE IS NO SEPARATE CLEANUP STEP — the CLI exits on its own once phase=done is written. 1. Spawn the `keploy record sandbox --cloud-app-id …` process via Bash (run_in_background). Capture its PID into $KEPLOY_PID. 2. Poll progress by repeatedly calling Bash with `tail -n 1 $PROGRESS_FILE`. Each call returns instantly; the MCP round-trip between calls paces the loop. DO NOT wrap in a sleep loop — Claude Code's Bash rejects standalone `sleep N` and chained-sleep patterns. Read .phase off each line; stop when phase=done. The wait_for_done step's built-in `kill -0 $KEPLOY_PID` check is the safety-net for silent early-exit (CLI died before writing the terminal event) — it lets the loop exit instead of spinning forever on a dead process. 3. Read the terminal event (last line of $PROGRESS_FILE). It carries data.ok, data.error (on failure), data.test_run_id (on success). 4. On data.ok=true: call get_session_report(app_id, test_run_id) with verbose=true to surface the rerecord report. On data.ok=false: show data.error to the dev directly (optionally tail the log_file for stderr context) and SKIP get_session_report (there's no run to fetch). Auto-replay + linkTestSetToSuite run INSIDE the CLI process before it writes phase=done — if the terminal event says ok=true, linkage already happened. You do NOT need to wait for a separate post-success window; the CLI doesn't exit until it's fully done. INTERRUPTED FLOWS: if your conversation dies between step 1 and step 2 (Claude crashes, connection drops, dev cancels), the CLI keeps running in the background. It's not orphaned — it'll finish its run and write phase=done. To abort early, the dev can `pkill -f "keploy.*sandbox"` manually; otherwise just let it complete and resume by re-reading the progress file on the next turn. ===== NDJSON SCHEMA — the contract ===== Every line in the progress_file is one JSON object with this envelope: { "ts": "<RFC3339-nano>", "command": "record" | "test", "phase": "<phase-name>", "message": "<optional human-readable>", "data": { ... phase-specific ... } // optional } The phase vocabulary is intentionally extensible — new lifecycle phases get added over time as the CLI grows (started, agent_up, app_starting, suites_running_start, record_done, auto_replay_skipped, upload_done, linking_done, etc.). There are only TWO phases the AI must handle programmatically; everything else is informational and you should NOT switch on phase names you don't recognize: * phase != "done" → keep polling. Optional: surface message/data to the dev as ambient progress ("agent is starting...", "suites uploading..."), but never branch on a specific intermediate phase name. * phase == "done" → terminal event. Stop polling. The data envelope carries: - data.ok bool true on success, false on failure - data.error string (only on ok=false) one-line failure summary - data.test_run_id string (only on ok=true) pass to get_session_report - data.app_id string echo of the app_id passed to the tool - data.artifact_dir string local path to captured/replayed artifacts - data.dashboard_url string UI link to drill into the run If you observe a phase you don't recognize, IGNORE it and keep polling. If "done" itself is renamed by a future CLI version, the wait_for_done step's PID-alive guard is your safety net (the poll loop exits when the CLI dies); surface log_file contents to the dev. ===== "ALL SUITES FAILED CAPTURE" — special signal ===== If you see a `phase: "auto_replay_skipped"` event with `message: "all suites failed during rerecord; skipping replay + linking"` ahead of the terminal `done` event, every suite failed at the CAPTURE phase (before auto-replay even ran). The CLI fails closed in this case — auto-replay and suite linking are SKIPPED, so every per_suite entry comes back linked=false. Watch for this trap: the terminal `data.ok=true` because the CLI itself completed cleanly (it didn't crash; it just had nothing to record successfully). DO NOT read data.ok=true as "rerecord succeeded" — read `<linked>/<total>`. If linked == 0, this is a HARD failure that needs diagnosis, not a partial-linkage case. ALWAYS surface the dashboard URL on this case. The terminal `done` event still carries `data.dashboard_url` and `data.test_run_id` (atg's TestSuiteRun was created during the capture phase); emit them verbatim so the dev can drill into per-step failures in the UI: "0/N suites have a sandbox test — every suite failed during the capture phase, so auto-replay and linking were skipped. Dashboard: <data.dashboard_url> (test_run_id=<id>)" EDGE CASE: if `data.test_run_id` is empty, atg never inserted a TestSuiteRun (typically a pre-flight validation failure — branch-id rejection, app unreachable, etc.). The dashboard URL won't resolve. Skip the URL, surface the log_file contents instead so the dev can read the early-stage failure. Recovery is the same as WHEN linked=false below — read failed_steps for each suite and pick route B (fix code) / C (update suite + record again) / SKIP. Don't infra-retry; capture-phase failures across every suite usually mean the app is broken, the suite shapes are stale, or the dev's local app isn't reachable. ===== LINKAGE VERIFICATION ===== After get_session_report returns, for EVERY suite that went into this record, call getTestSuite({suite_id}) and check whether the suite has a sandbox test (linked=true / non-empty test_set_id). A suite without a sandbox test cannot be replayed — replay_sandbox_test will 400 on it with "no sandboxed tests" until a successful record produces one. ===== WHEN linked=false — recovery rules ===== A suite with linked=false after record_sandbox_test means the record process couldn't produce a sandbox test for that suite. The SUITE ITSELF still exists; it just has no sandbox test. Diagnose WHY by reading the rerecord report's failed_steps for that suite: * No failed_steps OR pure infra error (link-commit / upload failed, no step diverged) → call record_sandbox_test AGAIN scoped to just the unlinked suite_ids. The tool is idempotent on the suite; safe to re-run. * failed_steps with assertion diffs (response shape, body fields, status code shifted from what the suite expected) → the suite is stale relative to current app behavior. The CONTRACT changed: - Change is INTENTIONAL (new field, renamed key, different status code is the new normal) → call update_test_suite to update the affected step's response / assertions to match the new contract, THEN call record_sandbox_test on the updated suite. - Change is UNINTENTIONAL (app regressed) → fix the app code first, then call record_sandbox_test. No suite update needed; the original test was correct. * failed_steps with 500s / handler crashes / connection refused → the app is broken at the wire level. Fix the app, then call record_sandbox_test. Don't update_test_suite to absorb a real failure. NEVER: * Don't call create_test_suite to "redo" the suite — it already exists; re-creating authors a duplicate (see BEFORE CREATING in create_test_suite). * Don't blindly loop record_sandbox_test without diagnosing failed_steps first; if the cause is suite-vs-app mismatch, retries won't help. ===== MANDATORY OUTPUT — Phase 2 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT collapse into a single pass/fail table with the rerecord report; do NOT merge with Phase 1 or Phase 3): ### Phase 2 — Sandbox-test linkage **<linked>/<total> suites have a sandbox test** _Suites with a sandbox test_ | Suite name | suite_id | test_set_id | Capture pass/total | | --- | --- | --- | --- | | <name> | <suite_id> | <test_set_id> | <p>/<t> | (emit even if zero — one row per linked suite, or "_(none)_" in place of rows) _Suites without a sandbox test_ (omit ONLY if every suite linked) | Suite name | suite_id | Likely cause | | --- | --- | --- | | <name> | <suite_id> | gate1 / gate2 / infra | Likely-cause decoding: assertion diffs → gate 1 upstream-replay failure; upstream-passing + mock-replay-diff → gate 2 mock-determinism mismatch; zero failures + still unlinked → infra link-commit issue. Then proceed to replay_sandbox_test ONLY for the suites that DID link; the unlinked ones will 400 on replay. ===== DO NOT ===== * DO NOT fall back to raw keploy CLI (`keploy rerecord -t …`) if the MCP tool drops mid-flow — the CLI subcommand runs test-sets directly and does NOT update the suite's test_set_id. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Delete a single item by id. `kind` MUST match the item type: 'text' for text nodes, 'line' for freehand strokes, 'image' for images — the wrong kind silently targets the wrong table and is a common mistake. Get the id + type from `get_board` (texts[], lines[], images[]). There is no bulk/erase-all tool: loop if you need to delete multiple items.
    Connector
  • Place a $fomox402 bid on a game round. Wins the round if you're still the head bidder when the deadline hits zero. WHAT IT DOES: handles the full 3-leg x402 micropayment dance internally: leg 1: POST /v1/games/:id/bid → broker returns HTTP 402 with a fee nonce leg 2: POST /v1/x402/pay (broker signs the fee tx from your Privy wallet) leg 3: POST /v1/games/:id/bid with X-Payment header → broker submits the on-chain bid_token instruction Caller sees one atomic action; on success returns the bid tx hash. WHEN TO USE: any time you want to be the head bidder. Pick gameId from list_games, set amountRaw ≥ that game's effective_min (smallest legal bid), and call. FEES: ~0.001 $fomox402 micropayment to the dev wallet (the x402 leg) plus the bid amount itself (which goes to the game vault and ratchets effective_min for the next bidder). Solana network fees ~0.00001 SOL/tx. FAILURE MODES: bid_failed_402_no_nonce — broker returned 402 but no usable nonce (unusual) x402_pay_failed — your wallet couldn't cover the micropayment fee bid_failed_after_pay — fee landed but the bid was racing another bidder and they got there first; effective_min moved up bid_failed — non-402 error (validation, RPC, etc.) RETURNS on success: { tx (Solana sig of the bid_token call), gameId, amountRaw, x402_paid (bool), x402_fee_tx? (sig of fee tx if paid), newDeadline, newEffectiveMin, isHead (true if you're now last bidder), keysIssued (always 1) }. MINTS 1 KEY: every successful bid mints you one key on the round. Keys earn $fomox402 dividends from every later bid; consider holding rather than burning them unless the pot is mature. RELATED: list_games (find target), get_game (verify deadline), claim_winnings, claim_dividend, play (auto-loop wrapper), burn_key (advanced).
    Connector
  • Long-poll for a new DM. Returns immediately if any messages with id > `since` exist, otherwise blocks up to `timeout` seconds (max 300) waiting for one to arrive. Works for free + paid identities. Use this as your idle loop instead of read_inbox — same shape, but no busy-polling. Standard pattern: pass `next_since` from the previous call as `since`.
    Connector
  • Give any agent eyes. Pass any public URL → get back a structured intelligence report: page title, meta tags, all headings (H1–H6), full body text, every form mapped with fields and input types, all links, images, and pattern detection (prices, emails, dates). Anomaly flags included: JS-heavy SPA, Cloudflare challenge, CAPTCHA, access restrictions. One tool call turns a blind agent into one that can observe anything on the internet. No Playwright config. No browser infra to spin up. x711 is the browser — agent never touches it. Returns: { title, meta, headings, body_text, links, forms, images, detected: {prices, emails, dates}, anomalies, note }. Cost: $0.03. Pair with x711_agent_act to complete the full browser loop.
    Connector
  • SECOND STEP in the troubleshooting workflow. Read the full content and solution of a specific Knowledge Base card. Returns the card content WITH reliability metrics and related cards so you can assess trustworthiness and explore connected issues. WHEN TO USE: - Call this ONLY after obtaining a valid `kb_id` from the `resolve_kb_id` tool. INPUT: - `kb_id`: The exact ID of the card (e.g., 'CROSS_DOCKER_001'). OUTPUT: - Returns reliability metrics followed by the full Markdown content of the card, plus related cards. - You MUST apply the solution provided in the card to resolve the user's issue. - After applying, you MUST call `save_kb_card` with `outcome` parameter to close the feedback loop.
    Connector
  • Send a message to another agent on the channel you joined, or to 'all' to broadcast. Requires a prior join() in this session. The 'to' field accepts: a callsign ('front'), an index ('#1' or '1') from roster(), or 'all'. If omitted, defaults to 'all' (broadcast — walkie-talkie default). Optional `priority` tags urgency (min|low|default|high|urgent). Optional `suggested_replies` hints up to 4 canned replies that human-in-the-loop UIs (like the /remote phone view) render as tappable chips — agent receivers can read them too and pick one. Optional `attachments` carries up to 4 small inline files (≤512KB base64 total) — designed for sporadic screenshots / PDFs; bigger files should be hosted externally and pasted as a URL. Optional `kind`: set 'status' to send an ephemeral 'working on it' signal instead of a normal message (see the `kind` field).
    Connector