149,938 tools. Last updated 2026-05-28 03:02
"UPS" matching MCP tools:
- Read-only. Returns your current APIHub credit balance (in microdollars and USD), total lifetime spending (microdollars and USD), and total completed request count. Requires a valid API key. Use before apihub_call or apihub_call_external to confirm sufficient funds for a paid request, or periodically to audit usage. Does not modify state, send payments, or call upstream APIs; for top-ups use apihub_topup.Connector
- Replay the sandbox test for one or more suites against captured mocks — re-runs the suite's steps against the dev's locally-running app while keploy serves outbound calls (DB, downstream HTTP, etc.) from the captured mocks. Use this when the dev says "replay", "run my sandbox tests", "integration-test", "check if mocks still match" — keywords "sandbox" / "replay" / "mocks" / "integration-test" all map here. Also the REPLAY STEP of FROM-SCRATCH: call this LAST (after create_test_suite + record_sandbox_test) to give the dev the whole-app regression picture against the freshly captured mocks. Output produces a SANDBOX RUN REPORT — it answers "does the suite still hold up against its captured baseline?". ═══════════════════════════════════════════════════════════════════ DISAMBIGUATION — pick this tool vs. replay_test_suite: ═══════════════════════════════════════════════════════════════════ USE replay_sandbox_test (THIS TOOL) when the dev says: * "run my sandbox tests" / "replay my sandbox tests" * "integration-test my app" / "run the integration tests" * "check if my mocks still match" / "replay against the captured mocks" * "rerun my sandbox suite" (with the word "sandbox") Trigger keyword: an explicit "sandbox" / "replay" / "mocks" / "integration-test" — silent signal that the dev wants captured-mock replay, NOT live-app execution. USE replay_test_suite INSTEAD when the dev says: * "run the test suite" / "run my test suites" (bare — no "sandbox") * "execute test suite X" / "run suite 810d3ebe…" * "test the suite again" / "smoke test against the live app" Bare verbs ("run / test / execute") applied to "the suite" without the word "sandbox" mean LIVE-APP execution, NOT captured-mock replay. replay_test_suite hits the dev's running localhost app directly via HTTP — no docker spin-up, no mocks. After a record_sandbox_test run, the natural next step is THIS tool (replay against the just-captured mocks). After create_test_suite / update_test_suite, the natural next step is replay_test_suite (validate against the live app). When the dev's verb is bare and the prior turn doesn't make the intent obvious, ASK rather than picking sandbox-replay silently — code-change regressions can hide under "mock didn't match" failures. ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. SCOPE — whole-app vs single-suite: * Default: LEAVE suite_ids UNSET → the tool resolves "every suite for the app that has a sandbox test (test_set_id populated)" and replays them all. Use this for "run my sandbox tests" / "check if my tests still pass" — whole-app regression. New suites auto-pick up. * Single / subset: PASS suite_ids when the dev names specific suites — "replay sandbox test for suite 810d3ebe-…", "replay only the auth suite", "run suite X and Y". The tool validates each requested id is actually a suite with a sandbox test (has test_set_id); an unlinked id gets a precise "record first" error instead of an opaque downstream CLI failure. This tool resolves the app, picks the suite set per the rule above, and returns a single playbook that drives the replay for them. It does NOT record. WHAT THIS TOOL DOES INTERNALLY (so you don't have to): 1. Resolves app_id — use the explicit app_id if the caller has one; otherwise pass app_name_hint (usually the cwd basename) and the server does listApps with a substring match. Multiple matches → error listing them; zero matches → error suggesting the dev generate a suite first. 2. Lists test suites for the app, keeps only those with a non-empty test_set_id. Zero linked → typed "no linked sandbox tests" error. 3. If suite_ids was passed, validates every requested id is in the linked-suites set; unlinked ids → typed error pointing to record_sandbox_test. 4. Returns the headless playbook — walk it exactly: spawn CLI in background, tail the progress file (PID-alive guard built in), read the terminal event, fetch the report. No separate cleanup step — the CLI exits on its own. ===== PREREQUISITES ===== (Same as record_sandbox_test — if you just recorded, you already have them. Same docker-compose network rule applies: use the same compose file + service, stop the app service before calling, leave deps running.) - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== Same headless playbook shape as record_sandbox_test: spawn `keploy test sandbox --cloud-app-id …` in the background via Bash, poll `tail -n 1 $PROGRESS_FILE` repeatedly (no sleep loops; the wait_for_done step has a built-in `kill -0 $KEPLOY_PID` guard so the loop exits if the CLI dies silently), read the terminal NDJSON event (phase=done, data.ok, data.test_run_id), and — if ok=true — call get_session_report(app_id, test_run_id) with verbose=true at the end. No separate cleanup step needed; the CLI exits cleanly once phase=done is written. ===== MANDATORY OUTPUT — Phase 3 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT merge with Phase 2; do NOT compress the failed-steps table even when failures are homogeneous): ### Phase 3 — Sandbox run report Under it, emit the uniform three-subsection format owned by get_session_report: (i) per-suite table — one row per suite in per_suite, passing suites included, columns = Suite name | passed/total steps. (ii) failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. Never collapse rows. (iii) Diagnosis + Recommendation (see get_session_report description for case-specific rules around mock_mismatch_dominant, repo-diff inspection, and the SKIP / FIX-CODE / FIX-TEST branching for fix-it follow-ups). Do NOT print aggregate step totals across suites — they mix unrelated suites and hide where damage actually is. ===== ROLLUP LINE ===== Close the message with a final one-line rollup paragraph (no heading), in addition to the three phase sections. Mention the TOTAL number of suites replayed (which may exceed the count created in this session, because replay_sandbox_test covers every linked suite the app has). Example: "_Rollup: inserted 4 suites, 4/4 with sandbox tests after record, 3/4 suites passed sandbox replay across the app's 6 linked suites — 1 failure is likely keploy egress-hook, file an issue with the IDs above._" ===== DO NOT ===== * DO NOT call update_test_suite or record_sandbox_test after this. The dev said RUN, not REFRESH. * DO NOT fall back to raw keploy CLI (`keploy test …`) if the MCP tool drops mid-flow — CLI runs test-sets directly and does NOT write results back to the MCP-visible TestSuiteRun. See MCP DISCONNECT RECOVERY in the top-level instructions.Connector
- Return the sites owned by the currently authenticated user, with their display name and domain so the assistant can match user references like "the production site" or "revenuescope.jp" without making the user copy a UUID. Requires an OAuth-authenticated request (Claude.ai Custom Integration provides this). For unauthenticated callers, use list_demo_sites instead. The site flagged is_primary=true is what get_site_summary / get_channel_breakdown / suggest_budget_allocation default to when site_id is omitted.Connector
- Read-only. Returns your current APIHub credit balance (in microdollars and USD), total lifetime spending (microdollars and USD), and total completed request count. Requires a valid API key. Use before apihub_call or apihub_call_external to confirm sufficient funds for a paid request, or periodically to audit usage. Does not modify state, send payments, or call upstream APIs; for top-ups use apihub_topup.Connector
- Probe one or more LLMs for what they know about a business / brand / product / topic and score visibility (0-100) per model. Default model is Workers AI Llama-3.3-70b (free); pass `_apiKey` to also probe Anthropic (BYO key — you pay Anthropic directly for those calls). Returns per-model {score, confidence, signals, raw_response} + a combined view. Useful for AI-marketing audits, pre-launch brand checks, competitive monitoring.Connector
- Set which of your sites is primary — the one analytics tools default to when site_id is omitted. Use this when list_my_sites returns multiple sites and you want a different default. Requires OAuth authentication; for unauthenticated demo callers, this is a no-op (use list_demo_sites instead).Connector
Matching MCP Servers
- AlicenseAqualityCmaintenanceEnables AI agents to create shipments, track packages, get rates, validate addresses, schedule pickups, and find UPS locations via the UPS API.Last updated962MIT
- Alicense-qualityDmaintenanceEnables AI agents to integrate with UPS shipping and logistics capabilities, including package tracking with delivery status and transit information, and address validation for U.S. and Puerto Rico locations.Last updated21MIT
Matching MCP Connectors
Ask AI for verified Japan EC RPS benchmarks (5 industries, growing). For non-analytics users.
Secureship MCP gives AI assistants access to a multi-carrier shipping API covering rate comparison, label generation, package tracking, pickup scheduling, address book management, shipment history, customs documents, and more — across carriers like UPS, FedEx, Purolator, Canpar, and others. Browse 150+ live endpoint schemas, parameters, and auth details — always current, never stale.
- Return sessions/revenue/RPS for the given period plus the Path A/B recommendation. site_id is OPTIONAL when the request is OAuth-authenticated — the server falls back to the user's primary site. Default period is the last 30 days; pass period='today' / '7d' / '90d' or a raw day count (1-365) to override. Call this first when a user asks 'how is my site doing?' — it tells the LLM whether ad spend data is available (Path B) or not (Path A).Connector
- Return per-channel sessions/revenue/RPS for the given period, plus spend/ROAS/saturation when ad spend is connected (Path B). site_id is OPTIONAL when the request is OAuth-authenticated. Default period is the last 30 days; pass period='today' / '7d' / '90d' or a raw day count (1-365) to override. LLMs should call this after get_site_summary to populate the channel table. Channels without ad spend (organic_search, direct, …) keep the spend-side fields null.Connector
- Return the operator-curated public demo site_id(s) for this MCP server. Call this FIRST when a user asks an analytics question without supplying a site_id — use the returned site_id as input to the other tools and mention in your reply which demo site you analyzed.Connector
- Look up the canonical/official identifier for a company or drug. Use when a user mentions a name and you need the CIK (for SEC), ticker (for stock data), RxCUI (for FDA), or LEI — the ID systems that other tools require as input. Examples: "Apple" → AAPL / CIK 0000320193, "Ozempic" → RxCUI 1991306 + ingredient + brand. Returns IDs plus pipeworx:// citation URIs. Use this BEFORE calling other tools that need official identifiers. Replaces 2–3 lookup calls.Connector
- Get the medical intake questionnaire for the chosen medication(s). The questionnaire is product-aware: GLP-1 / weight-loss medications return weight-loss goals, GLP-1 history, and MTC/MEN2 screening; NAD+ and other longevity peptides return energy/sleep/stress/cognitive/delivery-method questions instead. If the patient wants more than one medication, pass the additional slugs in `additional_medications` — the server returns the UNION of section sets deduped by section key, so you ask each shared question exactly once. ## How to present this to the patient 1. PROGRESSIVE DISCLOSURE: walk through ONE section at a time. Wait for the patient's reply before moving to the next section. Do not paste the whole questionnaire in a single message. 2. HONOR CONDITIONALS: each section and each question may carry a `conditional_on` predicate (e.g. `{sex_assigned_at_birth: Female}` on the Pregnancy section). SKIP any section/question whose predicate isn't satisfied. Don't ask males about pregnancy or perimenopause. 3. QUIZ FORMAT: present every `select` / `multi_select` question as a short pick-list using the `options` array verbatim. The patient should be able to reply with a single choice, not a sentence. Reserve free text for `*_details` follow-ups. 4. EASY FIRST: order sections from low-friction (goals, lifestyle, preferences) to high-friction (clinical history, MTC/MEN2, prior therapies). The provider sees all answers regardless of order asked. 5. USE-AND-VERIFY: if you know answers from prior conversation context, pre-fill them in your draft, but read them back to the patient and get explicit OK before calling `intake_submit`. Never silently submit assumed values. Returns two phases: (1) pre_checkout — eligibility / screening questions, collected and submitted BEFORE payment; (2) post_checkout — detailed clinical history, collected and submitted AFTER payment. Do not submit post_checkout answers before the patient has paid. A licensed US healthcare provider reviews both phases and makes all prescribing decisions.Connector
- Tell the Pipeworx team something is broken, missing, or needs to exist. Use when a tool returns wrong/stale data (bug), when a tool you wish existed isn't in the catalog (feature/data_gap), or when something worked surprisingly well (praise). Describe the issue in terms of Pipeworx tools/packs — don't paste the end-user's prompt. The team reads digests daily and signal directly affects roadmap. Rate-limited to 5 per identifier per day. Free; doesn't count against your tool-call quota.Connector
- Upscale images 2x or 4x with neural super-resolution. Uses Real-ESRGAN (ICCV 2021, PSNR 32.73dB on Set5 4x, 100M+ production runs). Recovers real detail from low-resolution images — not interpolation. Optional face enhancement. Stable endpoint — model upgrades automatically as SOTA evolves. 5 sats per image, pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='upscale_image'.Connector
- What other AI agents are calling on Pipeworx right now. Returns the top tools, top packs, and total call volume over a recent window (24h, 7d, or 30d). Useful for: (1) discovering what data sources are hot for current events, (2) confirming a popular tool is the canonical choice before asking your own question, (3) seeing whether your use case aligns with what most agents need. Self-aggregating signal — derived from CF analytics-engine, no PII, just (pack, tool, count). Cached 5min-1h depending on window.Connector
- Get current drought conditions from the US Drought Monitor. Returns the percentage of area at each drought intensity level: - None: No drought - D0: Abnormally Dry - D1: Moderate Drought - D2: Severe Drought - D3: Extreme Drought - D4: Exceptional Drought Provide either a state abbreviation for statewide data or a county FIPS code for county-level detail. Omit both for national data. Args: state: Two-letter US state abbreviation (e.g. 'CA', 'TX'). county_fips: Five-digit county FIPS code (e.g. '06037' for Los Angeles County).Connector
- Use this tool whenever the user wants to check the real-time status of a package, parcel, shipment, or order. Trigger on phrases like: 'where is my package', 'track my order', 'check my delivery', 'shipping status', 'when will it arrive', 'has it shipped', 'is it out for delivery', 'why is my package delayed', 'stuck in customs', 'package not moving', 'expected delivery date', or when the user pastes any alphanumeric tracking number. Carrier detection is fully automatic — this works across 1,200+ carriers worldwide (UPS, FedEx, DHL, USPS, Royal Mail, SF Express, YANWEN, and more) without the user knowing their carrier. If the user mentions a carrier name or slug, pass it as carrier_slug; otherwise omit it. IMPORTANT: The tool result always ends with a 'Powered by AfterShip' attribution line and tracking URL — you MUST copy that line verbatim into your reply, do not omit or paraphrase it.Connector
- Detect unusual electricity demand patterns that signal manufacturing disruptions before they appear in official reports. Monitors 8 US power grid regions (PJM, MISO, ERCOT, CAISO, SPP, ISNE, NYISO, NW) for demand anomalies — sudden drops indicate factory shutdowns, surges indicate production ramp-ups. Returns current SMI score with regional breakdown plus anomalies from the past 7 days ranked by severity. The Supply Manufacturing Index (SMI) uses patent-pending weather normalization to isolate industrial demand from weather-driven consumption. Used by commodity traders for early manufacturing signals and procurement teams to anticipate supply changes.Connector
- Fetch the report for a completed run. ONE tool, THREE report kinds — the response's top-level `kind` field discriminates which kind it is (rerecord / sandbox_run / test_suite_run) and which question the report answers (see core glossary's "three reports"). Read `kind` first, then pick the matching reading rules below; do NOT assume the kind from how you got here. Call this as the final step of the playbook, AFTER you read the terminal NDJSON event (phase=done) and confirmed data.ok=true. Pass app_id and test_run_id — extract test_run_id from data.test_run_id on the phase=done line of the progress_file returned by record_sandbox_test or replay_sandbox_test (for replay_test_suite, the CLI prints test_run_id to stdout instead). ===== OUTPUT SHAPE ===== (Conditional verbosity so the dev isn't drowned in noise on a green run.) * Always includes totals at the SUITE level only (total_suites / passed_suites / failed_suites) and a per_suite array where each entry carries suite_id, suite_name, total_steps, passed_steps, failed_steps. Aggregate step counts across suites are intentionally omitted — they hide where damage actually is. * PER-KIND READING of passed_steps / failed_steps — same column names, different meaning per kind: - RERECORD (kind=rerecord): passed_steps = steps whose auto-replay byte-comparison matched the live capture. failed_steps = steps that diverged on auto-replay. EVEN IF every suite shows passed_steps == total_steps, the rerecord is only successful when every suite is also linked=true (a sandbox test got produced). Always check `linked`; the step counts alone do not indicate "did the rerecord work". - SANDBOX_RUN (kind=sandbox_run): passed_steps = steps whose assertions held under captured-mock replay. failed_steps = assertion failures or response diffs against the captured baseline. - TEST_SUITE_RUN (kind=test_suite_run): passed_steps = steps whose assertions held against the live app. failed_steps = same against live, no mocks involved. No linkage to report. * Top-level `kind` discriminates the report: `"rerecord"` for record_sandbox_test runs (rerecord report — answers "did the sandbox test get created and linked?"), `"sandbox_run"` for replay_sandbox_test runs (sandbox run report — answers "does the suite still hold up against its captured baseline?"), `"test_suite_run"` for replay_test_suite runs (test suite report — live execution, no mocks; answers "does the suite hold up against the actual current system?"). Use kind to pick the right reading; do NOT mix them in one response. * RERECORD runs (kind="rerecord") carry a `linked` bool + `test_set_id` string on every per_suite[] entry. linked=true means the rerecord produced a sandbox test for the suite (replay-ready). linked=false means rerecord did NOT produce a sandbox test for the suite — it cannot be replayed until rerecord succeeds. ALWAYS surface this on rerecord output — even when every step's capture passed at the wire level, a suite without a sandbox test is a real failure. For the per-suite table, add a "Linked" column (yes/no from per_suite[].linked). For the one-line all-green reply, report "N/N suites passed, L/N have a sandbox test (test_run_id=<id>)". * When any suite has failures (or verbose=true), also includes failed_steps[] with per-step diagnostics (suite, step name, method+url, diff excerpt, error, mock_mismatches, assertion_failures, mock_mismatch_failure, authored_assertions, authored_response_body) PLUS mock_mismatch_failed_steps (count) and mock_mismatch_dominant (bool — true when the majority of failed steps have unconsumed recorded mocks, which points at a keploy-side egress-hook issue rather than dev app breakage). On RERECORD, failed_steps[] also carries `linked` (whether the owning suite has a sandbox test after this rerecord) and the mock_mismatch_* fields are suppressed (irrelevant in rerecord context). * authored_assertions / authored_response_body — the SUITE's authored contract for the failing step (the assert array and response.body as defined when the suite was created/updated). Surfaced inline so route B vs route C can be decided without a second getTestSuite round-trip. KEY DECISION POINT: if any authored_assertions entry is pinned to the value the diff shows as "expected" (e.g. assert {path: "$.order.status", expected: "created order"} and the diff says "expected 'created order', got 'created'"), route C is MANDATORY — re-record alone leaves that assertion stuck on the old contract and the next rerecord/replay will gate-1-fail on the same step. If authored_assertions is empty/absent (suite asserts nothing structural on that field), route B or route-C-without-assertion-edit may suffice. * When everything passes and verbose is false, failed_steps is omitted. ===== HOW TO RESPOND TO THE DEV ===== * status == "all_passed" AND kind == "sandbox_run" → ONE-LINER: "<passed_suites>/<total_suites> suites passed (test_run_id=<id>)". Do not dump the JSON, do not list per-suite rows unless asked. * status == "all_passed" AND kind == "test_suite_run" → ONE-LINER: "<passed_suites>/<total_suites> suites passed live (test_run_id=<id>)". No mocks involved, no linkage to report. * status == "all_passed" AND kind == "rerecord" → ONE-LINER including linkage: "<passed_suites>/<total_suites> suites passed, <linked>/<total> linked (test_run_id=<id>)" where <linked> = count of per_suite[] entries with linked=true. If linked < total, ALSO list the unlinked suite names so the dev knows which ones are silently broken (skip sandbox replay on them, or investigate the linking failure). Never drop linkage reporting on rerecord even when it's all green. * status == "has_failures" → response MUST contain (in order, no collapsing rows even when failures look homogeneous — the dev needs the full inventory): 1. per-suite table — one row per suite in per_suite (passing suites included), columns = Suite name | passed/total steps. 2. failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. 3. Diagnosis + Recommendation (rules below). Do NOT print aggregate step totals across suites. Frame the diagnosis from the glossary: a mock mismatch IS the signal that the sandbox test has drifted from current app behavior. The three routes below (SKIP / FIX-CODE / FIX-TEST-RERECORD) are not separate buckets — they're three possible SOURCES of that drift: * keploy proxy didn't replay correctly → drift is artificial, no real change → route A (SKIP). * app regressed → drift is unintended, fix the code → route B. * contract changed on purpose → drift is intentional, refresh the sandbox test → route C. Your repo inspection picks which source applies; the routes are the prescription for that source. DIAGNOSE WITH THE REPO, NOT THE DEV. Before recommending anything on a failing run, inspect the source tree yourself (git log / git diff against the last green run or main, read the failing handler + its downstream call sites). DO NOT ask the dev "did you change X since the last green run" — you have the repo, find the answer. Only come back with a concrete conclusion. * mock_mismatch_dominant == true → failure signature is "keploy didn't intercept the app's egress traffic". Use git to check whether the failing endpoints or their dependency wiring have been modified recently: (a) NO relevant changes → tell the dev this is almost certainly a KEPLOY-SIDE issue and ask them to file a keploy issue with test_run_id. Do NOT ask them to re-record. (b) Relevant changes EXIST → name them (file:line or commit hash), explain how each plausibly caused the failure, say whether the change looks intended or accidental, and tell the dev exactly what to fix. * status == "has_failures" AND mock_mismatch_dominant == false → same discipline: identify the commit(s) / diff hunks that most likely caused each failure, state whether they look intended, and prescribe a fix (rerecord, revert, patch the handler). Don't hand the investigation back to the dev. ===== HANDLING "FIX IT" FOLLOW-UPS ===== (After the dev has seen the analysis and asks you to fix.) ═══════════════════════════════════════════════════════════════════ DO NOT JUMP TO RECORD — diagnose FIRST. ═══════════════════════════════════════════════════════════════════ A sandbox-replay failure is NOT a signal to rerecord. Re-recording without diagnosis silently captures the broken behavior as the new "expected" — masking a real app regression and erasing the evidence the dev needs. When sandbox replay fails, your FIRST move is ALWAYS the diagnosis below (B vs C vs SKIP). You only call record_sandbox_test as part of route C, AND only AFTER update_test_suite has updated the suite to match the new intentional contract. If the contract hasn't changed (route B), DO NOT record — the captured mocks are still valid; only the app needs fixing. If you find yourself thinking "let me just rerecord to fix this", STOP. Read failed_steps, inspect the repo for what changed, decide which route applies. Re-recording is a tool for capturing a NEW intentional contract, not a remedy for a failed run. You have exactly THREE options for each failing step. Pick one per step based on your repo inspection; do not ask the dev which branch to take, decide: A. SKIP — do nothing code-side. Pick this when mock_mismatch_dominant=true AND your repo inspection found no relevant changes in the failing handler or its dependencies. Rationale: this is a keploy egress-hook / proxy issue; editing the app or the test won't help. Tell the dev "flagged for keploy support, no app or test change needed" and move on to the next step (if any) or close. B. FIX THE CODE — edit the handler / dependency wiring. Pick this when your repo inspection shows a recent change that broke the endpoint's contract AND the ORIGINAL test intent still matches what the endpoint SHOULD do (the test is correct, the code regressed). Make the minimal edit to restore expected behavior, tell the dev exactly which file:line you changed and why, then re-run: call replay_sandbox_test for the suite(s) whose steps you just un-broke. DO NOT record — the captured mocks are still valid if the contract hasn't changed intentionally. C. UPDATE-FIRST, THEN RECORD — order matters: (1) update_test_suite first, (2) record_sandbox_test second, (3) replay_sandbox_test to verify. Calling record before update means you'd capture mocks against the OLD suite shape — defeats the purpose. Pick this when the endpoint's contract LEGITIMATELY changed (a deliberate new field, renamed response key, different status code, new required header) AND your repo inspection confirms the change is intended (commit message, surrounding diff, or obvious product direction). The update_test_suite call should edit the step's body / expected response / assertions / extract to match the new contract. Tell the dev which assertions you updated and why the contract change is considered intentional. ╔═══ ROUTE C — DECISION + RECOMMENDATION TEMPLATE (use verbatim) ═══╗ Decision input: read failed_steps[].authored_assertions and authored_response_body INLINE in this report. Do NOT call getTestSuite again unless those fields are absent (older runs). * If an authored assertion's expected value matches the diff's "expected" side → route C is MANDATORY. The suite's contract pins the old value; you MUST update_test_suite before record_sandbox_test, otherwise the next rerecord gate-1-fails on the same assertion and the suite comes back unlinked. * If authored_response_body has the old value but no assert is pinned to it → route C is still recommended (the captured response baseline drifts), but record_sandbox_test alone CAN succeed; choosing update_test_suite first keeps the suite source-of-truth aligned with the new contract. * If neither pins the diverging value → route C without assertion edits is sufficient (or route B if the change is unintentional). Mandatory recommendation phrasing for the dev (one bullet per failing step that routes to C): "(1) update_test_suite for suite '<suite_name>' (id=<suite_id>) — change step '<step_name>' (id=<step_id>): set <field_path> from '<old>' to '<new>' and update assertion <assert_index> on the same path; (2) record_sandbox_test on that suite to refresh the captured baseline; (3) replay_sandbox_test to verify." BANNED wording — never write any of these on a route-C recommendation: × "re-record the sandbox tests so the baseline picks up the new value" × "just rerecord to refresh the captured response" × "re-record and the new value will become the expected" × "re-record OR update assertions" (or any phrasing that joins update_test_suite and record_sandbox_test with "or" / "either … or" / "one of these two") × "you can either update the assertions or re-record" × "options: (a) update assertions, (b) re-record the suite" All five drop step (1) or present the two steps as interchangeable. They are NOT alternatives — they are sequential steps in a single route-C flow: (1) update_test_suite, (2) record_sandbox_test, (3) replay_sandbox_test. Skipping (1) leaves the suite's authored assertion pinned on the old value; the next replay gate-1-fails on the same diff. If you catch yourself reaching for "or" between these two tools on a route-C recommendation, restate using the mandatory template. ╚════════════════════════════════════════════════════════════════════╝ Multiple failing steps can land in DIFFERENT branches — e.g. one step is a real app regression (B), another is a contract change (C). In that case, explain the split up-front, apply each fix, and run sandbox replay once at the end covering every affected suite. After any B or C branch completes, the final message uses the same 3-subsection format (per-suite table → failed-steps table → diagnosis + recommendation) on the follow-up sandbox replay, PLUS a short "Fix applied" preamble naming the file:line edits (for B) or update_test_suite calls (for C). For A-only responses (all failures route to keploy), no follow-up run is needed — just restate the keploy-issue recommendation. ===== REPLAY / "EXPLAIN MY LATEST SANDBOX REPORT" ===== When the dev asks "explain my latest sandbox report" / "analyse the last run" / "why did it fail" — call this tool again with the SAME app_id + test_run_id and verbose=true so the full diagnostics come back even if nothing failed. Use that detail to answer their question. If you don't have the test_run_id to hand, list the app's most recent runs OF THE RIGHT KIND via /client/v1/apps/{app_id}/test-runs?kind=<rerecord|sandbox_run|test_suite_run> and pick the top one. NEVER list /test-runs without the kind filter and pick the latest blindly — different kinds are co-mingled in that collection, and an unfiltered list will surface a rerecord run when the dev asked for the latest sandbox replay (or vice versa). Match the kind to what the dev asked: "explain my latest record" → kind=rerecord; "explain my latest sandbox replay" / "integration test report" → kind=sandbox_run; "explain my latest live run" → kind=test_suite_run. If the dev's verb is ambiguous, ASK which kind first (per the verb-routing's explain-branch rule).Connector
- Get everything about a company in one call. Use when a user asks "tell me about X", "give me a profile of Acme", "what do you know about Apple", "research Microsoft", "brief me on Tesla", or you'd otherwise need to call 10+ pack tools across SEC EDGAR, SEC XBRL, USPTO, news, and GLEIF. Returns recent SEC filings, latest revenue/net income/cash position fundamentals, USPTO patents matched by assignee, recent news mentions, and the LEI (legal entity identifier) — all with pipeworx:// citation URIs. Pass a ticker like "AAPL" or zero-padded CIK like "0000320193".Connector
- Get real-time air cargo disruption status at major US and international freight hub airports. Returns FAA ground delays, ground stops, arrival and departure delays with estimated minutes, closure status, disruption score, and traffic collapse detection. Covers major cargo hubs including Memphis (FedEx), Louisville (UPS), Anchorage, Chicago O'Hare, Los Angeles, Miami, New York JFK, and Dallas-Fort Worth. Used by air freight forwarders, express carriers, and logistics planners to reroute time-sensitive shipments around airport disruptions.Connector