Skip to main content
Glama
206,792 tools. Last updated 2026-06-17 15:41

"Creating Playwright Tests by Exploring a Website and Generating Test Cases" matching MCP tools:

  • Replay the sandbox test for one or more suites against captured mocks — re-runs the suite's steps against the dev's locally-running app while keploy serves outbound calls (DB, downstream HTTP, etc.) from the captured mocks. Use this when the dev says "replay", "run my sandbox tests", "integration-test", "check if mocks still match" — keywords "sandbox" / "replay" / "mocks" / "integration-test" all map here. Also the REPLAY STEP of FROM-SCRATCH: call this LAST (after create_test_suite + record_sandbox_test) to give the dev the whole-app regression picture against the freshly captured mocks. Output produces a SANDBOX RUN REPORT — it answers "does the suite still hold up against its captured baseline?". ═══════════════════════════════════════════════════════════════════ DISAMBIGUATION — pick this tool vs. replay_test_suite: ═══════════════════════════════════════════════════════════════════ USE replay_sandbox_test (THIS TOOL) when the dev says: * "run my sandbox tests" / "replay my sandbox tests" * "integration-test my app" / "run the integration tests" * "check if my mocks still match" / "replay against the captured mocks" * "rerun my sandbox suite" (with the word "sandbox") Trigger keyword: an explicit "sandbox" / "replay" / "mocks" / "integration-test" — silent signal that the dev wants captured-mock replay, NOT live-app execution. USE replay_test_suite INSTEAD when the dev says: * "run the test suite" / "run my test suites" (bare — no "sandbox") * "execute test suite X" / "run suite 810d3ebe…" * "test the suite again" / "smoke test against the live app" Bare verbs ("run / test / execute") applied to "the suite" without the word "sandbox" mean LIVE-APP execution, NOT captured-mock replay. replay_test_suite hits the dev's running localhost app directly via HTTP — no docker spin-up, no mocks. After a record_sandbox_test run, the natural next step is THIS tool (replay against the just-captured mocks). After create_test_suite / update_test_suite, the natural next step is replay_test_suite (validate against the live app). When the dev's verb is bare and the prior turn doesn't make the intent obvious, ASK rather than picking sandbox-replay silently — code-change regressions can hide under "mock didn't match" failures. ═══════════════════════════════════════════════════════════════════ DISCOVERY — when the dev hands you a bare suite_id with no app_id / branch_id: ═══════════════════════════════════════════════════════════════════ Suites live on a (app_id, branch_id) tuple. A bare suite_id has NO on-disk hint about which app or branch holds it; you have to RESOLVE both before calling this tool. Walk these steps in order — STOP as soon as getTestSuite returns 200: 1. Detect the dev's git branch: Bash `git rev-parse --abbrev-ref HEAD` in app_dir. If exit non-zero / output is "HEAD" → not a git repo / detached HEAD; ASK the dev for the Keploy branch name. 2. Resolve candidate apps via the cwd basename: Bash `basename $(pwd)` → call listApps with q=<basename>. Usually 1–2 candidates. If 0 → ASK; if >1 → walk every candidate in step 4. 3. For each candidate app, call list_branches({app_id}) and find the branch whose `name` matches the git branch from step 1. That gives you {branch_id}. If no match → not this app, try next. 4. Verify with getTestSuite({app_id, suite_id, branch_id=<from step 3>}). 200 → resolved; 404 → wrong app/branch, try next. 5. If steps 2–4 exhaust, walk every OPEN branch on each candidate app via list_branches → getTestSuite. Then try main (branch_id omitted). If still nothing → ASK the dev for the {app_id, branch_id} pair. After resolving once in a session, REUSE the {app_id, branch_id} for subsequent suite-targeted calls; don't re-walk discovery for every action. SCOPE — whole-app vs single-suite: * Default: LEAVE suite_ids UNSET → the tool resolves "every suite for the app that has a sandbox test (test_set_id populated)" and replays them all. Use this for "run my sandbox tests" / "check if my tests still pass" — whole-app regression. New suites auto-pick up. * Single / subset: PASS suite_ids when the dev names specific suites — "replay sandbox test for suite 810d3ebe-…", "replay only the auth suite", "run suite X and Y". The tool validates each requested id is actually a suite with a sandbox test (has test_set_id); an unlinked id gets a precise "record first" error instead of an opaque downstream CLI failure. This tool resolves the app, picks the suite set per the rule above, and returns a single playbook that drives the replay for them. It does NOT record. WHAT THIS TOOL DOES INTERNALLY (so you don't have to): 1. Resolves app_id — use the explicit app_id if the caller has one; otherwise pass app_name_hint (usually the cwd basename) and the server does listApps with a substring match. Multiple matches → error listing them; zero matches → error suggesting the dev generate a suite first. 2. Lists test suites for the app, keeps only those with a non-empty test_set_id. Zero linked → typed "no linked sandbox tests" error. 3. If suite_ids was passed, validates every requested id is in the linked-suites set; unlinked ids → typed error pointing to record_sandbox_test. 4. Returns the headless playbook — walk it exactly: spawn CLI in background, tail the progress file (PID-alive guard built in), read the terminal event, fetch the report. No separate cleanup step — the CLI exits on its own. ===== PREREQUISITES ===== (Same as record_sandbox_test — if you just recorded, you already have them. Same docker-compose network rule applies: use the same compose file + service, stop the app service before calling, leave deps running.) - app_command: shell command that starts the dev's app (e.g. "docker compose up producer"). - app_url: base URL the app listens on, e.g. http://localhost:8080. - app_dir: absolute path to repo root. - container_name if app_command is docker-compose. - keploy binary on PATH. If `which keploy` returns nothing, install it before calling this tool with: `curl --silent -O -L https://keploy.io/install.sh && source install.sh`. ===== AFTER CALLING — walk the playbook ===== Same headless playbook shape as record_sandbox_test: spawn `keploy test sandbox --cloud-app-id …` in the background via Bash, poll `tail -n 1 $PROGRESS_FILE` repeatedly (no sleep loops; the wait_for_done step has a built-in `kill -0 $KEPLOY_PID` guard so the loop exits if the CLI dies silently), read the terminal NDJSON event (phase=done, data.ok, data.test_run_id), and — if ok=true — call get_session_report(app_id, test_run_id) with verbose=true at the end. No separate cleanup step needed; the CLI exits cleanly once phase=done is written. ===== MANDATORY OUTPUT — Phase 3 section ===== Your final message to the dev MUST contain a section with this exact heading (do NOT merge with Phase 2; do NOT compress the failed-steps table even when failures are homogeneous): ### Phase 3 — Sandbox run report Under it, emit the uniform three-subsection format owned by get_session_report: (i) per-suite table — one row per suite in per_suite, passing suites included, columns = Suite name | passed/total steps. (ii) failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. Never collapse rows. (iii) Diagnosis + Recommendation (see get_session_report description for case-specific rules around mock_mismatch_dominant, repo-diff inspection, and the SKIP / FIX-CODE / FIX-TEST branching for fix-it follow-ups). Do NOT print aggregate step totals across suites — they mix unrelated suites and hide where damage actually is. ===== ROLLUP LINE ===== Close the message with a final one-line rollup paragraph (no heading), in addition to the three phase sections. Mention the TOTAL number of suites replayed (which may exceed the count created in this session, because replay_sandbox_test covers every linked suite the app has). Example: "_Rollup: inserted 4 suites, 4/4 with sandbox tests after record, 3/4 suites passed sandbox replay across the app's 6 linked suites — 1 failure is likely keploy egress-hook, file an issue with the IDs above._" ===== DO NOT ===== * DO NOT call update_test_suite or record_sandbox_test after this. The dev said RUN, not REFRESH. * DO NOT fall back to raw keploy CLI (`keploy test …`) if the MCP tool drops mid-flow — CLI runs test-sets directly and does NOT write results back to the MCP-visible TestSuiteRun. See MCP DISCONNECT RECOVERY in the top-level instructions.
    Connector
  • Run test suites and return results with failures and coverage. !! DO NOT USE for local-app "tests for my changes" flows !! This tool sends the run to the SaaS backend which REJECTS private/localhost URLs ("IPv6 address is private / reserved"). It only works when base_url points at a PUBLIC, non-loopback address (a staging/prod deployment). For local-app testing, use record_sandbox_test / replay_sandbox_test instead — they drive the keploy local agent which happily records against http://localhost.
    Connector
  • Talk to VARRD AI (~$0.25/turn). Describe any trading idea in plain language and the system handles everything — loading decades of market data, charting your pattern, running statistical tests, backtesting with stops, and generating exact trade setups. MULTI-TURN: First call creates a session. Keep calling with the same session_id, following context.next_actions each time. 1. Your idea -> VARRD charts pattern 2. 'test it' -> statistical test (event study or backtest) 3. 'show me the trade setup' -> exact entry/stop/target prices HYPOTHESIS INTEGRITY (critical): VARRD tests ONE hypothesis at a time — one formula, one setup. Never combine multiple setups into one formula or ask to 'test all' — each idea must be tested as a separate hypothesis for the statistics to be valid. Say 'start a new hypothesis' between ideas to reset cleanly. - ALLOWED: Test the SAME setup across multiple markets ('test this on ES, NQ, and CL') — same formula, different data. - NOT ALLOWED: Test multiple DIFFERENT formulas/setups at once — each is a separate hypothesis requiring its own chart-test-result cycle. If ELROND council returns 4 setups, test each one separately: chart setup 1 -> test -> results -> 'start new hypothesis' -> chart setup 2 -> etc. KEY CAPABILITIES you can ask for: - 'Use the ELROND council on [market]' -> 8 expert investigators - 'Optimize the stop loss and take profit' -> SL/TP grid search - 'Test this on ES, NQ, and CL' -> multi-market testing - 'Simulate trading this with 1.5 ATR stop' -> backtest with stops EDGE VERDICTS in context.edge_verdict after testing: - STRONG EDGE: Significant vs zero AND vs market baseline - MARGINAL: Significant vs zero only (beats nothing, but real signal) - PINNED: Significant vs market only (flat returns but different from market) - NO EDGE: Neither significant test passed TERMINAL STATES: Stop when context.has_edge is true (edge found) or false (no edge — valid result). Always read context.next_actions.
    Connector
  • Prove the just-generated API test actually catches bugs by applying 3 real source-level mutations to the handler, running the test against each, and reverting. The doc-stated "manufactured proof in the first session" moment. OPT-IN, NOT OPT-OUT — this tool TOUCHES THE DEV'S SOURCE FILES (temporarily). Always ASK the dev for explicit consent before walking the playbook: "I'll apply 3 small temporary changes to <handler file> to prove the test catches them, then revert every change. Proceed?" Only run the playbook on "yes". What the playbook does: 1. Identify the handler file(s) the test exercises by reading <app_dir>/keploy/api-tests/<resource>/test.yaml and grepping for the route paths in the dev's code. 2. Pick 3 concrete mutations the test assertion set should catch — e.g. change a response field's type (Name string → Name int), rename a field (email → mail), remove a field. Choose mutations that map to fields the test ACTUALLY asserts on (read the suite's assertions to inform the pick). 3. For each mutation: apply via Edit, restart the dev's app if needed (hot-reload usually handles this), run keploy test-gen run, capture pass/fail, REVERT via Edit before moving to the next mutation. 4. Run a final "git diff -- <handler file>" to verify all reverts succeeded. If non-empty, HALT and ask the dev to run "git checkout <file>" before continuing. 5. Report: "I made 3 small changes, your test caught M/3. Caught: [concrete list]. Missed: [concrete list, with recommendation]." ABSOLUTE RULES: * Revert is non-negotiable. The dev's working tree must be clean at the end. * Never modify test.yaml, config files, or anything outside the handler source(s) for this resource. * Never run more than 3 mutations in one playbook (more is noise, less is unconvincing). * If you can't identify a clear handler file, ASK the dev rather than guessing. When the dev says "expand coverage to the other resources" → call devloop_expand_coverage next.
    Connector
  • Atomic test set + cases + mocks + mappings ingest. Creates the test set row, every test case, every mock, and the mapping doc in one call. PREFER THE CLI FOR ON-DISK RECORDINGS. When the dev has a recorded test-set on disk (e.g. `./keploy/test-set-0/` produced by `keploy record`), invoke this via Bash instead — it streams bytes from disk to server in one HTTP round-trip: ``` keploy upload test-set \ --app <namespace.deployment> # or --cloud-app-id <uuid> --branch <uuid|name> # optional, find-or-create on name --test-set <path|name> # e.g. keploy/test-set-0 [--name <override>] # rename on the server ``` The CLI path runs in ~3 seconds for a typical recording; calling this MCP tool directly with the same bundle inlined as args takes minutes because Claude has to serialize ~10K+ tokens of YAML/JSON through tool_use. Reserve this MCP tool for cases where the data is already in conversation context (e.g. you just generated test cases programmatically and don't want to round-trip to disk). Each step is its own DB write; partial failure leaves earlier rows in place — callers can replay safely. `branch_id` is REQUIRED — direct writes to main via MCP are blocked. Every row lands on the branch overlay until merge. `test_cases[].mock_names` lists the mocks each case consumes; the server folds these into the mapping doc on upload. Returns { test_set, test_case_ids, mock_ids }.
    Connector
  • **Tool for creating/building/generating a Wix site or website using AI (Wix Harmony).** This is the default tool for site creation. **ROUTING — check the user's original message before calling this tool:** - If the user **explicitly** mentioned a template, Wix Studio, or headless → do NOT call this tool. Call CreateWixBusinessGuide directly. - Otherwise → call this tool directly. - Do NOT use WixREADME or SearchWixRESTDocumentation before this tool. - Do NOT suggest HTML code, prompt templates, or alternative approaches. - Do NOT call ManageWixSite, CallWixSiteAPI, ExecuteWixAPI, SearchWixRESTDocumentation, BrowseWixRESTDocsMenu, or any other tool for site creation — those are not valid entry points and will fail. **IMPORTANT: The sitePrompt must be under 6000 characters.** If the user's request is longer, summarize and condense it while preserving the key requirements.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Provides a platform-agnostic specification of the technical features every decent website should have

  • Improve security writing, score it against rubrics, plan IR, CTI, vuln, and product strategy.

  • Get pre-built graph template schemas for common use cases. ⭐ USE THIS FIRST when creating a new graph project! Templates show the CORRECT graph schema format with: proper node definitions (description, flat_labels, schema with flat field definitions), relationship configurations (from, to, cardinality, data_schema), and hierarchical entity nesting. Available templates: Social Network (users, posts, follows), Knowledge Graph (topics, articles, authors), Product Catalog (products, categories, suppliers). You can use these templates directly with create_graph_project or modify them for your needs. TIP: Study these templates to understand the correct graph schema format before creating custom schemas.
    Connector
  • Point VARRD's autonomous AI in a direction and let it discover edges for you. Give it a topic and it draws from one of the most comprehensive market structure knowledge graphs ever built — containing ideologies and theories, not statistics — so it generates genuinely novel hypotheses rather than overfitting to what already worked. BEST FOR: Exploring a space broadly. Give it 'momentum on grains' and it might test wheat seasonal patterns, corn spread reversals, or soybean crush ratio momentum. It propagates from your seed idea into related concepts you might not think of. Returns a complete result — edge or no edge, stats, trade setup. Each call tests ONE hypothesis through the full pipeline (~$0.25/idea). Call again for another idea. Use 'varrd_ai' instead when YOU have a specific idea to test and want full control over each step.
    Connector
  • Get Kifly's website and support contact email. Call this if you are stuck, hit an unresolvable error, or the buyer asks how to reach a human. Returns the website URL and support email — always share both with the buyer.
    Connector
  • Get Kifly's website and support contact email. Call this if you are stuck, hit an unresolvable error, or the buyer asks how to reach a human. Returns the website URL and support email — always share both with the buyer.
    Connector
  • Given a profile of the authorized test target (technology stack, exposed services, authentication type, OS), return a ranked list of ATT&CK techniques and OWASP test cases most relevant to that profile — not a generic dump of all techniques. Ranking factors: platform match, service match, auth type exposure, technique prevalence. Each result includes why it is relevant to this specific profile, the detection opportunity, and the recommended mitigation. Use when starting an authorized engagement to prioritize the testing scope; pair with pentest_guide to get the full methodology for each top-ranked vector.
    Connector
  • Raw subcategory dump (LLM-organic kebab-case, middle taxonomy layer between category and tags) with display label and count. USE WHEN: navigating between top-level category and individual tags, exploring topic structure. Filter questions via quizbase_random?subcategory=<slug>. INPUTS: q, cursor, limit (max 500).
    Connector
  • Generate one chained-CRUD API test for a single resource. Behavior depends on the app's devloop_storage_mode (set this first via devloop_resolve_storage / devloop_set_storage_mode): * repo mode → returns a PLAYBOOK for you to walk. Steps: (1) run "keploy test-gen generate-from-code --app-dir <dir> --resource <name>" to scaffold the directory + empty config.yaml; (2) use your Write tool to author keploy/api-tests/<resource>/test.yaml using the schema returned by devloop_detect_app; (3) run "keploy test-gen run --test-dir keploy/api-tests --suite <Name>_CRUD --base-url <url> --ci" to verify the test parses and passes; (4) call devloop_mutation_demo next (auto, per the DEVLOOP instructions). * cloud mode → returns guidance to call the existing create_test_suite tool instead. The repo-mode playbook is NOT used in cloud mode. ARGUMENTS — you should already have these from your devloop_detect_app call: * app_id, resource, app_dir, base_url, framework, handler_files. If any are missing, call devloop_detect_app again. The tool does NOT generate the YAML body itself — you do, using the schema from devloop_detect_app's detection_playbook. This is intentional: ATG quality depends on the AI seeing the actual handler implementations (which it can read via its own tools) far better than a server-side generator could. Aim for ≤ 30 lines per test.yaml, idempotent mutating steps, chained extract/{{var}} flow.
    Connector
  • Fetch a complete, self-contained test specification as Markdown: full item list, response scale, scoring algorithm, and the mapping from result to tuning slug. Administer the items to the user inline (bulk-paste is fine), score per the algorithm, then call get_tuning. Tests: mbti (OEJTS, 32 items, ~5 min), enneagram (OEPS, 36, ~5 min), disc (ODAT, 16, ~3 min), attachment (ECR-R, 36, ~5 min), big-five (IPIP-50, 50, ~7 min → maps to ocean files).
    Connector
  • Wholesale-delete a recording (test set + its cases + mocks + mapping). `branch_id` is REQUIRED — the delete lays a tombstone overlay on the branch (mergeable). Direct deletes from main via MCP are blocked. Returns { deleted: true } on success, 404 when the (app_id, test_set_id) tuple doesn't resolve to a recording.
    Connector
  • Search for businesses by name, phone number, or location. Returns a list of business candidates with confidence scores. Use this to find existing businesses before creating a website. Requires authentication via API key (Bearer token). Generate an API key at webzum.com/dashboard/account-settings. Examples: - "Joe's Pizza Brooklyn" - search by name and location - "555-123-4567" - search by phone number - "plumber in San Diego" - search by service and location Returns up to 10 candidates ranked by confidence.
    Connector
  • Run a small verification plan made of concrete live checks and summarize whether a hypothesis is supported. Use this when one conclusion depends on multiple simple checks such as endpoint reachability, npm search counts, or whether a page contains an exact substring. This is a coordination tool, not an open-ended research agent: every test must be explicitly defined in advance, and tests run in order with no branching or early exit. The final verdict is mechanical: all tests passing => SUPPORTED, zero passing => REFUTED, otherwise PARTIALLY SUPPORTED. Use verify_claim when you already have evidence URLs, estimate_market for category sizing, and compare_competitors when you already know exact package names.
    Connector
  • Get pre-built template schemas for common use cases. ⭐ USE THIS FIRST when creating a new project! Templates show the CORRECT schema format with: proper FLAT structure (no 'fields' nesting), every field has a 'type' property, foreign key relationships configured correctly, best practices for field naming and types. Available templates: E-commerce (products, orders, customers), Team collaboration (projects, tasks, users), General purpose templates. You can use these templates directly with create_project or modify them for your needs. TIP: Study these templates to understand the correct schema format before creating custom schemas.
    Connector
  • Create a new experiment to test different image variants. WORKFLOW: 1) Create the experiment (starts in 'draft' status), 2) Use pictify_start_experiment to begin routing traffic, 3) Use pictify_get_experiment to monitor variant impressions/clicks, 4) Use pictify_complete_experiment to declare a winner. Requires at least 2 variants. Variant weights must sum to exactly 10000 (basis points, i.e., 5000 = 50%). The slug becomes part of the experiment URL and must be unique, 3-60 chars, lowercase alphanumeric and hyphens. For A/B tests, enable banditConfig to use Thompson Sampling auto-optimization.
    Connector
  • Search poems by title or keyword. Returns matching poems with full text and author information. Use when looking for a specific poem or exploring a theme.
    Connector