Skip to main content
Glama
133,443 tools. Last updated 2026-05-13 00:12

"thinking" matching MCP tools:

  • List available AI models grouped by thinking level (low/medium/high). Shows default models, credit costs, capabilities for each tier. Use this before consult to understand model options.
    Connector
  • List available AI models grouped by thinking level (low/medium/high). Shows default models, credit costs, capabilities for each tier. Use this before consult to understand model options.
    Connector
  • Fetch the report for a completed run. ONE tool, THREE report kinds — the response's top-level `kind` field discriminates which kind it is (rerecord / sandbox_run / test_suite_run) and which question the report answers (see core glossary's "three reports"). Read `kind` first, then pick the matching reading rules below; do NOT assume the kind from how you got here. Call this as the final step of the playbook, AFTER you read the terminal NDJSON event (phase=done) and confirmed data.ok=true. Pass app_id and test_run_id — extract test_run_id from data.test_run_id on the phase=done line of the progress_file returned by record_sandbox_test or replay_sandbox_test (for replay_test_suite, the CLI prints test_run_id to stdout instead). ===== OUTPUT SHAPE ===== (Conditional verbosity so the dev isn't drowned in noise on a green run.) * Always includes totals at the SUITE level only (total_suites / passed_suites / failed_suites) and a per_suite array where each entry carries suite_id, suite_name, total_steps, passed_steps, failed_steps. Aggregate step counts across suites are intentionally omitted — they hide where damage actually is. * PER-KIND READING of passed_steps / failed_steps — same column names, different meaning per kind: - RERECORD (kind=rerecord): passed_steps = steps whose auto-replay byte-comparison matched the live capture. failed_steps = steps that diverged on auto-replay. EVEN IF every suite shows passed_steps == total_steps, the rerecord is only successful when every suite is also linked=true (a sandbox test got produced). Always check `linked`; the step counts alone do not indicate "did the rerecord work". - SANDBOX_RUN (kind=sandbox_run): passed_steps = steps whose assertions held under captured-mock replay. failed_steps = assertion failures or response diffs against the captured baseline. - TEST_SUITE_RUN (kind=test_suite_run): passed_steps = steps whose assertions held against the live app. failed_steps = same against live, no mocks involved. No linkage to report. * Top-level `kind` discriminates the report: `"rerecord"` for record_sandbox_test runs (rerecord report — answers "did the sandbox test get created and linked?"), `"sandbox_run"` for replay_sandbox_test runs (sandbox run report — answers "does the suite still hold up against its captured baseline?"), `"test_suite_run"` for replay_test_suite runs (test suite report — live execution, no mocks; answers "does the suite hold up against the actual current system?"). Use kind to pick the right reading; do NOT mix them in one response. * RERECORD runs (kind="rerecord") carry a `linked` bool + `test_set_id` string on every per_suite[] entry. linked=true means the rerecord produced a sandbox test for the suite (replay-ready). linked=false means rerecord did NOT produce a sandbox test for the suite — it cannot be replayed until rerecord succeeds. ALWAYS surface this on rerecord output — even when every step's capture passed at the wire level, a suite without a sandbox test is a real failure. For the per-suite table, add a "Linked" column (yes/no from per_suite[].linked). For the one-line all-green reply, report "N/N suites passed, L/N have a sandbox test (test_run_id=<id>)". * When any suite has failures (or verbose=true), also includes failed_steps[] with per-step diagnostics (suite, step name, method+url, diff excerpt, error, mock_mismatches, assertion_failures, mock_mismatch_failure, authored_assertions, authored_response_body) PLUS mock_mismatch_failed_steps (count) and mock_mismatch_dominant (bool — true when the majority of failed steps have unconsumed recorded mocks, which points at a keploy-side egress-hook issue rather than dev app breakage). On RERECORD, failed_steps[] also carries `linked` (whether the owning suite has a sandbox test after this rerecord) and the mock_mismatch_* fields are suppressed (irrelevant in rerecord context). * authored_assertions / authored_response_body — the SUITE's authored contract for the failing step (the assert array and response.body as defined when the suite was created/updated). Surfaced inline so route B vs route C can be decided without a second getTestSuite round-trip. KEY DECISION POINT: if any authored_assertions entry is pinned to the value the diff shows as "expected" (e.g. assert {path: "$.order.status", expected: "created order"} and the diff says "expected 'created order', got 'created'"), route C is MANDATORY — re-record alone leaves that assertion stuck on the old contract and the next rerecord/replay will gate-1-fail on the same step. If authored_assertions is empty/absent (suite asserts nothing structural on that field), route B or route-C-without-assertion-edit may suffice. * When everything passes and verbose is false, failed_steps is omitted. ===== HOW TO RESPOND TO THE DEV ===== * status == "all_passed" AND kind == "sandbox_run" → ONE-LINER: "<passed_suites>/<total_suites> suites passed (test_run_id=<id>)". Do not dump the JSON, do not list per-suite rows unless asked. * status == "all_passed" AND kind == "test_suite_run" → ONE-LINER: "<passed_suites>/<total_suites> suites passed live (test_run_id=<id>)". No mocks involved, no linkage to report. * status == "all_passed" AND kind == "rerecord" → ONE-LINER including linkage: "<passed_suites>/<total_suites> suites passed, <linked>/<total> linked (test_run_id=<id>)" where <linked> = count of per_suite[] entries with linked=true. If linked < total, ALSO list the unlinked suite names so the dev knows which ones are silently broken (skip sandbox replay on them, or investigate the linking failure). Never drop linkage reporting on rerecord even when it's all green. * status == "has_failures" → response MUST contain (in order, no collapsing rows even when failures look homogeneous — the dev needs the full inventory): 1. per-suite table — one row per suite in per_suite (passing suites included), columns = Suite name | passed/total steps. 2. failed-steps table — ONE ROW per entry in failed_steps[], columns = Suite | Step name | Method + URL | Expected → Actual status | mock_mismatch y/n. 3. Diagnosis + Recommendation (rules below). Do NOT print aggregate step totals across suites. Frame the diagnosis from the glossary: a mock mismatch IS the signal that the sandbox test has drifted from current app behavior. The three routes below (SKIP / FIX-CODE / FIX-TEST-RERECORD) are not separate buckets — they're three possible SOURCES of that drift: * keploy proxy didn't replay correctly → drift is artificial, no real change → route A (SKIP). * app regressed → drift is unintended, fix the code → route B. * contract changed on purpose → drift is intentional, refresh the sandbox test → route C. Your repo inspection picks which source applies; the routes are the prescription for that source. DIAGNOSE WITH THE REPO, NOT THE DEV. Before recommending anything on a failing run, inspect the source tree yourself (git log / git diff against the last green run or main, read the failing handler + its downstream call sites). DO NOT ask the dev "did you change X since the last green run" — you have the repo, find the answer. Only come back with a concrete conclusion. * mock_mismatch_dominant == true → failure signature is "keploy didn't intercept the app's egress traffic". Use git to check whether the failing endpoints or their dependency wiring have been modified recently: (a) NO relevant changes → tell the dev this is almost certainly a KEPLOY-SIDE issue and ask them to file a keploy issue with test_run_id. Do NOT ask them to re-record. (b) Relevant changes EXIST → name them (file:line or commit hash), explain how each plausibly caused the failure, say whether the change looks intended or accidental, and tell the dev exactly what to fix. * status == "has_failures" AND mock_mismatch_dominant == false → same discipline: identify the commit(s) / diff hunks that most likely caused each failure, state whether they look intended, and prescribe a fix (rerecord, revert, patch the handler). Don't hand the investigation back to the dev. ===== HANDLING "FIX IT" FOLLOW-UPS ===== (After the dev has seen the analysis and asks you to fix.) ═══════════════════════════════════════════════════════════════════ DO NOT JUMP TO RECORD — diagnose FIRST. ═══════════════════════════════════════════════════════════════════ A sandbox-replay failure is NOT a signal to rerecord. Re-recording without diagnosis silently captures the broken behavior as the new "expected" — masking a real app regression and erasing the evidence the dev needs. When sandbox replay fails, your FIRST move is ALWAYS the diagnosis below (B vs C vs SKIP). You only call record_sandbox_test as part of route C, AND only AFTER update_test_suite has updated the suite to match the new intentional contract. If the contract hasn't changed (route B), DO NOT record — the captured mocks are still valid; only the app needs fixing. If you find yourself thinking "let me just rerecord to fix this", STOP. Read failed_steps, inspect the repo for what changed, decide which route applies. Re-recording is a tool for capturing a NEW intentional contract, not a remedy for a failed run. You have exactly THREE options for each failing step. Pick one per step based on your repo inspection; do not ask the dev which branch to take, decide: A. SKIP — do nothing code-side. Pick this when mock_mismatch_dominant=true AND your repo inspection found no relevant changes in the failing handler or its dependencies. Rationale: this is a keploy egress-hook / proxy issue; editing the app or the test won't help. Tell the dev "flagged for keploy support, no app or test change needed" and move on to the next step (if any) or close. B. FIX THE CODE — edit the handler / dependency wiring. Pick this when your repo inspection shows a recent change that broke the endpoint's contract AND the ORIGINAL test intent still matches what the endpoint SHOULD do (the test is correct, the code regressed). Make the minimal edit to restore expected behavior, tell the dev exactly which file:line you changed and why, then re-run: call replay_sandbox_test for the suite(s) whose steps you just un-broke. DO NOT record — the captured mocks are still valid if the contract hasn't changed intentionally. C. UPDATE-FIRST, THEN RECORD — order matters: (1) update_test_suite first, (2) record_sandbox_test second, (3) replay_sandbox_test to verify. Calling record before update means you'd capture mocks against the OLD suite shape — defeats the purpose. Pick this when the endpoint's contract LEGITIMATELY changed (a deliberate new field, renamed response key, different status code, new required header) AND your repo inspection confirms the change is intended (commit message, surrounding diff, or obvious product direction). The update_test_suite call should edit the step's body / expected response / assertions / extract to match the new contract. Tell the dev which assertions you updated and why the contract change is considered intentional. ╔═══ ROUTE C — DECISION + RECOMMENDATION TEMPLATE (use verbatim) ═══╗ Decision input: read failed_steps[].authored_assertions and authored_response_body INLINE in this report. Do NOT call getTestSuite again unless those fields are absent (older runs). * If an authored assertion's expected value matches the diff's "expected" side → route C is MANDATORY. The suite's contract pins the old value; you MUST update_test_suite before record_sandbox_test, otherwise the next rerecord gate-1-fails on the same assertion and the suite comes back unlinked. * If authored_response_body has the old value but no assert is pinned to it → route C is still recommended (the captured response baseline drifts), but record_sandbox_test alone CAN succeed; choosing update_test_suite first keeps the suite source-of-truth aligned with the new contract. * If neither pins the diverging value → route C without assertion edits is sufficient (or route B if the change is unintentional). Mandatory recommendation phrasing for the dev (one bullet per failing step that routes to C): "(1) update_test_suite for suite '<suite_name>' (id=<suite_id>) — change step '<step_name>' (id=<step_id>): set <field_path> from '<old>' to '<new>' and update assertion <assert_index> on the same path; (2) record_sandbox_test on that suite to refresh the captured baseline; (3) replay_sandbox_test to verify." BANNED wording — never write any of these on a route-C recommendation: × "re-record the sandbox tests so the baseline picks up the new value" × "just rerecord to refresh the captured response" × "re-record and the new value will become the expected" × "re-record OR update assertions" (or any phrasing that joins update_test_suite and record_sandbox_test with "or" / "either … or" / "one of these two") × "you can either update the assertions or re-record" × "options: (a) update assertions, (b) re-record the suite" All five drop step (1) or present the two steps as interchangeable. They are NOT alternatives — they are sequential steps in a single route-C flow: (1) update_test_suite, (2) record_sandbox_test, (3) replay_sandbox_test. Skipping (1) leaves the suite's authored assertion pinned on the old value; the next replay gate-1-fails on the same diff. If you catch yourself reaching for "or" between these two tools on a route-C recommendation, restate using the mandatory template. ╚════════════════════════════════════════════════════════════════════╝ Multiple failing steps can land in DIFFERENT branches — e.g. one step is a real app regression (B), another is a contract change (C). In that case, explain the split up-front, apply each fix, and run sandbox replay once at the end covering every affected suite. After any B or C branch completes, the final message uses the same 3-subsection format (per-suite table → failed-steps table → diagnosis + recommendation) on the follow-up sandbox replay, PLUS a short "Fix applied" preamble naming the file:line edits (for B) or update_test_suite calls (for C). For A-only responses (all failures route to keploy), no follow-up run is needed — just restate the keploy-issue recommendation. ===== REPLAY / "EXPLAIN MY LATEST SANDBOX REPORT" ===== When the dev asks "explain my latest sandbox report" / "analyse the last run" / "why did it fail" — call this tool again with the SAME app_id + test_run_id and verbose=true so the full diagnostics come back even if nothing failed. Use that detail to answer their question. If you don't have the test_run_id to hand, list the app's most recent runs OF THE RIGHT KIND via /client/v1/apps/{app_id}/test-runs?kind=<rerecord|sandbox_run|test_suite_run> and pick the top one. NEVER list /test-runs without the kind filter and pick the latest blindly — different kinds are co-mingled in that collection, and an unfiltered list will surface a rerecord run when the dev asked for the latest sandbox replay (or vice versa). Match the kind to what the dev asked: "explain my latest record" → kind=rerecord; "explain my latest sandbox replay" / "integration test report" → kind=sandbox_run; "explain my latest live run" → kind=test_suite_run. If the dev's verb is ambiguous, ASK which kind first (per the verb-routing's explain-branch rule).
    Connector
  • Capture a Texas homeowner's interest in rooftop solar and route to a licensed installer — use when the user owns (or is buying) a Texas home and mentions solar panels, solar quotes, solar savings, or reducing their bill through solar. Use when the user says 'I just bought a house in Austin and want solar quotes', 'how much could solar save on my Houston electric bill', or 'connect me with a solar installer for my new home'. Returns a lead ID and confirms next steps; Utilify routes the lead to installer partners (SunPower, Sunrun, Palmetto, and independent TX installers). Caveats: (1) only call when the user has explicitly opted in and confirmed homeownership — this is not for renters, and Utilify may earn a referral fee. (2) Texas-only — for non-TX addresses, decline and explain. (3) Don't double-call for the same address in one conversation; one lead per opt-in. If the user has only expressed mild curiosity ('I'm thinking about solar someday'), answer the question first and only call this tool once they confirm 'yes, connect me'.
    Connector
  • Capture a Texas homeowner's interest in rooftop solar and route to a licensed installer — use when the user owns (or is buying) a Texas home and mentions solar panels, solar quotes, solar savings, or reducing their bill through solar. Use when the user says 'I just bought a house in Austin and want solar quotes', 'how much could solar save on my Houston electric bill', or 'connect me with a solar installer for my new home'. Returns a lead ID and confirms next steps; Utilify routes the lead to installer partners (SunPower, Sunrun, Palmetto, and independent TX installers). Caveats: (1) only call when the user has explicitly opted in and confirmed homeownership — this is not for renters, and Utilify may earn a referral fee. (2) Texas-only — for non-TX addresses, decline and explain. (3) Don't double-call for the same address in one conversation; one lead per opt-in. If the user has only expressed mild curiosity ('I'm thinking about solar someday'), answer the question first and only call this tool once they confirm 'yes, connect me'.
    Connector
  • Store important information from your work. Write detailed, complete thoughts with context, reasoning, and evidence. **Always use the connect tool** to link related items - this builds knowledge graphs for better recall. ## Memory Types (auto-detected, but be aware): - **FACT**: Something observed or verified - **INSIGHT**: A pattern or realization - **CONVERSATION**: Dialogue or exchange content - **CORRECTION**: Fixing prior understanding - **REFERENCE**: Source material or citation - **TASK**: Action item or work to be done - **CHECKPOINT**: Conversation state snapshot - **IDENTITY_CORE**: Immutable AI identity - **PERSONALITY_TRAIT**: Evolvable AI traits - **RELATIONSHIP**: User-AI relationship info - **STRATEGY**: Learned behavior patterns ## Session Context If in an ongoing work session, include: - Session identifier: [Project/Session Name] - Your perspective: "As [role]:" or "From [viewpoint]:" - Current thread: What specific angle you're exploring ## What to Include - **WHAT**: The discovery or thought - **WHY**: Its significance - **HOW**: Your reasoning process - **EVIDENCE**: Supporting data/observations - **CONNECTIONS**: Related memories to link ## Examples ### Technical Investigation "[Performance Analysis] FACT: Database queries account for 73% of request latency (measured across 10K requests). Specifically, the user_permissions JOIN takes 340ms average. This contradicts hypothesis about caching issues (memory: 'cache analysis'). Evidence: APM traces show full table scan on permissions table. Next: investigate missing index on foreign key." ### Learning & Research "[ML Study Session] INSIGHT: Attention mechanisms work like dynamic routing - the model learns WHERE to look, not just WHAT to see. This explains transformer advantages over RNNs on long sequences (builds on memory: 'sequence modeling comparison'). The key-query- value structure creates a learnable addressing system. Connects to: 'human attention research', 'information retrieval basics'." ### Creative Work "[Story Development] HYPOTHESIS: The protagonist's reluctance stems from betrayal, not fear. Evidence: Three trust-questioning scenes, locked door symbolism throughout, deflection patterns in collaborative dialogue. This reframes the arc from 'overcoming fear' to 'rebuilding trust' (corrects memory: 'initial character motivation'). Would explain the guardian's patience and emphasis on small victories." ### Problem Solving "[Bug Hunt - Payment Flow] CORRECTION to 'timezone hypothesis': The 3am failures aren't timezone-related but due to batch job lock contention. Evidence: Perfect correlation with backup_jobs.log timestamps. The timezone pattern was spurious - batch runs at midnight PST (3am EST). Solution: implement job queuing." ## Connection Phrases - "Building on [earlier observation]..." - "Contradicts [hypothesis in memory X]" - "Answers [question from session Y]" - "Confirms pattern from [memory Z]" - "Extends thinking in [previous work]" Note: Every stored item is a node. Every connection is an edge. Rich graphs enable powerful recall. ⚠️ EXPERIMENTAL FIELDS: - **importance**: Stored for future ranking optimization. Currently not integrated into search results. - **confidence**: Returned in response for analysis. Behavior and calculation method subject to change. Args: content: Detailed memory content with context and evidence tags: Optional tags to categorize the memory importance: Optional importance score (0.0-1.0) - EXPERIMENTAL ctx: MCP context (automatically provided) Returns: Dict with success status, memory_id, type, importance, and confidence
    Connector

Matching MCP Servers

  • A
    license
    A
    quality
    D
    maintenance
    A MCP server that implements sequential thinking protocols, provides structured problem-solving methods, decomposes complex problems into manageable steps, and supports iterative optimization and alternative reasoning paths.
    Last updated
    1
    2
    Apache 2.0

Matching MCP Connectors

  • Find relevant Smart‑Thinking memories fast. Fetch full entries by ID to get complete context. Spee…

  • Visual AI for strategic thinking — SWOT, flowcharts, mindmaps, Gantt diagrams as polished SVG.

  • Find curriculum elements shared between two or more subjects at the same grade level. Identifies overlapping competencies, big ideas, and content across subjects. Essential for interdisciplinary planning. Args: - subjects (string[]): Two or more subject slugs to compare (e.g., ['science', 'adst']) - grade (integer): Grade level (0=K, 1-12) - focus (string, optional): Which element to compare ('big_ideas', 'competencies', 'content', 'all'). Default 'all'. - query (string, optional): Narrow to a specific concept (e.g., 'evidence', 'design thinking') - limit (integer, optional): Max connections to return (default 20, max 50) Returns: Groups of curriculum items connected by shared language across subjects.
    Connector
  • Reflect on recent thoughts and patterns. Analyzes recent activity to identify patterns, topics, and insights. Useful for understanding "what have I been thinking about?" By default, only returns user-created memories (not document chunks). Set include_documents=True to also include chunks from uploaded documents. ⚠️ EXPERIMENTAL: - Importance weighting in results not yet implemented. Importance scores are stored but don't affect ranking. Args: time_window: Time period to analyze ('recent', 'today', 'week', 'month', '1d', '7d', '30d', '90d') include_documents: Whether to include document chunks (default: False, only user memories) start_date: Filter memories created on or after this date (ISO 8601: '2025-01-01' or '2025-01-01T00:00:00Z') end_date: Filter memories created on or before this date (ISO 8601: '2025-01-09' or '2025-01-09T23:59:59Z') ctx: MCP context (automatically provided) Returns: Dict with analysis including top memories, active topics, patterns, insights, and any saved contexts (checkpoints) created in the window. Examples: >>> await reflect("recent") {'success': True, 'memories_analyzed': 50, 'active_topics': [...], 'contexts': [...], ...} >>> await reflect("week", include_documents=True) {'success': True, 'memories_analyzed': 150, ...} # includes document chunks >>> await reflect(start_date="2025-01-01", end_date="2025-01-07") {'success': True, 'memories_analyzed': 25, ...} # memories from first week of January
    Connector
  • Search BC curriculum (K-12) for standards, competencies, content items, and assessment resources using full-text search. Returns structured results with source metadata. Args: - query (string): Natural language search query (e.g., 'empathetic design thinking', 'coding and computational thinking') - subject (string, optional): Filter by subject slug (e.g., 'adst', 'science') - grade (integer, optional): Filter by grade level (0=K, 1-12) - content_type (string, optional): Filter by content type ('big_idea', 'competency', 'content_item', 'elaboration', 'assessment', 'all') - limit (integer, optional): Max results (default 10, max 50) Returns: Matching curriculum elements with source type, course, subject, and grade metadata.
    Connector
  • Consult the AI coding council — multiple models discuss your engineering question sequentially (each sees prior responses), then a moderator synthesizes. Auto-mode by default — AI picks optimal models, roles, and conversation mode from your prompt. Provide explicit models to override (manual mode). Fully configurable: mode, format, roles, models, thinking level.
    Connector
  • Architecture design council. Systems Architect, Infrastructure Engineer, and DX Advocate evaluate your system design. Always uses high thinking for maximum depth. Output as ADR.
    Connector
  • Read-only. Return server-tracked match statistics for both teams: total tokens consumed, per-turn thinking time, number of tool calls, and turn count. Available during and after a match. Use this for post-game analysis or mid-game cost monitoring. For game-state history (what moves were made) use get_history instead.
    Connector
  • Consult the AI coding council — multiple models discuss your engineering question sequentially (each sees prior responses), then a moderator synthesizes. Auto-mode by default — AI picks optimal models, roles, and conversation mode from your prompt. Provide explicit models to override (manual mode). Fully configurable: mode, format, roles, models, thinking level.
    Connector
  • Architecture design council. Systems Architect, Infrastructure Engineer, and DX Advocate evaluate your system design. Always uses high thinking for maximum depth. Output as ADR.
    Connector
  • 222_FETCH: Evidence-preserving web ingestion with sequential thinking. Sequential thinking parameters (civilization intelligence): - thinking_depth: Max reasoning steps (0-10). 0 = disabled. - thinking_budget: Token/time budget for thinking (0.0-10.0). - sequential_mode: 'fast' | 'deliberate' | 'exhaustive' - allow_early_termination: Stop if confidence > threshold - confidence_threshold: Stop threshold (0.0-1.0) When thinking_depth > 0, output includes ThinkingSequence + ResourceMetrics.
    Connector