Multi-step Hypothesis Test
test_hypothesisRun a predefined sequence of live checks—such as endpoint reachability, npm search counts, or substring presence—to test a hypothesis and produce a clear pass/fail summary.
Instructions
Run a small verification plan made of concrete live checks and summarize whether a hypothesis is supported. Use this when one conclusion depends on multiple simple checks such as endpoint reachability, npm search counts, or whether a page contains an exact substring. This is a coordination tool, not an open-ended research agent: every test must be explicitly defined in advance. Use verify_claim when you already have evidence URLs, estimate_market for category sizing, and compare_competitors when you already know exact package names.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| hypothesis | Yes | Claim to test, for example 'there are fewer than 50 MCP email servers on npm'. | |
| tests | Yes | Ordered list of one to ten checks to run. Each test object uses only the fields required by its type. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| hypothesis | Yes | Hypothesis that was evaluated. | |
| tests | Yes | Per-test execution results in input order. | |
| verdict | Yes | High-level verdict for the hypothesis. |
Implementation Reference
- src/index.ts:1214-1277 (handler)The handler function for the test_hypothesis tool. Runs a multi-step verification plan: endpoint_exists (fetch URL, check 2xx), npm_count_above/npm_count_below (search npm, compare count to threshold), response_contains (fetch URL, check substring). Produces per-test results and an aggregate verdict (SUPPORTED/REFUTED/PARTIALLY SUPPORTED).
async ({ hypothesis, tests }) => { const results = []; for (const test of tests) { let passed: boolean | null = null; let actual: string | number | null = null; try { switch (test.type) { case "endpoint_exists": { const resp = await fetch(test.url, { headers: { "User-Agent": "GroundTruth/0.3" }, }); passed = resp.ok; actual = `status ${resp.status}`; break; } case "npm_count_above": case "npm_count_below": { const data = await searchNpm(sql, test.query, 1); const total = data.total ?? 0; actual = total; passed = test.type === "npm_count_above" ? total > test.threshold : total < test.threshold; break; } case "response_contains": { const { body } = await cachedFetch(sql, test.url); passed = body.includes(test.substring); actual = `${body.length} chars, contains=${passed}`; break; } } } catch (e: unknown) { passed = false; actual = e instanceof Error ? e.message : String(e); } results.push({ description: test.description, type: test.type, passed, actual, }); } const passedCount = results.filter(r => r.passed).length; logUsage("test_hypothesis", true); return structuredToolResult({ hypothesis, tests: results as { description: string; type: "endpoint_exists" | "npm_count_above" | "npm_count_below" | "response_contains"; passed: boolean; actual: string | number | null; }[], verdict: { passed: passedCount, failed: results.length - passedCount, summary: passedCount === results.length ? "SUPPORTED" : passedCount === 0 ? "REFUTED" : "PARTIALLY SUPPORTED", }, }); } );