evaluate
Send prediction scores (float 0-1 or Q0.16 int) to compute per-episode stability metrics including CI, drift, flip, collapse, and ghost detection.
Instructions
Evaluate prediction stability. Sends scores to the CI-1T engine and returns per-episode stability metrics. Accepts floats (0.0–1.0) or Q0.16 integers (0–65535) — auto-converts. Response: { episodes: [{ ci_out, ci_ema_out, al_out, warn, fault, ghost_confirmed, ghost_suspect_streak, ... }], credits_used, credits_remaining }. CI values are Q0.16 (0–65535; divide by 65535 for %). Classification: ≤0.15=Stable, ≤0.45=Drift, ≤0.70=Flip, >0.70=Collapse. Chain results → visualize (chart), alert_check (threshold alerts), compare_windows (drift detection), or interpret_scores (stats).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| scores | Yes | Array of prediction scores — floats (0.0–1.0) or Q0.16 integers (0–65535), auto-detected. Max 10,000. | |
| n | No | Episode length (default: 3) |
Implementation Reference
- src/index.ts:462-482 (registration)The 'evaluate' tool is registered using server.tool() with name 'evaluate', a description, Zod schema for parameters (scores array and optional n), and an async handler function.
server.tool( "evaluate", "Evaluate prediction stability. Sends scores to the CI-1T engine and returns per-episode stability metrics. Accepts floats (0.0–1.0) or Q0.16 integers (0–65535) — auto-converts. Response: { episodes: [{ ci_out, ci_ema_out, al_out, warn, fault, ghost_confirmed, ghost_suspect_streak, ... }], credits_used, credits_remaining }. CI values are Q0.16 (0–65535; divide by 65535 for %). Classification: ≤0.15=Stable, ≤0.45=Drift, ≤0.70=Flip, >0.70=Collapse. Chain results → visualize (chart), alert_check (threshold alerts), compare_windows (drift detection), or interpret_scores (stats).", { scores: z.array(z.number().min(0).max(65535)).min(1).max(10000).describe("Array of prediction scores — floats (0.0–1.0) or Q0.16 integers (0–65535), auto-detected. Max 10,000."), n: z.number().int().min(2).max(8).optional().describe("Episode length (default: 3)"), }, async ({ scores, n }) => { const guard = requireApiKey(); if (guard) return guard; const q16Scores = toQ16(scores); const body: Record<string, unknown> = { scores: q16Scores }; if (n !== undefined) body.config = { n }; const result = await apiFetch("/api/evaluate", { method: "POST", headers: apiKeyHeaders(), body, }); return formatResult(result); } ); - src/index.ts:469-482 (handler)The handler for evaluate: checks API key via requireApiKey(), converts scores to Q0.16 via toQ16(), then calls POST /api/evaluate with the scores and optional config.n, returning the formatted result.
async ({ scores, n }) => { const guard = requireApiKey(); if (guard) return guard; const q16Scores = toQ16(scores); const body: Record<string, unknown> = { scores: q16Scores }; if (n !== undefined) body.config = { n }; const result = await apiFetch("/api/evaluate", { method: "POST", headers: apiKeyHeaders(), body, }); return formatResult(result); } ); - src/index.ts:465-468 (schema)Input schema for evaluate: accepts 'scores' (array of numbers min 0 max 65535, length 1-10000) and optional 'n' (integer 2-8 for episode length).
{ scores: z.array(z.number().min(0).max(65535)).min(1).max(10000).describe("Array of prediction scores — floats (0.0–1.0) or Q0.16 integers (0–65535), auto-detected. Max 10,000."), n: z.number().int().min(2).max(8).optional().describe("Episode length (default: 3)"), }, - src/index.ts:297-304 (helper)The toQ16() helper function auto-detects whether input scores are floats (0.0-1.0 with decimals) and scales them to Q0.16 integers (0-65535), or clamps if already in Q0.16 range. Used by evaluate to normalize scores.
function toQ16(scores: number[]): number[] { const hasDecimals = scores.some((s) => s % 1 !== 0); const allInUnit = scores.every((s) => s >= 0 && s <= 1); const isFloat = hasDecimals && allInUnit; return isFloat ? scores.map((s) => Math.round(Math.max(0, Math.min(1, s)) * Q16)) : scores.map((s) => Math.round(Math.max(0, Math.min(Q16, s)))); } - src/index.ts:34-38 (helper)apiKeyHeaders() helper that creates HTTP headers with X-API-Key authentication. Used by the evaluate handler when calling the backend API.
function apiKeyHeaders(extra?: Record<string, string>): Record<string, string> { const h: Record<string, string> = { "Content-Type": "application/json" }; if (API_KEY) h["X-API-Key"] = API_KEY; return { ...h, ...extra }; }