Skip to main content
Glama
collapseindex

CI-1T Prediction Stability Engine

evaluate

Send prediction scores (float 0-1 or Q0.16 int) to compute per-episode stability metrics including CI, drift, flip, collapse, and ghost detection.

Instructions

Evaluate prediction stability. Sends scores to the CI-1T engine and returns per-episode stability metrics. Accepts floats (0.0–1.0) or Q0.16 integers (0–65535) — auto-converts. Response: { episodes: [{ ci_out, ci_ema_out, al_out, warn, fault, ghost_confirmed, ghost_suspect_streak, ... }], credits_used, credits_remaining }. CI values are Q0.16 (0–65535; divide by 65535 for %). Classification: ≤0.15=Stable, ≤0.45=Drift, ≤0.70=Flip, >0.70=Collapse. Chain results → visualize (chart), alert_check (threshold alerts), compare_windows (drift detection), or interpret_scores (stats).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
scoresYesArray of prediction scores — floats (0.0–1.0) or Q0.16 integers (0–65535), auto-detected. Max 10,000.
nNoEpisode length (default: 3)

Implementation Reference

  • src/index.ts:462-482 (registration)
    The 'evaluate' tool is registered using server.tool() with name 'evaluate', a description, Zod schema for parameters (scores array and optional n), and an async handler function.
    server.tool(
      "evaluate",
      "Evaluate prediction stability. Sends scores to the CI-1T engine and returns per-episode stability metrics. Accepts floats (0.0–1.0) or Q0.16 integers (0–65535) — auto-converts. Response: { episodes: [{ ci_out, ci_ema_out, al_out, warn, fault, ghost_confirmed, ghost_suspect_streak, ... }], credits_used, credits_remaining }. CI values are Q0.16 (0–65535; divide by 65535 for %). Classification: ≤0.15=Stable, ≤0.45=Drift, ≤0.70=Flip, >0.70=Collapse. Chain results → visualize (chart), alert_check (threshold alerts), compare_windows (drift detection), or interpret_scores (stats).",
      {
        scores: z.array(z.number().min(0).max(65535)).min(1).max(10000).describe("Array of prediction scores — floats (0.0–1.0) or Q0.16 integers (0–65535), auto-detected. Max 10,000."),
        n: z.number().int().min(2).max(8).optional().describe("Episode length (default: 3)"),
      },
      async ({ scores, n }) => {
        const guard = requireApiKey();
        if (guard) return guard;
        const q16Scores = toQ16(scores);
        const body: Record<string, unknown> = { scores: q16Scores };
        if (n !== undefined) body.config = { n };
        const result = await apiFetch("/api/evaluate", {
          method: "POST",
          headers: apiKeyHeaders(),
          body,
        });
        return formatResult(result);
      }
    );
  • The handler for evaluate: checks API key via requireApiKey(), converts scores to Q0.16 via toQ16(), then calls POST /api/evaluate with the scores and optional config.n, returning the formatted result.
      async ({ scores, n }) => {
        const guard = requireApiKey();
        if (guard) return guard;
        const q16Scores = toQ16(scores);
        const body: Record<string, unknown> = { scores: q16Scores };
        if (n !== undefined) body.config = { n };
        const result = await apiFetch("/api/evaluate", {
          method: "POST",
          headers: apiKeyHeaders(),
          body,
        });
        return formatResult(result);
      }
    );
  • Input schema for evaluate: accepts 'scores' (array of numbers min 0 max 65535, length 1-10000) and optional 'n' (integer 2-8 for episode length).
    {
      scores: z.array(z.number().min(0).max(65535)).min(1).max(10000).describe("Array of prediction scores — floats (0.0–1.0) or Q0.16 integers (0–65535), auto-detected. Max 10,000."),
      n: z.number().int().min(2).max(8).optional().describe("Episode length (default: 3)"),
    },
  • The toQ16() helper function auto-detects whether input scores are floats (0.0-1.0 with decimals) and scales them to Q0.16 integers (0-65535), or clamps if already in Q0.16 range. Used by evaluate to normalize scores.
    function toQ16(scores: number[]): number[] {
      const hasDecimals = scores.some((s) => s % 1 !== 0);
      const allInUnit = scores.every((s) => s >= 0 && s <= 1);
      const isFloat = hasDecimals && allInUnit;
      return isFloat
        ? scores.map((s) => Math.round(Math.max(0, Math.min(1, s)) * Q16))
        : scores.map((s) => Math.round(Math.max(0, Math.min(Q16, s))));
    }
  • apiKeyHeaders() helper that creates HTTP headers with X-API-Key authentication. Used by the evaluate handler when calling the backend API.
    function apiKeyHeaders(extra?: Record<string, string>): Record<string, string> {
      const h: Record<string, string> = { "Content-Type": "application/json" };
      if (API_KEY) h["X-API-Key"] = API_KEY;
      return { ...h, ...extra };
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Despite no annotations, description details the engine call, auto-conversion, response structure, and classification, providing solid behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Dense but well-organized single paragraph; front-loads purpose and provides essential details without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description fully explains response structure and classification, plus chaining guidance, making it complete for evaluation context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds value beyond schema by explaining auto-conversion for scores and default value for n; high schema coverage is compensated.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb+resource: 'Evaluate prediction stability.' Specifies input, output, and chaining to sibling tools, distinguishing its role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explains input types, output structure, classification thresholds, and suggests chaining to specific sibling tools for further analysis.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/collapseindex/ci-1t-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server