Metrx MCP Server

Get Failure Predictions

metrx_get_failure_predictions

Read-onlyIdempotent

Predict potential agent failures like error rate breaches, latency degradation, and budget exhaustion. Provides confidence levels and recommended actions to prevent issues before they occur.

Instructions

Get predictive failure analysis for your agents. Shows upcoming risk of error rate breaches, latency degradation, cost overruns, rate limit risks, and budget exhaustion. Each prediction includes confidence level and recommended actions. Do NOT use for current/past failures — use get_alerts for active issues.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`agent_id`	No	Filter predictions for a specific agent
`severity`	No	Filter by prediction severity
`status`	No	Filter by prediction status (default: active)	active

Implementation Reference

src/tools/alerts.ts:120-172 (handler)

Tool registration and handler for get_failure_predictions. Defines input schema (agent_id, severity, status) and the async handler that calls the API client to fetch predictions from '/predictions' endpoint and formats them using formatPredictions().

// ── get_failure_predictions ──
server.registerTool(
  'get_failure_predictions',
  {
    title: 'Get Failure Predictions',
    description:
      'Get predictive failure analysis for your agents. ' +
      'Shows upcoming risk of error rate breaches, latency degradation, ' +
      'cost overruns, rate limit risks, and budget exhaustion. ' +
      'Each prediction includes confidence level and recommended actions. ' +
      'Do NOT use for current/past failures — use get_alerts for active issues.',
    inputSchema: {
      agent_id: z.string().uuid().optional().describe('Filter predictions for a specific agent'),
      severity: z
        .enum(['info', 'warning', 'critical'])
        .optional()
        .describe('Filter by prediction severity'),
      status: z
        .enum(['active', 'acknowledged', 'resolved'])
        .default('active')
        .describe('Filter by prediction status (default: active)'),
    },
    annotations: {
      readOnlyHint: true,
      destructiveHint: false,
      idempotentHint: true,
      openWorldHint: false,
    },
  },
  async ({ agent_id, severity, status }) => {
    const params: Record<string, string> = {
      status: status ?? 'active',
    };
    if (agent_id) params.agent_id = agent_id;
    if (severity) params.severity = severity;

    const result = await client.get<{ predictions: FailurePrediction[] }>('/predictions', params);

    if (result.error) {
      return {
        content: [{ type: 'text', text: `Error fetching predictions: ${result.error}` }],
        isError: true,
      };
    }

    const predictions = result.data?.predictions || [];
    const text = formatPredictions(predictions);

    return {
      content: [{ type: 'text', text }],
    };
  }
);

src/server-factory.ts:42-80 (registration)

Server factory that wraps all tool registrations with rate limiting and adds the 'metrx_' prefix. The get_failure_predictions tool is registered via registerAlertTools() call at line 74, resulting in final name 'metrx_get_failure_predictions'.

// Add rate limiting middleware + metrx_ namespace prefix
const METRX_PREFIX = 'metrx_';
const originalRegisterTool = server.registerTool.bind(server);
(server as any).registerTool = function (
  name: string,
  config: any,
  handler: (...handlerArgs: any[]) => Promise<any>
) {
  const wrappedHandler = async (...handlerArgs: any[]) => {
    if (!rateLimiter.isAllowed(name)) {
      return {
        content: [
          {
            type: 'text' as const,
            text: `Rate limit exceeded for tool '${name}'. Maximum 60 requests per minute allowed.`,
          },
        ],
        isError: true,
      };
    }
    return handler(...handlerArgs);
  };

  // Register with metrx_ prefix (primary name only — no deprecated aliases)
  const prefixedName = name.startsWith(METRX_PREFIX) ? name : `${METRX_PREFIX}${name}`;
  originalRegisterTool(prefixedName, config, wrappedHandler);
};

// Register all tool domains
registerDashboardTools(server, apiClient);
registerOptimizationTools(server, apiClient);
registerBudgetTools(server, apiClient);
registerAlertTools(server, apiClient);
registerExperimentTools(server, apiClient);
registerCostLeakDetectorTools(server, apiClient);
registerAttributionTools(server, apiClient);
registerUpgradeJustificationTools(server, apiClient);
registerAlertConfigTools(server, apiClient);
registerROIAuditTools(server, apiClient);

src/services/formatters.ts:248-275 (helper)

formatPredictions() helper function that formats FailurePrediction array into human-readable text output with severity icons, confidence levels, current/threshold values, predicted breach times, and recommended actions.

/** Format failure predictions */
export function formatPredictions(predictions: FailurePrediction[]): string {
  if (predictions.length === 0) {
    return 'No active failure predictions. All agents are healthy.';
  }

  const lines: string[] = [`## Failure Predictions (${predictions.length})`, ''];

  for (const p of predictions) {
    const icon = p.severity === 'critical' ? '🔴' : p.severity === 'warning' ? '🟡' : 'ℹ️';
    lines.push(
      `${icon} **${p.prediction_type}** — ${p.severity} (${formatPct(p.confidence)} confidence)`
    );
    lines.push(`  Agent: ${p.agent_id}`);
    lines.push(
      `  Current: ${p.current_value.toFixed(2)} → Threshold: ${p.threshold_value.toFixed(2)} (${
        p.trend_direction
      })`
    );
    if (p.predicted_breach_at) {
      lines.push(`  Predicted breach: ${p.predicted_breach_at}`);
    }
    if (p.recommended_actions.length > 0) {
      lines.push('  Recommended actions:');
      for (const action of p.recommended_actions) {
        lines.push(`    - ${action.action}: ${action.description}`);
      }
    }

src/types.ts:184-205 (schema)

FailurePrediction interface type definition that defines the structure of prediction data including id, agent_id, prediction_type, severity, confidence, predicted_breach_at, current_value, threshold_value, trend_direction, status, and recommended_actions array.

// ── Failure Prediction Types ──

export interface FailurePrediction {
  id: string;
  agent_id: string;
  prediction_type: string;
  severity: string;
  confidence: number;
  predicted_breach_at?: string;
  current_value: number;
  threshold_value: number;
  trend_direction: string;
  status: string;
  recommended_actions: Array<{
    action: string;
    impact: string;
    description: string;
  }>;
}

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate this is a read-only, non-destructive, idempotent operation with a closed world. The description adds valuable behavioral context beyond annotations by specifying the scope ('upcoming risk'), content of predictions (confidence level, recommended actions), and the exclusion of current/past failures. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first states the purpose and details, the second provides critical usage guidance. Every sentence adds value with no wasted words, and key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (predictive analysis with filtering), rich annotations (covering safety and behavior), and 100% schema coverage, the description is largely complete. It explains what the tool returns (predictions with confidence and actions) and usage boundaries. The main gap is the lack of an output schema, but the description compensates by detailing return content.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters (agent_id, severity, status). The description does not add any parameter-specific information beyond what's in the schema, such as explaining how filtering works or parameter interactions. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get predictive failure analysis') and resources ('for your agents'), listing the types of predictions (error rate breaches, latency degradation, etc.). It distinguishes from sibling tools by explicitly contrasting with 'get_alerts' for current/past failures.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('for upcoming risk') and when not to use it ('Do NOT use for current/past failures'), with a clear alternative named ('use get_alerts for active issues'). This directly addresses sibling tool differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security
Open Source Has a Bot Problem
By punkpeye on March 19, 2026.
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/metrxbots/metrx-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server