Skip to main content
Glama
metrxbots

Metrx MCP Server

by metrxbots

metrx_create_model_experiment

Launch an A/B test to compare LLM models for an agent, routing traffic between models while tracking performance metrics until statistical significance is reached.

Instructions

Start an A/B test comparing two LLM models for a specific agent. Routes a percentage of traffic to the treatment model and tracks cost, latency, error rate, and quality metrics. The experiment runs until statistical significance is reached or the max duration expires. Do NOT use for one-off model comparisons — use compare_models for static pricing data.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
agent_idYesAgent to run the experiment on
nameYesHuman-readable experiment name
treatment_modelYesThe candidate model to test (e.g., "gpt-4o-mini", "claude-haiku-4-20250414")
traffic_pctNoPercentage of traffic to route to the treatment model (default: 10%)
primary_metricNoThe primary metric to optimize for (default: cost_per_call)cost_per_call
max_duration_daysNoMaximum experiment duration in days (default: 14)
auto_promoteNoAutomatically apply the winning model when the experiment concludes

Implementation Reference

  • The async handler function that executes the create_model_experiment tool. It extracts parameters, builds the request body, calls the API client to create an experiment, and formats the response.
    async ({
      agent_id,
      name,
      treatment_model,
      traffic_pct,
      primary_metric,
      max_duration_days,
      auto_promote,
    }) => {
      const body: Record<string, unknown> = {
        agent_id,
        name,
        treatment_model,
        traffic_pct: traffic_pct ?? 10,
        primary_metric: primary_metric ?? 'cost_per_call',
        max_duration_days: max_duration_days ?? 14,
        auto_promote: auto_promote ?? false,
      };
    
      const result = await client.post<ModelRoutingExperiment>('/experiments', body);
    
      if (result.error) {
        return {
          content: [{ type: 'text', text: `Error creating experiment: ${result.error}` }],
          isError: true,
        };
      }
    
      const exp = result.data!;
      const text = [
        `✅ Experiment "${exp.name}" created.`,
        '',
        formatExperiment(exp),
        '',
        'The experiment will start routing traffic immediately. Use get_experiment_results to check progress.',
      ].join('\n');
    
      return {
        content: [{ type: 'text', text }],
      };
    }
  • Zod input schema defining all parameters for the create_model_experiment tool including agent_id, name, treatment_model, traffic_pct, primary_metric, max_duration_days, and auto_promote with validation rules and defaults.
    inputSchema: {
      agent_id: z.string().uuid().describe('Agent to run the experiment on'),
      name: z.string().min(1).max(100).describe('Human-readable experiment name'),
      treatment_model: z
        .string()
        .describe('The candidate model to test (e.g., "gpt-4o-mini", "claude-haiku-4-20250414")'),
      traffic_pct: z
        .number()
        .int()
        .min(1)
        .max(50)
        .default(10)
        .describe('Percentage of traffic to route to the treatment model (default: 10%)'),
      primary_metric: z
        .enum(['cost_per_call', 'latency_p50', 'latency_p95', 'error_rate', 'quality_score'])
        .default('cost_per_call')
        .describe('The primary metric to optimize for (default: cost_per_call)'),
      max_duration_days: z
        .number()
        .int()
        .min(1)
        .max(30)
        .default(14)
        .describe('Maximum experiment duration in days (default: 14)'),
      auto_promote: z
        .boolean()
        .default(false)
        .describe('Automatically apply the winning model when the experiment concludes'),
    },
  • Complete tool registration via server.registerTool including metadata (title, description), input schema, annotations, and the handler function for create_model_experiment.
    server.registerTool(
      'create_model_experiment',
      {
        title: 'Create Model Experiment',
        description:
          'Start an A/B test comparing two LLM models for a specific agent. ' +
          'Routes a percentage of traffic to the treatment model and tracks ' +
          'cost, latency, error rate, and quality metrics. The experiment runs ' +
          'until statistical significance is reached or the max duration expires. ' +
          'Do NOT use for one-off model comparisons — use compare_models for static pricing data.',
        inputSchema: {
          agent_id: z.string().uuid().describe('Agent to run the experiment on'),
          name: z.string().min(1).max(100).describe('Human-readable experiment name'),
          treatment_model: z
            .string()
            .describe('The candidate model to test (e.g., "gpt-4o-mini", "claude-haiku-4-20250414")'),
          traffic_pct: z
            .number()
            .int()
            .min(1)
            .max(50)
            .default(10)
            .describe('Percentage of traffic to route to the treatment model (default: 10%)'),
          primary_metric: z
            .enum(['cost_per_call', 'latency_p50', 'latency_p95', 'error_rate', 'quality_score'])
            .default('cost_per_call')
            .describe('The primary metric to optimize for (default: cost_per_call)'),
          max_duration_days: z
            .number()
            .int()
            .min(1)
            .max(30)
            .default(14)
            .describe('Maximum experiment duration in days (default: 14)'),
          auto_promote: z
            .boolean()
            .default(false)
            .describe('Automatically apply the winning model when the experiment concludes'),
        },
        annotations: {
          readOnlyHint: false,
          destructiveHint: false,
          idempotentHint: false,
          openWorldHint: false,
        },
      },
      async ({
        agent_id,
        name,
        treatment_model,
        traffic_pct,
        primary_metric,
        max_duration_days,
        auto_promote,
      }) => {
        const body: Record<string, unknown> = {
          agent_id,
          name,
          treatment_model,
          traffic_pct: traffic_pct ?? 10,
          primary_metric: primary_metric ?? 'cost_per_call',
          max_duration_days: max_duration_days ?? 14,
          auto_promote: auto_promote ?? false,
        };
    
        const result = await client.post<ModelRoutingExperiment>('/experiments', body);
    
        if (result.error) {
          return {
            content: [{ type: 'text', text: `Error creating experiment: ${result.error}` }],
            isError: true,
          };
        }
    
        const exp = result.data!;
        const text = [
          `✅ Experiment "${exp.name}" created.`,
          '',
          formatExperiment(exp),
          '',
          'The experiment will start routing traffic immediately. Use get_experiment_results to check progress.',
        ].join('\n');
    
        return {
          content: [{ type: 'text', text }],
        };
      }
    );
  • TypeScript interface defining the ModelRoutingExperiment data structure returned by the API, including experiment metadata, model configurations, traffic settings, and results tracking fields.
    export interface ModelRoutingExperiment {
      id: string;
      name: string;
      agent_id: string;
      control_model: string;
      treatment_model: string;
      traffic_pct: number;
      status: string;
      primary_metric: string;
      control_samples: number;
      treatment_samples: number;
      is_significant: boolean;
      winner?: string;
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/metrxbots/metrx-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server