Skip to main content
Glama

Evaluate Output

evaluate_output

Assess agent output quality by applying configurable evaluation rules for completeness, relevance, safety, cost, or custom criteria.

Instructions

Evaluate agent output quality using configurable rules

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
outputYesThe output text to evaluate
eval_typeNoType of evaluationcompleteness
expectedNoExpected output for comparison
inputNoOriginal input for context
trace_idNoLink evaluation to a trace
custom_rulesNoCustom evaluation rules
cost_usdNoCost for cost evaluation
token_usageNoToken usage for cost evaluation

Implementation Reference

  • The handler function for the "evaluate_output" tool, which processes inputs, invokes the evaluation engine, stores the results, and returns the formatted response.
      async (args) => {
        const evalType = args.eval_type as EvalType;
    
        const result = evalEngine.evaluate(
          evalType,
          {
            output: args.output,
            expected: args.expected,
            input: args.input,
            costUsd: args.cost_usd,
            tokenUsage: args.token_usage,
          },
          args.custom_rules as CustomRuleDefinition[] | undefined,
        );
    
        if (args.trace_id) {
          result.trace_id = args.trace_id;
        }
    
        await storage.insertEvalResult(result);
    
        return {
          content: [
            {
              type: 'text' as const,
              text: JSON.stringify({
                id: result.id,
                score: result.score,
                passed: result.passed,
                rule_results: result.rule_results,
                suggestions: result.suggestions,
              }),
            },
          ],
        };
      },
    );
  • The input schema definition for the "evaluate_output" tool, detailing the required and optional parameters.
    const inputSchema = {
      output: z.string().describe('The output text to evaluate'),
      eval_type: z.enum(['completeness', 'relevance', 'safety', 'cost', 'custom']).default('completeness').describe('Type of evaluation'),
      expected: z.string().optional().describe('Expected output for comparison'),
      input: z.string().optional().describe('Original input for context'),
      trace_id: z.string().optional().describe('Link evaluation to a trace'),
      custom_rules: z.array(CustomRuleSchema).optional().describe('Custom evaluation rules'),
      cost_usd: z.number().optional().describe('Cost for cost evaluation'),
      token_usage: z.object({
        prompt_tokens: z.number().optional(),
        completion_tokens: z.number().optional(),
        total_tokens: z.number().optional(),
      }).optional().describe('Token usage for cost evaluation'),
    };
  • The registration function for the "evaluate_output" tool, linking the name, schema, and handler to the MCP server.
    export function registerEvaluateOutputTool(
      server: McpServer,
      storage: IStorageAdapter,
      evalEngine: EvalEngine,
    ): void {
      server.registerTool(
        'evaluate_output',
        {
          title: 'Evaluate Output',
          description: 'Evaluate agent output quality using configurable rules',
          inputSchema,
        },

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iris-eval/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server