Evaluate Output
evaluate_outputAssess agent output quality by applying configurable evaluation rules for completeness, relevance, safety, cost, or custom criteria.
Instructions
Evaluate agent output quality using configurable rules
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| output | Yes | The output text to evaluate | |
| eval_type | No | Type of evaluation | completeness |
| expected | No | Expected output for comparison | |
| input | No | Original input for context | |
| trace_id | No | Link evaluation to a trace | |
| custom_rules | No | Custom evaluation rules | |
| cost_usd | No | Cost for cost evaluation | |
| token_usage | No | Token usage for cost evaluation |
Implementation Reference
- src/tools/evaluate-output.ts:44-80 (handler)The handler function for the "evaluate_output" tool, which processes inputs, invokes the evaluation engine, stores the results, and returns the formatted response.
async (args) => { const evalType = args.eval_type as EvalType; const result = evalEngine.evaluate( evalType, { output: args.output, expected: args.expected, input: args.input, costUsd: args.cost_usd, tokenUsage: args.token_usage, }, args.custom_rules as CustomRuleDefinition[] | undefined, ); if (args.trace_id) { result.trace_id = args.trace_id; } await storage.insertEvalResult(result); return { content: [ { type: 'text' as const, text: JSON.stringify({ id: result.id, score: result.score, passed: result.passed, rule_results: result.rule_results, suggestions: result.suggestions, }), }, ], }; }, ); - src/tools/evaluate-output.ts:17-30 (schema)The input schema definition for the "evaluate_output" tool, detailing the required and optional parameters.
const inputSchema = { output: z.string().describe('The output text to evaluate'), eval_type: z.enum(['completeness', 'relevance', 'safety', 'cost', 'custom']).default('completeness').describe('Type of evaluation'), expected: z.string().optional().describe('Expected output for comparison'), input: z.string().optional().describe('Original input for context'), trace_id: z.string().optional().describe('Link evaluation to a trace'), custom_rules: z.array(CustomRuleSchema).optional().describe('Custom evaluation rules'), cost_usd: z.number().optional().describe('Cost for cost evaluation'), token_usage: z.object({ prompt_tokens: z.number().optional(), completion_tokens: z.number().optional(), total_tokens: z.number().optional(), }).optional().describe('Token usage for cost evaluation'), }; - src/tools/evaluate-output.ts:32-43 (registration)The registration function for the "evaluate_output" tool, linking the name, schema, and handler to the MCP server.
export function registerEvaluateOutputTool( server: McpServer, storage: IStorageAdapter, evalEngine: EvalEngine, ): void { server.registerTool( 'evaluate_output', { title: 'Evaluate Output', description: 'Evaluate agent output quality using configurable rules', inputSchema, },