review_output

Review any AI-generated output for errors using an independent adversarial checker. Get a PASS/FAIL/CONDITIONAL_PASS verdict, score, and categorized issues with severity. Works for code, content, summaries, translations, and more.

Instructions

Adversarial quality review of any AI-generated output. An independent reviewer assumes the author made mistakes and actively looks for problems. Returns structured verdict (PASS/FAIL/CONDITIONAL_PASS), score (0-100), categorized issues with severity, and evidence-based checklist. Works for any output type: code, content, summaries, translations, data extraction, etc.

Input Schema

TableJSON Schema

Name	Required	Description
`output`	Yes	The AI-generated output to review (max 100K chars)
`criteria`	No	Custom review criteria — what specifically to check for
`review_type`	No	Review category label (e.g., "code", "content", "factual", "translation")
`model`	No	Reviewer model ID (default: claude-sonnet-4-6)

Implementation Reference

src/review-engine.ts:46-71 (handler)

The actual handler that executes the review logic: calls Anthropic API with a review prompt, parses the result, validates checklist, and returns a ReviewResult.

export async function reviewOutput(options: ReviewOptions): Promise<ReviewResult> {
  const { output, criteria, reviewType, model } = options
  const client = getClient()

  const reviewPrompt = buildReviewPrompt(output, criteria, reviewType)

  const startTime = Date.now()
  const response = await client.messages.create({
    model: model || DEFAULT_MODEL,
    max_tokens: MAX_REVIEW_TOKENS,
    messages: [{ role: 'user', content: reviewPrompt }],
  })

  const rawText = response.content
    .filter((block): block is Anthropic.TextBlock => block.type === 'text')
    .map(block => block.text)
    .join('')

  const result = parseReviewResult(rawText)
  result.reviewer_model = model || DEFAULT_MODEL

  validateChecklist(result)

  console.error(`[REVIEW] Completed in ${Date.now() - startTime}ms — verdict: ${result.verdict}, score: ${result.score}`)
  return result
}

src/review-engine.ts:8-13 (schema)
Internal ReviewOptions interface defining input parameters: output, criteria, reviewType, and model.
```
interface ReviewOptions {
  output: string
  criteria?: string
  reviewType?: string
  model?: string
}
```

src/types.ts:1-21 (schema)

Type definitions for ReviewResult, ReviewIssue, and ChecklistItem — the return types of the review tool.

export interface ReviewResult {
  verdict: 'PASS' | 'FAIL' | 'CONDITIONAL_PASS'
  score: number
  issues: ReviewIssue[]
  checklist: ChecklistItem[]
  summary: string
  reviewer_model: string
}

export interface ReviewIssue {
  severity: 'critical' | 'high' | 'medium' | 'low'
  category: string
  description: string
  suggestion: string
}

export interface ChecklistItem {
  item: string
  status: 'pass' | 'fail'
  evidence: string
}

src/index.ts:44-64 (registration)

Registers the 'review_output' MCP tool with name, description, Zod input schema, and safeAsyncTool handler that delegates to reviewOutput().

server.tool(
  'review_output',
  'Adversarial quality review of any AI-generated output. An independent reviewer assumes the author made mistakes and actively looks for problems. Returns structured verdict (PASS/FAIL/CONDITIONAL_PASS), score (0-100), categorized issues with severity, and evidence-based checklist. Works for any output type: code, content, summaries, translations, data extraction, etc.',
  {
    output: z.string().max(100000).describe('The AI-generated output to review (max 100K chars)'),
    criteria: z.string().optional().describe('Custom review criteria — what specifically to check for'),
    review_type: z.string().optional().describe('Review category label (e.g., "code", "content", "factual", "translation")'),
    model: z.string().optional().describe('Reviewer model ID (default: claude-sonnet-4-6)'),
  },
  safeAsyncTool(async ({ output, criteria, review_type, model }) => {
    if (!process.env.ANTHROPIC_API_KEY) {
      throw new Error('ANTHROPIC_API_KEY environment variable is required. Set it in your MCP server config.')
    }
    return await reviewOutput({
      output,
      criteria: criteria || undefined,
      reviewType: review_type || undefined,
      model: model || undefined,
    })
  })
)

src/review-engine.ts:124-155 (helper)

buildReviewPrompt constructs the adversarial review prompt sent to the LLM. Also: extractJson, parseReviewResult, validateChecklist, sanitizeIssue, sanitizeChecklistItem — all helpers supporting reviewOutput.

export function buildReviewPrompt(output: string, criteria?: string, reviewType?: string): string {
  let prompt = `You are an independent, adversarial quality reviewer. Your job is to find problems.
Assume the author may have made mistakes, taken shortcuts, or missed edge cases.
Do NOT give the benefit of the doubt. Be thorough and critical.

IMPORTANT RULES:
1. Every checklist item MUST have specific evidence (a quote or concrete observation).
2. If you cannot find evidence for a PASS item, mark it as FAIL.
3. A single critical issue means the overall verdict MUST be FAIL.
4. Score must reflect the issues found: critical = max 30, high = max 60.
5. Do not be impressed by length or formatting — judge substance.

`

  if (criteria) {
    prompt += `REVIEW CRITERIA:\n${criteria}\n\n`
  }

  if (reviewType) {
    prompt += `REVIEW TYPE: ${reviewType}\n\n`
  }

  prompt += `OUTPUT TO REVIEW:
---
${output}
---

Respond in this exact JSON format (no other text):
${REVIEW_JSON_TEMPLATE}`

  return prompt
}

agentdesk-mcp

review_output

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API