review_dual

Perform dual adversarial review to identify critical issues in AI outputs. Two independent reviewers assess from different angles; if either finds a critical issue, the merged verdict is FAIL. Use for high-stakes quality assurance.

Instructions

Dual adversarial review: two independent reviewers assess the output from different angles, then a merge agent combines their findings. Stricter than single review — if either reviewer finds a critical issue, the merged verdict is FAIL. Use for high-stakes outputs where quality is critical.

Input Schema

TableJSON Schema

Name	Required	Description
`output`	Yes	The AI-generated output to review (max 100K chars)
`criteria`	No	Custom review criteria
`review_type`	No	Review category label
`model`	No	Reviewer model ID (default: claude-sonnet-4-6)

Implementation Reference

src/review-engine.ts:76-120 (handler)

Core handler for the review_dual tool. Runs two independent adversarial reviews (reviewOutput) in parallel, then a merge agent combines their findings into a single final verdict. If either reviewer found a critical issue, merged verdict is FAIL.

export async function reviewOutputDual(options: ReviewOptions): Promise<ReviewResult> {
  const { output, criteria, model } = options
  const client = getClient()

  const [reviewA, reviewB] = await Promise.all([
    reviewOutput(options),
    reviewOutput({
      ...options,
      criteria: (criteria || '') + '\n\nAdditional focus: Look for edge cases, security issues, and unstated assumptions.',
    }),
  ])

  const mergePrompt = `You are a senior reviewer merging two independent reviews.

Review A:
- Verdict: ${reviewA.verdict} (Score: ${reviewA.score})
- Issues: ${JSON.stringify(reviewA.issues)}
- Summary: ${reviewA.summary}

Review B:
- Verdict: ${reviewB.verdict} (Score: ${reviewB.score})
- Issues: ${JSON.stringify(reviewB.issues)}
- Summary: ${reviewB.summary}

Produce a MERGED review. If either reviewer found a critical issue, the merged verdict must be FAIL.
Take the LOWER score. Combine all unique issues. Deduplicate.

Respond in this exact JSON format:
${REVIEW_JSON_TEMPLATE}`

  const mergeResponse = await client.messages.create({
    model: model || DEFAULT_MODEL,
    max_tokens: MAX_REVIEW_TOKENS,
    messages: [{ role: 'user', content: mergePrompt }],
  })

  const mergeText = mergeResponse.content
    .filter((block): block is Anthropic.TextBlock => block.type === 'text')
    .map(block => block.text)
    .join('')

  const merged = parseReviewResult(mergeText)
  merged.reviewer_model = `dual:${model || DEFAULT_MODEL}`
  return merged
}

src/index.ts:66-86 (registration)

Registration of the 'review_dual' MCP tool with Zod schema (output, criteria, review_type, model) and handler that calls reviewOutputDual.

server.tool(
  'review_dual',
  'Dual adversarial review: two independent reviewers assess the output from different angles, then a merge agent combines their findings. Stricter than single review — if either reviewer finds a critical issue, the merged verdict is FAIL. Use for high-stakes outputs where quality is critical.',
  {
    output: z.string().max(100000).describe('The AI-generated output to review (max 100K chars)'),
    criteria: z.string().optional().describe('Custom review criteria'),
    review_type: z.string().optional().describe('Review category label'),
    model: z.string().optional().describe('Reviewer model ID (default: claude-sonnet-4-6)'),
  },
  safeAsyncTool(async ({ output, criteria, review_type, model }) => {
    if (!process.env.ANTHROPIC_API_KEY) {
      throw new Error('ANTHROPIC_API_KEY environment variable is required. Set it in your MCP server config.')
    }
    return await reviewOutputDual({
      output,
      criteria: criteria || undefined,
      reviewType: review_type || undefined,
      model: model || undefined,
    })
  })
)

src/types.ts:1-8 (schema)

Type definitions for ReviewResult, ReviewIssue, and ChecklistItem used by the dual review flow.

export interface ReviewResult {
  verdict: 'PASS' | 'FAIL' | 'CONDITIONAL_PASS'
  score: number
  issues: ReviewIssue[]
  checklist: ChecklistItem[]
  summary: string
  reviewer_model: string
}

src/review-engine.ts:124-155 (helper)

buildReviewPrompt constructs the adversarial review prompt used by both single and dual review reviewers.

export function buildReviewPrompt(output: string, criteria?: string, reviewType?: string): string {
  let prompt = `You are an independent, adversarial quality reviewer. Your job is to find problems.
Assume the author may have made mistakes, taken shortcuts, or missed edge cases.
Do NOT give the benefit of the doubt. Be thorough and critical.

IMPORTANT RULES:
1. Every checklist item MUST have specific evidence (a quote or concrete observation).
2. If you cannot find evidence for a PASS item, mark it as FAIL.
3. A single critical issue means the overall verdict MUST be FAIL.
4. Score must reflect the issues found: critical = max 30, high = max 60.
5. Do not be impressed by length or formatting — judge substance.

`

  if (criteria) {
    prompt += `REVIEW CRITERIA:\n${criteria}\n\n`
  }

  if (reviewType) {
    prompt += `REVIEW TYPE: ${reviewType}\n\n`
  }

  prompt += `OUTPUT TO REVIEW:
---
${output}
---

Respond in this exact JSON format (no other text):
${REVIEW_JSON_TEMPLATE}`

  return prompt
}

src/review-engine.ts:288-311 (helper)

validateChecklist validates checklist evidence and can downgrade verdict if too many PASS items lack evidence.

export function validateChecklist(result: ReviewResult): void {
  let downgraded = false
  for (const item of result.checklist) {
    if (item.status === 'pass' && (!item.evidence || item.evidence.trim().length < 5)) {
      item.status = 'fail'
      downgraded = true
    }
  }

  if (downgraded) {
    const failCount = result.checklist.filter(i => i.status === 'fail').length
    const total = result.checklist.length
    if (total > 0 && failCount / total > 0.3) {
      result.verdict = 'FAIL'
      result.score = Math.min(result.score, 50)
      result.issues.push({
        severity: 'high',
        category: 'anti_gaming',
        description: 'Multiple checklist items marked PASS without evidence were downgraded to FAIL',
        suggestion: 'Provide specific evidence for each checklist assertion',
      })
    }
  }
}

agentdesk-mcp

review_dual

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API