Skip to main content
Glama

review_dual

Perform dual adversarial review to identify critical issues in AI outputs. Two independent reviewers assess from different angles; if either finds a critical issue, the merged verdict is FAIL. Use for high-stakes quality assurance.

Instructions

Dual adversarial review: two independent reviewers assess the output from different angles, then a merge agent combines their findings. Stricter than single review — if either reviewer finds a critical issue, the merged verdict is FAIL. Use for high-stakes outputs where quality is critical.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
outputYesThe AI-generated output to review (max 100K chars)
criteriaNoCustom review criteria
review_typeNoReview category label
modelNoReviewer model ID (default: claude-sonnet-4-6)

Implementation Reference

  • Core handler for the review_dual tool. Runs two independent adversarial reviews (reviewOutput) in parallel, then a merge agent combines their findings into a single final verdict. If either reviewer found a critical issue, merged verdict is FAIL.
    export async function reviewOutputDual(options: ReviewOptions): Promise<ReviewResult> {
      const { output, criteria, model } = options
      const client = getClient()
    
      const [reviewA, reviewB] = await Promise.all([
        reviewOutput(options),
        reviewOutput({
          ...options,
          criteria: (criteria || '') + '\n\nAdditional focus: Look for edge cases, security issues, and unstated assumptions.',
        }),
      ])
    
      const mergePrompt = `You are a senior reviewer merging two independent reviews.
    
    Review A:
    - Verdict: ${reviewA.verdict} (Score: ${reviewA.score})
    - Issues: ${JSON.stringify(reviewA.issues)}
    - Summary: ${reviewA.summary}
    
    Review B:
    - Verdict: ${reviewB.verdict} (Score: ${reviewB.score})
    - Issues: ${JSON.stringify(reviewB.issues)}
    - Summary: ${reviewB.summary}
    
    Produce a MERGED review. If either reviewer found a critical issue, the merged verdict must be FAIL.
    Take the LOWER score. Combine all unique issues. Deduplicate.
    
    Respond in this exact JSON format:
    ${REVIEW_JSON_TEMPLATE}`
    
      const mergeResponse = await client.messages.create({
        model: model || DEFAULT_MODEL,
        max_tokens: MAX_REVIEW_TOKENS,
        messages: [{ role: 'user', content: mergePrompt }],
      })
    
      const mergeText = mergeResponse.content
        .filter((block): block is Anthropic.TextBlock => block.type === 'text')
        .map(block => block.text)
        .join('')
    
      const merged = parseReviewResult(mergeText)
      merged.reviewer_model = `dual:${model || DEFAULT_MODEL}`
      return merged
    }
  • src/index.ts:66-86 (registration)
    Registration of the 'review_dual' MCP tool with Zod schema (output, criteria, review_type, model) and handler that calls reviewOutputDual.
    server.tool(
      'review_dual',
      'Dual adversarial review: two independent reviewers assess the output from different angles, then a merge agent combines their findings. Stricter than single review — if either reviewer finds a critical issue, the merged verdict is FAIL. Use for high-stakes outputs where quality is critical.',
      {
        output: z.string().max(100000).describe('The AI-generated output to review (max 100K chars)'),
        criteria: z.string().optional().describe('Custom review criteria'),
        review_type: z.string().optional().describe('Review category label'),
        model: z.string().optional().describe('Reviewer model ID (default: claude-sonnet-4-6)'),
      },
      safeAsyncTool(async ({ output, criteria, review_type, model }) => {
        if (!process.env.ANTHROPIC_API_KEY) {
          throw new Error('ANTHROPIC_API_KEY environment variable is required. Set it in your MCP server config.')
        }
        return await reviewOutputDual({
          output,
          criteria: criteria || undefined,
          reviewType: review_type || undefined,
          model: model || undefined,
        })
      })
    )
  • Type definitions for ReviewResult, ReviewIssue, and ChecklistItem used by the dual review flow.
    export interface ReviewResult {
      verdict: 'PASS' | 'FAIL' | 'CONDITIONAL_PASS'
      score: number
      issues: ReviewIssue[]
      checklist: ChecklistItem[]
      summary: string
      reviewer_model: string
    }
  • buildReviewPrompt constructs the adversarial review prompt used by both single and dual review reviewers.
    export function buildReviewPrompt(output: string, criteria?: string, reviewType?: string): string {
      let prompt = `You are an independent, adversarial quality reviewer. Your job is to find problems.
    Assume the author may have made mistakes, taken shortcuts, or missed edge cases.
    Do NOT give the benefit of the doubt. Be thorough and critical.
    
    IMPORTANT RULES:
    1. Every checklist item MUST have specific evidence (a quote or concrete observation).
    2. If you cannot find evidence for a PASS item, mark it as FAIL.
    3. A single critical issue means the overall verdict MUST be FAIL.
    4. Score must reflect the issues found: critical = max 30, high = max 60.
    5. Do not be impressed by length or formatting — judge substance.
    
    `
    
      if (criteria) {
        prompt += `REVIEW CRITERIA:\n${criteria}\n\n`
      }
    
      if (reviewType) {
        prompt += `REVIEW TYPE: ${reviewType}\n\n`
      }
    
      prompt += `OUTPUT TO REVIEW:
    ---
    ${output}
    ---
    
    Respond in this exact JSON format (no other text):
    ${REVIEW_JSON_TEMPLATE}`
    
      return prompt
    }
  • validateChecklist validates checklist evidence and can downgrade verdict if too many PASS items lack evidence.
    export function validateChecklist(result: ReviewResult): void {
      let downgraded = false
      for (const item of result.checklist) {
        if (item.status === 'pass' && (!item.evidence || item.evidence.trim().length < 5)) {
          item.status = 'fail'
          downgraded = true
        }
      }
    
      if (downgraded) {
        const failCount = result.checklist.filter(i => i.status === 'fail').length
        const total = result.checklist.length
        if (total > 0 && failCount / total > 0.3) {
          result.verdict = 'FAIL'
          result.score = Math.min(result.score, 50)
          result.issues.push({
            severity: 'high',
            category: 'anti_gaming',
            description: 'Multiple checklist items marked PASS without evidence were downgraded to FAIL',
            suggestion: 'Provide specific evidence for each checklist assertion',
          })
        }
      }
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must cover behavior. It explains the dual review process and the merge verdict logic, but omits details like side effects, authentication needs, or output structure, leaving gaps for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first explains the process, the second provides usage guidance and the verdict rule. No extraneous words, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no output schema, and no annotations, the description adequately explains the core functionality but does not cover output format, error handling, or prerequisites, leaving room for improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all parameters. The description adds no additional meaning beyond what is in the schema, so it does not improve understanding of parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool performs a dual adversarial review with two independent reviewers and a merge agent. It clearly distinguishes itself from the sibling tool 'review_output' by being stricter and specifying the verdict rule.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises using this tool for high-stakes outputs and contrasts it with single review, but does not explicitly state when not to use it or list alternatives beyond the implied single review.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rih0z/agentdesk-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server