Skip to main content
Glama

review_output

Identify errors in AI-generated content through independent adversarial review. Returns structured PASS/FAIL verdicts, quality scores, and categorized severity issues with evidence checklists.

Instructions

Adversarial quality review of any AI-generated output. An independent reviewer assumes the author made mistakes and actively looks for problems. Returns structured verdict (PASS/FAIL/CONDITIONAL_PASS), score (0-100), categorized issues with severity, and evidence-based checklist. Works for any output type: code, content, summaries, translations, data extraction, etc.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
outputYesThe AI-generated output to review (max 100K chars)
criteriaNoCustom review criteria — what specifically to check for
review_typeNoReview category label (e.g., "code", "content", "factual", "translation")
modelNoReviewer model ID (default: claude-sonnet-4-6)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full disclosure burden. It successfully explains the adversarial methodology ('actively looks for problems') and detailed return structure (verdict types, scoring range, categorized issues). Could improve by explicitly stating this is a read-only analysis operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four tightly constructed sentences with zero redundancy. Front-loaded with core purpose ('Adversarial quality review'), followed by methodology, return format specification, and applicability examples. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description comprehensively details return values (verdict, score, issues, checklist). With 100% parameter coverage and clear behavioral description, it provides sufficient context for a 4-parameter tool, though explicit safety declarations would make it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the structured documentation already defines all four parameters (output, criteria, review_type, model). The description adds context that 'Works for any output type' relates to the review_type parameter, but largely relies on the schema for parameter semantics, meeting the baseline for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as performing 'Adversarial quality review of any AI-generated output' with specific methodology (independent reviewer assumes mistakes). It effectively distinguishes from execute_service and list_services by scope, though it could explicitly contrast with sibling review_dual regarding single vs. dual reviewer approaches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides implied usage through examples ('Works for any output type: code, content, summaries...'), helping agents understand applicability. However, lacks explicit guidance on when to use this versus review_dual, or prerequisites like output length limits (despite 100K max in schema).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rih0z/agentdesk-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server