xcomet_evaluate
Analyze translation quality by scoring text from 0-1, detecting errors with severity levels, and generating summaries to evaluate machine or human translations.
Instructions
Evaluate the quality of a translation using xCOMET model.
This tool analyzes a source text and its translation, providing:
A quality score between 0 and 1 (higher is better)
Detected error spans with severity levels (minor/major/critical)
A human-readable quality summary
Args:
source (string): Original source text to translate from
translation (string): Translated text to evaluate
reference (string, optional): Reference translation for comparison
source_lang (string, optional): Source language code (ISO 639-1)
target_lang (string, optional): Target language code (ISO 639-1)
response_format ('json' | 'markdown'): Output format (default: 'json')
Returns: For JSON format: { "score": number, // Quality score 0-1 "errors": [ // Detected errors { "text": string, "start": number, "end": number, "severity": "minor" | "major" | "critical" } ], "summary": string // Human-readable summary }
Examples:
Evaluate EN→JA translation quality
Check if MT output needs post-editing
Compare translation against reference
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| source | Yes | Original source text | |
| translation | Yes | Translated text to evaluate | |
| reference | No | Optional reference translation for comparison | |
| source_lang | No | Source language code (ISO 639-1, e.g., 'en', 'ja') | |
| target_lang | No | Target language code (ISO 639-1, e.g., 'en', 'ja') | |
| response_format | No | Output format: 'json' for structured data or 'markdown' for human-readable | json |
| use_gpu | No | Use GPU for inference (faster if available). Default: false (CPU only) |