xcomet_batch_evaluate
Batch evaluate translation quality by processing multiple source-translation pairs to generate aggregate statistics and individual error analysis.
Instructions
Evaluate multiple translation pairs in a batch.
This tool processes multiple source-translation pairs and provides aggregate statistics along with individual results.
Args:
pairs (array): Array of translation pairs, each with:
source (string): Original source text
translation (string): Translated text
reference (string, optional): Reference translation
source_lang (string, optional): Source language code
target_lang (string, optional): Target language code
response_format ('json' | 'markdown'): Output format (default: 'json')
Returns: { "average_score": number, "total_pairs": number, "results": [ { "index": number, "score": number, "error_count": number, "has_critical_errors": boolean } ], "summary": string }
Examples:
Evaluate entire translated document
Compare MT system quality across test set
Identify segments needing attention
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pairs | Yes | Array of translation pairs to evaluate | |
| source_lang | No | Source language code | |
| target_lang | No | Target language code | |
| response_format | No | Output format | json |
| use_gpu | No | Use GPU for inference (faster if available). Default: false (CPU only) | |
| batch_size | No | Batch size for GPU processing (1-64). Larger = faster but uses more memory. Default: 8 |