xcomet-mcp-server
The xCOMET MCP Server enables AI agents to evaluate machine translation quality using the xCOMET model, providing three core tools:
xcomet_evaluate: Score a source-translation pair on a 0–1 quality scale, detect error spans with severity levels (minor/major/critical), and receive a human-readable summary. Supports optional reference translations, ISO 639-1 language codes, and JSON or Markdown output.xcomet_detect_errors: Focus specifically on identifying and categorizing translation errors, returning total error counts, breakdowns by severity, error positions, and optional correction suggestions with configurable minimum severity filtering.xcomet_batch_evaluate: Evaluate up to 500 translation pairs in a single request, receiving per-pair scores, error counts, critical error flags, an average score, and an overall quality summary with configurable batch size (1–64).
Additional capabilities:
GPU acceleration for faster inference when compatible hardware is available
Persistent model loading (fast ~500ms subsequent calls vs. 25–90s initial load)
Configurable models (XL, XXL, wmt22-comet-da) to balance quality and performance
Seamless integration with other MCP servers (e.g., DeepL) for end-to-end translation workflows
Flexible Python environment support with auto-detection or explicit path configuration
Integrates into translation workflows alongside services like DeepL to evaluate machine translation quality, providing quality scores and detecting error spans with severity levels to determine if results require manual intervention.
xCOMET MCP Server
⚠️ This is an unofficial community project, not affiliated with Unbabel.
Translation quality evaluation MCP Server powered by xCOMET (eXplainable COMET).
🎯 Overview
xCOMET MCP Server provides AI agents with the ability to evaluate machine translation quality. It integrates with the xCOMET model from Unbabel to provide:
Quality Scoring: Scores between 0-1 indicating translation quality
Error Detection: Identifies error spans with severity levels (minor/major/critical)
Batch Processing: Evaluate multiple translation pairs efficiently (optimized single model load)
GPU Support: Optional GPU acceleration for faster inference
graph LR
A[AI Agent] --> B[Node.js MCP Server]
B --> C[Python FastAPI Server]
C --> D[xCOMET Model<br/>Persistent in Memory]
D --> C
C --> B
B --> A
style D fill:#9f9🔧 Prerequisites
Python Environment
Python 3.9 - 3.12 recommended (3.13+ is not yet supported by xCOMET dependencies)
xCOMET requires Python with several packages. We recommend using a virtual environment:
# If using uv (recommended - auto-downloads the correct Python version)
uv venv ~/.xcomet-venv --python 3.12
source ~/.xcomet-venv/bin/activate
uv pip install "unbabel-comet>=2.2.0" "fastapi>=0.100.0" "uvicorn>=0.23.0" "pydantic>=2.0.0"
# Or using standard venv (requires Python 3.9-3.12 already installed)
python3 -m venv ~/.xcomet-venv
source ~/.xcomet-venv/bin/activate # Windows: ~/.xcomet-venv\Scripts\activate
pip install "unbabel-comet>=2.2.0" "fastapi>=0.100.0" "uvicorn>=0.23.0" "pydantic>=2.0.0"Note: When using with Claude Desktop or other MCP hosts, set
XCOMET_PYTHON_PATHto point to the venv Python (see Configuration).
Model Download
Important: XCOMET-XL and XCOMET-XXL are gated models on HuggingFace. You must:
Create a HuggingFace account
Visit Unbabel/XCOMET-XL and request access
Login via CLI:
source ~/.xcomet-venv/bin/activate huggingface-cli login
Unbabel/wmt22-comet-dadoes not require authentication (but requires reference translations).
After authentication, download the model (~14GB for XL, ~42GB for XXL):
source ~/.xcomet-venv/bin/activate
python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"Node.js
Node.js >= 18.0.0
npm or yarn
📦 Installation
Note: If you just want to use xCOMET MCP Server, you do not need to clone this repository. Install the Python environment and model (see Prerequisites), then use
npx(see Usage). The section below is for contributors and local development only.
Local Development
For contributors and local development:
# Clone the repository
git clone https://github.com/shuji-bonji/xcomet-mcp-server.git
cd xcomet-mcp-server
# Set up Python virtual environment and install dependencies
uv venv .venv --python 3.12 # or: python3 -m venv .venv
source .venv/bin/activate
pip install -r python/requirements.txt
# Install Node.js dependencies and build
npm install
npm run build🚀 Usage
With Claude Desktop (npx)
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PYTHON_PATH": "~/.xcomet-venv/bin/python3"
}
}
}
}Tip: If you installed Python packages system-wide or use pyenv,
XCOMET_PYTHON_PATHmay be omitted (auto-detection will find it). See Python Path Auto-Detection for details.
With Claude Code
claude mcp add xcomet --env XCOMET_PYTHON_PATH=~/.xcomet-venv/bin/python3 -- npx -y xcomet-mcp-serverGlobal Installation
If you prefer installing globally:
npm install -g xcomet-mcp-serverThen configure:
{
"mcpServers": {
"xcomet": {
"command": "xcomet-mcp-server",
"env": {
"XCOMET_PYTHON_PATH": "~/.xcomet-venv/bin/python3"
}
}
}
}Local Development Build
If you cloned and built the repository locally (see Installation):
{
"mcpServers": {
"xcomet": {
"command": "node",
"args": ["/path/to/xcomet-mcp-server/dist/index.js"],
"env": {
"XCOMET_PYTHON_PATH": "~/.xcomet-venv/bin/python3"
}
}
}
}HTTP Mode (Remote Access)
TRANSPORT=http PORT=3000 npm startThen connect to http://localhost:3000/mcp
🛠️ Available Tools
xcomet_evaluate
Evaluate translation quality for a single source-translation pair.
Parameters:
Name | Type | Required | Description |
| string | ✅ | Original source text |
| string | ✅ | Translated text to evaluate |
| string | ❌ | Reference translation |
| string | ❌ | Source language code (ISO 639-1) |
| string | ❌ | Target language code (ISO 639-1) |
| "json" | "markdown" | ❌ | Output format (default: "json") |
| boolean | ❌ | Use GPU for inference (default: false) |
Example:
{
"source": "The quick brown fox jumps over the lazy dog.",
"translation": "素早い茶色のキツネが怠惰な犬を飛び越える。",
"source_lang": "en",
"target_lang": "ja",
"use_gpu": true
}Response:
{
"score": 0.847,
"errors": [],
"summary": "Good quality (score: 0.847) with 0 error(s) detected."
}xcomet_detect_errors
Focus on detecting and categorizing translation errors.
Parameters:
Name | Type | Required | Description |
| string | ✅ | Original source text |
| string | ✅ | Translated text to analyze |
| string | ❌ | Reference translation |
| "minor" | "major" | "critical" | ❌ | Minimum severity (default: "minor") |
| "json" | "markdown" | ❌ | Output format |
| boolean | ❌ | Use GPU for inference (default: false) |
xcomet_batch_evaluate
Evaluate multiple translation pairs in a single request.
Performance Note: With the persistent server architecture (v0.3.0+), the model stays loaded in memory. Batch evaluation processes all pairs efficiently without reloading the model.
Parameters:
Name | Type | Required | Description |
| array | ✅ | Array of {source, translation, reference?} (max 500) |
| string | ❌ | Source language code |
| string | ❌ | Target language code |
| "json" | "markdown" | ❌ | Output format |
| boolean | ❌ | Use GPU for inference (default: false) |
| number | ❌ | Batch size 1-64 (default: 8). Larger = faster but uses more memory |
Example:
{
"pairs": [
{"source": "Hello", "translation": "こんにちは"},
{"source": "Goodbye", "translation": "さようなら"}
],
"use_gpu": true,
"batch_size": 16
}🔗 Integration with Other MCP Servers
xCOMET MCP Server is designed to work alongside other MCP servers for complete translation workflows:
sequenceDiagram
participant Agent as AI Agent
participant DeepL as DeepL MCP Server
participant xCOMET as xCOMET MCP Server
Agent->>DeepL: Translate text
DeepL-->>Agent: Translation result
Agent->>xCOMET: Evaluate quality
xCOMET-->>Agent: Score + Errors
Agent->>Agent: Decide: Accept or retry?Recommended Workflow
Translate using DeepL MCP Server (official)
Evaluate using xCOMET MCP Server
Iterate if quality is below threshold
Example: DeepL + xCOMET Integration
Configure both servers in Claude Desktop:
{
"mcpServers": {
"deepl": {
"command": "npx",
"args": ["-y", "@anthropic/deepl-mcp-server"],
"env": {
"DEEPL_API_KEY": "your-api-key"
}
},
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PYTHON_PATH": "~/.xcomet-venv/bin/python3"
}
}
}
}Then ask Claude:
"Translate this text to Japanese using DeepL, then evaluate the translation quality with xCOMET. If the score is below 0.8, suggest improvements."
⚙️ Configuration
Environment Variables
Variable | Default | Description |
|
| Transport mode: |
|
| HTTP server port (when TRANSPORT=http) |
|
| xCOMET model to use |
| (auto-detect) | Python executable path (see below) |
|
| Pre-load model at startup (v0.3.1+) |
|
| Enable verbose debug logging (v0.3.1+) |
Model Selection
Choose the model based on your quality/performance needs:
Model | Parameters | Size | Memory | Reference | HF Auth | Quality | Use Case |
| 3.5B | ~14GB | ~8-10GB | Optional | ✅ Required | ⭐⭐⭐⭐ | Recommended for most use cases |
| 10.7B | ~42GB | ~20GB | Optional | ✅ Required | ⭐⭐⭐⭐⭐ | Highest quality, requires more resources |
| 580M | ~2GB | ~3GB | Required | Not required | ⭐⭐⭐ | Lightweight, faster loading |
Important: XCOMET-XL and XCOMET-XXL are gated models on HuggingFace. Each model requires separate access approval. See Model Download for authentication setup.
Important:
wmt22-comet-darequires areferencetranslation for evaluation. XCOMET models support referenceless evaluation.
Tip: If you experience memory issues or slow model loading, try
Unbabel/wmt22-comet-dafor faster performance with slightly lower accuracy (but remember to provide reference translations).
To use a different model, set the XCOMET_MODEL environment variable:
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_MODEL": "Unbabel/XCOMET-XXL"
}
}
}
}Python Path Auto-Detection
The server automatically detects a Python environment with unbabel-comet installed:
XCOMET_PYTHON_PATHenvironment variable (if set)pyenv versions (
~/.pyenv/versions/*/bin/python3) - checks forcometmoduleHomebrew Python (
/opt/homebrew/bin/python3,/usr/local/bin/python3)Fallback:
python3command
This ensures the server works correctly even when the MCP host (e.g., Claude Desktop) uses a different Python than your terminal.
Example: Explicit Python path configuration
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PYTHON_PATH": "/Users/you/.pyenv/versions/3.11.0/bin/python3"
}
}
}
}⚡ Performance
Persistent Server Architecture (v0.3.0+)
The server uses a persistent Python FastAPI server that keeps the xCOMET model loaded in memory:
Request | Time | Notes |
First request | ~25-90s | Model loading (varies by model size) |
Subsequent requests | ~500ms | Model already loaded |
This provides a 177x speedup for consecutive evaluations compared to reloading the model each time.
Eager Loading (v0.3.1+)
Enable XCOMET_PRELOAD=true to pre-load the model at server startup:
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PRELOAD": "true"
}
}
}
}With preload enabled, all requests are fast (~500ms), including the first one.
graph LR
A[MCP Request] --> B[Node.js Server]
B --> C[Python FastAPI Server]
C --> D[xCOMET Model<br/>in Memory]
D --> C
C --> B
B --> A
style D fill:#9f9Batch Processing Optimization
The xcomet_batch_evaluate tool processes all pairs with a single model load:
Pairs | Estimated Time |
10 | ~30-40 sec |
50 | ~1-1.5 min |
100 | ~2 min |
GPU vs CPU Performance
Mode | 100 Pairs (Estimated) |
CPU (batch_size=8) | ~2 min |
GPU (batch_size=16) | ~20-30 sec |
Note: GPU requires CUDA-compatible hardware and PyTorch with CUDA support. If GPU is not available, set
use_gpu: false(default).
Best Practices
1. Let the persistent server do its job
With v0.3.0+, the model stays in memory. Multiple xcomet_evaluate calls are now efficient:
✅ Fast: First call loads model, subsequent calls reuse it
xcomet_evaluate(pair1) # ~90s (model loads)
xcomet_evaluate(pair2) # ~500ms (model cached)
xcomet_evaluate(pair3) # ~500ms (model cached)2. For many pairs, use batch evaluation
✅ Even faster: Batch all pairs in one call
xcomet_batch_evaluate(allPairs) # Optimal throughput3. Memory considerations
XCOMET-XL requires ~8-10GB RAM
For large batches (500 pairs), ensure sufficient memory
If memory is limited, split into smaller batches (100-200 pairs)
Auto-Restart (v0.3.1+)
The server automatically recovers from failures:
Monitors health every 30 seconds
Restarts after 3 consecutive health check failures
Up to 3 restart attempts before giving up
📊 Quality Score Interpretation
Score Range | Quality | Recommendation |
0.9 - 1.0 | Excellent | Ready for use |
0.7 - 0.9 | Good | Minor review recommended |
0.5 - 0.7 | Fair | Post-editing needed |
0.0 - 0.5 | Poor | Re-translation recommended |
🔍 Troubleshooting
Common Issues
"No module named 'comet'"
Cause: Python environment without unbabel-comet installed.
Solution:
# Check which Python is being used
python3 -c "import sys; print(sys.executable)"
# If using a virtual environment, make sure it's activated
source .venv/bin/activate
pip install -r python/requirements.txt
# For MCP hosts (e.g., Claude Desktop), specify the venv Python path
export XCOMET_PYTHON_PATH=~/.xcomet-venv/bin/python3Model download fails or times out
Cause: Large model files (~14GB for XL) require stable internet connection. XCOMET models also require HuggingFace authentication (see Model Download).
Solution:
# Login to HuggingFace (required for XCOMET-XL/XXL)
huggingface-cli login
# Pre-download the model manually
python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"GPU not detected
Cause: PyTorch not installed with CUDA support.
Solution:
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# If False, reinstall PyTorch with CUDA
pip install torch --index-url https://download.pytorch.org/whl/cu118Slow performance on Mac (MPS)
Cause: Mac MPS (Metal Performance Shaders) has compatibility issues with some operations.
Solution: The server automatically uses num_workers=1 for Mac MPS compatibility. For best performance on Mac, use CPU mode (use_gpu: false).
High memory usage or crashes
Cause: XCOMET-XL requires ~8-10GB RAM.
Solutions:
Use the persistent server (v0.3.0+): Model loads once and stays in memory, avoiding repeated memory spikes
Use a lighter model: Set
XCOMET_MODEL=Unbabel/wmt22-comet-dafor lower memory usage (~3GB)Reduce batch size: For large batches, process in smaller chunks (100-200 pairs)
Close other applications: Free up RAM before running large evaluations
# Check available memory
free -h # Linux
vm_stat | head -5 # macOSVS Code or IDE crashes during evaluation
Cause: High memory usage from the xCOMET model (~8-10GB for XL).
Solution:
With v0.3.0+, the model loads once and stays in memory (no repeated loading)
If memory is still an issue, use a lighter model:
XCOMET_MODEL=Unbabel/wmt22-comet-daClose other memory-intensive applications before evaluation
Getting Help
If you encounter issues:
Check the GitHub Issues
Enable debug logging (check Claude Desktop's Developer Mode logs, or set
XCOMET_DEBUG=true)Open a new issue with:
Your OS and Python version
The error message
Your configuration (without sensitive data)
🧪 Development
# Install dependencies
npm install
# Build TypeScript
npm run build
# Watch mode
npm run dev
# Run tests
npm test
# Test with MCP Inspector
npm run inspect📋 Changelog
See CHANGELOG.md for version history and updates.
📝 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
Unbabel for the xCOMET model
Anthropic for the MCP protocol
Model Context Protocol community
📚 References
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/shuji-bonji/xcomet-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server