evaluate
Measure quantized model quality using perplexity scoring to assess performance after compression. Provides quality ratings and evaluation metadata for GGUF, GPTQ, or AWQ formats.
Instructions
Run perplexity evaluation on a quantized model.
Measures model quality after quantization using perplexity scoring. Lower perplexity = better quality. Includes a quality assessment (EXCELLENT/GOOD/FAIR/DEGRADED/POOR).
Args: model_path: Path to the quantized model file (GGUF) or directory (GPTQ/AWQ). format: Format of the quantized model. One of 'gguf', 'gptq', 'awq'. bits: Bit width used during quantization (for quality context).
Returns: Perplexity score, quality assessment, and evaluation metadata.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model_path | Yes | ||
| format | No | gguf | |
| bits | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- mcp_turboquant/server.py:293-320 (handler)Tool registration for 'evaluate' in the MCP server.
def evaluate( model_path: str, format: str = "gguf", bits: int = 4, ) -> dict[str, Any]: """Run perplexity evaluation on a quantized model. Measures model quality after quantization using perplexity scoring. Lower perplexity = better quality. Includes a quality assessment (EXCELLENT/GOOD/FAIR/DEGRADED/POOR). Args: model_path: Path to the quantized model file (GGUF) or directory (GPTQ/AWQ). format: Format of the quantized model. One of 'gguf', 'gptq', 'awq'. bits: Bit width used during quantization (for quality context). Returns: Perplexity score, quality assessment, and evaluation metadata. """ if not os.path.exists(model_path): return { "success": False, "error": f"Model path does not exist: {model_path}", } return evaluate_model(model_path, format.lower(), bits) - mcp_turboquant/evaluate.py:173-226 (handler)Core logic for evaluating model perplexity, delegating to evaluate_gguf or evaluate_transformers.
def evaluate_model( model_path: str, fmt: str, bits: int ) -> dict[str, Any]: """Run perplexity evaluation on a quantized model. Args: model_path: Path to the quantized model file or directory. fmt: Format of the model ('gguf', 'gptq', or 'awq'). bits: Bit width used for quantization. Returns: Result dict with perplexity score and quality assessment. """ if fmt == "gguf": result = evaluate_gguf(model_path) elif fmt in ("gptq", "awq"): result = evaluate_transformers(model_path, fmt) else: return { "success": False, "error": f"Evaluation not supported for format '{fmt}'.", } # Add quality assessment if we got a perplexity score if result.get("success") and result.get("perplexity"): ppl = result["perplexity"] if ppl < 10: result["quality"] = "EXCELLENT" result["assessment"] = "Minimal quality loss from quantization." elif ppl < 20: result["quality"] = "GOOD" result["assessment"] = "Acceptable quality for most use cases." elif ppl < 50: result["quality"] = "FAIR" result["assessment"] = ( f"Some quality degradation at {bits}-bit. " f"Consider using higher bits." ) elif ppl < 100: result["quality"] = "DEGRADED" result["assessment"] = ( f"Significant quality loss at {bits}-bit. " f"Recommend {min(bits + 1, 8)}-bit or higher." ) else: result["quality"] = "POOR" result["assessment"] = ( "Severe quality loss. Model may produce incoherent output. " "Use higher bit quantization." ) result["format"] = fmt result["bits"] = bits return result