# Kubernetes Capability Inference Multi-Model Comparison
You are evaluating and comparing multiple AI models' capability to analyze and infer Kubernetes resource capabilities. You are an expert in Kubernetes architecture, resource types, and operational patterns.
{pricing_context}
{tool_context}
## CAPABILITY ANALYSIS SCENARIO
Scenario: "{scenario_name}"
## AI RESPONSES TO COMPARE
{model_responses}
## EVALUATION CRITERIA
### Quality (40% weight)
- **Technical Correctness**: Are the listed capabilities technically accurate for each resource?
- **Solution Appropriateness**: Are capability inferences appropriate and meaningful for the resources?
- **Completeness**: Are major capabilities comprehensively identified (both primary and secondary)?
- **Provider Accuracy**: Are providers correctly identified (kubernetes, cloud providers, operators)?
### Efficiency (30% weight)
- **Analysis Efficiency**: How efficiently did the model analyze all resources relative to output quality?
- **Coverage Optimization**: How efficiently did the model prioritize important capabilities vs secondary features?
- **Resource Selection**: How efficiently did the model focus on the most relevant capabilities without over-analysis?
### Performance (20% weight)
- **Response Time**: How quickly did the model respond?
- **Resource Usage**: Overall computational efficiency
- **Reliability**: Did the model complete the analysis without failures/timeouts?
- **Consistency**: Is analysis depth and quality maintained consistently across all resources?
### Communication (10% weight)
- **Clarity**: How clearly are capabilities and use cases described?
- **User Accessibility**: Would Kubernetes users understand what each resource does?
- **Structure**: How well-organized and readable is the capability analysis?
## FAILURE ANALYSIS CONSIDERATION
Some models may have failure analysis metadata indicating they experienced timeouts, errors, or other issues during the full workflow execution. When evaluating:
- **Successful individual responses**: If a model provided a good response for this specific analysis but failed elsewhere in the workflow, focus on the quality of THIS response but apply a **reliability penalty** to the performance score
- **Timeout failures**: Models that timed out during the full workflow should receive reduced performance scores even if their individual analyses were good. **Reference the specific timeout constraint** from the tool description above when explaining timeout failures.
- **Reliability scoring**: Factor workflow completion reliability into the performance score (models that couldn't complete the full workflow are less reliable for production capability analysis)
- **Cost-performance analysis**: Consider model pricing when analyzing overall value - a model with slightly lower scores but significantly lower cost may offer better value for certain use cases.
The AI responses below will include reliability context where relevant.
## MODELS BEING COMPARED
{models}
## REQUIRED RESPONSE FORMAT
Provide your evaluation as a JSON object:
```json
{
"scenario_summary": "Brief description of the capability analysis scenario evaluated",
"models_compared": ["model1", "model2", "model3"],
"comparative_analysis": {
"model1": {
"quality_score": <0-1>,
"efficiency_score": <0-1>,
"performance_score": <0-1>,
"communication_score": <0-1>,
"weighted_total": <calculated weighted score>,
"strengths": "<what this model did well>",
"weaknesses": "<what this model could improve>"
},
"model2": {
"quality_score": <0-1>,
"efficiency_score": <0-1>,
"performance_score": <0-1>,
"communication_score": <0-1>,
"weighted_total": <calculated weighted score>,
"strengths": "<what this model did well>",
"weaknesses": "<what this model could improve>"
}
},
"ranking": [
{
"rank": 1,
"model": "<best_model>",
"score": <weighted_total>,
"rationale": "<why this model ranked first>"
}
],
"overall_insights": "<key insights about model differences and performance patterns for capability analysis>"
}
```
Focus on technical accuracy, practical usefulness for Kubernetes users, and consistency across all resource analyses in the scenario.