compare_llm_responses
Compare Claude and another LLM's responses to the same prompt in parallel, returning structured analysis of differences and performance metrics.
Instructions
Compare how Claude and a second agent (defaults to Ollama) respond to the same prompt.
Sends the same prompt to both Claude (via ctx.sample) and the second agent in parallel, returning a structured comparison of their responses.
Args: prompt: The prompt to send to both LLMs llm_model: Which second model to use (default: llama3.2:latest) temperature: Temperature for both LLMs (default: 0.7) max_tokens: Maximum tokens for responses (default: 500)
Returns: Dictionary containing: { "prompt": "original prompt text", "claude_response": { "text": "Claude's response...", "model": "claude-sonnet-4-5", "error": None }, "alternative_response": { "text": "Ollama's response...", "model": "llama3.2:latest", "error": None }, "comparison": { "claude_length": 150, "alternative_length": 142, "both_succeeded": true } }
Raises: ValueError: If prompt is empty or invalid parameters provided
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | ||
| llm_model | No | ||
| temperature | No | ||
| max_tokens | No |