compare_llm_responses

Compare Claude and another LLM's responses to the same prompt in parallel, returning structured analysis of differences and performance metrics.

Instructions

Compare how Claude and a second agent (defaults to Ollama) respond to the same prompt.

Sends the same prompt to both Claude (via ctx.sample) and the second agent in parallel, returning a structured comparison of their responses.

Args: prompt: The prompt to send to both LLMs llm_model: Which second model to use (default: llama3.2:latest) temperature: Temperature for both LLMs (default: 0.7) max_tokens: Maximum tokens for responses (default: 500)

Returns: Dictionary containing: { "prompt": "original prompt text", "claude_response": { "text": "Claude's response...", "model": "claude-sonnet-4-5", "error": None }, "alternative_response": { "text": "Ollama's response...", "model": "llama3.2:latest", "error": None }, "comparison": { "claude_length": 150, "alternative_length": 142, "both_succeeded": true } }

Raises: ValueError: If prompt is empty or invalid parameters provided

Input Schema

TableJSON Schema

Name	Required	Description	Default
`prompt`	Yes
`llm_model`	No
`temperature`	No
`max_tokens`	No

greenroom