compare_code
Compare code versions to evaluate change quality. Detect improvements or regressions to guide effective code review.
Instructions
Compares two versions of code and evaluates the quality of changes.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| code_before | Yes | Old version of the code. | |
| code_after | Yes | New version of the code. | |
| language | No | Programming language. | python |
| context | No | Description of what changed and why (optional). |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- tools/compare.py:8-50 (handler)The main handler function for the compare_code tool. Takes code_before, code_after, language (default 'python'), and optional context. Validates inputs, checks cache, builds a prompt, calls the Groq API via `call()`, parses JSON, caches, and returns the result.
async def compare_code( code_before: str, code_after: str, language: str = "python", context: str = "", ) -> str: """ Compares two versions of code and evaluates the quality of changes. Args: code_before: Old version of the code. code_after: New version of the code. language: Programming language. context: Description of what changed and why (optional). Returns: JSON with improvements, regressions, verdict, recommendation. """ if not code_before.strip() or not code_after.strip(): return error_response("Both code_before and code_after must be provided.") key = cache.make_key("compare_code", code_before, code_after, language, context) if hit := cache.get(key): return hit context_block = f"\nChange context: {context}" if context else "" user = ( f"Language: {language}{context_block}\n\n" f"OLD CODE:\n```{language}\n{code_before}\n```\n\n" f"NEW CODE:\n```{language}\n{code_after}\n```" ) try: raw = await call(COMPARE, user) result = json.loads(raw) except httpx.HTTPStatusError as e: return error_response(f"Groq API error {e.response.status_code}", e.response.text[:300]) except json.JSONDecodeError as e: return error_response("Groq returned invalid JSON", str(e)) except ValueError as e: return error_response(str(e)) out = json.dumps(result, ensure_ascii=False, indent=2) cache.set(key, out) return out - prompts.py:28-43 (schema)The COMPARE prompt template defines the expected JSON output schema for the compare_code tool: verdict, summary, improvements, regressions (with severity), neutral_changes, score_before, score_after, and recommendation.
COMPARE = """\ You are an experienced code reviewer. Compare the old and new versions of code. Return ONLY valid JSON, no Markdown. Format: { "verdict": "improvement|regression|neutral change", "summary": "One-sentence overall conclusion", "improvements": [{"title": "...", "description": "..."}], "regressions": [{"severity": "critical|high|medium|low", "title": "...", "description": "..."}], "neutral_changes": ["..."], "score_before": <0-100>, "score_after": <0-100>, "recommendation": "merge|request_changes|needs_discussion" } """ - server.py:31-39 (registration)Registration of compare_code as an MCP tool via `mcp.tool()(compare_code)` in the FastMCP server.
mcp.tool()(compare_code) mcp.tool()(explain_code) mcp.tool()(generate_tests) mcp.tool()(analyze_file) mcp.tool()(cache_info) mcp.tool()(generate_report) if __name__ == "__main__": mcp.run(transport="stdio") - tools/__init__.py:2-12 (registration)Exports compare_code from tools/compare.py and includes it in __all__ for easy importing.
from .compare import compare_code from .explain import explain_code from .tests import generate_tests from .file_tool import analyze_file from .cache_tool import cache_info from .report import generate_report __all__ = [ "analyze_code", "compare_code", "explain_code", "generate_tests", "analyze_file", "cache_info", "generate_report", ] - groq_client.py:13-67 (helper)The `call()` helper function that sends requests to the Groq API (used by compare_code to invoke the LLM). Also `error_response()` for formatting error JSON.
async def call(system: str, user: str, json_mode: bool = True) -> str: """Send a request to Groq. Automatically retries on 429 rate limit errors.""" if not GROQ_API_KEY: raise ValueError("GROQ_API_KEY is not set. Add it to .env or set it as an environment variable.") payload: dict = { "model": GROQ_MODEL, "messages": [ {"role": "system", "content": system}, {"role": "user", "content": user}, ], "temperature": 0.1, "max_tokens": MAX_TOKENS, } if json_mode: payload["response_format"] = {"type": "json_object"} response = None for attempt in range(MAX_RETRIES): async with httpx.AsyncClient(timeout=60.0) as client: response = await client.post( GROQ_API_URL, headers={"Authorization": f"Bearer {GROQ_API_KEY}", "Content-Type": "application/json"}, json=payload, ) if response.status_code == 429 and attempt < MAX_RETRIES - 1: await asyncio.sleep(_parse_retry_after(response)) continue response.raise_for_status() break raw: str = response.json()["choices"][0]["message"]["content"] return _strip_fences(raw) def _parse_retry_after(response: httpx.Response) -> float: try: msg = response.json().get("error", {}).get("message", "") match = re.search(r"try again in ([0-9.]+)s", msg) if match: return float(match.group(1)) + 1.0 except Exception: pass return 20.0 def _strip_fences(raw: str) -> str: raw = re.sub(r"^```(?:json)?\s*", "", raw.strip()) return re.sub(r"\s*```$", "", raw) def error_response(msg: str, detail: str = "") -> str: return json.dumps({"error": msg, "detail": detail}, ensure_ascii=False)