Skip to main content
Glama
notasandy

MCP Code Sanitizer

compare_code

Compare code versions to evaluate change quality. Detect improvements or regressions to guide effective code review.

Instructions

Compares two versions of code and evaluates the quality of changes.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
code_beforeYesOld version of the code.
code_afterYesNew version of the code.
languageNoProgramming language.python
contextNoDescription of what changed and why (optional).

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The main handler function for the compare_code tool. Takes code_before, code_after, language (default 'python'), and optional context. Validates inputs, checks cache, builds a prompt, calls the Groq API via `call()`, parses JSON, caches, and returns the result.
    async def compare_code(
        code_before: str, code_after: str,
        language: str = "python", context: str = "",
    ) -> str:
        """
        Compares two versions of code and evaluates the quality of changes.
    
        Args:
            code_before: Old version of the code.
            code_after:  New version of the code.
            language:    Programming language.
            context:     Description of what changed and why (optional).
    
        Returns:
            JSON with improvements, regressions, verdict, recommendation.
        """
        if not code_before.strip() or not code_after.strip():
            return error_response("Both code_before and code_after must be provided.")
    
        key = cache.make_key("compare_code", code_before, code_after, language, context)
        if hit := cache.get(key):
            return hit
    
        context_block = f"\nChange context: {context}" if context else ""
        user = (
            f"Language: {language}{context_block}\n\n"
            f"OLD CODE:\n```{language}\n{code_before}\n```\n\n"
            f"NEW CODE:\n```{language}\n{code_after}\n```"
        )
    
        try:
            raw = await call(COMPARE, user)
            result = json.loads(raw)
        except httpx.HTTPStatusError as e:
            return error_response(f"Groq API error {e.response.status_code}", e.response.text[:300])
        except json.JSONDecodeError as e:
            return error_response("Groq returned invalid JSON", str(e))
        except ValueError as e:
            return error_response(str(e))
    
        out = json.dumps(result, ensure_ascii=False, indent=2)
        cache.set(key, out)
        return out
  • The COMPARE prompt template defines the expected JSON output schema for the compare_code tool: verdict, summary, improvements, regressions (with severity), neutral_changes, score_before, score_after, and recommendation.
    COMPARE = """\
    You are an experienced code reviewer. Compare the old and new versions of code.
    Return ONLY valid JSON, no Markdown.
    
    Format:
    {
      "verdict": "improvement|regression|neutral change",
      "summary": "One-sentence overall conclusion",
      "improvements": [{"title": "...", "description": "..."}],
      "regressions": [{"severity": "critical|high|medium|low", "title": "...", "description": "..."}],
      "neutral_changes": ["..."],
      "score_before": <0-100>,
      "score_after": <0-100>,
      "recommendation": "merge|request_changes|needs_discussion"
    }
    """
  • server.py:31-39 (registration)
    Registration of compare_code as an MCP tool via `mcp.tool()(compare_code)` in the FastMCP server.
    mcp.tool()(compare_code)
    mcp.tool()(explain_code)
    mcp.tool()(generate_tests)
    mcp.tool()(analyze_file)
    mcp.tool()(cache_info)
    mcp.tool()(generate_report)
    
    if __name__ == "__main__":
        mcp.run(transport="stdio")
  • tools/__init__.py:2-12 (registration)
    Exports compare_code from tools/compare.py and includes it in __all__ for easy importing.
    from .compare   import compare_code
    from .explain   import explain_code
    from .tests     import generate_tests
    from .file_tool import analyze_file
    from .cache_tool import cache_info
    from .report    import generate_report
    
    __all__ = [
        "analyze_code", "compare_code", "explain_code",
        "generate_tests", "analyze_file", "cache_info", "generate_report",
    ]
  • The `call()` helper function that sends requests to the Groq API (used by compare_code to invoke the LLM). Also `error_response()` for formatting error JSON.
    async def call(system: str, user: str, json_mode: bool = True) -> str:
        """Send a request to Groq. Automatically retries on 429 rate limit errors."""
        if not GROQ_API_KEY:
            raise ValueError("GROQ_API_KEY is not set. Add it to .env or set it as an environment variable.")
    
        payload: dict = {
            "model": GROQ_MODEL,
            "messages": [
                {"role": "system", "content": system},
                {"role": "user",   "content": user},
            ],
            "temperature": 0.1,
            "max_tokens": MAX_TOKENS,
        }
        if json_mode:
            payload["response_format"] = {"type": "json_object"}
    
        response = None
        for attempt in range(MAX_RETRIES):
            async with httpx.AsyncClient(timeout=60.0) as client:
                response = await client.post(
                    GROQ_API_URL,
                    headers={"Authorization": f"Bearer {GROQ_API_KEY}", "Content-Type": "application/json"},
                    json=payload,
                )
    
            if response.status_code == 429 and attempt < MAX_RETRIES - 1:
                await asyncio.sleep(_parse_retry_after(response))
                continue
    
            response.raise_for_status()
            break
    
        raw: str = response.json()["choices"][0]["message"]["content"]
        return _strip_fences(raw)
    
    
    def _parse_retry_after(response: httpx.Response) -> float:
        try:
            msg = response.json().get("error", {}).get("message", "")
            match = re.search(r"try again in ([0-9.]+)s", msg)
            if match:
                return float(match.group(1)) + 1.0
        except Exception:
            pass
        return 20.0
    
    
    def _strip_fences(raw: str) -> str:
        raw = re.sub(r"^```(?:json)?\s*", "", raw.strip())
        return re.sub(r"\s*```$", "", raw)
    
    
    def error_response(msg: str, detail: str = "") -> str:
        return json.dumps({"error": msg, "detail": detail}, ensure_ascii=False)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states the tool 'evaluates the quality of changes,' which hints at a read-only, assessment behavior. However, it does not detail the output format, performance characteristics, or any side effects. The output schema exists but is not referenced in the description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that communicates the core purpose. It is front-loaded and contains no extraneous information. While very brief, it avoids verbosity and is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and four parameters, the description provides minimal context. It does not explain what the evaluation returns (e.g., a score, comparison report) or how the optional context parameter influences results. The tool is for comparing code, but the description lacks completeness for an agent to fully anticipate behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, meaning all parameters are documented in the input schema. The description adds no additional meaning beyond the schema (e.g., 'two versions of code' maps to code_before and code_after). Baseline 3 applies as the description does not compensate for missing schema details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action (compare and evaluate) and resource (two versions of code). It distinguishes from sibling tools like analyze_code (single code analysis) and explain_code (explanation of code). However, it could be more specific about what 'evaluates' entails, such as returning a quality score or identifying changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when comparing code versions, but does not explicitly state when to use this tool over alternatives like analyze_code (for single code) or generate_report (for generating reports). No exclusions or conditions are mentioned, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/notasandy/mcp-code-sanitizer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server