Skip to main content
Glama

judge_workplan

Compare code changes between git refs against a workplan to evaluate implementation adherence, quality, completeness, and provide improvement suggestions.

Instructions

Triggers an asynchronous code judgement comparing two git refs against a workplan.

This tool will:

  1. Create a sub-issue linked to the workplan immediately

  2. Launch a background AI process to analyze the code changes

  3. Update the sub-issue with the judgement once complete

The judgement will evaluate:

  • Whether the implementation follows the workplan

  • Code quality and completeness

  • Missing or incomplete items

  • Suggestions for improvement

Supports comparing:

  • Branches (e.g., feature-branch vs main)

  • Commits (e.g., abc123 vs def456)

  • PR changes (automatically uses PR's base and head)

Returns the sub-issue URL immediately.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
issue_numberYes
base_refNomain
head_refNoHEAD
codebase_reasoningNofull
debugNo
disable_search_groundingNo
subissue_to_updateNo
pr_urlNo

Implementation Reference

  • Main MCP tool handler for 'judge_workplan'. Orchestrates fetching the workplan issue body, generating git diff between refs, creating a placeholder judgement sub-issue, and launching the asynchronous LLM judgement process via process_judgement_async.
    async def judge_workplan(
        ctx: Context,
        issue_number: str,
        base_ref: str = "main",
        head_ref: str = "HEAD",
        codebase_reasoning: str = "full",
        debug: bool = False,
        disable_search_grounding: bool = False,
        subissue_to_update: str | None = None,
        pr_url: str | None = None,
    ) -> str:
        """Triggers an asynchronous code judgement for changes against a workplan.
    
        Args:
            ctx: Server context.
            issue_number: The workplan issue number to judge against.
            base_ref: The base git reference (default: "main").
            head_ref: The head git reference (default: "HEAD").
            codebase_reasoning: Reasoning mode for codebase analysis:
                   - "full": Include complete file contents and full diff
                   - "lsp": Include function signatures and diff of changed functions
                   - "file_structure": Include only file structure and list of changed files
                   - "none": No codebase context, only diff summary
            debug: If True, adds a comment with the full prompt used for generation.
            disable_search_grounding: If True, disables Google Search Grounding.
    
        Returns:
            JSON string containing the sub-issue URL and number.
    
        Raises:
            YellhornMCPError: If judgement creation fails.
        """
        original_search_grounding = True
        try:
            repo_path: Path = ctx.request_context.lifespan_context["repo_path"]
            model = ctx.request_context.lifespan_context["model"]
            llm_manager = ctx.request_context.lifespan_context.get("llm_manager")
            reasoning_effort = ctx.request_context.lifespan_context.get("reasoning_effort")
    
            # Handle search grounding override if specified
            original_search_grounding = ctx.request_context.lifespan_context.get(
                "use_search_grounding", True
            )
            if disable_search_grounding:
                ctx.request_context.lifespan_context["use_search_grounding"] = False
                await ctx.log(
                    level="info",
                    message="Search grounding temporarily disabled for this request",
                )
    
            # Use default branch if base_ref is "main" but the repo uses "master"
            if base_ref == "main":
                default_branch = await get_default_branch(repo_path)
                if default_branch != "main":
                    await ctx.log(
                        level="info",
                        message=f"Using default branch '{default_branch}' instead of 'main'",
                    )
                    base_ref = default_branch
    
            # Check if issue_number is a PR URL
            if issue_number.startswith("http") and "/pull/" in issue_number:
                # This is a PR URL, we need to extract the diff and find the related workplan
                pr_diff = await get_github_pr_diff(repo_path, issue_number)
    
                # Extract PR number for finding related workplan
                import re
    
                pr_match = re.search(r"/pull/(\d+)", issue_number)
                if not pr_match:
                    raise YellhornMCPError(f"Invalid PR URL: {issue_number}")
    
                pr_number = pr_match.group(1)
    
                # Try to find workplan issue number in PR description or title
                # For now, we'll ask the user to provide the workplan issue number
                raise YellhornMCPError(
                    f"PR URL detected. Please provide the workplan issue number instead of PR URL. "
                    f"You can find the workplan issue number in the PR description."
                )
    
            # Fetch the workplan
            workplan = await get_issue_body(repo_path, issue_number)
    
            # Handle PR URL or git refs for diff generation
            if pr_url:
                # Use PR diff instead of git refs
                diff = await get_github_pr_diff(repo_path, pr_url)
                # For PR, use placeholder commit hashes
                base_commit_hash = "pr_base"
                head_commit_hash = "pr_head"
            else:
                # Resolve git references to commit hashes
                base_commit_hash = await run_git_command(
                    repo_path,
                    ["rev-parse", base_ref],
                    ctx.request_context.lifespan_context.get("git_command_func"),
                )
                head_commit_hash = await run_git_command(
                    repo_path,
                    ["rev-parse", head_ref],
                    ctx.request_context.lifespan_context.get("git_command_func"),
                )
                # Generate diff for review
                diff = await get_git_diff(
                    repo_path,
                    base_ref,
                    head_ref,
                    codebase_reasoning,
                    ctx.request_context.lifespan_context.get("git_command_func"),
                )
    
            # Check if diff is empty or only contains the header for file_structure mode
            is_empty = not diff.strip() or (
                codebase_reasoning in ["file_structure", "none"]
                and diff.strip() == f"Changed files between {base_ref} and {head_ref}:"
            )
    
            if is_empty:
                # No changes to judge
                return json.dumps(
                    {
                        "error": f"No changes found between {base_ref} and {head_ref}",
                        "base_commit": base_commit_hash,
                        "head_commit": head_commit_hash,
                    }
                )
    
            # Extract URLs from the workplan
            submitted_urls = extract_urls(workplan)
    
            # Create a placeholder sub-issue immediately
            submission_metadata = SubmissionMetadata(
                status="Generating judgement...",
                model_name=model,
                search_grounding_enabled=ctx.request_context.lifespan_context.get(
                    "use_search_grounding", False
                ),
                yellhorn_version=__version__,
                submitted_urls=submitted_urls if submitted_urls else None,
                codebase_reasoning_mode=codebase_reasoning,
                timestamp=datetime.now(timezone.utc),
            )
    
            submission_comment = format_submission_comment(submission_metadata)
            placeholder_body = f"Parent workplan: #{issue_number}\n\n## Status\nGenerating judgement...\n\n{submission_comment}"
            judgement_title = f"Judgement for #{issue_number}: {head_ref} vs {base_ref}"
    
            # Create or update the sub-issue
            if subissue_to_update:
                # Update existing subissue
                subissue_number = subissue_to_update
                subissue_url = f"https://github.com/{repo_path.name}/issues/{subissue_number}"
                await update_github_issue(repo_path, subissue_number, placeholder_body)
            else:
                # Create new sub-issue
                from yellhorn_mcp.integrations.github_integration import create_judgement_subissue
    
                subissue_url = await create_judgement_subissue(
                    repo_path, issue_number, judgement_title, placeholder_body
                )
    
                # Extract sub-issue number from URL
                import re
    
                issue_match = re.search(r"/issues/(\d+)", subissue_url)
                subissue_number = issue_match.group(1) if issue_match else None
    
            await ctx.log(
                level="info",
                message=f"Created judgement sub-issue: {subissue_url}",
            )
    
            # Launch background task to generate judgement
            await ctx.log(
                level="info",
                message=f"Launching background task to generate judgement with AI model {model}",
            )
    
            # Prepare metadata for async processing
            start_time = datetime.now(timezone.utc)
    
            asyncio.create_task(
                process_judgement_async(
                    repo_path,
                    llm_manager,
                    model,
                    workplan,
                    diff,
                    base_ref,
                    head_ref,
                    base_commit_hash,
                    head_commit_hash,
                    issue_number,
                    subissue_to_update=subissue_number,
                    debug=debug,
                    codebase_reasoning=codebase_reasoning,
                    disable_search_grounding=disable_search_grounding,
                    reasoning_effort=reasoning_effort,
                    _meta={
                        "original_search_grounding": original_search_grounding,
                        "start_time": start_time,
                        "submitted_urls": submitted_urls,
                    },
                    ctx=ctx,
                    github_command_func=ctx.request_context.lifespan_context.get("github_command_func"),
                    git_command_func=ctx.request_context.lifespan_context.get("git_command_func"),
                )
            )
    
            # Restore original search grounding setting if modified
            if disable_search_grounding:
                ctx.request_context.lifespan_context["use_search_grounding"] = original_search_grounding
    
            # Return the sub-issue URL and number as JSON
            return json.dumps({"subissue_url": subissue_url, "subissue_number": subissue_number})
    
        except Exception as e:
            # Restore original search grounding setting on error
            if disable_search_grounding:
                try:
                    ctx.request_context.lifespan_context["use_search_grounding"] = (
                        original_search_grounding
                    )
                except NameError:
                    pass  # original_search_grounding was not defined yet
            raise YellhornMCPError(f"Failed to create judgement: {str(e)}")
  • FastMCP tool registration decorator that binds the judge_workplan function to the tool name 'judge_workplan' with full description and input parameters inferred from function signature.
    @mcp.tool(
        name="judge_workplan",
        description="""Triggers an asynchronous code judgement comparing two git refs against a workplan.
    
    This tool will:
    1. Create a sub-issue linked to the workplan immediately
    2. Launch a background AI process to analyze the code changes
    3. Update the sub-issue with the judgement once complete
    
    The judgement will evaluate:
    - Whether the implementation follows the workplan
    - Code quality and completeness
    - Missing or incomplete items
    - Suggestions for improvement
    
    Supports comparing:
    - Branches (e.g., feature-branch vs main)
    - Commits (e.g., abc123 vs def456)
    - PR changes (automatically uses PR's base and head)
    
    Returns the sub-issue URL immediately.""",
    )
  • Core asynchronous helper that performs the LLM call to judge the code diff against the workplan, formats the response with metadata, handles citations/search grounding, calculates costs, and updates the GitHub judgement sub-issue.
    async def process_judgement_async(
        repo_path: Path,
        llm_manager: LLMManager,
        model: str,
        workplan_content: str,
        diff_content: str,
        base_ref: str,
        head_ref: str,
        base_commit_hash: str,
        head_commit_hash: str,
        parent_workplan_issue_number: str,
        subissue_to_update: str | None = None,
        debug: bool = False,
        codebase_reasoning: str = "full",
        disable_search_grounding: bool = False,
        _meta: dict[str, object] | None = None,
        ctx: Context | None = None,
        github_command_func: Callable | None = None,
        git_command_func: Callable | None = None,
        reasoning_effort: ReasoningEffort | None = None,
    ) -> None:
        """Judge a code diff against a workplan asynchronously.
    
        Args:
            repo_path: Path to the repository.
            llm_manager: LLM Manager instance for API calls.
            model: Model name to use (Gemini or OpenAI).
            workplan_content: The original workplan content.
            diff_content: The code diff to judge.
            base_ref: Base reference name.
            head_ref: Head reference name.
            base_commit_hash: Base commit hash.
            head_commit_hash: Head commit hash.
            parent_workplan_issue_number: Parent workplan issue number.
            subissue_to_update: Optional existing sub-issue to update.
            debug: If True, add a comment with the full prompt.
            codebase_reasoning: Mode for codebase context.
            disable_search_grounding: If True, disables search grounding.
            _meta: Optional metadata from the caller.
            ctx: Optional context for logging.
            github_command_func: Optional GitHub command function (for mocking).
            git_command_func: Optional Git command function (for mocking).
            reasoning_effort: Optional reasoning effort to apply for supported models.
        """
        try:
    
            # Construct prompt
            prompt = f"""You are an expert software reviewer tasked with judging whether a code diff successfully implements a given workplan.
    
    # Original Workplan
    {workplan_content}
    
    # Code Diff
    {diff_content}
    
    # Task
    Review the code diff against the original workplan and provide a detailed judgement. Consider:
    
    1. **Completeness**: Does the diff implement all the steps and requirements outlined in the workplan?
    2. **Correctness**: Is the implementation technically correct and does it follow best practices?
    3. **Missing Elements**: What parts of the workplan, if any, were not addressed?
    4. **Additional Changes**: Were there any changes made that weren't part of the original workplan?
    5. **Quality**: Comment on code quality, testing, documentation, and any potential issues.
    
    The diff represents changes between '{base_ref}' and '{head_ref}'.
    
    Structure your response with these clear sections:
    
    ## Judgement Summary
    Provide a clear verdict: APPROVED, NEEDS_WORK, or INCOMPLETE, followed by a brief explanation.
    
    ## Implementation Analysis
    Detail what was successfully implemented from the workplan.
    
    ## Missing or Incomplete Items
    List specific items from the workplan that were not addressed or were only partially implemented.
    
    ## Code Quality Assessment
    Evaluate the quality of the implementation including:
    - Code style and consistency
    - Error handling
    - Test coverage
    - Documentation
    
    ## Recommendations
    Provide specific, actionable recommendations for improvement.
    
    ## References
    Extract any URLs mentioned in the workplan or that would be helpful for understanding the implementation and list them here. This ensures important links are preserved.
    
    IMPORTANT: Respond *only* with the Markdown content for the judgement. Do *not* wrap your entire response in a single Markdown code block (```). Start directly with the '## Judgement Summary' heading.
    """
            # Check if we should use search grounding
            use_search_grounding = not disable_search_grounding
            if _meta and "original_search_grounding" in _meta:
                use_search_grounding = (
                    _meta["original_search_grounding"] and not disable_search_grounding
                )
    
            # Prepare optional generation config for the LLM call
            generation_config = None
            is_openai_model = llm_manager._is_openai_model(model)
    
            # Handle search grounding for Gemini models
            if not is_openai_model and use_search_grounding:
                if ctx:
                    await ctx.log(
                        level="info", message=f"Attempting to enable search grounding for model {model}"
                    )
                try:
                    from google.genai.types import GenerateContentConfig
    
                    from yellhorn_mcp.utils.search_grounding_utils import _get_gemini_search_tools
    
                    search_tools = _get_gemini_search_tools(model)
                    if search_tools:
                        generation_config = GenerateContentConfig(tools=search_tools)
                        if ctx:
                            await ctx.log(
                                level="info", message=f"Search grounding enabled for model {model}"
                            )
                except ImportError:
                    if ctx:
                        await ctx.log(
                            level="warning",
                            message="GenerateContentConfig not available, skipping search grounding",
                        )
    
            # Call LLM through the manager with citation support
            effective_reasoning: ReasoningEffort | None = None
            if is_openai_model:
                # OpenAI models don't support citations
                if reasoning_effort is not None:
                    usage_result: UsageResult = await llm_manager.call_llm_with_usage(
                        prompt=prompt,
                        model=model,
                        temperature=0.0,
                        ctx=ctx,
                        generation_config=generation_config,
                        reasoning_effort=reasoning_effort,
                    )
                else:
                    usage_result = await llm_manager.call_llm_with_usage(
                        prompt=prompt,
                        model=model,
                        temperature=0.0,
                        ctx=ctx,
                        generation_config=generation_config,
                    )
                usage_metadata: UsageMetadata = usage_result["usage_metadata"]
                content_value = usage_result["content"]
                judgement_content = (
                    content_value if isinstance(content_value, str) else str(content_value)
                )
                effective_reasoning = usage_result.get("reasoning_effort")
                completion_metadata = CompletionMetadata(
                    model_name=model,
                    status="✅ Judgement generated successfully",
                    generation_time_seconds=0.0,  # Will be calculated below
                    input_tokens=usage_metadata.prompt_tokens,
                    output_tokens=usage_metadata.completion_tokens,
                    total_tokens=usage_metadata.total_tokens,
                    timestamp=datetime.now(timezone.utc),
                )
            else:
                # Gemini models - use citation-aware call
                if reasoning_effort is not None:
                    citation_result: CitationResult = await llm_manager.call_llm_with_citations(
                        prompt=prompt,
                        model=model,
                        temperature=0.0,
                        ctx=ctx,
                        generation_config=generation_config,
                        reasoning_effort=reasoning_effort,
                    )
                else:
                    citation_result = await llm_manager.call_llm_with_citations(
                        prompt=prompt,
                        model=model,
                        temperature=0.0,
                        ctx=ctx,
                        generation_config=generation_config,
                    )
    
                content_val = citation_result.get("content", "")
                judgement_content = content_val if isinstance(content_val, str) else str(content_val)
                usage_metadata = citation_result.get("usage_metadata", UsageMetadata())
    
                # Process citations if available
                grounding_metadata = citation_result.get("grounding_metadata")
                if grounding_metadata is not None:
                    from yellhorn_mcp.utils.search_grounding_utils import add_citations_from_metadata
    
                    judgement_content = add_citations_from_metadata(
                        judgement_content, cast(GroundingMetadata, grounding_metadata)
                    )
    
                # Create completion metadata
                if isinstance(grounding_metadata, GroundingMetadata):
                    sr_used = (
                        len(grounding_metadata.grounding_chunks)
                        if grounding_metadata.grounding_chunks is not None
                        else None
                    )
                else:
                    sr_used = None
                effective_reasoning = None
    
                completion_metadata = CompletionMetadata(
                    model_name=model,
                    status="✅ Judgement generated successfully",
                    generation_time_seconds=0.0,  # Will be calculated below
                    input_tokens=usage_metadata.prompt_tokens,
                    output_tokens=usage_metadata.completion_tokens,
                    total_tokens=usage_metadata.total_tokens,
                    search_results_used=sr_used,
                    timestamp=datetime.now(timezone.utc),
                )
    
            if not judgement_content:
                api_name = "OpenAI" if is_openai_model else "Gemini"
                raise YellhornMCPError(
                    f"Failed to generate judgement: Received an empty response from {api_name} API."
                )
    
            # Calculate generation time if we have metadata
            if (
                completion_metadata
                and _meta
                and "start_time" in _meta
                and isinstance(_meta["start_time"], datetime)
            ):
                generation_time = (datetime.now(timezone.utc) - _meta["start_time"]).total_seconds()
                completion_metadata.generation_time_seconds = generation_time
                completion_metadata.timestamp = datetime.now(timezone.utc)
    
            # Calculate cost if we have token counts
            if (
                completion_metadata
                and completion_metadata.input_tokens
                and completion_metadata.output_tokens
            ):
                completion_metadata.estimated_cost = calculate_cost(
                    model,
                    int(completion_metadata.input_tokens or 0),
                    int(completion_metadata.output_tokens or 0),
                    effective_reasoning.value if effective_reasoning else None,
                )
    
            # Add context size
            if completion_metadata:
                completion_metadata.context_size_chars = len(prompt)
    
            # Construct metadata section for the final body
            metadata_section = f"""## Comparison Metadata
    - **Workplan Issue**: `#{parent_workplan_issue_number}`
    - **Base Ref**: `{base_ref}` (Commit: `{base_commit_hash}`)
    - **Head Ref**: `{head_ref}` (Commit: `{head_commit_hash}`)
    - **Codebase Reasoning Mode**: `{codebase_reasoning}`
    - **AI Model**: `{model}`
    
    """
    
            # Add parent issue link at the top
            parent_link = f"Parent workplan: #{parent_workplan_issue_number}\n\n"
    
            # Construct the full body (no metrics in body)
            full_body = f"{parent_link}{metadata_section}{judgement_content}"
    
            # Construct title
            judgement_title = f"Judgement for #{parent_workplan_issue_number}: {head_ref} vs {base_ref}"
    
            # Create or update the sub-issue
            if subissue_to_update:
                # Update existing issue
                await update_github_issue(
                    repo_path=repo_path,
                    issue_number=subissue_to_update,
                    title=judgement_title,
                    body=full_body,
                    github_command_func=github_command_func,
                )
    
                # Construct the URL for the updated issue
                repo_info = await run_git_command(
                    repo_path, ["remote", "get-url", "origin"], git_command_func
                )
                # Clean up the repo URL to get the proper format
                if repo_info.endswith(".git"):
                    repo_info = repo_info[:-4]
                if repo_info.startswith("git@github.com:"):
                    repo_info = repo_info.replace("git@github.com:", "https://github.com/")
    
                subissue_url = f"{repo_info}/issues/{subissue_to_update}"
            else:
                subissue_url = await create_judgement_subissue(
                    repo_path,
                    parent_workplan_issue_number,
                    judgement_title,
                    full_body,
                    github_command_func=github_command_func,
                )
    
            if ctx:
                await ctx.log(
                    level="info",
                    message=f"Successfully created judgement sub-issue: {subissue_url}",
                )
    
            # Add debug comment if requested
            if debug:
                # Extract issue number from URL
                issue_match = re.search(r"/issues/(\d+)", subissue_url)
                if issue_match:
                    sub_issue_number = issue_match.group(1)
                    debug_comment = f"<details>\n<summary>Debug: Full prompt used for generation</summary>\n\n```\n{prompt}\n```\n</details>"
                    await add_issue_comment(
                        repo_path,
                        sub_issue_number,
                        debug_comment,
                        github_command_func=github_command_func,
                    )
    
            # Add completion comment to the PARENT issue, not the sub-issue
            if completion_metadata and _meta:
                _urls_obj = _meta.get("submitted_urls")
                urls = (
                    [u for u in _urls_obj if isinstance(u, str)]
                    if isinstance(_urls_obj, list)
                    else None
                )
                _ts_obj = _meta.get("start_time")
                ts = _ts_obj if isinstance(_ts_obj, datetime) else datetime.now(timezone.utc)
                submission_metadata = SubmissionMetadata(
                    status="Generating judgement...",
                    model_name=model,
                    search_grounding_enabled=not disable_search_grounding,
                    yellhorn_version=__version__,
                    submitted_urls=urls,
                    codebase_reasoning_mode=codebase_reasoning,
                    timestamp=ts,
                )
    
                # Post completion comment to the sub-issue
                completion_comment = format_completion_comment(completion_metadata)
                # Extract sub-issue number from URL or use the provided one
                if subissue_to_update:
                    sub_issue_number = subissue_to_update
                else:
                    # Extract issue number from URL
                    issue_match = re.search(r"/issues/(\d+)", subissue_url)
                    if issue_match:
                        sub_issue_number = issue_match.group(1)
                    else:
                        # Fallback to parent if we can't extract sub-issue number
                        sub_issue_number = parent_workplan_issue_number
    
                await add_issue_comment(
                    repo_path,
                    sub_issue_number,
                    completion_comment,
                    github_command_func=github_command_func,
                )
    
        except Exception as e:
            error_msg = f"Error processing judgement: {str(e)}"
            if ctx:
                await ctx.log(level="error", message=error_msg)
    
            # Try to add error comment to parent issue
            try:
                error_comment = f"❌ **Error generating judgement**\n\n{str(e)}"
                await add_issue_comment(
                    repo_path,
                    parent_workplan_issue_number,
                    error_comment,
                    github_command_func=github_command_func,
                )
            except Exception:
                # If we can't even add a comment, just log
                if ctx:
                    await ctx.log(
                        level="error", message=f"Failed to add error comment to issue: {str(e)}"
                    )
    
            # Re-raise as YellhornMCPError to signal failure outward
            raise YellhornMCPError(error_msg)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/msnidal/yellhorn-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server