Yellhorn MCP

curate_context

Analyzes your codebase to create a context whitelist file, reducing AI token usage by focusing only on relevant directories for specific tasks.

Instructions

Analyzes the codebase and creates a .yellhorncontext file listing directories to be included in AI context.

This tool helps optimize AI context by:

Analyzing your codebase structure
Understanding the task you want to accomplish
Creating a .yellhorncontext file that lists relevant directories
Subsequent workplan/judgement calls will only include files from these directories

The .yellhorncontext file acts as a whitelist - only files matching the patterns will be included. This significantly reduces token usage and improves AI focus on relevant code.

Example .yellhorncontext: src/api/ src/models/ tests/api/ *.config.js

Input Schema

TableJSON Schema

Name	Required	Default
`user_task`	Yes
`codebase_reasoning`	No	file_structure
`ignore_file_path`	No	.yellhornignore
`output_path`	No	.yellhorncontext
`disable_search_grounding`	No
`debug`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

yellhorn_mcp/server.py:544-645 (handler)

MCP tool handler for 'curate_context': decorated with @mcp.tool(name="curate_context"), defines input parameters (schema), and implements the entry point by calling process_context_curation_async from context_processor.

@mcp.tool(
    name="curate_context",
    description="""Analyzes the codebase and creates a .yellhorncontext file listing directories to be included in AI context.

This tool helps optimize AI context by:
1. Analyzing your codebase structure
2. Understanding the task you want to accomplish
3. Creating a .yellhorncontext file that lists relevant directories
4. Subsequent workplan/judgement calls will only include files from these directories

The .yellhorncontext file acts as a whitelist - only files matching the patterns will be included.
This significantly reduces token usage and improves AI focus on relevant code.

Example .yellhorncontext:
src/api/
src/models/
tests/api/
*.config.js""",
)
async def curate_context(
    ctx: Context,
    user_task: str,
    codebase_reasoning: str = "file_structure",
    ignore_file_path: str = ".yellhornignore",
    output_path: str = ".yellhorncontext",
    disable_search_grounding: bool = False,
    debug: bool = False,
) -> str:
    """Analyzes codebase structure and creates a context curation file.

    Args:
        ctx: Server context.
        user_task: Description of the task the user wants to accomplish.
        codebase_reasoning: How to analyze the codebase:
               - "file_structure": Only directory structure (recommended, fastest)
               - "lsp": Include function signatures (slower)
               - "full": Include file contents (slowest, not recommended)
               - "none": No codebase analysis (not recommended)
        ignore_file_path: Path to the ignore file. Defaults to ".yellhornignore".
        output_path: Path where the .yellhorncontext file will be created.
        depth_limit: Maximum directory depth to analyze (0 means no limit).
        disable_search_grounding: If True, disables Google Search Grounding.
        debug: If True, logs the full prompt sent to the LLM.

    Returns:
        Success message with the created file path.

    Raises:
        YellhornMCPError: If context curation fails.
    """
    original_search_grounding = True
    try:
        # Get repository path from context
        repo_path: Path = ctx.request_context.lifespan_context["repo_path"]
        llm_manager: LLMManager = ctx.request_context.lifespan_context.get("llm_manager")
        model: str = ctx.request_context.lifespan_context["model"]

        if not llm_manager:
            raise YellhornMCPError("LLM Manager not initialized")

        # Handle search grounding override if specified
        original_search_grounding = ctx.request_context.lifespan_context.get(
            "use_search_grounding", True
        )
        if disable_search_grounding:
            ctx.request_context.lifespan_context["use_search_grounding"] = False
            await ctx.log(
                level="info",
                message="Search grounding temporarily disabled for this request",
            )

        # Delegate to the processor
        result = await process_context_curation_async(
            repo_path=repo_path,
            llm_manager=llm_manager,
            model=model,
            user_task=user_task,
            output_path=output_path,
            codebase_reasoning=codebase_reasoning,
            debug=debug,
            ctx=ctx,
        )

        # Restore original search grounding setting if modified
        if disable_search_grounding:
            ctx.request_context.lifespan_context["use_search_grounding"] = original_search_grounding

        return json.dumps(
            {"status": "✅ Context curation completed successfully", "message": result}
        )

    except Exception as e:
        # Restore original search grounding setting on error
        if disable_search_grounding:
            try:
                ctx.request_context.lifespan_context["use_search_grounding"] = (
                    original_search_grounding
                )
            except NameError:
                pass  # original_search_grounding was not defined yet
        raise YellhornMCPError(f"Failed to curate context: {str(e)}")

yellhorn_mcp/processors/context_processor.py:431-597 (handler)

Core handler logic delegated from server.py: process_context_curation_async performs codebase snapshot, LLM analysis to select relevant directories/files, and generates/saves the .yellhorncontext file.

async def process_context_curation_async(
    repo_path: Path,
    llm_manager: LLMManager,
    model: str,
    user_task: str,
    output_path: str = ".yellhorncontext",
    codebase_reasoning: str = "file_structure",
    disable_search_grounding: bool = False,
    debug: bool = False,
    ctx: Context | None = None,
) -> str:
    """Analyze codebase and create a context curation file.

    Args:
        repo_path: Path to the repository.
        llm_manager: LLM Manager instance.
        model: Model name to use.
        user_task: Description of the task to accomplish.
        output_path: Path where the .yellhorncontext file will be created.
        codebase_reasoning: How to analyze the codebase.
        ignore_file_path: Path to the ignore file.
        disable_search_grounding: Whether to disable search grounding.
        debug: Whether to log the full prompt sent to the LLM.
        ctx: Optional context for logging.

    Returns:
        Success message with the created file path.

    Raises:
        YellhornMCPError: If context curation fails.
    """
    # Check if LLM manager is provided
    if not llm_manager:
        raise YellhornMCPError("LLM Manager not initialized")

    try:
        # Store original search grounding setting
        original_search_grounding = None
        if disable_search_grounding and ctx:
            original_search_grounding = ctx.request_context.lifespan_context.get(
                "use_search_grounding", True
            )
            ctx.request_context.lifespan_context["use_search_grounding"] = False

        if ctx:
            await ctx.log(level="info", message="Starting context curation process")

        # Get git command function from context if available
        git_command_func = (
            ctx.request_context.lifespan_context.get("git_command_func") if ctx else None
        )

        # Determine the codebase reasoning mode to use
        codebase_reasoning_mode = (
            ctx.request_context.lifespan_context.get("codebase_reasoning", codebase_reasoning)
            if ctx
            else codebase_reasoning
        )

        # Delete existing .yellhorncontext file to prevent it from influencing file filtering
        context_file_path = repo_path / output_path
        if context_file_path.exists():
            try:
                context_file_path.unlink()
                if ctx:
                    await ctx.log(
                        level="info",
                        message=f"Deleted existing {output_path} file before analysis",
                    )
            except Exception as e:
                if ctx:
                    await ctx.log(
                        level="warning",
                        message=f"Could not delete existing {output_path} file: {e}",
                    )

        # Step 1: Build the codebase context
        directory_context, file_paths, all_dirs = await build_codebase_context(
            repo_path=repo_path,
            codebase_reasoning_mode=codebase_reasoning_mode,
            model=model,
            ctx=ctx,
            git_command_func=git_command_func,
        )

        # Log peek of directory context
        if ctx:
            await ctx.log(
                level="info",
                message=(
                    f"Directory context:\n{directory_context[:500]}..."
                    if len(directory_context) > 500
                    else f"Directory context:\n{directory_context}"
                ),
            )

        # Step 2: Analyze with LLM
        all_important_dirs = set()
        try:
            llm_result = await analyze_with_llm(
                llm_manager=llm_manager,
                model=model,
                directory_context=directory_context,
                user_task=user_task,
                debug=debug,
                ctx=ctx,
            )

            # Step 3: Parse LLM output for directories
            all_important_dirs = await parse_llm_directories(
                llm_result=llm_result,
                all_dirs=all_dirs,
                ctx=ctx,
            )

            # Log the directories found
            if ctx:
                dirs_str = ", ".join(sorted(list(all_important_dirs))[:5])
                if len(all_important_dirs) > 5:
                    dirs_str += f", ... ({len(all_important_dirs) - 5} more)"

                await ctx.log(
                    level="info",
                    message=f"Analysis complete, found {len(all_important_dirs)} important directories: {dirs_str}",
                )

        except Exception as e:
            if ctx:
                await ctx.log(
                    level="error",
                    message=f"Error during LLM analysis: {str(e)} ({type(e).__name__})",
                )
            # Fallback to all directories
            all_important_dirs = set(all_dirs)

        # If no directories identified, use all (already handled in parse_llm_directories)
        if not all_important_dirs:
            all_important_dirs = set(all_dirs)

        if ctx:
            await ctx.log(
                level="info",
                message=f"Processing complete, identified {len(all_important_dirs)} important directories",
            )

        # Step 4: Save the context file
        result = await save_context_file(
            repo_path=repo_path,
            output_path=output_path,
            user_task=user_task,
            all_important_dirs=all_important_dirs,
            file_paths=file_paths,
            ctx=ctx,
        )

        # Restore original search grounding setting if modified
        if disable_search_grounding and ctx:
            ctx.request_context.lifespan_context["use_search_grounding"] = original_search_grounding

        return result

    except Exception as e:
        error_message = f"Failed to generate .yellhorncontext file: {str(e)}"
        if ctx:
            await ctx.log(level="error", message=error_message)
        raise YellhornMCPError(error_message)

yellhorn_mcp/processors/context_processor.py:28-97 (helper)

build_codebase_context: Builds the initial codebase context (directory structure/snapshot) based on reasoning mode for LLM analysis.

async def build_codebase_context(
    repo_path: Path,
    codebase_reasoning_mode: str,
    model: str,
    ctx: Context | None = None,
    git_command_func=None,
) -> tuple[str, list[str], set[str]]:
    """Build the codebase context for analysis.

    Args:
        repo_path: Path to the repository.
        codebase_reasoning_mode: How to analyze the codebase.
        model: Model name for token counting.
        ctx: Optional context for logging.
        git_command_func: Optional git command function.

    Returns:
        Tuple of (directory_context, file_paths, all_dirs)
    """

    # Define log function for get_codebase_context
    def sync_context_log(msg: str):
        if ctx:
            asyncio.create_task(ctx.log(level="info", message=msg))

    if ctx:
        await ctx.log(
            level="info",
            message=f"Getting codebase context using {codebase_reasoning_mode} mode",
        )

    # Get the codebase context
    directory_context, context_file_paths = await get_codebase_context(
        repo_path=repo_path,
        reasoning_mode=codebase_reasoning_mode,
        log_function=sync_context_log if ctx else None,
        git_command_func=git_command_func,
    )

    # Log key metrics
    if ctx:
        token_counter = TokenCounter()
        token_count = token_counter.count_tokens(directory_context, model)
        file_count = len(directory_context.split("\n")) if directory_context else 0
        await ctx.log(
            level="info",
            message=f"Codebase context metrics: {file_count} files, {token_count} tokens based on ({model})",
        )

    # Extract directories from file paths
    all_dirs = set()
    for file_path in context_file_paths:
        parts = file_path.split("/")
        for i in range(1, len(parts)):
            dir_path = "/".join(parts[:i])
            if dir_path:
                all_dirs.add(dir_path)

    # Add root directory if there are root-level files
    if any("/" not in f for f in context_file_paths):
        all_dirs.add(".")

    if ctx:
        await ctx.log(
            level="info",
            message=f"Extracted {len(all_dirs)} directories from {len(context_file_paths)} filtered files",
        )

    return directory_context, context_file_paths, all_dirs

yellhorn_mcp/processors/context_processor.py:99-177 (helper)

analyze_with_llm: Prompts the LLM with codebase context and user task to identify important directories.

async def analyze_with_llm(
    llm_manager: LLMManager,
    model: str,
    directory_context: str,
    user_task: str,
    debug: bool = False,
    ctx: Context | None = None,
) -> str:
    """Analyze the codebase with LLM to identify important directories.

    Args:
        llm_manager: LLM Manager instance.
        model: Model name to use.
        directory_context: The codebase context string.
        user_task: Description of the task.
        debug: Whether to log debug information.
        ctx: Optional context for logging.

    Returns:
        LLM response containing directory analysis.
    """
    # Construct the system message
    system_message = f"""You are an expert software developer tasked with analyzing a codebase structure to identify important directories for building and executing a workplan.

Your goal is to identify the most important directories that should be included for the user's task.

Analyze the directories and identify the ones that:
1. Contain core application code relevant to the user's task
2. Likely contain important business logic
3. Would be essential for understanding the codebase architecture
4. Are needed to implement the requested task
5. Contain SDKs or libraries relevant to the user's task

Ignore directories that:
1. Contain only build artifacts or generated code
2. Store dependencies or vendor code
3. Contain temporary or cache files
4. Probably aren't relevant to the user's specific task

User Task: {user_task}

Return your analysis as a list of important directories, one per line, without any additional text or formatting as below:

```context
dir1/subdir1/
dir2/
dir3/subdir3/file3.filetype
```

Prefer to include directories, and not just file paths but include just file paths when appropriate. 
IMPORTANT: Select only the most relevant directories or files.
Don't include explanations for your choices, just return the list in the specified format."""

    prompt = f"""{directory_context}"""

    if ctx:
        await ctx.log(
            level="info",
            message=f"Analyzing directory structure with {model}",
        )

    # Debug logging
    if debug and ctx:
        await ctx.log(level="info", message=f"[DEBUG] System message: {system_message}")
        await ctx.log(
            level="info", message=f"[DEBUG] User prompt ({len(prompt)} chars): {prompt[:5000]}..."
        )

    # Call LLM
    result = await llm_manager.call_llm(
        model=model,
        prompt=prompt,
        system_message=system_message,
        temperature=0.0,
        ctx=ctx,
    )

    return result if isinstance(result, str) else str(result)

yellhorn_mcp/processors/context_processor.py:304-429 (helper)

save_context_file: Formats and writes the selected directories/files to the .yellhorncontext file.

async def save_context_file(
    repo_path: Path,
    output_path: str,
    user_task: str,
    all_important_dirs: set[str],
    file_paths: list[str],
    ctx: Context | None = None,
) -> str:
    """Save the context file with important directories.

    Args:
        repo_path: Path to the repository.
        output_path: Path where the context file will be created.
        user_task: Description of the task.
        all_important_dirs: Set of important directories.
        file_paths: List of all file paths.
        ctx: Optional context for logging.

    Returns:
        Success message with the created file path.

    Raises:
        YellhornMCPError: If writing fails.
    """
    # Generate file content
    final_content = "# Yellhorn Context File - AI context optimization\n"
    final_content += f"# Generated by yellhorn-mcp curate_context tool\n"
    final_content += f"# Based on task: {user_task[:80]}\n\n"

    # Sort directories for consistent output
    # Separate files from directories
    important_dirs = set()
    important_files = set()

    for item in all_important_dirs:
        # Check if this looks like a file (has extension or is a dot file)
        if "/" in item:
            parts = item.split("/")
            last_part = parts[-1]
            is_file = (
                "." in last_part
                and not last_part.endswith("/")
                and (last_part.count(".") == 1 or last_part.startswith("."))
            )
        else:
            # Special case: "." alone means root directory, not a file
            if item == ".":
                is_file = False
            else:
                is_file = "." in item and (item.count(".") == 1 or item.startswith("."))

        if is_file:
            important_files.add(item)
        else:
            important_dirs.add(item)

    sorted_important_dirs = sorted(list(important_dirs))
    sorted_important_files = sorted(list(important_files))

    # Generate .yellhorncontext file content
    if sorted_important_dirs or sorted_important_files:
        final_content += "# Important directories to specifically include\n"
        dir_includes = []

        # Add specific files first
        for file_path in sorted_important_files:
            dir_includes.append(file_path)

        # Add directories
        for dir_path in sorted_important_dirs:
            # Check if directory has files
            has_files = False
            if dir_path == ".":
                has_files = any("/" not in f for f in file_paths)
            else:
                has_files = any(f.startswith(dir_path + "/") for f in file_paths)

            if dir_path == ".":
                if has_files:
                    dir_includes.append("./")
                else:
                    dir_includes.append("./**")
            else:
                if has_files:
                    dir_includes.append(f"{dir_path}/")
                else:
                    dir_includes.append(f"{dir_path}/**")

        final_content += "\n".join(dir_includes) + "\n\n"

    # Remove duplicate lines
    content_lines = final_content.splitlines()
    content_lines.reverse()

    seen_lines = set()
    unique_lines = []

    for line in content_lines:
        if line.strip() == "" or line.strip().startswith("#"):
            unique_lines.append(line)
            continue

        if line not in seen_lines:
            seen_lines.add(line)
            unique_lines.append(line)

    unique_lines.reverse()
    final_content = "\n".join(unique_lines)

    # Write the file
    output_file_path = repo_path / output_path
    try:
        with open(output_file_path, "w", encoding="utf-8") as f:
            f.write(final_content)

        if ctx:
            await ctx.log(
                level="info",
                message=f"Successfully wrote .yellhorncontext file to {output_file_path}",
            )

        return f"Successfully created .yellhorncontext file at {output_file_path} with {len(sorted_important_files)} files and {len(sorted_important_dirs)} directories."

    except Exception as e:
        raise YellhornMCPError(f"Failed to write .yellhorncontext file: {str(e)}")

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it analyzes codebase structure, creates a whitelist file, reduces token usage, and improves AI focus. However, it lacks details on potential side effects (e.g., file overwriting), error handling, or performance characteristics like rate limits, leaving some gaps for a tool with significant impact.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by a bulleted list of steps and benefits. It's appropriately sized for a complex tool, though the example section could be slightly trimmed without losing clarity. Every sentence contributes to understanding, with minimal redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, no annotations) and the presence of an output schema (which handles return values), the description is mostly complete. It explains the tool's role, benefits, and output format (.yellhorncontext file). However, it lacks details on parameter interactions or edge cases, which could aid in more robust usage, leaving room for slight improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions analyzing 'codebase structure' and 'the task you want to accomplish,' which loosely relates to parameters like user_task and codebase_reasoning, but doesn't explain specific semantics (e.g., what user_task entails or how ignore_file_path works). The example .yellhorncontext file adds some context but doesn't directly clarify parameters, resulting in marginal value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: 'Analyzes the codebase and creates a .yellhorncontext file listing directories to be included in AI context.' It uses specific verbs ('analyzes,' 'creates') and identifies the resource (codebase, .yellhorncontext file), clearly distinguishing it from sibling tools like create_workplan or judge_workplan, which focus on different aspects of the workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'This tool helps optimize AI context' and 'Subsequent workplan/judgement calls will only include files from these directories.' It implicitly distinguishes it from siblings by highlighting its role in context curation before other steps, though it doesn't explicitly name alternatives, the context makes the workflow clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/msnidal/yellhorn-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server