curate_context

Analyzes your codebase to create a context whitelist file, reducing AI token usage by focusing only on relevant directories for specific tasks.

Instructions

Analyzes the codebase and creates a .yellhorncontext file listing directories to be included in AI context.

This tool helps optimize AI context by:

Analyzing your codebase structure
Understanding the task you want to accomplish
Creating a .yellhorncontext file that lists relevant directories
Subsequent workplan/judgement calls will only include files from these directories

The .yellhorncontext file acts as a whitelist - only files matching the patterns will be included. This significantly reduces token usage and improves AI focus on relevant code.

Example .yellhorncontext: src/api/ src/models/ tests/api/ *.config.js

Input Schema

TableJSON Schema

Name	Required	Default
`user_task`	Yes
`codebase_reasoning`	No	file_structure
`ignore_file_path`	No	.yellhornignore
`output_path`	No	.yellhorncontext
`disable_search_grounding`	No
`debug`	No

Implementation Reference

yellhorn_mcp/server.py:544-645 (handler)
MCP tool handler for 'curate_context': decorated with @mcp.tool(name="curate_context"), defines input parameters (schema), and implements the entry point by calling process_context_curation_async from context_processor.
@mcp.tool( name="curate_context", description="""Analyzes the codebase and creates a .yellhorncontext file listing directories to be included in AI context. This tool helps optimize AI context by: 1. Analyzing your codebase structure 2. Understanding the task you want to accomplish 3. Creating a .yellhorncontext file that lists relevant directories 4. Subsequent workplan/judgement calls will only include files from these directories The .yellhorncontext file acts as a whitelist - only files matching the patterns will be included. This significantly reduces token usage and improves AI focus on relevant code. Example .yellhorncontext: src/api/ src/models/ tests/api/ *.config.js""", ) async def curate_context( ctx: Context, user_task: str, codebase_reasoning: str = "file_structure", ignore_file_path: str = ".yellhornignore", output_path: str = ".yellhorncontext", disable_search_grounding: bool = False, debug: bool = False, ) -> str: """Analyzes codebase structure and creates a context curation file. Args: ctx: Server context. user_task: Description of the task the user wants to accomplish. codebase_reasoning: How to analyze the codebase: - "file_structure": Only directory structure (recommended, fastest) - "lsp": Include function signatures (slower) - "full": Include file contents (slowest, not recommended) - "none": No codebase analysis (not recommended) ignore_file_path: Path to the ignore file. Defaults to ".yellhornignore". output_path: Path where the .yellhorncontext file will be created. depth_limit: Maximum directory depth to analyze (0 means no limit). disable_search_grounding: If True, disables Google Search Grounding. debug: If True, logs the full prompt sent to the LLM. Returns: Success message with the created file path. Raises: YellhornMCPError: If context curation fails. """ original_search_grounding = True try: # Get repository path from context repo_path: Path = ctx.request_context.lifespan_context["repo_path"] llm_manager: LLMManager = ctx.request_context.lifespan_context.get("llm_manager") model: str = ctx.request_context.lifespan_context["model"] if not llm_manager: raise YellhornMCPError("LLM Manager not initialized") # Handle search grounding override if specified original_search_grounding = ctx.request_context.lifespan_context.get( "use_search_grounding", True ) if disable_search_grounding: ctx.request_context.lifespan_context["use_search_grounding"] = False await ctx.log( level="info", message="Search grounding temporarily disabled for this request", ) # Delegate to the processor result = await process_context_curation_async( repo_path=repo_path, llm_manager=llm_manager, model=model, user_task=user_task, output_path=output_path, codebase_reasoning=codebase_reasoning, debug=debug, ctx=ctx, ) # Restore original search grounding setting if modified if disable_search_grounding: ctx.request_context.lifespan_context["use_search_grounding"] = original_search_grounding return json.dumps( {"status": "✅ Context curation completed successfully", "message": result} ) except Exception as e: # Restore original search grounding setting on error if disable_search_grounding: try: ctx.request_context.lifespan_context["use_search_grounding"] = ( original_search_grounding ) except NameError: pass # original_search_grounding was not defined yet raise YellhornMCPError(f"Failed to curate context: {str(e)}")
yellhorn_mcp/processors/context_processor.py:431-597 (handler)
Core handler logic delegated from server.py: process_context_curation_async performs codebase snapshot, LLM analysis to select relevant directories/files, and generates/saves the .yellhorncontext file.
async def process_context_curation_async( repo_path: Path, llm_manager: LLMManager, model: str, user_task: str, output_path: str = ".yellhorncontext", codebase_reasoning: str = "file_structure", disable_search_grounding: bool = False, debug: bool = False, ctx: Context | None = None, ) -> str: """Analyze codebase and create a context curation file. Args: repo_path: Path to the repository. llm_manager: LLM Manager instance. model: Model name to use. user_task: Description of the task to accomplish. output_path: Path where the .yellhorncontext file will be created. codebase_reasoning: How to analyze the codebase. ignore_file_path: Path to the ignore file. disable_search_grounding: Whether to disable search grounding. debug: Whether to log the full prompt sent to the LLM. ctx: Optional context for logging. Returns: Success message with the created file path. Raises: YellhornMCPError: If context curation fails. """ # Check if LLM manager is provided if not llm_manager: raise YellhornMCPError("LLM Manager not initialized") try: # Store original search grounding setting original_search_grounding = None if disable_search_grounding and ctx: original_search_grounding = ctx.request_context.lifespan_context.get( "use_search_grounding", True ) ctx.request_context.lifespan_context["use_search_grounding"] = False if ctx: await ctx.log(level="info", message="Starting context curation process") # Get git command function from context if available git_command_func = ( ctx.request_context.lifespan_context.get("git_command_func") if ctx else None ) # Determine the codebase reasoning mode to use codebase_reasoning_mode = ( ctx.request_context.lifespan_context.get("codebase_reasoning", codebase_reasoning) if ctx else codebase_reasoning ) # Delete existing .yellhorncontext file to prevent it from influencing file filtering context_file_path = repo_path / output_path if context_file_path.exists(): try: context_file_path.unlink() if ctx: await ctx.log( level="info", message=f"Deleted existing {output_path} file before analysis", ) except Exception as e: if ctx: await ctx.log( level="warning", message=f"Could not delete existing {output_path} file: {e}", ) # Step 1: Build the codebase context directory_context, file_paths, all_dirs = await build_codebase_context( repo_path=repo_path, codebase_reasoning_mode=codebase_reasoning_mode, model=model, ctx=ctx, git_command_func=git_command_func, ) # Log peek of directory context if ctx: await ctx.log( level="info", message=( f"Directory context:\n{directory_context[:500]}..." if len(directory_context) > 500 else f"Directory context:\n{directory_context}" ), ) # Step 2: Analyze with LLM all_important_dirs = set() try: llm_result = await analyze_with_llm( llm_manager=llm_manager, model=model, directory_context=directory_context, user_task=user_task, debug=debug, ctx=ctx, ) # Step 3: Parse LLM output for directories all_important_dirs = await parse_llm_directories( llm_result=llm_result, all_dirs=all_dirs, ctx=ctx, ) # Log the directories found if ctx: dirs_str = ", ".join(sorted(list(all_important_dirs))[:5]) if len(all_important_dirs) > 5: dirs_str += f", ... ({len(all_important_dirs) - 5} more)" await ctx.log( level="info", message=f"Analysis complete, found {len(all_important_dirs)} important directories: {dirs_str}", ) except Exception as e: if ctx: await ctx.log( level="error", message=f"Error during LLM analysis: {str(e)} ({type(e).__name__})", ) # Fallback to all directories all_important_dirs = set(all_dirs) # If no directories identified, use all (already handled in parse_llm_directories) if not all_important_dirs: all_important_dirs = set(all_dirs) if ctx: await ctx.log( level="info", message=f"Processing complete, identified {len(all_important_dirs)} important directories", ) # Step 4: Save the context file result = await save_context_file( repo_path=repo_path, output_path=output_path, user_task=user_task, all_important_dirs=all_important_dirs, file_paths=file_paths, ctx=ctx, ) # Restore original search grounding setting if modified if disable_search_grounding and ctx: ctx.request_context.lifespan_context["use_search_grounding"] = original_search_grounding return result except Exception as e: error_message = f"Failed to generate .yellhorncontext file: {str(e)}" if ctx: await ctx.log(level="error", message=error_message) raise YellhornMCPError(error_message)
yellhorn_mcp/processors/context_processor.py:28-97 (helper)
build_codebase_context: Builds the initial codebase context (directory structure/snapshot) based on reasoning mode for LLM analysis.
async def build_codebase_context( repo_path: Path, codebase_reasoning_mode: str, model: str, ctx: Context | None = None, git_command_func=None, ) -> tuple[str, list[str], set[str]]: """Build the codebase context for analysis. Args: repo_path: Path to the repository. codebase_reasoning_mode: How to analyze the codebase. model: Model name for token counting. ctx: Optional context for logging. git_command_func: Optional git command function. Returns: Tuple of (directory_context, file_paths, all_dirs) """ # Define log function for get_codebase_context def sync_context_log(msg: str): if ctx: asyncio.create_task(ctx.log(level="info", message=msg)) if ctx: await ctx.log( level="info", message=f"Getting codebase context using {codebase_reasoning_mode} mode", ) # Get the codebase context directory_context, context_file_paths = await get_codebase_context( repo_path=repo_path, reasoning_mode=codebase_reasoning_mode, log_function=sync_context_log if ctx else None, git_command_func=git_command_func, ) # Log key metrics if ctx: token_counter = TokenCounter() token_count = token_counter.count_tokens(directory_context, model) file_count = len(directory_context.split("\n")) if directory_context else 0 await ctx.log( level="info", message=f"Codebase context metrics: {file_count} files, {token_count} tokens based on ({model})", ) # Extract directories from file paths all_dirs = set() for file_path in context_file_paths: parts = file_path.split("/") for i in range(1, len(parts)): dir_path = "/".join(parts[:i]) if dir_path: all_dirs.add(dir_path) # Add root directory if there are root-level files if any("/" not in f for f in context_file_paths): all_dirs.add(".") if ctx: await ctx.log( level="info", message=f"Extracted {len(all_dirs)} directories from {len(context_file_paths)} filtered files", ) return directory_context, context_file_paths, all_dirs
yellhorn_mcp/processors/context_processor.py:99-177 (helper)
analyze_with_llm: Prompts the LLM with codebase context and user task to identify important directories.
async def analyze_with_llm( llm_manager: LLMManager, model: str, directory_context: str, user_task: str, debug: bool = False, ctx: Context | None = None, ) -> str: """Analyze the codebase with LLM to identify important directories. Args: llm_manager: LLM Manager instance. model: Model name to use. directory_context: The codebase context string. user_task: Description of the task. debug: Whether to log debug information. ctx: Optional context for logging. Returns: LLM response containing directory analysis. """ # Construct the system message system_message = f"""You are an expert software developer tasked with analyzing a codebase structure to identify important directories for building and executing a workplan. Your goal is to identify the most important directories that should be included for the user's task. Analyze the directories and identify the ones that: 1. Contain core application code relevant to the user's task 2. Likely contain important business logic 3. Would be essential for understanding the codebase architecture 4. Are needed to implement the requested task 5. Contain SDKs or libraries relevant to the user's task Ignore directories that: 1. Contain only build artifacts or generated code 2. Store dependencies or vendor code 3. Contain temporary or cache files 4. Probably aren't relevant to the user's specific task User Task: {user_task} Return your analysis as a list of important directories, one per line, without any additional text or formatting as below: ```context dir1/subdir1/ dir2/ dir3/subdir3/file3.filetype ``` Prefer to include directories, and not just file paths but include just file paths when appropriate. IMPORTANT: Select only the most relevant directories or files. Don't include explanations for your choices, just return the list in the specified format.""" prompt = f"""{directory_context}""" if ctx: await ctx.log( level="info", message=f"Analyzing directory structure with {model}", ) # Debug logging if debug and ctx: await ctx.log(level="info", message=f"[DEBUG] System message: {system_message}") await ctx.log( level="info", message=f"[DEBUG] User prompt ({len(prompt)} chars): {prompt[:5000]}..." ) # Call LLM result = await llm_manager.call_llm( model=model, prompt=prompt, system_message=system_message, temperature=0.0, ctx=ctx, ) return result if isinstance(result, str) else str(result)
yellhorn_mcp/processors/context_processor.py:304-429 (helper)
save_context_file: Formats and writes the selected directories/files to the .yellhorncontext file.
async def save_context_file( repo_path: Path, output_path: str, user_task: str, all_important_dirs: set[str], file_paths: list[str], ctx: Context | None = None, ) -> str: """Save the context file with important directories. Args: repo_path: Path to the repository. output_path: Path where the context file will be created. user_task: Description of the task. all_important_dirs: Set of important directories. file_paths: List of all file paths. ctx: Optional context for logging. Returns: Success message with the created file path. Raises: YellhornMCPError: If writing fails. """ # Generate file content final_content = "# Yellhorn Context File - AI context optimization\n" final_content += f"# Generated by yellhorn-mcp curate_context tool\n" final_content += f"# Based on task: {user_task[:80]}\n\n" # Sort directories for consistent output # Separate files from directories important_dirs = set() important_files = set() for item in all_important_dirs: # Check if this looks like a file (has extension or is a dot file) if "/" in item: parts = item.split("/") last_part = parts[-1] is_file = ( "." in last_part and not last_part.endswith("/") and (last_part.count(".") == 1 or last_part.startswith(".")) ) else: # Special case: "." alone means root directory, not a file if item == ".": is_file = False else: is_file = "." in item and (item.count(".") == 1 or item.startswith(".")) if is_file: important_files.add(item) else: important_dirs.add(item) sorted_important_dirs = sorted(list(important_dirs)) sorted_important_files = sorted(list(important_files)) # Generate .yellhorncontext file content if sorted_important_dirs or sorted_important_files: final_content += "# Important directories to specifically include\n" dir_includes = [] # Add specific files first for file_path in sorted_important_files: dir_includes.append(file_path) # Add directories for dir_path in sorted_important_dirs: # Check if directory has files has_files = False if dir_path == ".": has_files = any("/" not in f for f in file_paths) else: has_files = any(f.startswith(dir_path + "/") for f in file_paths) if dir_path == ".": if has_files: dir_includes.append("./") else: dir_includes.append("./**") else: if has_files: dir_includes.append(f"{dir_path}/") else: dir_includes.append(f"{dir_path}/**") final_content += "\n".join(dir_includes) + "\n\n" # Remove duplicate lines content_lines = final_content.splitlines() content_lines.reverse() seen_lines = set() unique_lines = [] for line in content_lines: if line.strip() == "" or line.strip().startswith("#"): unique_lines.append(line) continue if line not in seen_lines: seen_lines.add(line) unique_lines.append(line) unique_lines.reverse() final_content = "\n".join(unique_lines) # Write the file output_file_path = repo_path / output_path try: with open(output_file_path, "w", encoding="utf-8") as f: f.write(final_content) if ctx: await ctx.log( level="info", message=f"Successfully wrote .yellhorncontext file to {output_file_path}", ) return f"Successfully created .yellhorncontext file at {output_file_path} with {len(sorted_important_files)} files and {len(sorted_important_dirs)} directories." except Exception as e: raise YellhornMCPError(f"Failed to write .yellhorncontext file: {str(e)}")

Yellhorn MCP

curate_context

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API