Skip to main content
Glama
safurrier

MCP Filesystem Server

find_large_files

Identify files exceeding a specified size threshold within a directory structure to manage disk space and locate storage-intensive items.

Instructions

Find files larger than the specified size.

Args:
    path: Starting directory
    min_size_mb: Minimum file size in megabytes
    recursive: Whether to search subdirectories
    max_results: Maximum number of results to return
    exclude_patterns: Optional patterns to exclude
    format: Output format ('text' or 'json')
    ctx: MCP context

Returns:
    Large file information

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pathYes
min_size_mbNo
recursiveNo
max_resultsNo
exclude_patternsNo
formatNotext

Implementation Reference

  • Core handler implementation in AdvancedFileOperations class. Recursively scans directories, identifies files exceeding the minimum size threshold, collects file information, and returns a sorted list by size (largest first).
    async def find_large_files(
        self,
        root_path: Union[str, Path],
        min_size_mb: float = 100,
        recursive: bool = True,
        max_results: int = 100,
        exclude_patterns: Optional[List[str]] = None,
    ) -> List[Dict]:
        """Find files larger than the specified size.
    
        Args:
            root_path: Starting directory
            min_size_mb: Minimum file size in megabytes
            recursive: Whether to search subdirectories
            max_results: Maximum number of results to return
            exclude_patterns: Optional patterns to exclude
    
        Returns:
            List of file information dictionaries for large files
    
        Raises:
            ValueError: If root_path is outside allowed directories
        """
        min_size_bytes = int(min_size_mb * 1024 * 1024)
    
        abs_path, allowed = await self.validator.validate_path(root_path)
        if not allowed:
            raise ValueError(f"Path outside allowed directories: {root_path}")
    
        if not abs_path.is_dir():
            raise ValueError(f"Not a directory: {root_path}")
    
        # Compile exclude patterns if provided
        exclude_regexes = []
        if exclude_patterns:
            for exclude in exclude_patterns:
                try:
                    exclude_regexes.append(re.compile(exclude))
                except re.error:
                    logger.warning(f"Invalid exclude pattern: {exclude}")
    
        # Find large files
        results: List[Dict[str, Any]] = []
    
        async def scan_for_large_files(dir_path: Path) -> None:
            if len(results) >= max_results:
                return
    
            try:
                entries = await anyio.to_thread.run_sync(list, dir_path.iterdir())
    
                for entry in entries:
                    if len(results) >= max_results:
                        return
    
                    # Skip if matched by exclude pattern
                    path_str = str(entry)
                    excluded = False
                    for exclude_re in exclude_regexes:
                        if exclude_re.search(path_str):
                            excluded = True
                            break
    
                    if excluded:
                        continue
    
                    try:
                        if entry.is_file():
                            size = entry.stat().st_size
                            if size >= min_size_bytes:
                                info = FileInfo(entry)
                                results.append(info.to_dict())
    
                        elif entry.is_dir() and recursive:
                            # Check if this path is still allowed
                            (
                                entry_abs,
                                entry_allowed,
                            ) = await self.validator.validate_path(entry)
                            if entry_allowed:
                                await scan_for_large_files(entry)
    
                    except (PermissionError, FileNotFoundError):
                        # Skip entries we can't access
                        pass
    
            except (PermissionError, FileNotFoundError):
                # Skip directories we can't access
                pass
    
        await scan_for_large_files(abs_path)
    
        # Sort by size (largest first)
        return sorted(results, key=lambda x: x["size"], reverse=True)
  • MCP tool registration with @mcp.tool() decorator. Thin wrapper around the advanced handler that handles input/output formatting (text or JSON) and error handling.
    @mcp.tool()
    async def find_large_files(
        path: str,
        ctx: Context,
        min_size_mb: float = 100,
        recursive: bool = True,
        max_results: int = 100,
        exclude_patterns: Optional[List[str]] = None,
        format: str = "text",
    ) -> str:
        """Find files larger than the specified size.
    
        Args:
            path: Starting directory
            min_size_mb: Minimum file size in megabytes
            recursive: Whether to search subdirectories
            max_results: Maximum number of results to return
            exclude_patterns: Optional patterns to exclude
            format: Output format ('text' or 'json')
            ctx: MCP context
    
        Returns:
            Large file information
        """
        try:
            components = get_components()
            results = await components["advanced"].find_large_files(
                path, min_size_mb, recursive, max_results, exclude_patterns
            )
    
            if format.lower() == "json":
                return json.dumps(results, indent=2)
    
            # Format as text
            if not results:
                return f"No files larger than {min_size_mb} MB found"
    
            lines = []
            for file in results:
                size_mb = file["size"] / (1024 * 1024)
                lines.append(f"{file['path']} - {size_mb:.2f} MB")
    
            return (
                f"Found {len(results)} files larger than {min_size_mb} MB:\n\n"
                + "\n".join(lines)
            )
    
        except Exception as e:
            return f"Error finding large files: {str(e)}"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the basic operation but doesn't cover important behavioral aspects: whether this is a read-only operation, potential performance implications for large directories, permission requirements, error handling, or what 'Large file information' specifically includes. The description provides minimal behavioral context beyond the basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with a clear purpose statement, then efficiently documents parameters in a bullet-like format, and ends with return information. Every sentence serves a purpose with zero wasted words. The formatting with 'Args:' and 'Returns:' sections enhances readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 6-parameter tool with no annotations and no output schema, the description provides adequate but incomplete coverage. It documents parameters well but lacks behavioral context about safety, performance, and error handling. The return description 'Large file information' is vague without an output schema. Given the complexity, it should provide more guidance on usage context and result interpretation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate, and it does so effectively by listing all 6 parameters with brief explanations. It clarifies 'min_size_mb' is in megabytes, 'recursive' searches subdirectories, 'exclude_patterns' is optional, and 'format' has two output options. This adds substantial meaning beyond the bare schema, though it could provide more detail about pattern syntax or result formatting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Find files larger than the specified size.' This is a specific verb+resource combination that indicates it's a search/filtering operation. However, it doesn't explicitly differentiate from sibling tools like 'search_files' or 'find_duplicate_files' beyond the size-based filtering focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'search_files' for general searches, 'find_duplicate_files' for duplicate detection, or 'calculate_directory_size' for size analysis. There's no context about when this specific size-based filtering is appropriate versus other file-finding operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/safurrier/mcp-filesystem'

If you have feedback or need assistance with the MCP directory API, please join our Discord server