MCP Filesystem Server

Overview Schema Related Servers Score Discussions

find_large_files

Identify files exceeding a specified size threshold within a directory structure to manage disk space and locate storage-intensive items.

Instructions

Find files larger than the specified size.

Args:
    path: Starting directory
    min_size_mb: Minimum file size in megabytes
    recursive: Whether to search subdirectories
    max_results: Maximum number of results to return
    exclude_patterns: Optional patterns to exclude
    format: Output format ('text' or 'json')
    ctx: MCP context

Returns:
    Large file information

Input Schema

TableJSON Schema

Name	Required	Default
`path`	Yes
`min_size_mb`	No
`recursive`	No
`max_results`	No
`exclude_patterns`	No
`format`	No	text

Implementation Reference

mcp_filesystem/advanced.py:585-678 (handler)

Core handler implementation in AdvancedFileOperations class. Recursively scans directories, identifies files exceeding the minimum size threshold, collects file information, and returns a sorted list by size (largest first).

async def find_large_files(
    self,
    root_path: Union[str, Path],
    min_size_mb: float = 100,
    recursive: bool = True,
    max_results: int = 100,
    exclude_patterns: Optional[List[str]] = None,
) -> List[Dict]:
    """Find files larger than the specified size.

    Args:
        root_path: Starting directory
        min_size_mb: Minimum file size in megabytes
        recursive: Whether to search subdirectories
        max_results: Maximum number of results to return
        exclude_patterns: Optional patterns to exclude

    Returns:
        List of file information dictionaries for large files

    Raises:
        ValueError: If root_path is outside allowed directories
    """
    min_size_bytes = int(min_size_mb * 1024 * 1024)

    abs_path, allowed = await self.validator.validate_path(root_path)
    if not allowed:
        raise ValueError(f"Path outside allowed directories: {root_path}")

    if not abs_path.is_dir():
        raise ValueError(f"Not a directory: {root_path}")

    # Compile exclude patterns if provided
    exclude_regexes = []
    if exclude_patterns:
        for exclude in exclude_patterns:
            try:
                exclude_regexes.append(re.compile(exclude))
            except re.error:
                logger.warning(f"Invalid exclude pattern: {exclude}")

    # Find large files
    results: List[Dict[str, Any]] = []

    async def scan_for_large_files(dir_path: Path) -> None:
        if len(results) >= max_results:
            return

        try:
            entries = await anyio.to_thread.run_sync(list, dir_path.iterdir())

            for entry in entries:
                if len(results) >= max_results:
                    return

                # Skip if matched by exclude pattern
                path_str = str(entry)
                excluded = False
                for exclude_re in exclude_regexes:
                    if exclude_re.search(path_str):
                        excluded = True
                        break

                if excluded:
                    continue

                try:
                    if entry.is_file():
                        size = entry.stat().st_size
                        if size >= min_size_bytes:
                            info = FileInfo(entry)
                            results.append(info.to_dict())

                    elif entry.is_dir() and recursive:
                        # Check if this path is still allowed
                        (
                            entry_abs,
                            entry_allowed,
                        ) = await self.validator.validate_path(entry)
                        if entry_allowed:
                            await scan_for_large_files(entry)

                except (PermissionError, FileNotFoundError):
                    # Skip entries we can't access
                    pass

        except (PermissionError, FileNotFoundError):
            # Skip directories we can't access
            pass

    await scan_for_large_files(abs_path)

    # Sort by size (largest first)
    return sorted(results, key=lambda x: x["size"], reverse=True)

mcp_filesystem/server.py:616-664 (registration)

MCP tool registration with @mcp.tool() decorator. Thin wrapper around the advanced handler that handles input/output formatting (text or JSON) and error handling.

@mcp.tool()
async def find_large_files(
    path: str,
    ctx: Context,
    min_size_mb: float = 100,
    recursive: bool = True,
    max_results: int = 100,
    exclude_patterns: Optional[List[str]] = None,
    format: str = "text",
) -> str:
    """Find files larger than the specified size.

    Args:
        path: Starting directory
        min_size_mb: Minimum file size in megabytes
        recursive: Whether to search subdirectories
        max_results: Maximum number of results to return
        exclude_patterns: Optional patterns to exclude
        format: Output format ('text' or 'json')
        ctx: MCP context

    Returns:
        Large file information
    """
    try:
        components = get_components()
        results = await components["advanced"].find_large_files(
            path, min_size_mb, recursive, max_results, exclude_patterns
        )

        if format.lower() == "json":
            return json.dumps(results, indent=2)

        # Format as text
        if not results:
            return f"No files larger than {min_size_mb} MB found"

        lines = []
        for file in results:
            size_mb = file["size"] / (1024 * 1024)
            lines.append(f"{file['path']} - {size_mb:.2f} MB")

        return (
            f"Found {len(results)} files larger than {min_size_mb} MB:\n\n"
            + "\n".join(lines)
        )

    except Exception as e:
        return f"Error finding large files: {str(e)}"

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the basic operation but doesn't cover important behavioral aspects: whether this is a read-only operation, potential performance implications for large directories, permission requirements, error handling, or what 'Large file information' specifically includes. The description provides minimal behavioral context beyond the basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with a clear purpose statement, then efficiently documents parameters in a bullet-like format, and ends with return information. Every sentence serves a purpose with zero wasted words. The formatting with 'Args:' and 'Returns:' sections enhances readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 6-parameter tool with no annotations and no output schema, the description provides adequate but incomplete coverage. It documents parameters well but lacks behavioral context about safety, performance, and error handling. The return description 'Large file information' is vague without an output schema. Given the complexity, it should provide more guidance on usage context and result interpretation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate, and it does so effectively by listing all 6 parameters with brief explanations. It clarifies 'min_size_mb' is in megabytes, 'recursive' searches subdirectories, 'exclude_patterns' is optional, and 'format' has two output options. This adds substantial meaning beyond the bare schema, though it could provide more detail about pattern syntax or result formatting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Find files larger than the specified size.' This is a specific verb+resource combination that indicates it's a search/filtering operation. However, it doesn't explicitly differentiate from sibling tools like 'search_files' or 'find_duplicate_files' beyond the size-based filtering focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'search_files' for general searches, 'find_duplicate_files' for duplicate detection, or 'calculate_directory_size' for size analysis. There's no context about when this specific size-based filtering is appropriate versus other file-finding operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/safurrier/mcp-filesystem'

If you have feedback or need assistance with the MCP directory API, please join our Discord server