Skip to main content
Glama
coucya

mcp-server-requests

by coucya

fetch_to_file

Fetch web content from any URL and save it directly to a file in your workspace, with options for raw HTML, cleaned HTML, or Markdown formats.

Instructions

Fetch web content and save it to a file in the workspace

Function/Features:

  • Retrieves web content from any HTTP/HTTPS URL and saves it to a file

  • Automatic directory creation for nested file paths

Notes:

  • Automatically creates parent directories if they don't exist

  • Uses UTF-8 encoding for all saved files

  • parameter file_path must be a absolute path

Args: url (str): The URL to fetch content from. file_path (str): File path where the content will be saved. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown". - "raw": Saves unmodified HTML content - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format before saving

Examples: // Save web page as markdown fetch_to_file({url: "https://example.com", file_path: "/home/user/content/example.md"})

// Save raw HTML content
fetch_to_file({url: "https://api.example.com/data", file_path: "C:\data\response.html", return_content: "raw"})

// Save cleaned content
fetch_to_file({url: "https://example.com/docs", file_path: "/tmp/docs/cleaned.html", return_content: "strict_clean"})

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes(require) The URL to fetch content from
file_pathYes(require) Absolute file path where the content will be saved. The path must be absolute and will be validated for security
return_contentNo(optional, Defaults to "markdown") processing format for HTML contentmarkdown

Implementation Reference

  • Core implementation logic for the fetch_to_file tool. Handles path validation (workspace root and protected paths), fetches content using mcp_http_request, creates directories, writes file, and returns success message.
    async def fetch_content_and_write_to_file(
        url: str,
        file_path: str,
        return_content: Literal['raw', 'basic_clean', 'strict_clean', 'markdown'],
        ctx: Context,
        use_workspace_root: bool = False,
        allow_external_file_access: bool = False,
        user_agent: str = "mcp-server-requests",
        force_user_agent: bool = False
    ) -> str:
        try:
            # Validate file path
            validated_path = file_path
    
            if use_workspace_root and ctx:
                roots = await ctx.list_roots()
                if len(roots) == 0:
                    return "Error: No workspace root available"
                if len(roots) > 1:
                    return "Error: Multiple workspace roots found, which is not supported"
    
                if roots[0].uri.scheme != "file":
                    return "Error: Workspace root is not a file:// URI"
                root = roots[0].uri.path or "/"
    
                if not os.path.isabs(file_path):
                    validated_path = os.path.normpath(os.path.abspath(os.path.join(root, file_path)))
    
                if allow_external_file_access:
                    rel = os.path.relpath(validated_path, root)
                    if rel.startswith(".."):
                        return f"Error: Access denied - path '{validated_path}' is outside workspace root '{root}'"
    
            if not os.path.isabs(validated_path):
                return f"Error: Path must be absolute: {validated_path}"
    
            # Set protected paths based on operating system
            protected_paths = []
            if os.name == 'nt':  # Windows
                protected_paths.extend([
                    os.path.join('C:', 'Windows'),
                    os.path.join('C:', 'Program Files'),
                    os.path.join('C:', 'Program Files (x86)'),
                ])
            else:  # Linux/Mac
                protected_paths.extend([
                    '/etc', '/usr', '/bin', '/sbin', '/lib', '/root',
                ])
    
            for protected in protected_paths:
                if validated_path.startswith(protected):
                    return f"Error: Do not allow writing to protected paths: {protected}"
    
            # Fetch content
            content = mcp_http_request(
                "GET", url,
                return_content=return_content,
                user_agent=user_agent,
                force_user_agnet=force_user_agent,
                format_status=False,
                format_headers=False
            )
    
            # Create parent directories if needed
            try:
                dir_path = os.path.dirname(validated_path)
                if dir_path:
                    os.makedirs(dir_path, exist_ok=True)
            except OSError as e:
                return f"Error: Unable to create directory for path '{validated_path}': {e}"
    
            # Write content to file
            try:
                with open(validated_path, 'w', encoding='utf-8', newline='') as f:
                    f.write(content)
            except OSError as e:
                return f"Error: Unable to write to file '{validated_path}': {e}"
    
            content_size = len(content)
            return f"Content from '{url}' ({content_size:,} bytes) successfully written to: {validated_path}"
    
        except Exception as e:
            return f"Error: Failed to fetch content or write file: {e}"
  • Registration of fetch_to_file tool (workspace root mode) using @mcp.tool(). Includes input schema via Annotated parameters and comprehensive docstring with examples. Dispatches to core helper.
    @mcp.tool()
    async def fetch_to_file(
        url: Annotated[str, "(require) The URL to fetch content from"],
        file_path: Annotated[str, "(require) File path where the content will be saved"],
        *,
        return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown",
        ctx: Context,
    ) -> str:
        """Fetch web content and save it to a file in the workspace
    
        Function/Features:
        - Retrieves web content from any HTTP/HTTPS URL and saves it to a file
        - Automatic directory creation for nested file paths
    
        Notes:
        - Automatically creates parent directories if they don't exist
        - Uses UTF-8 encoding for all saved files
        - parameter `file_path` **must** be a relative path (relative to the workspace root)
    
        Args:
            url (str): The URL to fetch content from.
            file_path (str): File path where the content will be saved.
            return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown".
              - "raw": Saves unmodified HTML content
              - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure
              - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure
              - "markdown": Converts HTML content to clean, readable Markdown format before saving
    
        Examples:
            // Save web page as markdown in workspace
            fetch_to_file({url: "https://example.com", file_path: "content/example.md"})
    
            // Save raw HTML content
            fetch_to_file({url: "https://api.example.com/data", file_path: "data/response.html", return_content: "raw"})
    
            // Save cleaned content
            fetch_to_file({url: "https://example.com/docs", file_path: "docs/cleaned.html", return_content: "strict_clean"})
        """
        return await fetch_content_and_write_to_file(
            url=url,
            file_path=file_path,
            return_content=return_content,
            ctx=ctx,
            use_workspace_root=True,
            allow_external_file_access=bool(allow_external_file_access),
            user_agent=ua,
            force_user_agent=ua_force if ua_force is not None else False
        )
  • Registration of fetch_to_file tool (absolute path mode) using @mcp.tool(). Includes input schema via Annotated parameters and comprehensive docstring with examples. Dispatches to core helper.
    else:
        @mcp.tool()
        async def fetch_to_file(
            url: Annotated[str, "(require) The URL to fetch content from"],
            file_path: Annotated[str, "(require) Absolute file path where the content will be saved. The path must be absolute and will be validated for security"],
            *,
            return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown",
            ctx: Context,
        ) -> str:
            """Fetch web content and save it to a file in the workspace
    
            Function/Features:
            - Retrieves web content from any HTTP/HTTPS URL and saves it to a file
            - Automatic directory creation for nested file paths
    
            Notes:
            - Automatically creates parent directories if they don't exist
            - Uses UTF-8 encoding for all saved files
            - parameter `file_path` **must** be a absolute path
    
            Args:
                url (str): The URL to fetch content from.
                file_path (str): File path where the content will be saved.
                return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown".
                  - "raw": Saves unmodified HTML content
                  - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure
                  - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure
                  - "markdown": Converts HTML content to clean, readable Markdown format before saving
    
            Examples:
                // Save web page as markdown
                fetch_to_file({url: "https://example.com", file_path: "/home/user/content/example.md"})
    
                // Save raw HTML content
                fetch_to_file({url: "https://api.example.com/data", file_path: "C:\\data\\response.html", return_content: "raw"})
    
                // Save cleaned content
                fetch_to_file({url: "https://example.com/docs", file_path: "/tmp/docs/cleaned.html", return_content: "strict_clean"})
            """
            return await fetch_content_and_write_to_file(
                url=url,
                file_path=file_path,
                return_content=return_content,
                ctx=ctx,
                use_workspace_root=False,
                allow_external_file_access=False,
                user_agent=ua,
                force_user_agent=ua_force if ua_force is not None else False
            )
  • Type annotations defining the input schema for the core fetch_to_file logic, including return_content options.
    async def fetch_content_and_write_to_file(
        url: str,
        file_path: str,
        return_content: Literal['raw', 'basic_clean', 'strict_clean', 'markdown'],
        ctx: Context,
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/coucya/mcp-server-requests'

If you have feedback or need assistance with the MCP directory API, please join our Discord server