fetch_to_file
Fetch web content from any URL and save it directly to a file in your workspace, with options for raw HTML, cleaned HTML, or Markdown formats.
Instructions
Fetch web content and save it to a file in the workspace
Function/Features:
Retrieves web content from any HTTP/HTTPS URL and saves it to a file
Automatic directory creation for nested file paths
Notes:
Automatically creates parent directories if they don't exist
Uses UTF-8 encoding for all saved files
parameter
file_pathmust be a absolute path
Args: url (str): The URL to fetch content from. file_path (str): File path where the content will be saved. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown". - "raw": Saves unmodified HTML content - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format before saving
Examples: // Save web page as markdown fetch_to_file({url: "https://example.com", file_path: "/home/user/content/example.md"})
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | (require) The URL to fetch content from | |
| file_path | Yes | (require) Absolute file path where the content will be saved. The path must be absolute and will be validated for security | |
| return_content | No | (optional, Defaults to "markdown") processing format for HTML content | markdown |
Implementation Reference
- mcp_server_requests/server.py:166-249 (handler)Core implementation logic for the fetch_to_file tool. Handles path validation (workspace root and protected paths), fetches content using mcp_http_request, creates directories, writes file, and returns success message.async def fetch_content_and_write_to_file( url: str, file_path: str, return_content: Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], ctx: Context, use_workspace_root: bool = False, allow_external_file_access: bool = False, user_agent: str = "mcp-server-requests", force_user_agent: bool = False ) -> str: try: # Validate file path validated_path = file_path if use_workspace_root and ctx: roots = await ctx.list_roots() if len(roots) == 0: return "Error: No workspace root available" if len(roots) > 1: return "Error: Multiple workspace roots found, which is not supported" if roots[0].uri.scheme != "file": return "Error: Workspace root is not a file:// URI" root = roots[0].uri.path or "/" if not os.path.isabs(file_path): validated_path = os.path.normpath(os.path.abspath(os.path.join(root, file_path))) if allow_external_file_access: rel = os.path.relpath(validated_path, root) if rel.startswith(".."): return f"Error: Access denied - path '{validated_path}' is outside workspace root '{root}'" if not os.path.isabs(validated_path): return f"Error: Path must be absolute: {validated_path}" # Set protected paths based on operating system protected_paths = [] if os.name == 'nt': # Windows protected_paths.extend([ os.path.join('C:', 'Windows'), os.path.join('C:', 'Program Files'), os.path.join('C:', 'Program Files (x86)'), ]) else: # Linux/Mac protected_paths.extend([ '/etc', '/usr', '/bin', '/sbin', '/lib', '/root', ]) for protected in protected_paths: if validated_path.startswith(protected): return f"Error: Do not allow writing to protected paths: {protected}" # Fetch content content = mcp_http_request( "GET", url, return_content=return_content, user_agent=user_agent, force_user_agnet=force_user_agent, format_status=False, format_headers=False ) # Create parent directories if needed try: dir_path = os.path.dirname(validated_path) if dir_path: os.makedirs(dir_path, exist_ok=True) except OSError as e: return f"Error: Unable to create directory for path '{validated_path}': {e}" # Write content to file try: with open(validated_path, 'w', encoding='utf-8', newline='') as f: f.write(content) except OSError as e: return f"Error: Unable to write to file '{validated_path}': {e}" content_size = len(content) return f"Content from '{url}' ({content_size:,} bytes) successfully written to: {validated_path}" except Exception as e: return f"Error: Failed to fetch content or write file: {e}"
- mcp_server_requests/server.py:295-342 (registration)Registration of fetch_to_file tool (workspace root mode) using @mcp.tool(). Includes input schema via Annotated parameters and comprehensive docstring with examples. Dispatches to core helper.@mcp.tool() async def fetch_to_file( url: Annotated[str, "(require) The URL to fetch content from"], file_path: Annotated[str, "(require) File path where the content will be saved"], *, return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown", ctx: Context, ) -> str: """Fetch web content and save it to a file in the workspace Function/Features: - Retrieves web content from any HTTP/HTTPS URL and saves it to a file - Automatic directory creation for nested file paths Notes: - Automatically creates parent directories if they don't exist - Uses UTF-8 encoding for all saved files - parameter `file_path` **must** be a relative path (relative to the workspace root) Args: url (str): The URL to fetch content from. file_path (str): File path where the content will be saved. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown". - "raw": Saves unmodified HTML content - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format before saving Examples: // Save web page as markdown in workspace fetch_to_file({url: "https://example.com", file_path: "content/example.md"}) // Save raw HTML content fetch_to_file({url: "https://api.example.com/data", file_path: "data/response.html", return_content: "raw"}) // Save cleaned content fetch_to_file({url: "https://example.com/docs", file_path: "docs/cleaned.html", return_content: "strict_clean"}) """ return await fetch_content_and_write_to_file( url=url, file_path=file_path, return_content=return_content, ctx=ctx, use_workspace_root=True, allow_external_file_access=bool(allow_external_file_access), user_agent=ua, force_user_agent=ua_force if ua_force is not None else False )
- mcp_server_requests/server.py:343-391 (registration)Registration of fetch_to_file tool (absolute path mode) using @mcp.tool(). Includes input schema via Annotated parameters and comprehensive docstring with examples. Dispatches to core helper.else: @mcp.tool() async def fetch_to_file( url: Annotated[str, "(require) The URL to fetch content from"], file_path: Annotated[str, "(require) Absolute file path where the content will be saved. The path must be absolute and will be validated for security"], *, return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown", ctx: Context, ) -> str: """Fetch web content and save it to a file in the workspace Function/Features: - Retrieves web content from any HTTP/HTTPS URL and saves it to a file - Automatic directory creation for nested file paths Notes: - Automatically creates parent directories if they don't exist - Uses UTF-8 encoding for all saved files - parameter `file_path` **must** be a absolute path Args: url (str): The URL to fetch content from. file_path (str): File path where the content will be saved. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown". - "raw": Saves unmodified HTML content - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format before saving Examples: // Save web page as markdown fetch_to_file({url: "https://example.com", file_path: "/home/user/content/example.md"}) // Save raw HTML content fetch_to_file({url: "https://api.example.com/data", file_path: "C:\\data\\response.html", return_content: "raw"}) // Save cleaned content fetch_to_file({url: "https://example.com/docs", file_path: "/tmp/docs/cleaned.html", return_content: "strict_clean"}) """ return await fetch_content_and_write_to_file( url=url, file_path=file_path, return_content=return_content, ctx=ctx, use_workspace_root=False, allow_external_file_access=False, user_agent=ua, force_user_agent=ua_force if ua_force is not None else False )
- Type annotations defining the input schema for the core fetch_to_file logic, including return_content options.async def fetch_content_and_write_to_file( url: str, file_path: str, return_content: Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], ctx: Context,