Skip to main content
Glama

fetch_to_file

Retrieve web content from a URL and save it to a specified file, with options for raw HTML, cleaned HTML, or Markdown conversion.

Instructions

获取网页内容并保存到文件。 - 如果是 HTML, 则根据 return_content 返回合适的内容, - 如果不是 HTML,但是是 Text 或 Json 内容,则直接保存其内容。 - 如果是其它类型的内容,则返回错误信息。

Args: url (str): 要获取的网页 URL。 file_path (str): 要保存到的文件路径,必须是绝对路径。 return_content ("raw" | "basic_clean" | "strict_clean" | "markdown", optional): 默认为 "markdown",用于控制返回 html 内容的方式, - 如果为 raw,返回原始 HTML 内容。 - 如果为 basic_clean,返回过滤后的 HTML 内容,过滤掉所有不会显示的标签,如 script, style 等。 - 如果为 strict_clean,返回过滤后的 HTML 内容,过滤掉所有不会显示的标签,如 script, style 等,并且会删除大部分无用的 HTML 属性。 - 如果为 markdown,HTML 转换为 Markdown 后返回。 Returns: - 成功时返回文件保存路径 - 如果路径不安全则返回错误信息

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
return_contentNomarkdown
urlYes

Implementation Reference

  • Handler function for the 'fetch_to_file' tool (workspace root mode). Defines input schema via Annotated parameters, includes comprehensive docstring with examples, and delegates execution to the helper function fetch_content_and_write_to_file.
    async def fetch_to_file( url: Annotated[str, "(require) The URL to fetch content from"], file_path: Annotated[str, "(require) File path where the content will be saved"], *, return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown", ctx: Context, ) -> str: """Fetch web content and save it to a file in the workspace Function/Features: - Retrieves web content from any HTTP/HTTPS URL and saves it to a file - Automatic directory creation for nested file paths Notes: - Automatically creates parent directories if they don't exist - Uses UTF-8 encoding for all saved files - parameter `file_path` **must** be a relative path (relative to the workspace root) Args: url (str): The URL to fetch content from. file_path (str): File path where the content will be saved. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown". - "raw": Saves unmodified HTML content - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format before saving Examples: // Save web page as markdown in workspace fetch_to_file({url: "https://example.com", file_path: "content/example.md"}) // Save raw HTML content fetch_to_file({url: "https://api.example.com/data", file_path: "data/response.html", return_content: "raw"}) // Save cleaned content fetch_to_file({url: "https://example.com/docs", file_path: "docs/cleaned.html", return_content: "strict_clean"}) """ return await fetch_content_and_write_to_file( url=url, file_path=file_path, return_content=return_content, ctx=ctx, use_workspace_root=True, allow_external_file_access=bool(allow_external_file_access), user_agent=ua, force_user_agent=ua_force if ua_force is not None else False )
  • Alternative handler function for the 'fetch_to_file' tool (absolute path mode). Similar to the workspace version but requires absolute paths and disables workspace root usage.
    async def fetch_to_file( url: Annotated[str, "(require) The URL to fetch content from"], file_path: Annotated[str, "(require) Absolute file path where the content will be saved. The path must be absolute and will be validated for security"], *, return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown", ctx: Context, ) -> str: """Fetch web content and save it to a file in the workspace Function/Features: - Retrieves web content from any HTTP/HTTPS URL and saves it to a file - Automatic directory creation for nested file paths Notes: - Automatically creates parent directories if they don't exist - Uses UTF-8 encoding for all saved files - parameter `file_path` **must** be a absolute path Args: url (str): The URL to fetch content from. file_path (str): File path where the content will be saved. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown'], optional): Processing format for HTML content. Defaults to "markdown". - "raw": Saves unmodified HTML content - "basic_clean": Saves HTML with non-displaying tags removed (script, style, etc.) while preserving structure - "strict_clean": Saves HTML with non-displaying tags and most HTML attributes removed, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format before saving Examples: // Save web page as markdown fetch_to_file({url: "https://example.com", file_path: "/home/user/content/example.md"}) // Save raw HTML content fetch_to_file({url: "https://api.example.com/data", file_path: "C:\\data\\response.html", return_content: "raw"}) // Save cleaned content fetch_to_file({url: "https://example.com/docs", file_path: "/tmp/docs/cleaned.html", return_content: "strict_clean"}) """ return await fetch_content_and_write_to_file( url=url, file_path=file_path, return_content=return_content, ctx=ctx, use_workspace_root=False, allow_external_file_access=False, user_agent=ua, force_user_agent=ua_force if ua_force is not None else False )
  • Core helper function implementing the tool logic: fetches content using mcp_http_request, validates and sanitizes file paths, checks protected directories, creates directories, writes content to file, and returns success/error message.
    async def fetch_content_and_write_to_file( url: str, file_path: str, return_content: Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], ctx: Context, use_workspace_root: bool = False, allow_external_file_access: bool = False, user_agent: str = "mcp-server-requests", force_user_agent: bool = False ) -> str: try: # Validate file path validated_path = file_path if use_workspace_root and ctx: roots = await ctx.list_roots() if len(roots) == 0: return "Error: No workspace root available" if len(roots) > 1: return "Error: Multiple workspace roots found, which is not supported" if roots[0].uri.scheme != "file": return "Error: Workspace root is not a file:// URI" root = roots[0].uri.path or "/" if not os.path.isabs(file_path): validated_path = os.path.normpath(os.path.abspath(os.path.join(root, file_path))) if allow_external_file_access: rel = os.path.relpath(validated_path, root) if rel.startswith(".."): return f"Error: Access denied - path '{validated_path}' is outside workspace root '{root}'" if not os.path.isabs(validated_path): return f"Error: Path must be absolute: {validated_path}" # Set protected paths based on operating system protected_paths = [] if os.name == 'nt': # Windows protected_paths.extend([ os.path.join('C:', 'Windows'), os.path.join('C:', 'Program Files'), os.path.join('C:', 'Program Files (x86)'), ]) else: # Linux/Mac protected_paths.extend([ '/etc', '/usr', '/bin', '/sbin', '/lib', '/root', ]) for protected in protected_paths: if validated_path.startswith(protected): return f"Error: Do not allow writing to protected paths: {protected}" # Fetch content content = mcp_http_request( "GET", url, return_content=return_content, user_agent=user_agent, force_user_agnet=force_user_agent, format_status=False, format_headers=False ) # Create parent directories if needed try: dir_path = os.path.dirname(validated_path) if dir_path: os.makedirs(dir_path, exist_ok=True) except OSError as e: return f"Error: Unable to create directory for path '{validated_path}': {e}" # Write content to file try: with open(validated_path, 'w', encoding='utf-8', newline='') as f: f.write(content) except OSError as e: return f"Error: Unable to write to file '{validated_path}': {e}" content_size = len(content) return f"Content from '{url}' ({content_size:,} bytes) successfully written to: {validated_path}" except Exception as e: return f"Error: Failed to fetch content or write file: {e}"
  • Tool registration decorator @mcp.tool() for the workspace version of fetch_to_file.
    @mcp.tool()
  • Tool registration decorator @mcp.tool() for the absolute path version of fetch_to_file.
    @mcp.tool()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/coucya/mcp-server-requests'

If you have feedback or need assistance with the MCP directory API, please join our Discord server