fetch
Retrieve web page content from any URL and process it into raw HTML, cleaned HTML, or readable Markdown format for analysis and integration.
Instructions
Fetch web page content
Function/Features:
Retrieves web page content from any HTTP/HTTPS URL
Args: url (str): The URL to fetch content from. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown', optional): Processing format for HTML content. Defaults to "markdown". - "raw": Returns unmodified HTML content with full response headers - "basic_clean": Removes non-displaying tags (script, style, meta, etc.) while preserving structure - "strict_clean": Removes non-displaying tags and most HTML attributes, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format
Examples: // Returns content as markdown fetch({url: "https://example.com"})
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | (require) The URL to fetch content from | |
| return_content | No | (optional, Defaults to "markdown") processing format for HTML content | markdown |
Implementation Reference
- mcp_server_requests/server.py:266-293 (handler)The 'fetch' tool handler function, which is also registered via @mcp.tool(). It performs an HTTP GET request to the specified URL and returns the content in the requested format (raw, cleaned HTML, or markdown). Includes input schema via Annotated types and comprehensive docstring.@mcp.tool() def fetch( url: Annotated[str, "(require) The URL to fetch content from"], return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown", ) -> str: """Fetch web page content Function/Features: - Retrieves web page content from any HTTP/HTTPS URL Args: url (str): The URL to fetch content from. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown', optional): Processing format for HTML content. Defaults to "markdown". - "raw": Returns unmodified HTML content with full response headers - "basic_clean": Removes non-displaying tags (script, style, meta, etc.) while preserving structure - "strict_clean": Removes non-displaying tags and most HTML attributes, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format Examples: // Returns content as markdown fetch({url: "https://example.com"}) // Returns raw HTML content fetch({url: "https://api.example.com/data", return_content: "raw"}) """ return mcp_http_request("GET", url, return_content=return_content, user_agent=ua, force_user_agnet=ua_force, format_headers=False)
- Helper function that executes the actual HTTP request and handles response formatting or error handling. Called by the 'fetch' tool.def mcp_http_request( method: str, url: str, *, query: Optional[dict] = None, data: Optional[str | bytes | bytearray] = None, json: Optional[dict] = None, headers: Optional[dict] = None, user_agent: Optional[str] = None, force_user_agnet: Optional[bool] = None, format_status: bool = True, format_headers: bool = True, return_content: Literal['raw', 'basic_clean', 'strict_clean', 'markdown'] = "raw", ) -> str: hs = {} if headers: hs.update(headers) if force_user_agnet: if user_agent: hs["User-Agent"] = user_agent else: if "User-Agent" not in hs and user_agent: hs["User-Agent"] = user_agent try: response = http_request( method, url, query=query, headers=hs, data=data, json_=json ) return format_response_result( response, format_status=format_status, format_headers=format_headers, return_content=return_content ) except Exception as e: return format_error_result(e)
- mcp_server_requests/server.py:44-99 (helper)Helper function that formats the HTTP response content according to the specified return_content type, handling HTML cleaning and markdown conversion. Used by mcp_http_request which is called by 'fetch'.def format_response_result( response: Response, *, format_status: bool | None = None, format_headers: bool | None = None, return_content: Literal["raw", "basic_clean", "strict_clean", "markdown"] = "raw", ) -> str: http_version = response.version status = response.status_code reason = response.reason headers = response.headers content = response.content content_type = response.content_type if not isinstance(content_type, str): content_type = 'application/octet-stream' if content_type.startswith("text/") or content_type.startswith("application/json"): try: if isinstance(content, (bytes, bytearray)): content = content.decode('utf-8') else: content = str(content) except UnicodeDecodeError as e: err_message = f"response content type is \"{content_type}\", but not utf-8 encoded'" raise ResponseError(response, err_message) from e except Exception as e: err_message = f"response content type is \"{content_type}\", but cannot be converted to a string" raise ResponseError(response, err_message) from e else: err_message = f'response content type is "{content_type}", cannot be converted to a string' raise ResponseError(response, err_message) if content_type.startswith("text/html"): if return_content == "raw": pass elif return_content == "basic_clean": content = clean_html(content, allowed_attrs=True) elif return_content == "strict_clean": content = clean_html(content, allowed_attrs=("id", "src", "href")) elif return_content == "markdown": content = html_to_markdown(content) strs = [] if format_status: strs.append(f"{http_version} {status} {reason}\r\n") if format_headers: response_header_str = "\r\n".join(f"{k}: {v}" for k, v in headers) strs.append(response_header_str) if len(strs) > 0: strs.append("\r\n\r\n") strs.append(content) return "\r\n".join(strs)