Skip to main content
Glama

fetch

Retrieve web page content from any URL and process it into raw HTML, cleaned HTML, or readable Markdown format for analysis and integration.

Instructions

Fetch web page content

Function/Features:

  • Retrieves web page content from any HTTP/HTTPS URL

Args: url (str): The URL to fetch content from. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown', optional): Processing format for HTML content. Defaults to "markdown". - "raw": Returns unmodified HTML content with full response headers - "basic_clean": Removes non-displaying tags (script, style, meta, etc.) while preserving structure - "strict_clean": Removes non-displaying tags and most HTML attributes, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format

Examples: // Returns content as markdown fetch({url: "https://example.com"})

// Returns raw HTML content fetch({url: "https://api.example.com/data", return_content: "raw"})

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes(require) The URL to fetch content from
return_contentNo(optional, Defaults to "markdown") processing format for HTML contentmarkdown

Implementation Reference

  • The 'fetch' tool handler function, which is also registered via @mcp.tool(). It performs an HTTP GET request to the specified URL and returns the content in the requested format (raw, cleaned HTML, or markdown). Includes input schema via Annotated types and comprehensive docstring.
    @mcp.tool() def fetch( url: Annotated[str, "(require) The URL to fetch content from"], return_content: Annotated[Literal['raw', 'basic_clean', 'strict_clean', 'markdown'], "(optional, Defaults to \"markdown\") processing format for HTML content"] = "markdown", ) -> str: """Fetch web page content Function/Features: - Retrieves web page content from any HTTP/HTTPS URL Args: url (str): The URL to fetch content from. return_content ('raw' | 'basic_clean' | 'strict_clean' | 'markdown', optional): Processing format for HTML content. Defaults to "markdown". - "raw": Returns unmodified HTML content with full response headers - "basic_clean": Removes non-displaying tags (script, style, meta, etc.) while preserving structure - "strict_clean": Removes non-displaying tags and most HTML attributes, keeping only essential structure - "markdown": Converts HTML content to clean, readable Markdown format Examples: // Returns content as markdown fetch({url: "https://example.com"}) // Returns raw HTML content fetch({url: "https://api.example.com/data", return_content: "raw"}) """ return mcp_http_request("GET", url, return_content=return_content, user_agent=ua, force_user_agnet=ua_force, format_headers=False)
  • Helper function that executes the actual HTTP request and handles response formatting or error handling. Called by the 'fetch' tool.
    def mcp_http_request( method: str, url: str, *, query: Optional[dict] = None, data: Optional[str | bytes | bytearray] = None, json: Optional[dict] = None, headers: Optional[dict] = None, user_agent: Optional[str] = None, force_user_agnet: Optional[bool] = None, format_status: bool = True, format_headers: bool = True, return_content: Literal['raw', 'basic_clean', 'strict_clean', 'markdown'] = "raw", ) -> str: hs = {} if headers: hs.update(headers) if force_user_agnet: if user_agent: hs["User-Agent"] = user_agent else: if "User-Agent" not in hs and user_agent: hs["User-Agent"] = user_agent try: response = http_request( method, url, query=query, headers=hs, data=data, json_=json ) return format_response_result( response, format_status=format_status, format_headers=format_headers, return_content=return_content ) except Exception as e: return format_error_result(e)
  • Helper function that formats the HTTP response content according to the specified return_content type, handling HTML cleaning and markdown conversion. Used by mcp_http_request which is called by 'fetch'.
    def format_response_result( response: Response, *, format_status: bool | None = None, format_headers: bool | None = None, return_content: Literal["raw", "basic_clean", "strict_clean", "markdown"] = "raw", ) -> str: http_version = response.version status = response.status_code reason = response.reason headers = response.headers content = response.content content_type = response.content_type if not isinstance(content_type, str): content_type = 'application/octet-stream' if content_type.startswith("text/") or content_type.startswith("application/json"): try: if isinstance(content, (bytes, bytearray)): content = content.decode('utf-8') else: content = str(content) except UnicodeDecodeError as e: err_message = f"response content type is \"{content_type}\", but not utf-8 encoded'" raise ResponseError(response, err_message) from e except Exception as e: err_message = f"response content type is \"{content_type}\", but cannot be converted to a string" raise ResponseError(response, err_message) from e else: err_message = f'response content type is "{content_type}", cannot be converted to a string' raise ResponseError(response, err_message) if content_type.startswith("text/html"): if return_content == "raw": pass elif return_content == "basic_clean": content = clean_html(content, allowed_attrs=True) elif return_content == "strict_clean": content = clean_html(content, allowed_attrs=("id", "src", "href")) elif return_content == "markdown": content = html_to_markdown(content) strs = [] if format_status: strs.append(f"{http_version} {status} {reason}\r\n") if format_headers: response_header_str = "\r\n".join(f"{k}: {v}" for k, v in headers) strs.append(response_header_str) if len(strs) > 0: strs.append("\r\n\r\n") strs.append(content) return "\r\n".join(strs)

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/coucya/mcp-server-requests'

If you have feedback or need assistance with the MCP directory API, please join our Discord server