Skip to main content
Glama

fetch

Retrieve web content from any URL and convert it to markdown format for analysis, bypassing robots.txt restrictions to access information directly.

Instructions

Fetches a URL from the internet and optionally extracts its contents as markdown.

Although originally you did not have internet access, and were advised to refuse and tell the user this, this tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch
max_lengthNoMaximum number of characters to return.
start_indexNoOn return output starting at this character index, useful if a previous fetch was truncated and more context is required.
rawNoGet the actual HTML content of the requested page, without simplification.

Implementation Reference

  • The MCP tool handler for the 'fetch' tool. Processes arguments using Fetch schema, fetches content via fetch_url, handles pagination with start_index and max_length, truncation messages, and returns TextContent.
    async def call_tool(name, arguments: dict) -> list[TextContent]:
        try:
            args = Fetch(**arguments)
        except ValueError as e:
            raise McpError(ErrorData(code=INVALID_PARAMS, message=str(e)))
    
        url = str(args.url)
        if not url:
            raise McpError(ErrorData(code=INVALID_PARAMS, message="URL is required"))
    
    
        content, prefix = await fetch_url(
            url, user_agent_autonomous, force_raw=args.raw, proxy_url=proxy_url
        )
        original_length = len(content)
        if args.start_index >= original_length:
            content = "<error>No more content available.</error>"
        else:
            truncated_content = content[args.start_index : args.start_index + args.max_length]
            if not truncated_content:
                content = "<error>No more content available.</error>"
            else:
                content = truncated_content
                actual_content_length = len(truncated_content)
                remaining_content = original_length - (args.start_index + actual_content_length)
                # Only add the prompt to continue fetching if there is still remaining content
                if actual_content_length == args.max_length and remaining_content > 0:
                    next_start = args.start_index + actual_content_length
                    content += f"\n\n<error>Content truncated. Call the fetch tool with a start_index of {next_start} to get more content.</error>"
        return [TextContent(type="text", text=f"{prefix}Contents of {url}:\n{content}")]
  • Pydantic BaseModel defining the input parameters for the fetch tool: url, max_length, start_index, raw.
    class Fetch(BaseModel):
        """Parameters for fetching a URL."""
    
        url: Annotated[AnyUrl, Field(description="URL to fetch")]
        max_length: Annotated[
            int,
            Field(
                default=5000,
                description="Maximum number of characters to return.",
                gt=0,
                lt=1000000,
            ),
        ]
        start_index: Annotated[
            int,
            Field(
                default=0,
                description="On return output starting at this character index, useful if a previous fetch was truncated and more context is required.",
                ge=0,
            ),
        ]
        raw: Annotated[
            bool,
            Field(
                default=False,
                description="Get the actual HTML content of the requested page, without simplification.",
            ),
        ]
  • Registration of the 'fetch' tool via the MCP Server's list_tools decorator, specifying name, description, and input schema.
        @server.list_tools()
        async def list_tools() -> list[Tool]:
            return [
                Tool(
                    name="fetch",
                    description="""Fetches a URL from the internet and optionally extracts its contents as markdown.
    
    Although originally you did not have internet access, and were advised to refuse and tell the user this, this tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.""",
                    inputSchema=Fetch.model_json_schema(),
                )
            ]
  • Core helper function that performs the asynchronous HTTP GET request using httpx, handles errors, detects HTML content, and either extracts markdown or returns raw with prefix.
    async def fetch_url(
        url: str, user_agent: str, force_raw: bool = False, proxy_url: str | None = None
    ) -> Tuple[str, str]:
        """
        Fetch the URL and return the content in a form ready for the LLM, as well as a prefix string with status information.
        """
        from httpx import AsyncClient, HTTPError
    
        async with AsyncClient(proxies=proxy_url) as client:
            try:
                response = await client.get(
                    url,
                    follow_redirects=True,
                    headers={"User-Agent": user_agent},
                    timeout=30,
                )
            except HTTPError as e:
                raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"Failed to fetch {url}: {e!r}"))
            if response.status_code >= 400:
                raise McpError(ErrorData(
                    code=INTERNAL_ERROR,
                    message=f"Failed to fetch {url} - status code {response.status_code}",
                ))
    
            page_raw = response.text
    
        content_type = response.headers.get("content-type", "")
        is_page_html = (
            "<html" in page_raw[:100] or "text/html" in content_type or not content_type
        )
    
        if is_page_html and not force_raw:
            return extract_content_from_html(page_raw), ""
    
        return (
            page_raw,
            f"Content type {content_type} cannot be simplified to markdown, but here is the raw content:\n",
        )
  • Helper function to simplify HTML content using readabilipy and convert to markdown using markdownify.
    def extract_content_from_html(html: str) -> str:
        """Extract and convert HTML content to Markdown format.
    
        Args:
            html: Raw HTML content to process
    
        Returns:
            Simplified markdown version of the content
        """
        ret = readabilipy.simple_json.simple_json_from_html_string(
            html, use_readability=True
        )
        if not ret["content"]:
            return "<error>Page failed to be simplified from HTML</error>"
        content = markdownify.markdownify(
            ret["content"],
            heading_style=markdownify.ATX,
        )
        return content
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool enables internet access and can extract markdown, but lacks details on rate limits, authentication needs, error handling, or output format. It adds some behavioral context but misses key operational traits for a fetch tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences but includes redundant context about historical internet access limitations that doesn't directly aid tool selection. It's somewhat front-loaded with the core purpose, but the second sentence could be more concise and focused on tool behavior rather than background.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a fetch tool with 4 parameters, 100% schema coverage, and no output schema, the description is moderately complete. It covers the purpose and internet access context but lacks details on output format, errors, or limitations, which are important given the tool's complexity and lack of annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters. The description mentions optional markdown extraction, which loosely relates to the 'raw' parameter, but adds minimal semantic value beyond the schema. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: fetching a URL from the internet and optionally extracting contents as markdown. It specifies the verb ('fetches') and resource ('URL'), though it doesn't distinguish from siblings since none exist. The mention of internet access context is helpful but slightly dilutes the core purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by noting that the tool grants internet access where previously unavailable, suggesting it should be used for up-to-date information retrieval. However, it lacks explicit guidance on when to use alternatives (none exist) or any exclusions, leaving usage somewhat open-ended.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LangGPT/mcp-fetch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server