Skip to main content
Glama
alexyangjie

Multi Fetch MCP Server

by alexyangjie

fetch

Retrieve web content from a URL and convert it to markdown format for analysis or integration, with options to control output length and format.

Instructions

Fetches a single URL from the internet and optionally extracts its contents as markdown. This tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch
max_lengthNoMaximum number of characters to return.
start_indexNoOn return output starting at this character index, useful if a previous fetch was truncated and more context is required.
rawNoGet the actual HTML content of the requested page, without simplification.

Implementation Reference

  • The tool handler for the 'fetch' tool within the `call_tool` function. It validates parameters, checks robots.txt, performs the fetch, and handles content truncation.
    if name == "fetch":
        try:
            args = Fetch(**arguments)
        except ValueError as e:
            raise McpError(ErrorData(code=INVALID_PARAMS, message=str(e)))
    
        url = str(args.url)
        if not url:
            raise McpError(ErrorData(code=INVALID_PARAMS, message="URL is required"))
    
        if not ignore_robots_txt:
            await check_may_autonomously_fetch_url(url, user_agent_autonomous, proxy_url)
    
        content, prefix = await fetch_url(
            url, user_agent_autonomous, force_raw=args.raw, proxy_url=proxy_url
        )
        original_length = len(content)
        if args.start_index >= original_length:
            content = "<error>No more content available.</error>"
        else:
            truncated_content = content[args.start_index : args.start_index + args.max_length]
            if not truncated_content:
                content = "<error>No more content available.</error>"
            else:
                content = truncated_content
                actual_content_length = len(truncated_content)
                remaining_content = original_length - (args.start_index + actual_content_length)
                if actual_content_length == args.max_length and remaining_content > 0:
                    next_start = args.start_index + actual_content_length
                    content += f"\n\n<error>Content truncated. Call the fetch tool with a start_index of {next_start} to get more content.</error>"
        return [TextContent(type="text", text=f"{prefix}Contents of {url}:\n{content}")]
  • The Pydantic schema class defining input parameters for the 'fetch' tool.
    class Fetch(BaseModel):
        """Parameters for fetching a URL."""
    
        url: Annotated[AnyUrl, Field(description="URL to fetch")]
        max_length: Annotated[
            int,
            Field(
                default=50000,
                description="Maximum number of characters to return.",
                gt=0,
                lt=1000000,
            ),
        ]
        start_index: Annotated[
            int,
            Field(
                default=0,
                description="On return output starting at this character index, useful if a previous fetch was truncated and more context is required.",
                ge=0,
            ),
        ]
        raw: Annotated[
            bool,
            Field(
                default=False,
                description="Get the actual HTML content of the requested page, without simplification.",
            ),
        ]
  • The registration of the 'fetch' tool in the `list_tools` function.
                Tool(
                    name="fetch",
                    description="""Fetches a single URL from the internet and optionally extracts its contents as markdown.
    This tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.""",
                    inputSchema=Fetch.model_json_schema(),
                ),
  • The helper function `fetch_url` which interacts with the Firecrawl API to perform the actual scraping.
    async def fetch_url(
        url: str, user_agent: str, force_raw: bool = False, proxy_url: str | None = None
    ) -> Tuple[str, str]:
        """
        Fetch the URL and return the content in a form ready for the LLM, as well as a prefix string with status information.
        """
        # Use Firecrawl (SDK or HTTP) to scrape the URL for markdown or raw HTML
        if firecrawl_client is None:
            raise McpError(ErrorData(code=INTERNAL_ERROR, message="Firecrawl client is not initialised"))
        try:
            formats = ["rawHtml"] if force_raw else ["markdown"]
            # Firecrawl v2: scrape(url, options?) where options has 'formats'
            data = await firecrawl_client.scrape(url, options={"formats": formats})
        except Exception as e:
            raise McpError(ErrorData(
                code=INTERNAL_ERROR,
                message=f"Failed to fetch {url} via Firecrawl SDK: {e!r}"
            ))
    
        if force_raw:
            # Prefer rawHtml when requested; fall back to html if backend provides only that
            if isinstance(data, dict):
                content = data.get("rawHtml") or data.get("html") or ""
            else:
                content = getattr(data, 'rawHtml', None) or getattr(data, 'html', None) or ""
        else:
            content = getattr(data, 'markdown', None) or (data.get("markdown") if isinstance(data, dict) else "") or ""
    
        if not content:
            raise McpError(ErrorData(
                code=INTERNAL_ERROR,
                message=f"No {'HTML' if force_raw else 'Markdown'} content returned for {url}"
            ))
        return content, ""
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alexyangjie/mcp-server-multi-fetch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server