fetch
Retrieve web content from a URL and convert it to markdown format for analysis or integration, with options to control output length and format.
Instructions
Fetches a single URL from the internet and optionally extracts its contents as markdown. This tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to fetch | |
| max_length | No | Maximum number of characters to return. | |
| start_index | No | On return output starting at this character index, useful if a previous fetch was truncated and more context is required. | |
| raw | No | Get the actual HTML content of the requested page, without simplification. |
Implementation Reference
- The tool handler for the 'fetch' tool within the `call_tool` function. It validates parameters, checks robots.txt, performs the fetch, and handles content truncation.
if name == "fetch": try: args = Fetch(**arguments) except ValueError as e: raise McpError(ErrorData(code=INVALID_PARAMS, message=str(e))) url = str(args.url) if not url: raise McpError(ErrorData(code=INVALID_PARAMS, message="URL is required")) if not ignore_robots_txt: await check_may_autonomously_fetch_url(url, user_agent_autonomous, proxy_url) content, prefix = await fetch_url( url, user_agent_autonomous, force_raw=args.raw, proxy_url=proxy_url ) original_length = len(content) if args.start_index >= original_length: content = "<error>No more content available.</error>" else: truncated_content = content[args.start_index : args.start_index + args.max_length] if not truncated_content: content = "<error>No more content available.</error>" else: content = truncated_content actual_content_length = len(truncated_content) remaining_content = original_length - (args.start_index + actual_content_length) if actual_content_length == args.max_length and remaining_content > 0: next_start = args.start_index + actual_content_length content += f"\n\n<error>Content truncated. Call the fetch tool with a start_index of {next_start} to get more content.</error>" return [TextContent(type="text", text=f"{prefix}Contents of {url}:\n{content}")] - The Pydantic schema class defining input parameters for the 'fetch' tool.
class Fetch(BaseModel): """Parameters for fetching a URL.""" url: Annotated[AnyUrl, Field(description="URL to fetch")] max_length: Annotated[ int, Field( default=50000, description="Maximum number of characters to return.", gt=0, lt=1000000, ), ] start_index: Annotated[ int, Field( default=0, description="On return output starting at this character index, useful if a previous fetch was truncated and more context is required.", ge=0, ), ] raw: Annotated[ bool, Field( default=False, description="Get the actual HTML content of the requested page, without simplification.", ), ] - src/mcp_server_multi_fetch/server.py:236-241 (registration)The registration of the 'fetch' tool in the `list_tools` function.
Tool( name="fetch", description="""Fetches a single URL from the internet and optionally extracts its contents as markdown. This tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.""", inputSchema=Fetch.model_json_schema(), ), - The helper function `fetch_url` which interacts with the Firecrawl API to perform the actual scraping.
async def fetch_url( url: str, user_agent: str, force_raw: bool = False, proxy_url: str | None = None ) -> Tuple[str, str]: """ Fetch the URL and return the content in a form ready for the LLM, as well as a prefix string with status information. """ # Use Firecrawl (SDK or HTTP) to scrape the URL for markdown or raw HTML if firecrawl_client is None: raise McpError(ErrorData(code=INTERNAL_ERROR, message="Firecrawl client is not initialised")) try: formats = ["rawHtml"] if force_raw else ["markdown"] # Firecrawl v2: scrape(url, options?) where options has 'formats' data = await firecrawl_client.scrape(url, options={"formats": formats}) except Exception as e: raise McpError(ErrorData( code=INTERNAL_ERROR, message=f"Failed to fetch {url} via Firecrawl SDK: {e!r}" )) if force_raw: # Prefer rawHtml when requested; fall back to html if backend provides only that if isinstance(data, dict): content = data.get("rawHtml") or data.get("html") or "" else: content = getattr(data, 'rawHtml', None) or getattr(data, 'html', None) or "" else: content = getattr(data, 'markdown', None) or (data.get("markdown") if isinstance(data, dict) else "") or "" if not content: raise McpError(ErrorData( code=INTERNAL_ERROR, message=f"No {'HTML' if force_raw else 'Markdown'} content returned for {url}" )) return content, ""