s_fetch_page
Fetch web page content with pagination support and bot-detection avoidance. Retrieve website data in HTML or markdown format with configurable modes for different complexity levels.
Instructions
Fetches a complete web page with pagination support. Retrieves content from websites with bot-detection avoidance. For best performance, start with 'basic' mode (fastest), then only escalate to 'stealth' or 'max-stealth' modes if basic mode fails. Content is returned as 'METADATA: {json}\n\n[content]' where metadata includes length information and truncation status.
Args:
url: URL to fetch
mode: Fetching mode (basic, stealth, or max-stealth)
format: Output format (html or markdown)
max_length: Maximum number of characters to return.
start_index: On return output starting at this character index, useful if a previous fetch was truncated and more content is required.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| mode | No | basic | |
| format | No | markdown | |
| max_length | No | ||
| start_index | No |
Implementation Reference
- src/scrapling_fetch_mcp/mcp.py:14-38 (handler)The main handler and registration point for the 's_fetch_page' tool using @mcp.tool() decorator. Defines input parameters (schema) and delegates execution to the core implementation.@mcp.tool() async def s_fetch_page( url: str, mode: str = "basic", format: str = "markdown", max_length: int = 5000, start_index: int = 0, ) -> str: """Fetches a complete web page with pagination support. Retrieves content from websites with bot-detection avoidance. For best performance, start with 'basic' mode (fastest), then only escalate to 'stealth' or 'max-stealth' modes if basic mode fails. Content is returned as 'METADATA: {json}\\n\\n[content]' where metadata includes length information and truncation status. Args: url: URL to fetch mode: Fetching mode (basic, stealth, or max-stealth) format: Output format (html or markdown) max_length: Maximum number of characters to return. start_index: On return output starting at this character index, useful if a previous fetch was truncated and more content is required. """ try: result = await fetch_page_impl(url, mode, format, max_length, start_index) return result except Exception as e: logger = getLogger("scrapling_fetch_mcp") logger.error("DETAILED ERROR IN s_fetch_page: %s", str(e)) logger.error("TRACEBACK: %s", format_exc()) raise
- Core implementation of the fetching logic delegated by the handler. Fetches the page using browse_url, optionally converts to markdown, applies truncation and pagination, and formats output with metadata.async def fetch_page_impl( url: str, mode: str, format: str, max_length: int, start_index: int ) -> str: page = await browse_url(url, mode) is_markdown = format == "markdown" full_content = ( _html_to_markdown(page.html_content) if is_markdown else page.html_content ) total_length = len(full_content) truncated_content = full_content[start_index : start_index + max_length] is_truncated = total_length > (start_index + max_length) metadata_json = _create_metadata( total_length, len(truncated_content), is_truncated, start_index ) return f"METADATA: {metadata_json}\n\n{truncated_content}"