Skip to main content
Glama
cyberchitta

Scrapling Fetch MCP

by cyberchitta

s_fetch_page

Fetch web page content with pagination support and bot-detection avoidance. Retrieve website data in HTML or markdown format with configurable modes for different complexity levels.

Instructions

Fetches a complete web page with pagination support. Retrieves content from websites with bot-detection avoidance. For best performance, start with 'basic' mode (fastest), then only escalate to 'stealth' or 'max-stealth' modes if basic mode fails. Content is returned as 'METADATA: {json}\n\n[content]' where metadata includes length information and truncation status.

Args:
    url: URL to fetch
    mode: Fetching mode (basic, stealth, or max-stealth)
    format: Output format (html or markdown)
    max_length: Maximum number of characters to return.
    start_index: On return output starting at this character index, useful if a previous fetch was truncated and more content is required.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
modeNobasic
formatNomarkdown
max_lengthNo
start_indexNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The main handler and registration point for the 's_fetch_page' tool using @mcp.tool() decorator. Defines input parameters (schema) and delegates execution to the core implementation.
    @mcp.tool()
    async def s_fetch_page(
        url: str,
        mode: str = "basic",
        format: str = "markdown",
        max_length: int = 5000,
        start_index: int = 0,
    ) -> str:
        """Fetches a complete web page with pagination support. Retrieves content from websites with bot-detection avoidance. For best performance, start with 'basic' mode (fastest), then only escalate to 'stealth' or 'max-stealth' modes if basic mode fails. Content is returned as 'METADATA: {json}\\n\\n[content]' where metadata includes length information and truncation status.
    
        Args:
            url: URL to fetch
            mode: Fetching mode (basic, stealth, or max-stealth)
            format: Output format (html or markdown)
            max_length: Maximum number of characters to return.
            start_index: On return output starting at this character index, useful if a previous fetch was truncated and more content is required.
        """
        try:
            result = await fetch_page_impl(url, mode, format, max_length, start_index)
            return result
        except Exception as e:
            logger = getLogger("scrapling_fetch_mcp")
            logger.error("DETAILED ERROR IN s_fetch_page: %s", str(e))
            logger.error("TRACEBACK: %s", format_exc())
            raise
  • Core implementation of the fetching logic delegated by the handler. Fetches the page using browse_url, optionally converts to markdown, applies truncation and pagination, and formats output with metadata.
    async def fetch_page_impl(
        url: str, mode: str, format: str, max_length: int, start_index: int
    ) -> str:
        page = await browse_url(url, mode)
        is_markdown = format == "markdown"
        full_content = (
            _html_to_markdown(page.html_content) if is_markdown else page.html_content
        )
    
        total_length = len(full_content)
        truncated_content = full_content[start_index : start_index + max_length]
        is_truncated = total_length > (start_index + max_length)
    
        metadata_json = _create_metadata(
            total_length, len(truncated_content), is_truncated, start_index
        )
        return f"METADATA: {metadata_json}\n\n{truncated_content}"
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: bot-detection avoidance, performance characteristics of modes, pagination support, output format structure ('METADATA: {json}\n\n[content]'), and truncation handling. It doesn't mention rate limits, authentication needs, or error conditions, but covers most essential operational aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with the core purpose, adds usage guidelines, describes output format, then details parameters in a clear 'Args:' section. Every sentence adds value, though the parameter explanations could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters with 0% schema coverage and no annotations, the description does an excellent job covering the tool's functionality. It explains purpose, usage, behavior, parameters, and output structure. The presence of an output schema means return values don't need explanation. The main gap is lack of error handling or edge case information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides meaningful context for all 5 parameters: explains what 'url' is for, defines 'mode' options and their purpose, specifies 'format' choices, clarifies 'max_length' as character limit, and describes 'start_index' for handling truncation. This adds substantial value beyond the bare schema, though some details like default values are only in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Fetches a complete web page with pagination support. Retrieves content from websites with bot-detection avoidance.' This specifies the verb ('fetches'), resource ('web page'), and key capabilities ('pagination support', 'bot-detection avoidance'). It doesn't explicitly differentiate from sibling tool 's_fetch_pattern', but the purpose is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use different modes: 'For best performance, start with 'basic' mode (fastest), then only escalate to 'stealth' or 'max-stealth' modes if basic mode fails.' This gives clear operational advice and distinguishes between modes based on performance and fallback scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cyberchitta/scrapling-fetch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server