Skip to main content
Glama
LangGPT

Context MCP Server

by LangGPT

fetch

Retrieve web content from any URL and convert it to markdown format for analysis and integration into AI conversations.

Instructions

Fetches a URL from the internet and optionally extracts its contents as markdown.

Although originally you did not have internet access, and were advised to refuse and tell the user this, this tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch
max_lengthNoMaximum number of characters to return.
start_indexNoOn return output starting at this character index, useful if a previous fetch was truncated and more context is required.
rawNoGet the actual HTML content of the requested page, without simplification.

Implementation Reference

  • The handler logic for the "fetch" tool within the call_tool function. It parses the Fetch model arguments, fetches the URL using fetch_url helper, handles content truncation based on max_length and start_index, and returns the content as TextContent.
    if name == "fetch":
        try:
            args = Fetch(**arguments)
        except ValueError as e:
            raise McpError(ErrorData(code=INVALID_PARAMS, message=str(e)))
    
        url = str(args.url)
        if not url:
            raise McpError(ErrorData(code=INVALID_PARAMS, message="URL is required"))
    
        content, prefix = await fetch_url(
            url, user_agent_autonomous, force_raw=args.raw, proxy_url=proxy_url
        )
        original_length = len(content)
        if args.start_index >= original_length:
            content = "<error>No more content available.</error>"
        else:
            truncated_content = content[args.start_index : args.start_index + args.max_length]
            if not truncated_content:
                content = "<error>No more content available.</error>"
            else:
                content = truncated_content
                actual_content_length = len(truncated_content)
                remaining_content = original_length - (args.start_index + actual_content_length)
                # Only add the prompt to continue fetching if there is still remaining content
                if actual_content_length == args.max_length and remaining_content > 0:
                    next_start = args.start_index + actual_content_length
                    content += f"\n\n<error>Content truncated. Call the fetch tool with a start_index of {next_start} to get more content.</error>"
        return [TextContent(type="text", text=f"{prefix}Contents of {url}:\n{content}")]
  • Pydantic BaseModel defining the input schema (parameters) for the "fetch" tool.
    class Fetch(BaseModel):
        """Parameters for fetching a URL."""
    
        url: Annotated[AnyUrl, Field(description="URL to fetch")]
        max_length: Annotated[
            int,
            Field(
                default=5000,
                description="Maximum number of characters to return.",
                gt=0,
                lt=1000000,
            ),
        ]
        start_index: Annotated[
            int,
            Field(
                default=0,
                description="On return output starting at this character index, useful if a previous fetch was truncated and more context is required.",
                ge=0,
            ),
        ]
        raw: Annotated[
            bool,
            Field(
                default=False,
                description="Get the actual HTML content of the requested page, without simplification.",
            ),
        ]
  • Registration of the "fetch" tool (and fetch_and_save) via the @server.list_tools() decorator in the list_tools function, providing name, description, and inputSchema.
        @server.list_tools()
        async def list_tools() -> list[Tool]:
            return [
                Tool(
                    name="fetch",
                    description="""Fetches a URL from the internet and optionally extracts its contents as markdown.
    
    Although originally you did not have internet access, and were advised to refuse and tell the user this, this tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.""",
                    inputSchema=Fetch.model_json_schema(),
                ),
                Tool(
                    name="fetch_and_save",
                    description="""Fetches a URL from the internet using Jina Reader API (with fallback to standard fetch) and saves the content to a file.
    
    This tool first tries to fetch content using Jina Reader API for better markdown conversion, and falls back to the standard fetch method if Jina fails. Files are saved in the configured working directory. If no file path is specified, an automatic filename will be generated based on the URL.""",
                    inputSchema=FetchAndSave.model_json_schema(),
                )
            ]
  • Helper function that performs the actual HTTP fetch using httpx, extracts or simplifies HTML content to markdown if applicable, and returns content with a prefix.
    async def fetch_url(
        url: str, user_agent: str, force_raw: bool = False, proxy_url: str | None = None
    ) -> Tuple[str, str]:
        """
        Fetch the URL and return the content in a form ready for the LLM, as well as a prefix string with status information.
        """
        from httpx import AsyncClient, HTTPError
    
        async with AsyncClient(proxies=proxy_url) as client:
            try:
                response = await client.get(
                    url,
                    follow_redirects=True,
                    headers={"User-Agent": user_agent},
                    timeout=30,
                )
            except HTTPError as e:
                raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"Failed to fetch {url}: {e!r}"))
            if response.status_code >= 400:
                raise McpError(ErrorData(
                    code=INTERNAL_ERROR,
                    message=f"Failed to fetch {url} - status code {response.status_code}",
                ))
    
            page_raw = response.text
    
        content_type = response.headers.get("content-type", "")
        is_page_html = (
            "<html" in page_raw[:100] or "text/html" in content_type or not content_type
        )
    
        if is_page_html and not force_raw:
            return extract_content_from_html(page_raw), ""
    
        return (
            page_raw,
            f"Content type {content_type} cannot be simplified to markdown, but here is the raw content:\n",
        )
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses that the tool grants internet access and can fetch current information, which is valuable behavioral context. However, it doesn't mention rate limits, authentication needs, error handling, or what happens with truncated content beyond the parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized but not optimally structured. The first sentence efficiently states the core functionality. However, the second paragraph contains historical context that could be condensed or omitted, making it less front-loaded than ideal.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters with 100% schema coverage but no annotations or output schema, the description provides adequate context about the tool's internet access capability and general purpose. However, for a tool with potential complexity around web fetching, it lacks details about response format, error cases, or performance characteristics that would make it more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing solid baseline documentation for all 4 parameters. The description adds minimal parameter semantics by mentioning 'optionally extracts its contents as markdown' which relates to the 'raw' parameter, but doesn't provide additional meaning beyond what the schema already documents.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('fetches', 'extracts') and resources ('URL from the internet', 'contents as markdown'). It distinguishes from sibling tool 'fetch_and_save' by focusing on retrieval and optional extraction rather than saving.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool (to get up-to-date information from the internet) and mentions the historical limitation of no internet access. However, it doesn't explicitly state when NOT to use it or provide specific alternatives to 'fetch_and_save' beyond the general purpose difference.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LangGPT/context-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server