Skip to main content
Glama
988664li-star

DuckDuckGo MCP Server

fetch_content

Extract and parse webpage content from a URL for analysis or integration with language models.

Instructions

Fetch and parse content from a webpage URL.

Args:
    url: The webpage URL to fetch content from
    ctx: MCP context for logging

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The 'fetch_content' tool handler function. Registered via @mcp.tool() decorator. Executes the tool logic by delegating to WebContentFetcher.fetch_and_parse().
    @mcp.tool()
    async def fetch_content(url: str, ctx: Context) -> str:
        """
        Fetch and parse content from a webpage URL.
    
        Args:
            url: The webpage URL to fetch content from
            ctx: MCP context for logging
        """
        return await fetcher.fetch_and_parse(url, ctx)
  • Core implementation of webpage fetching and parsing logic used by the 'fetch_content' tool. Includes rate limiting, HTML parsing with BeautifulSoup, text extraction and cleaning.
    async def fetch_and_parse(self, url: str, ctx: Context) -> str:
        """Fetch and parse content from a webpage"""
        try:
            await self.rate_limiter.acquire()
    
            await ctx.info(f"Fetching content from: {url}")
    
            async with httpx.AsyncClient() as client:
                response = await client.get(
                    url,
                    headers={
                        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
                    },
                    follow_redirects=True,
                    timeout=30.0,
                )
                response.raise_for_status()
    
            # Parse the HTML
            soup = BeautifulSoup(response.text, "html.parser")
    
            # Remove script and style elements
            for element in soup(["script", "style", "nav", "header", "footer"]):
                element.decompose()
    
            # Get the text content
            text = soup.get_text()
    
            # Clean up the text
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            text = " ".join(chunk for chunk in chunks if chunk)
    
            # Remove extra whitespace
            text = re.sub(r"\s+", " ", text).strip()
    
            # Truncate if too long
            if len(text) > 8000:
                text = text[:8000] + "... [content truncated]"
    
            await ctx.info(
                f"Successfully fetched and parsed content ({len(text)} characters)"
            )
            return text
    
        except httpx.TimeoutException:
            await ctx.error(f"Request timed out for URL: {url}")
            return "Error: The request timed out while trying to fetch the webpage."
        except httpx.HTTPError as e:
            await ctx.error(f"HTTP error occurred while fetching {url}: {str(e)}")
            return f"Error: Could not access the webpage ({str(e)})"
        except Exception as e:
            await ctx.error(f"Error fetching content from {url}: {str(e)}")
            return f"Error: An unexpected error occurred while fetching the webpage ({str(e)})"
  • Class providing the fetch_and_parse method and rate limiting for web content fetching, instantiated globally as 'fetcher'.
    class WebContentFetcher:
        def __init__(self):
            self.rate_limiter = RateLimiter(requests_per_minute=20)
  • Global instantiation of WebContentFetcher used by the fetch_content tool.
    fetcher = WebContentFetcher()
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It states the tool fetches and parses content, implying network I/O and data processing, but doesn't mention error handling, rate limits, authentication needs, timeouts, or what 'parse' entails (e.g., HTML extraction, text cleaning). This leaves significant gaps for safe and effective use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately brief and front-loaded with the core purpose in the first sentence. The Args section is structured but includes an extraneous 'ctx' parameter not in the schema, slightly reducing efficiency. Overall, it avoids unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (network operations, parsing), lack of annotations, no output schema, and incomplete parameter documentation (schema coverage 0% with a mismatched 'ctx' mention), the description is insufficient. It doesn't cover return values, error cases, or behavioral constraints needed for reliable agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds basic meaning for the 'url' parameter ('The webpage URL to fetch content from'), which is helpful since schema description coverage is 0%. However, it doesn't clarify format requirements (e.g., must be HTTP/HTTPS, encoding), and the 'ctx' parameter is mentioned in the description but absent from the input schema, creating confusion without additional context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('fetch and parse') and resource ('content from a webpage URL'), making it immediately understandable. However, it doesn't differentiate from its sibling tool 'search', which might have overlapping functionality for web content retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus the sibling 'search' tool. The description mentions what it does but offers no context about appropriate use cases, prerequisites, or alternatives, leaving the agent to guess about tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/988664li-star/duckduckgo-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server