Skip to main content
Glama
nickclyde

DuckDuckGo MCP Server

fetch_content

Extract and parse webpage content from any URL to retrieve structured information for analysis or integration.

Instructions

Fetch and parse content from a webpage URL.

Args: url: The webpage URL to fetch content from ctx: MCP context for logging

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The handler function for the 'fetch_content' tool, registered via @mcp.tool() decorator. It defines the input schema via type annotations and docstring, and delegates execution to WebContentFetcher.fetch_and_parse.
    @mcp.tool()
    async def fetch_content(url: str, ctx: Context) -> str:
        """
        Fetch and parse content from a webpage URL.
    
        Args:
            url: The webpage URL to fetch content from
            ctx: MCP context for logging
        """
        return await fetcher.fetch_and_parse(url, ctx)
  • The main helper method containing the actual logic for fetching web content using httpx, parsing with BeautifulSoup, cleaning the text, and handling errors. This is invoked by the fetch_content tool handler.
    async def fetch_and_parse(self, url: str, ctx: Context) -> str:
        """Fetch and parse content from a webpage"""
        try:
            await self.rate_limiter.acquire()
    
            await ctx.info(f"Fetching content from: {url}")
    
            async with httpx.AsyncClient() as client:
                response = await client.get(
                    url,
                    headers={
                        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
                    },
                    follow_redirects=True,
                    timeout=30.0,
                )
                response.raise_for_status()
    
            # Parse the HTML
            soup = BeautifulSoup(response.text, "html.parser")
    
            # Remove script and style elements
            for element in soup(["script", "style", "nav", "header", "footer"]):
                element.decompose()
    
            # Get the text content
            text = soup.get_text()
    
            # Clean up the text
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            text = " ".join(chunk for chunk in chunks if chunk)
    
            # Remove extra whitespace
            text = re.sub(r"\s+", " ", text).strip()
    
            # Truncate if too long
            if len(text) > 8000:
                text = text[:8000] + "... [content truncated]"
    
            await ctx.info(
                f"Successfully fetched and parsed content ({len(text)} characters)"
            )
            return text
    
        except httpx.TimeoutException:
            await ctx.error(f"Request timed out for URL: {url}")
            return "Error: The request timed out while trying to fetch the webpage."
        except httpx.HTTPError as e:
            await ctx.error(f"HTTP error occurred while fetching {url}: {str(e)}")
            return f"Error: Could not access the webpage ({str(e)})"
        except Exception as e:
            await ctx.error(f"Error fetching content from {url}: {str(e)}")
            return f"Error: An unexpected error occurred while fetching the webpage ({str(e)})"
  • The WebContentFetcher class that provides the fetch_and_parse method and initializes a RateLimiter instance for rate limiting requests.
    class WebContentFetcher:
        def __init__(self):
            self.rate_limiter = RateLimiter(requests_per_minute=20)

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nickclyde/duckduckgo-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server