Skip to main content
Glama
988664li-star

DuckDuckGo MCP Server

fetch_content

Extract and parse webpage content from a URL for analysis or integration with language models.

Instructions

Fetch and parse content from a webpage URL.

Args:
    url: The webpage URL to fetch content from
    ctx: MCP context for logging

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The 'fetch_content' tool handler function. Registered via @mcp.tool() decorator. Executes the tool logic by delegating to WebContentFetcher.fetch_and_parse().
    @mcp.tool()
    async def fetch_content(url: str, ctx: Context) -> str:
        """
        Fetch and parse content from a webpage URL.
    
        Args:
            url: The webpage URL to fetch content from
            ctx: MCP context for logging
        """
        return await fetcher.fetch_and_parse(url, ctx)
  • Core implementation of webpage fetching and parsing logic used by the 'fetch_content' tool. Includes rate limiting, HTML parsing with BeautifulSoup, text extraction and cleaning.
    async def fetch_and_parse(self, url: str, ctx: Context) -> str:
        """Fetch and parse content from a webpage"""
        try:
            await self.rate_limiter.acquire()
    
            await ctx.info(f"Fetching content from: {url}")
    
            async with httpx.AsyncClient() as client:
                response = await client.get(
                    url,
                    headers={
                        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
                    },
                    follow_redirects=True,
                    timeout=30.0,
                )
                response.raise_for_status()
    
            # Parse the HTML
            soup = BeautifulSoup(response.text, "html.parser")
    
            # Remove script and style elements
            for element in soup(["script", "style", "nav", "header", "footer"]):
                element.decompose()
    
            # Get the text content
            text = soup.get_text()
    
            # Clean up the text
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            text = " ".join(chunk for chunk in chunks if chunk)
    
            # Remove extra whitespace
            text = re.sub(r"\s+", " ", text).strip()
    
            # Truncate if too long
            if len(text) > 8000:
                text = text[:8000] + "... [content truncated]"
    
            await ctx.info(
                f"Successfully fetched and parsed content ({len(text)} characters)"
            )
            return text
    
        except httpx.TimeoutException:
            await ctx.error(f"Request timed out for URL: {url}")
            return "Error: The request timed out while trying to fetch the webpage."
        except httpx.HTTPError as e:
            await ctx.error(f"HTTP error occurred while fetching {url}: {str(e)}")
            return f"Error: Could not access the webpage ({str(e)})"
        except Exception as e:
            await ctx.error(f"Error fetching content from {url}: {str(e)}")
            return f"Error: An unexpected error occurred while fetching the webpage ({str(e)})"
  • Class providing the fetch_and_parse method and rate limiting for web content fetching, instantiated globally as 'fetcher'.
    class WebContentFetcher:
        def __init__(self):
            self.rate_limiter = RateLimiter(requests_per_minute=20)
  • Global instantiation of WebContentFetcher used by the fetch_content tool.
    fetcher = WebContentFetcher()
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/988664li-star/duckduckgo-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server