Skip to main content
Glama
988664li-star

DuckDuckGo MCP Server

fetch_content

Extract and parse webpage content from a URL for analysis or integration with language models.

Instructions

Fetch and parse content from a webpage URL.

Args:
    url: The webpage URL to fetch content from
    ctx: MCP context for logging

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The 'fetch_content' tool handler function. Registered via @mcp.tool() decorator. Executes the tool logic by delegating to WebContentFetcher.fetch_and_parse().
    @mcp.tool()
    async def fetch_content(url: str, ctx: Context) -> str:
        """
        Fetch and parse content from a webpage URL.
    
        Args:
            url: The webpage URL to fetch content from
            ctx: MCP context for logging
        """
        return await fetcher.fetch_and_parse(url, ctx)
  • Core implementation of webpage fetching and parsing logic used by the 'fetch_content' tool. Includes rate limiting, HTML parsing with BeautifulSoup, text extraction and cleaning.
    async def fetch_and_parse(self, url: str, ctx: Context) -> str:
        """Fetch and parse content from a webpage"""
        try:
            await self.rate_limiter.acquire()
    
            await ctx.info(f"Fetching content from: {url}")
    
            async with httpx.AsyncClient() as client:
                response = await client.get(
                    url,
                    headers={
                        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
                    },
                    follow_redirects=True,
                    timeout=30.0,
                )
                response.raise_for_status()
    
            # Parse the HTML
            soup = BeautifulSoup(response.text, "html.parser")
    
            # Remove script and style elements
            for element in soup(["script", "style", "nav", "header", "footer"]):
                element.decompose()
    
            # Get the text content
            text = soup.get_text()
    
            # Clean up the text
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            text = " ".join(chunk for chunk in chunks if chunk)
    
            # Remove extra whitespace
            text = re.sub(r"\s+", " ", text).strip()
    
            # Truncate if too long
            if len(text) > 8000:
                text = text[:8000] + "... [content truncated]"
    
            await ctx.info(
                f"Successfully fetched and parsed content ({len(text)} characters)"
            )
            return text
    
        except httpx.TimeoutException:
            await ctx.error(f"Request timed out for URL: {url}")
            return "Error: The request timed out while trying to fetch the webpage."
        except httpx.HTTPError as e:
            await ctx.error(f"HTTP error occurred while fetching {url}: {str(e)}")
            return f"Error: Could not access the webpage ({str(e)})"
        except Exception as e:
            await ctx.error(f"Error fetching content from {url}: {str(e)}")
            return f"Error: An unexpected error occurred while fetching the webpage ({str(e)})"
  • Class providing the fetch_and_parse method and rate limiting for web content fetching, instantiated globally as 'fetcher'.
    class WebContentFetcher:
        def __init__(self):
            self.rate_limiter = RateLimiter(requests_per_minute=20)
  • Global instantiation of WebContentFetcher used by the fetch_content tool.
    fetcher = WebContentFetcher()

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/988664li-star/duckduckgo-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server