Skip to main content
Glama
gianlucamazza

MCP DuckDuckGo Search Plugin

get_page_content

Extract web page content including title, description, and main text from any URL for analysis and information retrieval.

Instructions

Fetch and extract content from a web page.

Returns the page title, description, and main content.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch content from

Implementation Reference

  • The main asynchronous handler function for the 'get_page_content' tool. It fetches the web page using httpx, parses it with BeautifulSoup, extracts title, meta description, and main content using multiple selectors, and returns structured data including domain extracted via helper.
    @mcp_server.tool() async def get_page_content( url: str = Field(..., description="URL to fetch content from"), ctx: Context = Field(default_factory=Context), ) -> Dict[str, Any]: """ Fetch and extract content from a web page. Returns the page title, description, and main content. """ logger.info("Fetching content from: %s", url) try: # Get HTTP client from context http_client = getattr(ctx, "http_client", None) if not http_client: http_client = httpx.AsyncClient(timeout=15.0) close_client = True else: close_client = False try: headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" } response = await http_client.get(url, headers=headers, timeout=15) response.raise_for_status() soup = BeautifulSoup(response.text, "html.parser") # Extract title title = "" title_tag = soup.find("title") if title_tag: title = title_tag.get_text().strip() # Extract description from meta tags description = "" meta_desc = soup.find("meta", attrs={"name": "description"}) if meta_desc: description = meta_desc.get("content", "").strip() # type: ignore[union-attr] # Extract main content (try common content selectors) content_text = "" content_selectors = [ "main article", "article", '[role="main"]', ".content", ".article-content", ".post-content", "#content", "#article", ".entry-content", ] for selector in content_selectors: main_content = soup.select_one(selector) if main_content: content_text = main_content.get_text().strip() break # If no content found, get all paragraphs if not content_text: paragraphs = soup.find_all("p")[:5] # First 5 paragraphs content_text = "\n\n".join(p.get_text().strip() for p in paragraphs) # Clean up content (first 500 chars for preview) content_preview = ( content_text[:500] + "..." if len(content_text) > 500 else content_text ) return { "url": url, "title": title, "description": description, "content": content_text, "content_preview": content_preview, "domain": extract_domain(url), "status": "success", } finally: if close_client: await http_client.aclose() except Exception as e: logger.error("Failed to fetch content from %s: %s", url, e) return { "url": url, "title": "", "description": "", "content": "", "content_preview": f"Error: {str(e)}", "domain": extract_domain(url), "status": "error", "error": str(e), }
  • Registration of the tool occurs here in create_mcp_server() by calling register_search_tools(server), which defines and registers the get_page_content handler using the @mcp_server.tool() decorator.
    # Register tools directly with the server instance register_search_tools(server)
  • Helper utility function used by get_page_content to extract the domain from the URL for the response dictionary.
    def extract_domain(url: str) -> str: """ Extract domain from URL. Args: url: URL string to extract domain from Returns: Lowercase domain name or empty string if parsing fails """ try: parsed = urllib.parse.urlparse(url) return parsed.netloc.lower() except Exception as e: logger.debug("Failed to extract domain from URL %s: %s", url, e) return ""
  • Pydantic-based input schema definition using Field for validation and descriptions, output is Dict[str, Any].
    async def get_page_content( url: str = Field(..., description="URL to fetch content from"), ctx: Context = Field(default_factory=Context), ) -> Dict[str, Any]:

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gianlucamazza/mcp-duckduckgo'

If you have feedback or need assistance with the MCP directory API, please join our Discord server