fetch
Convert webpage HTML content into markdown format by providing the URL. Designed for web scraping and content extraction within the DuckDuckGo Web Search MCP Server.
Instructions
scrape the html content and return the markdown format using jina api.
Args:
url: The search query string
Returns:
text : html in markdown format
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes |
Implementation Reference
- main.py:154-170 (handler)The handler function for the 'fetch' MCP tool. Decorated with @mcp.tool(), it validates the input URL as a string and delegates to the fetch_url helper to retrieve and process the webpage content into markdown format.@mcp.tool() async def fetch(url: str): """ scrape the html content and return the markdown format using jina api. Args: url: The search query string Returns: text : html in markdown format """ if not isinstance(url, str): raise ValueError("Query must be a non-empty string") text = await fetch_url(url) return text
- main.py:59-80 (helper)Supporting helper function used by the 'fetch' tool to asynchronously fetch webpage content. Attempts Jina AI API for markdown conversion first, falls back to raw HTML parsing with BeautifulSoup if timeout occurs.async def fetch_url(url: str): jina_timeout = 15.0 raw_html_timeout = 5.0 url = f"https://r.jina.ai/{url}" async with httpx.AsyncClient() as client: try: print(f"fetching result from\n{url}") response = await client.get(url, timeout=jina_timeout) """ using jina api to convert html to markdown """ text = response.text return text except httpx.TimeoutException: try: print("Jina API timed out, fetching raw HTML...") response = await client.get(url, timeout=raw_html_timeout) """ using raw html """ soup = BeautifulSoup(response.text, "html.parser") text = soup.get_text() return text except httpx.TimeoutException: return "Timeout error"