webcrawl_search

Find web pages via DuckDuckGo search and optionally extract their full text content.

Instructions

Search the web using DuckDuckGo.

Input Schema

TableJSON Schema

Name	Required	Description
`query`	Yes	Search query string
`num_results`	No	Maximum number of results to return (default: 5)
`scrape_results`	No	If true, fetch full page content for each result (default: false)

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/webcrawl_mcp/server.py:33-49 (registration)

MCP tool registration via @mcp.tool decorator. The handler function webcrawl_search delegates to search() or search_and_scrape() from search.py.

@mcp.tool
async def webcrawl_search(
    query: str, num_results: int = 5, scrape_results: bool = False
) -> list[dict]:
    """Search the web using DuckDuckGo.

    Args:
        query: Search query string
        num_results: Maximum number of results to return (default: 5)
        scrape_results: If true, fetch full page content for each result (default: false)

    Returns:
        List of search results, each with url, title, snippet, and optionally content
    """
    if scrape_results:
        return await search_and_scrape(query, num_results)
    return search(query, num_results)

src/webcrawl_mcp/search.py:35-48 (handler)

Core search handler. Calls _search_ddg to perform DuckDuckGo search via the ddgs library, returning results with url, title, snippet.

def search(query: str, num_results: int = 5) -> list[dict]:
    """Search the web using DuckDuckGo.

    Args:
        query: Search query string
        num_results: Maximum number of results to return

    Returns:
        List of search results with url, title, snippet
    """
    print(f"[webcrawl] searching: {query}", file=sys.stderr)
    results = _search_ddg(query, num_results)
    print(f"[webcrawl] found {len(results)} results", file=sys.stderr)
    return results

src/webcrawl_mcp/search.py:51-81 (handler)

Alternate handler when scrape_results=True. Performs search then scrapes each result URL for full page content using the scrape() utility.

async def search_and_scrape(query: str, num_results: int = 5) -> list[dict]:
    """Search the web and fetch content for each result.

    Args:
        query: Search query string
        num_results: Maximum number of results to return

    Returns:
        List of search results with url, title, snippet, and content
    """
    print(f"[webcrawl] searching: {query}", file=sys.stderr)
    results = _search_ddg(query, num_results)
    print(f"[webcrawl] found {len(results)} results, fetching content...", file=sys.stderr)

    for result in results:
        url = result["url"]
        try:
            scraped = await scrape(url)
            result["content"] = scraped.content
            result["source"] = scraped.source
            print(
                f"[webcrawl] fetched {len(scraped.content)} chars from {url} "
                f"({scraped.source})",
                file=sys.stderr,
            )
        except Exception as e:
            print(f"[webcrawl] failed to fetch {url}: {e}", file=sys.stderr)
            result["content"] = None
            result["source"] = None

    return results

src/webcrawl_mcp/search.py:14-32 (helper)

Internal helper that interacts with the DDGS (DuckDuckGo Search) library to perform the actual web search, returning results with url, title, and snippet fields.

def _search_ddg(query: str, num_results: int) -> list[dict]:
    """Perform DuckDuckGo search.

    Args:
        query: Search query string
        num_results: Maximum number of results

    Returns:
        List of raw search results
    """
    results = []
    with DDGS() as ddgs:
        for r in ddgs.text(query, max_results=num_results):
            results.append({
                "url": r.get("href", ""),
                "title": r.get("title", ""),
                "snippet": r.get("body", ""),
            })
    return results

src/webcrawl_mcp/server.py:33-49 (schema)

Tool schema (input/output types and docstring) for webcrawl_search, including parameters query (str), num_results (int, default 5), scrape_results (bool, default False) and return type list[dict].

@mcp.tool
async def webcrawl_search(
    query: str, num_results: int = 5, scrape_results: bool = False
) -> list[dict]:
    """Search the web using DuckDuckGo.

    Args:
        query: Search query string
        num_results: Maximum number of results to return (default: 5)
        scrape_results: If true, fetch full page content for each result (default: false)

    Returns:
        List of search results, each with url, title, snippet, and optionally content
    """
    if scrape_results:
        return await search_and_scrape(query, num_results)
    return search(query, num_results)

webcrawl-mcp

webcrawl_search

Instructions

Input Schema

Output Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API