crawl_url
Extract and analyze webpage text content for research, quoting, or data processing by fetching URLs and returning clean text output.
Instructions
Fetch a URL with crawl4ai when you need the actual page text for quoting or analysis.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| reasoning | Yes | ||
| max_chars | No |
Implementation Reference
- src/searxng_mcp/server.py:102-135 (handler)The main handler function for the 'crawl_url' MCP tool. It takes a URL, fetches cleaned markdown content using the CrawlerClient, handles errors, clamps output length, and tracks usage.@mcp.tool() async def crawl_url( url: Annotated[str, "HTTP(S) URL (ideally from web_search output)"], reasoning: Annotated[str, "Why you're crawling this URL (required for analytics)"], max_chars: Annotated[int, "Trim textual result to this many characters"] = CRAWL_MAX_CHARS, ) -> str: """Fetch a URL with crawl4ai when you need the actual page text for quoting or analysis.""" start_time = time.time() success = False error_msg = None result = "" try: text = await crawler_client.fetch(url, max_chars=max_chars) result = clamp_text(text, MAX_RESPONSE_CHARS) success = True except Exception as exc: # noqa: BLE001 error_msg = str(exc) result = f"Crawl failed for {url}: {exc}" finally: # Track usage response_time = (time.time() - start_time) * 1000 tracker.track_usage( tool_name="crawl_url", reasoning=reasoning, parameters={"url": url, "max_chars": max_chars}, response_time_ms=response_time, success=success, error_message=error_msg, response_size=len(result.encode("utf-8")), ) return result
- src/searxng_mcp/server.py:102-102 (registration)The @mcp.tool() decorator registers the crawl_url function as an available tool in the FastMCP server.@mcp.tool()
- src/searxng_mcp/server.py:103-107 (schema)Input schema defined via Annotated type hints: url (str, required), reasoning (str, required), max_chars (int, optional default CRAWL_MAX_CHARS). Output: str (markdown content).async def crawl_url( url: Annotated[str, "HTTP(S) URL (ideally from web_search output)"], reasoning: Annotated[str, "Why you're crawling this URL (required for analytics)"], max_chars: Annotated[int, "Trim textual result to this many characters"] = CRAWL_MAX_CHARS, ) -> str:
- src/searxng_mcp/crawler.py:14-37 (helper)The fetch method of CrawlerClient performs the actual crawling using crawl4ai's AsyncWebCrawler, extracts markdown/content, and applies character limiting. Called by the tool handler.async def fetch(self, url: str, *, max_chars: int | None = None) -> str: """Fetch *url* and return cleaned markdown, trimmed to *max_chars*.""" run_config = CrawlerRunConfig(cache_mode=self.cache_mode) async with AsyncWebCrawler() as crawler: result = await crawler.arun(url=url, config=run_config) if getattr(result, "error", None): raise RuntimeError(str(result.error)) # type: ignore text = ( getattr(result, "markdown", None) or getattr(result, "content", None) or getattr(result, "html", None) or "" ) text = text.strip() if not text: raise RuntimeError("Crawl completed but returned no readable content.") limit = max_chars or CRAWL_MAX_CHARS return clamp_text(text, limit)