crawl
Extract content from websites by crawling multiple pages from a starting URL, with configurable depth and page limits for structured data collection.
Instructions
Crawls a website starting from the specified URL and extracts content from multiple pages. Args: - url: The complete URL of the web page to start crawling from - maxDepth: The maximum depth level for crawling linked pages - limit: The maximum number of pages to crawl
Returns:
- Content extracted from the crawled pages in markdown and HTML format
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| maxDepth | Yes | ||
| limit | Yes |
Implementation Reference
- main.py:39-54 (registration)Registration and handler wrapper for the 'crawl' MCP tool, which delegates to WebTools.crawl implementation.@mcp.tool() async def crawl(url: str, maxDepth: int, limit: int) -> str: """Crawls a website starting from the specified URL and extracts content from multiple pages. Args: - url: The complete URL of the web page to start crawling from - maxDepth: The maximum depth level for crawling linked pages - limit: The maximum number of pages to crawl Returns: - Content extracted from the crawled pages in markdown and HTML format """ try: crawl_results = webtools.crawl(url, maxDepth, limit) return crawl_results except Exception as e: return f"Error crawling pages: {str(e)}"
- tools/webtools.py:23-36 (handler)Core implementation of the crawl functionality using FirecrawlApp.crawl_url, handling parameters for limit, maxDepth, and formats.def crawl(self, url: str, maxDepth: int, limit: int): try: crawl_page = self.firecrawl.crawl_url( url, params={ "limit": limit, "maxDepth": maxDepth, "scrapeOptions": {"formats": ["markdown", "html"]}, }, poll_interval=30, ) return crawl_page except Exception as e: return f"Error crawling pages: {str(e)}"