crawl_recursive

Deep crawl websites by following internal links to map entire sites, find all pages, and build comprehensive indexes with configurable depth and page limits.

Instructions

[STATELESS] Deep crawl a website following internal links. Use when: mapping entire sites, finding all pages, building comprehensive indexes. Control with max_depth (default 3) and max_pages (default 50). Note: May need JS execution for dynamic sites. Each page gets a fresh browser. For persistent operations use create_session + crawl.

Input Schema

Name	Required	Description
`exclude_pattern`	No	Regex to skip URLs. Example: ".\/(login\|admin)." to avoid auth pages, ".*\.pdf$" to skip PDFs
`include_pattern`	No	Regex to match URLs to crawl. Example: ".\/blog\/." for blog posts only, ".*\.html$" for HTML pages
`max_depth`	No	Maximum depth to follow links
`max_pages`	No	Maximum number of pages to crawl
`url`	Yes	Starting URL to crawl from

Input Schema (JSON Schema)

{ "properties": { "exclude_pattern": { "description": "Regex to skip URLs. Example: \".*\\/(login|admin).*\" to avoid auth pages, \".*\\.pdf$\" to skip PDFs", "type": "string" }, "include_pattern": { "description": "Regex to match URLs to crawl. Example: \".*\\/blog\\/.*\" for blog posts only, \".*\\.html$\" for HTML pages", "type": "string" }, "max_depth": { "default": 3, "description": "Maximum depth to follow links", "type": "number" }, "max_pages": { "default": 50, "description": "Maximum number of pages to crawl", "type": "number" }, "url": { "description": "Starting URL to crawl from", "type": "string" } }, "required": [ "url" ], "type": "object" }

MCP Server for Crawl4AI

crawl_recursive

Instructions

Input Schema

Input Schema (JSON Schema)

Other Tools

Latest Blog Posts

MCP directory API