Skip to main content
Glama

MCP Server for Crawl4AI

by omgwtfwow

crawl_recursive

Deep crawl websites by following internal links to map entire sites, find all pages, and build comprehensive indexes with configurable depth and page limits.

Instructions

[STATELESS] Deep crawl a website following internal links. Use when: mapping entire sites, finding all pages, building comprehensive indexes. Control with max_depth (default 3) and max_pages (default 50). Note: May need JS execution for dynamic sites. Each page gets a fresh browser. For persistent operations use create_session + crawl.

Input Schema

NameRequiredDescriptionDefault
exclude_patternNoRegex to skip URLs. Example: ".*\/(login|admin).*" to avoid auth pages, ".*\.pdf$" to skip PDFs
include_patternNoRegex to match URLs to crawl. Example: ".*\/blog\/.*" for blog posts only, ".*\.html$" for HTML pages
max_depthNoMaximum depth to follow links
max_pagesNoMaximum number of pages to crawl
urlYesStarting URL to crawl from

Input Schema (JSON Schema)

{ "properties": { "exclude_pattern": { "description": "Regex to skip URLs. Example: \".*\\/(login|admin).*\" to avoid auth pages, \".*\\.pdf$\" to skip PDFs", "type": "string" }, "include_pattern": { "description": "Regex to match URLs to crawl. Example: \".*\\/blog\\/.*\" for blog posts only, \".*\\.html$\" for HTML pages", "type": "string" }, "max_depth": { "default": 3, "description": "Maximum depth to follow links", "type": "number" }, "max_pages": { "default": 50, "description": "Maximum number of pages to crawl", "type": "number" }, "url": { "description": "Starting URL to crawl from", "type": "string" } }, "required": [ "url" ], "type": "object" }

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server