Skip to main content

MCP Server for Crawl4AI

Overview InspectNew Schema Related Servers Score

MIT License

19

8

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`SERVER_NAME`	No	Custom name for the MCP server	crawl4ai-mcp
`SERVER_VERSION`	No	Custom version for the MCP server	1.0.0
`CRAWL4AI_API_KEY`	No	API key for authentication if your server requires it
`CRAWL4AI_BASE_URL`	Yes	URL of the Crawl4AI server	http://localhost:11235

Schema

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Tools

Functions exposed to the LLM to take actions

Name	Description
get_markdown	[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.
capture_screenshot	[STATELESS] Capture webpage screenshot. Returns base64-encoded PNG data. Creates new browser each time. Optionally saves screenshot to local directory. IMPORTANT: Chained calls (execute_js then capture_screenshot) will NOT work - the screenshot won't see JS changes! For JS changes + screenshot use create_session + crawl(session_id, js_code, screenshot:true) in ONE call.
generate_pdf	[STATELESS] Convert webpage to PDF. Returns base64-encoded PDF data. Creates new browser each time. Cannot capture form fills or JS changes. For persistent PDFs use create_session + crawl(session_id, pdf:true).
execute_js	[STATELESS] Execute JavaScript and get return values + page content. Creates new browser each time. Use for: extracting data, triggering dynamic content, checking page state. Scripts with "return" statements return actual values (strings, numbers, objects, arrays). Note: null returns as {"success": true}. Returns values but page state is lost. For persistent JS execution, use crawl with session_id.
batch_crawl	[STATELESS] Crawl multiple URLs concurrently for efficiency. Use when: processing URL lists, comparing multiple pages, or bulk data extraction. Faster than sequential crawling. Max 5 concurrent by default. Each URL gets a fresh browser. Cannot maintain state between URLs. For persistent operations use create_session + crawl.
smart_crawl	[STATELESS] Auto-detect and handle different content types (HTML, sitemap, RSS, text). Use when: URL type is unknown, crawling feeds/sitemaps, or want automatic format handling. Adapts strategy based on content. Creates new browser each time. For persistent operations use create_session + crawl.
get_html	[STATELESS] Get sanitized/processed HTML for inspection and automation planning. Use when: finding form fields/selectors, analyzing page structure before automation, building schemas. Returns cleaned HTML showing element names, IDs, and classes - perfect for identifying selectors for subsequent crawl operations. Commonly used before crawl to find selectors for automation. Creates new browser each time.
extract_links	[STATELESS] Extract and categorize all page links. Use when: building sitemaps, analyzing site structure, finding broken links, or discovering resources. Groups by internal/external/social/documents. Creates new browser each time. For persistent operations use create_session + crawl.
crawl_recursive	[STATELESS] Deep crawl a website following internal links. Use when: mapping entire sites, finding all pages, building comprehensive indexes. Control with max_depth (default 3) and max_pages (default 50). Note: May need JS execution for dynamic sites. Each page gets a fresh browser. For persistent operations use create_session + crawl.
parse_sitemap	[STATELESS] Extract URLs from XML sitemaps. Use when: discovering all site pages, planning crawl strategies, or checking sitemap validity. Supports regex filtering. Try sitemap.xml or robots.txt first. Creates new browser each time.
crawl	[SUPPORTS SESSIONS] THE ONLY TOOL WITH BROWSER PERSISTENCE RECOMMENDED PATTERNS: • Inspect first workflow: get_html(url) → find selectors & verify elements exist create_session() → "session-123" crawl({url, session_id: "session-123", js_code: ["action 1"]}) crawl({url: "/page2", session_id: "session-123", js_code: ["action 2"]}) • Multi-step with state: create_session() → "session-123" crawl({url, session_id: "session-123"}) → inspect current state crawl({url, session_id: "session-123", js_code: ["verified actions"]}) WITH session_id: Maintains browser state (cookies, localStorage, page) across calls WITHOUT session_id: Creates fresh browser each time (like other tools) WHEN TO USE SESSIONS vs STATELESS: • Need state between calls? → create_session + crawl • Just extracting data? → Use stateless tools • Filling forms? → Inspect first, then use sessions • Taking screenshot after JS? → Must use crawl with session • Unsure if elements exist? → Always use get_html first CRITICAL FOR js_code: RECOMMENDED: Always use screenshot: true when running js_code This avoids server serialization errors and gives visual confirmation
manage_session	[SESSION MANAGEMENT] Unified tool for managing browser sessions. Supports three actions: • CREATE: Start a persistent browser session that maintains state across calls • CLEAR: Remove a session from local tracking • LIST: Show all active sessions with age and usage info USAGE EXAMPLES: Create session: {action: "create", session_id: "my-session", initial_url: "https://example.com"} Clear session: {action: "clear", session_id: "my-session"} List sessions: {action: "list"} Browser sessions maintain ALL state (cookies, localStorage, page) across multiple crawl calls. Essential for: forms, login flows, multi-step processes, maintaining state across operations.
extract_with_llm	[STATELESS] Ask questions about webpage content using AI. Returns natural language answers. Crawls fresh each time. For dynamic content or sessions, use crawl with session_id first.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server