Schema | Crawl-MCP

Crawl-MCP

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`CRAWL4AI_LANG`	No	The language for the interface (e.g., 'en' for English, 'ja' for Japanese)	en

Capabilities

Features and capabilities supported by this server

Capability	Details
`tasks`	{ "list": {}, "cancel": {}, "requests": { "tools": { "call": {} }, "prompts": { "get": {} }, "resources": { "read": {} } } }
`tools`	{ "listChanged": true }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
crawl_urlA	Extract web page content with JavaScript support. Use wait_for_js=true for SPAs. Use content_offset/content_limit to paginate the response. Use output_path to persist the full unsliced content to disk as markdown and receive a slim metadata-only response.
deep_crawl_siteA	Crawl multiple pages from a site with configurable depth. Use output_path (directory) to persist per-URL markdown files + index.json; the response is then slimmed to metadata only.
crawl_url_with_fallbackA	Crawl with fallback strategies for anti-bot sites. Use content_offset/content_limit to paginate the response. Use output_path to persist the full unsliced content to disk as markdown and receive a slim response.
intelligent_extractA	Extract specific data from web pages using LLM. Use output_path to persist the full extraction output to disk as JSON and receive a slim response.
extract_entitiesA	Extract entities (emails, phones, etc.) from web pages. Use output_path to persist the full entity extraction output to disk as JSON and receive a slim response.
extract_structured_dataB	Extract structured data using CSS selectors or LLM. Use output_path to persist the full extraction (including table_data) to disk as JSON and receive a slim response.
extract_youtube_transcriptA	Extract YouTube transcripts with timestamps. Works with public captioned videos. Supports fallback to page crawl. Use output_path to persist the full unsliced transcript to disk as markdown.
batch_extract_youtube_transcriptsB	Extract transcripts from multiple YouTube videos. Max 3 URLs per call. Supply output_path (directory) in the request to persist per-video markdown files + index.json and receive a slim response.
get_youtube_video_infoA	Get YouTube video metadata and transcript availability. Use output_path to persist the full transcript to disk as markdown and receive a slim response.
extract_youtube_commentsB	Extract YouTube video comments. Supports pagination via comment_offset. Use output_path to persist the full unsliced comment list to disk as JSON; the response is then slimmed to metadata only.
process_fileB	Convert PDF, Word, Excel, PowerPoint, ZIP to markdown. Use output_path to persist the full unsliced converted markdown to disk and receive a slim response.
get_supported_file_formatsA	Get supported file formats (PDF, Office, ZIP) and their capabilities.
enhanced_process_large_contentB	Process large content with chunking and BM25 filtering. Use output_path to persist chunks + summaries to disk as JSON and receive a slim response.
search_googleA	Search Google with genre filtering. Genres: academic, news, technical, commercial, social. Supply output_path in the request to persist the full unsliced result set to disk as JSON and receive a slim response.
batch_search_googleA	Perform multiple Google searches. Max 3 queries per call. Supply output_path in the request to persist the full result set to disk as JSON and receive a slim response.
search_and_crawlA	Search Google and crawl top results. Combines search with full content extraction. Supply output_path (directory) in the request to persist per-page markdown (unsliced) + index.json and receive a slim response.
get_search_genresA	Get available search genres for targeted searching.
batch_crawlA	Crawl multiple URLs with fallback. Max 3 URLs per call. Use output_path (directory) to persist full per-URL markdown + index.json; the return shape stays a list, each success item gets an output_file key.
multi_url_crawlA	Multi-URL crawl with pattern-based config. Max 5 URL patterns per call. Use output_path (directory) to persist full per-URL markdown + index.json; the return shape stays a list, each success item gets an output_file key.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/walksoda/crawl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server