Crawl-MCP
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| CRAWL4AI_LANG | No | The language for the interface (e.g., 'en' for English, 'ja' for Japanese) | en |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tasks | {
"list": {},
"cancel": {},
"requests": {
"tools": {
"call": {}
},
"prompts": {
"get": {}
},
"resources": {
"read": {}
}
}
} |
| tools | {
"listChanged": true
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| crawl_urlA | Extract web page content with JavaScript support. Use wait_for_js=true for SPAs. Use content_offset/content_limit to paginate the response. Use output_path to persist the full unsliced content to disk as markdown and receive a slim metadata-only response. |
| deep_crawl_siteA | Crawl multiple pages from a site with configurable depth. Use output_path (directory) to persist per-URL markdown files + index.json; the response is then slimmed to metadata only. |
| crawl_url_with_fallbackA | Crawl with fallback strategies for anti-bot sites. Use content_offset/content_limit to paginate the response. Use output_path to persist the full unsliced content to disk as markdown and receive a slim response. |
| intelligent_extractA | Extract specific data from web pages using LLM. Use output_path to persist the full extraction output to disk as JSON and receive a slim response. |
| extract_entitiesA | Extract entities (emails, phones, etc.) from web pages. Use output_path to persist the full entity extraction output to disk as JSON and receive a slim response. |
| extract_structured_dataB | Extract structured data using CSS selectors or LLM. Use output_path to persist the full extraction (including table_data) to disk as JSON and receive a slim response. |
| extract_youtube_transcriptA | Extract YouTube transcripts with timestamps. Works with public captioned videos. Supports fallback to page crawl. Use output_path to persist the full unsliced transcript to disk as markdown. |
| batch_extract_youtube_transcriptsB | Extract transcripts from multiple YouTube videos. Max 3 URLs per call. Supply output_path (directory) in the request to persist per-video markdown files + index.json and receive a slim response. |
| get_youtube_video_infoA | Get YouTube video metadata and transcript availability. Use output_path to persist the full transcript to disk as markdown and receive a slim response. |
| extract_youtube_commentsB | Extract YouTube video comments. Supports pagination via comment_offset. Use output_path to persist the full unsliced comment list to disk as JSON; the response is then slimmed to metadata only. |
| process_fileB | Convert PDF, Word, Excel, PowerPoint, ZIP to markdown. Use output_path to persist the full unsliced converted markdown to disk and receive a slim response. |
| get_supported_file_formatsA | Get supported file formats (PDF, Office, ZIP) and their capabilities. |
| enhanced_process_large_contentB | Process large content with chunking and BM25 filtering. Use output_path to persist chunks + summaries to disk as JSON and receive a slim response. |
| search_googleA | Search Google with genre filtering. Genres: academic, news, technical, commercial, social. Supply output_path in the request to persist the full unsliced result set to disk as JSON and receive a slim response. |
| batch_search_googleA | Perform multiple Google searches. Max 3 queries per call. Supply output_path in the request to persist the full result set to disk as JSON and receive a slim response. |
| search_and_crawlA | Search Google and crawl top results. Combines search with full content extraction. Supply output_path (directory) in the request to persist per-page markdown (unsliced) + index.json and receive a slim response. |
| get_search_genresA | Get available search genres for targeted searching. |
| batch_crawlA | Crawl multiple URLs with fallback. Max 3 URLs per call. Use output_path (directory) to persist full per-URL markdown + index.json; the return shape stays a list, each success item gets an output_file key. |
| multi_url_crawlA | Multi-URL crawl with pattern-based config. Max 5 URL patterns per call. Use output_path (directory) to persist full per-URL markdown + index.json; the return shape stays a list, each success item gets an output_file key. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/walksoda/crawl-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server