Skip to main content
Glama
RapierCraft

alterlab-mcp-server

by RapierCraft

alterlab_crawl

Crawl an entire website asynchronously. Discovers URLs via sitemaps and links, scrapes each page, and returns a crawl ID for result polling.

Instructions

Start an asynchronous crawl of an entire website. Discovers URLs via sitemap parsing and link extraction, then scrapes each page. Returns a crawl_id immediately — use alterlab_crawl_status to poll results. Use include_patterns/exclude_patterns to scope the crawl to specific sections. Use render_js='auto' for mixed sites to save 30-60% vs always rendering. Supports extraction_schema to extract structured data from every page.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesStart URL for the crawl
formatsNoOutput formats for each scraped page
sitemapNoSitemap mode: include (default), skip (link extraction only), only (sitemap URLs only)include
max_depthNoMaximum link-following depth from start URL (0 = start page only)
max_pagesNoMaximum number of pages to scrape
render_jsNoRender JavaScript on crawled pages. true=always (Tier 4), false=never, auto=smart detection per page
use_proxyNoRoute all crawl requests through premium proxy
webhook_urlNoWebhook URL to notify on crawl completion
block_imagesNoBlock image downloads during browser rendering on each crawled page. Reduces proxy bandwidth and speeds up crawls. Only effective with render_js=true.
respect_robotsNoRespect robots.txt rules for the target domain
max_concurrencyNoMaximum concurrent pages to scrape simultaneously
exclude_patternsNoGlob patterns — skip URLs whose path matches any (e.g., ['/tag/*', '/author/*'])
extraction_modelNoPer-request LLM model override in provider-specific format (e.g. 'gpt-4o', 'claude-opus-4-5-20251101', 'llama3-70b-8192'). Overrides the model saved in your BYOK key settings for this request only.
include_patternsNoGlob patterns — only scrape URLs whose path matches at least one (e.g., ['/blog/*', '/docs/*'])
extraction_schemaNoJSON schema for structured extraction on each page
include_subdomainsNoInclude links to subdomains during discovery
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It explains the tool is asynchronous, returns a crawl_id, discovers URLs via sitemap and links, and supports structured extraction. Mentions settings like proxy, image blocking, and robots.txt respect, but does not detail error behavior or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is four sentences, front-loaded with the main purpose, followed by asynchronous return, scoping tips, and efficiency advice. No unnecessary words, well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 16 parameters and no output schema, the description covers the core workflow and provides actionable advice. Omits details on error handling and exact return format, but guides to the status tool for results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, baseline is 3. The description adds value by recommending render_js='auto' (30-60% savings) and highlighting include/exclude patterns and extraction_schema, going beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool starts an asynchronous crawl of an entire website, specifying discovery via sitemaps and links, and scraping each page. It distinguishes from siblings by mentioning async behavior and polling via alterlab_crawl_status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance: use include/exclude patterns to scope, use render_js='auto' for efficiency, and poll results with alterlab_crawl_status. Lacks explicit when-not-to-use compared to alternatives like alterlab_scrape or alterlab_map.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RapierCraft/alterlab-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server