Skip to main content
Glama
RapierCraft

alterlab-mcp-server

by RapierCraft

alterlab_crawl

Initiate an asynchronous website crawl that discovers URLs via sitemaps and link extraction, returning a crawl ID for later status polling. Supports pattern-based scoping and optional JavaScript rendering.

Instructions

Start an asynchronous crawl of an entire website. Discovers URLs via sitemap parsing and link extraction, then scrapes each page. Returns a crawl_id immediately — use alterlab_crawl_status to poll results. Use include_patterns/exclude_patterns to scope the crawl to specific sections. Use render_js='auto' for mixed sites to save 30-60% vs always rendering. Supports extraction_schema to extract structured data from every page.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesStart URL for the crawl
max_pagesNoMaximum number of pages to scrape
max_depthNoMaximum link-following depth from start URL (0 = start page only)
include_patternsNoGlob patterns — only scrape URLs whose path matches at least one (e.g., ['/blog/*', '/docs/*'])
exclude_patternsNoGlob patterns — skip URLs whose path matches any (e.g., ['/tag/*', '/author/*'])
sitemapNoSitemap mode: include (default), skip (link extraction only), only (sitemap URLs only)include
formatsNoOutput formats for each scraped page
extraction_schemaNoJSON schema for structured extraction on each page
extraction_modelNoPer-request LLM model override in provider-specific format (e.g. 'gpt-4o', 'claude-opus-4-5-20251101', 'llama3-70b-8192'). Overrides the model saved in your BYOK key settings for this request only.
render_jsNoRender JavaScript on crawled pages. true=always (Tier 4), false=never, auto=smart detection per page
use_proxyNoRoute all crawl requests through premium proxy
max_concurrencyNoMaximum concurrent pages to scrape simultaneously
respect_robotsNoRespect robots.txt rules for the target domain
include_subdomainsNoInclude links to subdomains during discovery
webhook_urlNoWebhook URL to notify on crawl completion
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses asynchronous nature, discovery methods (sitemap and links), scoping with patterns, and performance advice for render_js. Does not mention robots.txt respect or concurrency, but these are in schema parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise and front-loaded. Five sentences cover purpose, key features, and optimization tips without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (15 parameters, no output schema), the description adequately explains the async flow, how to poll results, key parameter use cases, and even provides optimization percentages. Sufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds significant value beyond the schema: concrete advice on include/exclude patterns for scoping, render_js='auto' for cost savings (30-60%), and extraction_schema for structured data extraction.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool starts an asynchronous crawl of an entire website, discovers URLs via sitemap parsing and link extraction, and returns a crawl_id immediately. It distinguishes itself by referencing the polling sibling tool (alterlab_crawl_status).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage tips: use include/exclude patterns to scope, render_js='auto' for mixed sites to save cost, and extraction_schema for structured data. Could be improved by explicitly contrasting with single-page scrape or site map tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RapierCraft/alterlab-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server