ZMCPTools

Overview Schema Related Servers Score Discussions

scrape_documentation

Scrape website documentation using automated sub-agents. Supports CSS selectors and URL filtering for targeted content extraction.

Instructions

Scrape documentation from a website using intelligent sub-agents. Jobs are queued and processed automatically by the background worker. Supports plain string selectors for content extraction.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes	The URL of the website to scrape. Must be a valid HTTP/HTTPS URL. This is the starting point for the scraping process.
`name`	No	Optional human-readable name for this documentation source. If not provided, the hostname from the URL will be used.
`source_type`	No	Type of documentation being scraped. Used for optimization and categorization. Choose "api" for API documentation, "guide" for tutorials/guides, "reference" for reference docs, or "tutorial" for step-by-step tutorials.	guide
`max_pages`	No	Maximum number of pages to scrape from the website. Helps prevent runaway scraping. Range: 1-1000 pages.
`selectors`	No	CSS selectors to target specific content areas on pages. Use standard CSS selector syntax (e.g., "main article", ".content", "#documentation"). If not provided, the entire page content will be extracted.
`allow_patterns`	No	Legacy pattern support for URL filtering. Use allow_path_segments, allow_url_contains, or other typed parameters instead. Patterns can be glob patterns (/docs/), regex patterns (/api\/v[0-9]+\/.*/) or JSON objects with specific matching rules.
`ignore_patterns`	No	Legacy pattern support for URL exclusion. Use ignore_path_segments, ignore_url_contains, or other typed parameters instead. Patterns can be glob patterns (/private/), regex patterns (/login\|admin/) or JSON objects with specific matching rules.
`allow_path_segments`	No	Array of path segments that URLs must contain to be scraped. For example, ["docs", "api"] will only scrape URLs containing /docs/ or /api/ in their path.
`ignore_path_segments`	No	Array of path segments to exclude from scraping. For example, ["admin", "private"] will skip URLs containing /admin/ or /private/ in their path.
`allow_file_extensions`	No	Array of file extensions to include in scraping. For example, ["html", "php"] will only scrape URLs ending with .html or .php. Do not include the dot prefix.
`ignore_file_extensions`	No	Array of file extensions to exclude from scraping. For example, ["js", "css", "png"] will skip JavaScript, CSS, and image files. Do not include the dot prefix.
`allow_url_contains`	No	Array of substrings that URLs must contain to be scraped. For example, ["documentation", "guide"] will only scrape URLs containing these terms anywhere in the URL.
`ignore_url_contains`	No	Array of substrings that will exclude URLs from scraping. For example, ["login", "signup", "404"] will skip URLs containing these terms anywhere in the URL.
`allow_url_starts_with`	No	Array of URL prefixes that must match for URLs to be scraped. For example, ["https://docs.example.com/v2/"] will only scrape URLs starting with this prefix.
`ignore_url_starts_with`	No	Array of URL prefixes that will exclude URLs from scraping. For example, ["https://example.com/legacy/"] will skip URLs starting with this prefix.
`allow_version_patterns`	No	Array of version patterns to include in scraping. Useful for versioned documentation. For example, to scrape only v2.x.x docs, use: [{"prefix": "https://docs.example.com/v", "major": 2}]
`ignore_version_patterns`	No	Array of version patterns to exclude from scraping. Useful for skipping deprecated versions. For example, to skip v1.x.x docs, use: [{"prefix": "https://docs.example.com/v", "major": 1}]
`allow_glob_patterns`	No	Array of glob patterns for URLs to include in scraping. Supports wildcards: * (match any characters), ? (match single character), [abc] (match any character in brackets). For example, ["/docs/", "/api/v"]
`ignore_glob_patterns`	No	Array of glob patterns for URLs to exclude from scraping. Supports wildcards: * (match any characters), ? (match single character), [abc] (match any character in brackets). For example, ["/private/", "/admin/"]
`allow_regex_patterns`	No	Array of regular expressions for URLs to include in scraping. Use standard regex syntax. For example, ["/api/v[0-9]+/", "/docs/[a-z]+/"] will match versioned API paths and alphabetic doc paths.
`ignore_regex_patterns`	No	Array of regular expressions for URLs to exclude from scraping. Use standard regex syntax. For example, ["/login", "/admin", "/\\.(js\|css\|png\|jpg)$"] will skip login, admin, and static asset URLs.
`include_subdomains`	No	Whether to include subdomains in the scraping process. If true, links to subdomains (e.g., api.example.com when scraping docs.example.com) will be followed.
`force_refresh`	No	Whether to force refresh of previously scraped pages. If true, pages will be re-scraped even if they already exist in the database.
`agent_id`	No	Optional agent ID for tracking and memory storage. If provided, scraping insights and results will be stored in the agent's memory for future reference.
`enable_sampling`	No	Whether to enable intelligent parameter optimization through website sampling. When enabled, the scraper will analyze the website structure and optimize filtering parameters automatically.
`sampling_timeout`	No	Timeout in milliseconds for the sampling/optimization process. Default is 30 seconds. Only used when enable_sampling is true.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits, but it only mentions queuing and selectors. It lacks information on side effects (e.g., whether it modifies the website), authentication needs, rate limits, or what happens to the scraped data. This is insufficient for a complex scraping tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief and front-loaded, but for a tool with many parameters, it could benefit from slightly more structure (e.g., grouping filtering options). It earns its place but is a bit too terse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 26 parameters and no output schema, the description is too minimal. It doesn't explain return values, error handling, or background behavior. The schema descriptions help, but the tool description itself lacks context for an agent to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 26 parameters have full descriptions in the input schema, so the description adds minimal value. It mentions 'plain string selectors' but the schema already covers that. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scrapes documentation using sub-agents, but it does not distinguish itself from sibling scraping tools like 'scrape_content' or 'navigate_and_scrape', which would be necessary for an agent to choose correctly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as when to prefer queued scraping over direct scraping. There is no mention of prerequisites or limitations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ZachHandley/ZMCPTools'

If you have feedback or need assistance with the MCP directory API, please join our Discord server