OpenChrome

crawl

Recursively crawl websites to extract content from multiple pages when the URL structure is unknown, respecting robots.txt and scope constraints.

Instructions

Recursively crawl a website via BFS. Opens pages in new tabs, extracts text and links, follows them up to max_depth. Respects robots.txt and scope constraints.

When to use: Extracting content from multiple pages of a site when the URL structure is not known in advance. When NOT to use: Use crawl_sitemap when the site has a sitemap.xml, or navigate for a single page.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	Starting URL to crawl
`max_depth`	No	Maximum link-follow depth (0 = start page only). Default: 2
`max_pages`	No	Maximum number of pages to crawl. Default: 20
`cursor`	No	Opaque pagination cursor returned as nextCursor from a prior crawl call. Cursoring paginates returned pages after the crawl completes.
`scope`	No	URL glob pattern limiting which URLs to follow (e.g. "https://docs.example.com/**"). Default: same origin as start URL.
`include_patterns`	No	URL glob patterns — only follow links matching at least one
`exclude_patterns`	No	URL glob patterns — skip links matching any of these
`output_format`	No	Content format per page. "markdown-clean" uses cheerio+turndown to strip nav/footer/ads. Default: markdown
`onlyMainContent`	No	markdown-clean only: strip nav/header/footer/aside/ads. Default: true.
`includeLinks`	No	markdown-clean only: preserve <a> as markdown links. Default: true.
`content_filter`	No	markdown-clean only: deterministic fit_markdown filter. Default: none.
`return_raw`	No	markdown-clean only: include raw_markdown in each page. Default: false.
`return_fit`	No	markdown-clean only: include fit_markdown and use it as content when filtering. Default: true when filtered.
`respect_robots`	No	Whether to fetch and obey robots.txt. Default: true
`delay_ms`	No	Delay between page fetches in milliseconds. Default: 1000
`concurrency`	No	Max parallel tab fetches. Default: 3
`engine`	No	Fetch engine: "cdp" (default, opens a Chrome tab per page), "static" (Node fetch only, fails closed on insufficient pages), or "auto" (static first, fall back to CDP when static is insufficient).
`include_metrics`	No	When true, include approximate output size/token metrics in the JSON result. Default: false.
`strategy`	No	Crawl traversal strategy. Default: bfs. best_first scores discovered URLs by query/url_score and visits highest-scoring URLs first.
`query`	No	Optional query terms used by strategy=best_first URL scoring.
`url_score`	No	Optional strategy=best_first URL scoring hints: keywords, prefer_paths, exclude_paths, same_depth_bias.
`dispatcher`	No	Crawl concurrency dispatcher. Default: fixed. adaptive reduces concurrency on memory/error pressure and records origin backoff for 429/503 responses.
`dispatcher_options`	No	dispatcher=adaptive options: min_concurrency, max_concurrency, memory_pressure_mb, origin_backoff_ms, rate_limit_statuses.
`cache_mode`	No	Opt-in crawl content cache mode. Default: disabled.
`cache_ttl_ms`	No	Maximum age for enabled/read_only cache hits. Negative or omitted means no TTL expiry.
`cache_scope`	No	Cache namespace/safety scope. Default: public; session adds a session fingerprint to the key.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the sparse annotations: opening new tabs, respecting robots.txt and scope constraints, following links up to max_depth. It does not contradict annotations. Some details like engine behavior or caching are not mentioned, but the core behavior is well-covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three short paragraphs: core functionality, usage guidance, and a trailing note on scope/responsiveness. Every sentence adds value, front-loaded with the most important information. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 26 parameters and no output schema, the description provides a solid high-level understanding. It covers the main use case, BFS traversal, robots.txt handling, and links to sibling tools. Advanced features like caching or best-first strategy are not detailed, but the schema handles those. The description is sufficiently complete for an agent to decide when to invoke it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter is already described in the schema. The description adds minimal extra semantics for individual parameters, mentioning only max_depth implicitly. Baseline of 3 is appropriate since the schema carries the burden.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: recursively crawl a website via BFS, extracting text and links. It distinguishes itself from siblings like navigate (single page) and crawl_sitemap (when sitemap exists). The specific verb 'crawl' and resource 'website' with method 'BFS' make it unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides 'When to use' and 'When NOT to use' sections, with clear alternatives: use crawl_sitemap when a sitemap is available, or navigate for a single page. This helps the agent choose correctly among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shaun0927/openchrome'

If you have feedback or need assistance with the MCP directory API, please join our Discord server