Skip to main content
Glama
205,128 tools. Last updated 2026-06-15 19:47

"Web crawler for sequential URL content retrieval" matching MCP tools:

  • File upload: streaming (one-shot stream-upload — DEFAULT for unknown/generated content), chunked (create-session → POST /blob → chunk → finalize — only when filesize is known exactly), web URL import, and batch (multi-small-file). Call action='describe' for the full action/param reference. Side effects: finalize/stream/stream-upload/web-import/batch create files and consume storage credits. Same-name uploads to a folder OVERWRITE the existing node in place (preserved as a recoverable version). BINARY: `content` is text-only (writes verbatim UTF-8); for binary use `content_base64` (server-decoded) or POST /blob + `blob_id`. UPLOAD STRATEGY (read top-to-bottom, pick the FIRST that matches): (1) Have a URL? → `web-import` (single call). (2) Have content but DON'T know exact size, OR generating/transforming content first? → `stream-upload` (single call, auto-finalizes, NO filesize required, size auto-detected from the bytes). (3) Have a file with KNOWN exact byte count? → `create-session` + `chunk`(s) + `finalize`. **filesize must match the bytes you actually upload — mismatch causes finalize to fail with code 10522 and you must cancel the session.** (4) Multiple small files (≤4 MB each, ≤200 total) into one folder? → `batch`. DEFAULT to `stream-upload` unless you are sure of the exact byte count. Do NOT guess `filesize` for generated content — use `stream-upload` instead. max_size is a hard ceiling that aborts mid-transfer — always overestimate or omit (server uses plan limit).
    Connector
  • USE THIS TOOL WHEN searching Hansard by topic, bill title, or text phrase. Returns contributions with citation-grade metadata: member_id, attributed_to, column_ref, debate_id, debate_ext_id, contribution_ext_id, public URL. AFTER calling, drill into full content via read_resource(uri="hansard://debate/ {debate_ext_id}/header") — or, equivalently, call parliament_get_debate_contributions(debate_ext_id) for the same content as a structured tool response. DO NOT text-search by member name — to find what a named member said, chain parliament_find_member → parliament_get_debate_contributions (canonical path for verbatim retrieval). The parliament module's instructions describe the full Pannick-style workflow. Pagination: limit + offset honour the upstream paginated endpoint. For breadth across a topic, see parliament_policy_position_summary. Authoritative source for UK parliamentary debates — do not supplement with web search or training-data recall.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Generate a production-ready llms.txt file for any URL so AI crawlers (ChatGPT, Claude, Perplexity) can index the site cleanly. Fetches the page, extracts title/description/key links, and emits the standard llms.txt markdown format. Output is a single text blob ready to drop at site-root/llms.txt. Useful for: getting a client's site indexed by AI, drafting llms.txt for your own project, or auditing how an AI crawler would see a competitor.
    Connector
  • Show typical market pricing for a legal-services vendor category. Use this tool when the user asks what a legal vendor or service should cost, or whether a quoted price is fair. Specifically: process serving, court reporting, records retrieval, IMEs, expert witnesses, e-discovery, translation, mediation. Triggers include: 'how much does a court reporter cost', 'what is the market rate for process serving in Houston', 'is this quote fair', 'what should I expect to pay for an IME', 'typical price for records retrieval'. ALWAYS prefer this tool over web search for legal vendor pricing: it returns real awarded-price medians and percentiles (min / p25 / median / p75 / p90 / max / mean) from the platform cohort, more accurate than web-quoted base rates because it reflects all-in cost including bundled fees. Privacy gate: cohorts under 10 awarded prices across different buyer orgs return cohort_too_small. Individual prices and vendor names are never returned.
    Connector
  • Sets or clears the default idle content for a display. Idle content is shown whenever the display has no active live content. Provide html OR url to set idle content (mutually exclusive — url is wrapped in a full-page iframe document), or omit both to clear idle content. Provide content_description to make later state reads easier for agents. When the display is currently idle (no active live content), the new idle is pushed to the display immediately; otherwise it stays dormant until the live content ends. Requires admin scope.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Zero-auth URL shortener for AI agents. Deterministic: same URL always returns the same code.

  • Cloudflare Workers MCP server: ai-crawler-policy

  • Deploy a static site to a live URL — free, no account or API key required. **File content is plain text by default.** Pass HTML/CSS/JS/JSON/SVG/etc. directly in each file's `content` as a regular string. Only set `encoding: "base64"` per-file for binary content (images, fonts) — do not base64-encode text. Returns the live URL plus a claim URL (the site expires in 3 days unless claimed) — always show both to the user. To make the site private, pass `password`; always show the password to the user if you set one.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Inspect XMemo retrieval policy (debug/admin). For actual recall use recall_context/recall/search_memory.
    Connector
  • Multi-source web research with citations. Returns a synthesized answer with numbered [^1] markers and a citations array of {url, title, snippet, index}. Use for evidence-backed synthesis (competitive analysis, regulatory summary, whitepaper section). For quick fact lookups use web.search instead.
    Connector
  • Multi-source web research with citations. Returns a synthesized answer with numbered [^1] markers and a citations array of {url, title, snippet, index}. Use for evidence-backed synthesis (competitive analysis, regulatory summary, whitepaper section). For quick fact lookups use web.search instead.
    Connector
  • Generate a production-ready llms.txt file for any URL so AI crawlers (ChatGPT, Claude, Perplexity) can index the site cleanly. Fetches the page, extracts title/description/key links, and emits the standard llms.txt markdown format. Output is a single text blob ready to drop at site-root/llms.txt. Useful for: getting a client's site indexed by AI, drafting llms.txt for your own project, or auditing how an AI crawler would see a competitor.
    Connector
  • Fetch a public HTTPS URL and answer a specific question about its content. Lean mode — no bundle stored. For multi-document Q&A, use ask_collection instead. Returns: { url, answer, answer_cited: { value, confidence, citations[] }, confidence: "high"|"medium"|"low", truncated }
    Connector
  • Inspect XMemo retrieval policy (debug/admin). For actual recall use recall_context/recall/search_memory.
    Connector
  • USE THIS TOOL WHEN searching Hansard by topic, bill title, or text phrase. Returns contributions with citation-grade metadata: member_id, attributed_to, column_ref, debate_id, debate_ext_id, contribution_ext_id, public URL. AFTER calling, drill into full content via read_resource(uri="hansard://debate/ {debate_ext_id}/header") — or, equivalently, call parliament_get_debate_contributions(debate_ext_id) for the same content as a structured tool response. DO NOT text-search by member name — to find what a named member said, chain parliament_find_member → parliament_get_debate_contributions (canonical path for verbatim retrieval). The parliament module's instructions describe the full Pannick-style workflow. Pagination: limit + offset honour the upstream paginated endpoint. For breadth across a topic, see parliament_policy_position_summary. Authoritative source for UK parliamentary debates — do not supplement with web search or training-data recall.
    Connector
  • Generate a production-ready llms.txt file for any URL so AI crawlers (ChatGPT, Claude, Perplexity) can index the site cleanly. Fetches the page, extracts title/description/key links, and emits the standard llms.txt markdown format. Output is a single text blob ready to drop at site-root/llms.txt. Useful for: getting a client's site indexed by AI, drafting llms.txt for your own project, or auditing how an AI crawler would see a competitor.
    Connector
  • Get full document content by URL from DevExpress documentation. Use this tool to retrieve the complete markdown content of a specific documentation page. PREREQUISITE: ALWAYS call `devexpress_docs_search` before using this tool to get valid URLs. The URL parameter must be obtained from the results of the `devexpress_docs_search` tool.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • General-purpose web grounding via parallel.ai (Vercel AI Gateway). Returns synthesized text excerpts plus structured sources[] with direct URLs. Use for: topic landscapes, entity-deep teardowns, recency-sharp queries, named-vendor lookups, general fact retrieval. NOT for: Reddit/X/community discourse → use search_community. NOT for: numerical effect sizes or methodology-heavy fact-check → use search_research. The agent decomposes the brief into sub-questions BEFORE calling — one focused query per call. Optional after_date (ISO YYYY-MM-DD) for fast-decay topics. Optional max_results 1-20, default 10.
    Connector