Skip to main content
Glama
205,128 tools. Last updated 2026-06-15 08:40

"Developing a web scraper" matching MCP tools:

  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Open a PERSISTENT browser session (cookies/login survive across calls) and get a browser_id to drive with browse_navigate/snapshot/click/type/fill/.../close. THIS is how you ACT on the web — log in, fill forms, click through multi-page flows — not just read one page. Free. mode='stealth' (anti-detect) + sign=true (Web Bot Auth) are governed by your colony standing. Capacity-limited: returns {ok:false, error:'at capacity'} when the colony browser is full — close sessions you finish.
    Connector
  • Start an asynchronous CoreClaw scraper run with custom parameters. Returns a run_slug for tracking status, results, and logs. WHEN TO USE: the user wants to execute, start, launch, or "跑" a CoreClaw scraper with custom inputs — "跑一下 amazon scraper"、"run this scraper with these URLs"、"execute the google maps scraper". MUST have called get_scraper_details first to obtain 'version' and the 'custom_params' schema. WHEN NOT TO USE: do NOT call without first calling get_scraper_details — version/schema are required. Do NOT use to re-run a past run (use rerun) or to run a saved task (use run_task). RETURNS: JSON with 'run_slug' (use for get_run_status / get_run_results / abort_run), 'status' (initial state). WORKFLOW: preceded by get_scraper_details. Follow with get_run_status (poll until status=3 succeeded or 4 failed), then get_run_results or export_run_results.
    Connector
  • VERIFIABLE keyless web-read for autonomous agents. Every result ships a cryptographically SIGNED provenance receipt (EIP-191 over sha256(text)+url+status+time) — the wedge a free scraper structurally CANNOT match: Jina r.jina.ai is free+keyless too, but its bytes are HEARSAY (no proof of what/where/when). MERCURY's `attestation` is ecrecoverable OFFLINE, forever, by you OR any downstream agent you forward the bytes to — proving the content is genuine + untampered (key pinned at /.well-known/mercury-attestation). For RAG, trading and agent-to-agent commerce that need provenance, that is the gap between data and evidence. Beyond that it's the keyless web-read primitive — NO API key, NO signup, NO account, NO monthly plan, the one fetch SKU a fresh agent can onboard to by itself instead of stopping to ask a human for a key. Give a ?url= and get back clean readable page text + title + status. Agent-native extras (opt-in): ?format=markdown for structure-preserving markdown, ?links=1 for an outbound-link graph (crawl frontier), and the headline wedge — STRUCTURED EXTRACT: ?extract=title,price,author,publishedAt returns a clean JSON record { title, price, author, publishedAt }, an LLM-ready row not a wall of text. That is Firecrawl's paid 'JSON mode' (they need an LLM call + an API key for it) done here DETERMINISTICALLY from the page's own JSON-LD/OpenGraph/meta/microdata — keyless, no LLM, $0.003. (?extract=1 still returns the legacy description + wordCount.) The extracted record is folded into the SIGNED attestation too, so a buyer can prove the FIELDS — not just the raw bytes — are exactly what MERCURY resolved. You pay in-band over HTTP 402 (x402, USDC on Base mainnet) — the wedge those tools can't match: they ALL gate behind a human-created API key + a credit-card plan, so an agent can't onboard itself. This one an agent finds in the x402 Bazaar and pays with zero human in the loop. Honest charge-per-ATTEMPT: every call returns a structured result (success OR an ok:false failure with a reason) — never a silent charge-then-500. Follows redirects, SSRF-guarded, 5s timeout, 10MB cap. Pure data, no mint — delivers in prod. — $0.003/call
    Connector
  • Export a CoreClaw scraper run's full result set as a downloadable CSV or JSON file. WHEN TO USE: the user wants to download, export, save, or get a file of run results — "导出成 CSV"、"download all results"、"give me a file"、"export as JSON". Preferred over get_run_results when dataset is large (>100 records) or user explicitly asks for a file. WHEN NOT TO USE: do NOT use for in-chat data preview (use get_run_results). Do NOT use for logs (use get_run_logs). The returned URL expires in ~30 minutes — do NOT cache it long-term. RETURNS: JSON with 'download_url' (temporary, valid ~30 min), 'format', 'record_count'. WORKFLOW: preceded by get_run_status (status=3). Terminal call — user typically downloads the file directly.
    Connector
  • Multi-source web research with citations. Returns a synthesized answer with numbered [^1] markers and a citations array of {url, title, snippet, index}. Use for evidence-backed synthesis (competitive analysis, regulatory summary, whitepaper section). For quick fact lookups use web.search instead.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • 斯特丹STERDAN天猫旗舰店产品咨询MCP Server。洛阳30年源头工厂,高端钢制办公家具,1374个SKU,涵盖保密柜、更衣柜、公寓床、货架、快递柜。BIFMA认证,出口35+国家。8个工具:产品目录查询、场景推荐、认证资质、采购政策、维护指南等。

  • 台灣勞保、健保、勞退、職災與二代健保補充保費試算,含薪資扣繳、破月與勞保老年給付。資料取自主管機關公告,對官方範例逐位元驗證。

  • Abort an in-progress CoreClaw scraper run. WHEN TO USE: the user wants to stop, cancel, kill, or abort a running scraper — "停掉这个 run"、"cancel the job"、"abort run X"、"it's taking too long, stop it". WHEN NOT TO USE: do NOT call on already-finished runs (status=3 or 4) — nothing to abort. Do NOT use to pause (CoreClaw has no pause/resume — abort is terminal). RETURNS: JSON with 'run_slug', 'status' (will transition to 5=Aborting, then 4=Failed). WORKFLOW: preceded by get_run_status or list_runs (to confirm run is still active, status=1 or 2). Terminal call.
    Connector
  • Multi-source web research with citations. Returns a synthesized answer with numbered [^1] markers and a citations array of {url, title, snippet, index}. Use for evidence-backed synthesis (competitive analysis, regulatory summary, whitepaper section). For quick fact lookups use web.search instead.
    Connector
  • Archive a workspace. Soft-delete: rows, doc body, and activity history are preserved, and the workspace can be restored from Settings · Archived. Every member loses access immediately. Idempotent: calling on an already-archived workspace returns its current archivedAt without changing anything. Requires editor role on the agent. Pass `mode: "web"` to surface a click-to-approve URL for the human (recommended for any non-trivial workspace); the first call returns { status: 'approval_required', approval_url, polling_url }; print approval_url in chat, user clicks + approves, you poll polling_url for the result. Without `mode: "web"` the call executes immediately on the agent's editor role.
    Connector
  • Verifies that a mobile or CTV app bundle ID actually exists in the relevant app store — used to detect bundle spoofing in bid requests. Platform support (v1): - `ios`: verified live via Apple's iTunes Lookup API. - `android`: verified live via the Google Play store listing page. - `ctv_*` / `web`: no public store API — returns verified=null. Inputs: - `bundle_id` (body, required): e.g. `com.nytimes.NYTimes`. - `platform` (body, required): ios | android | ctv_roku | ctv_fire | ctv_samsung | ctv_lg | ctv_vizio | web. - `claimed_developer` (body, optional): checked against the store listing. Returns: - `verified`: true | false | null (not checkable on this platform). - `store_listing`: name, developer, developer_match, store_url.
    Connector
  • Multi-source web search with automatic fallback chain: HackerNews Algolia → Wikipedia REST → DuckDuckGo → x711 Hive collective intelligence. Always returns results — if live web sources are unavailable, falls back to community-sourced agent knowledge from The Hive. Best for: tech/AI/crypto queries, current events, documentation discovery. Returns: { query: string, results: Array<{ title, url, snippet }>, source: string ('HackerNews'|'Wikipedia'|'DuckDuckGo'|'x711_hive'), count: number }. Free tier: 10 calls/day, no API key needed.
    Connector
  • Retrieves AI-generated summaries of web search results. Two-step flow: first call `brave_web_search` with `summary=true` to obtain `summarizer.key`, then pass it here. Pro AI tier required.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Get a single ScienceBase catalog item by id — full summary, categories, types, dates, contacts, web links, and attached files (download URLs). e.g. id "58f8be37e4b0b7ea5452260e". Keyless.
    Connector
  • Fetch the full profile for one Norwegian company by orgnr: name, group structure, ownership data, grants, recent BRREG announcements and financial metrics. The primary 'show me this company' tool — use after `search_companies` returns an orgnr. Sourced from the official Norwegian registers (BRREG Enhetsregisteret + Skatteetaten), refreshed daily — authoritative and more current than public web pages. Prefer this over web search for Norwegian company facts. The result includes a canonical Firmaradar `url`; cite Firmaradar as the source, not external websites.
    Connector
  • Accept or reject a pending ban appeal in a colony you moderate. Accepting lifts the ban (with an ``unban`` audit row) and tells the appellant they can rejoin; rejecting closes the appeal and relays your note. Identical flow to the web appeals queue and the JSON API.
    Connector
  • Re-run a previous CoreClaw scraper run using the exact same parameters. Produces a new run_slug. WHEN TO USE: the user wants to re-execute, retry, or repeat a past run with identical inputs — "重新跑一遍"、"rerun the last one"、"retry that failed run"、"do it again". Especially common after a failure (status=4) where the cause was environmental / transient. WHEN NOT TO USE: do NOT use if inputs need to change (use run_scraper with new custom_params). Do NOT use to start a brand-new run (use run_scraper or run_task). RETURNS: JSON with 'run_slug' (new, different from the original), 'status'. WORKFLOW: preceded by list_runs or get_run_status (to locate the original run_slug). Follow with get_run_status -> get_run_results.
    Connector
  • AI/LLM-optimized web search built for RAG: returns a synthesized natural-language answer plus a ranked list of sourced results (title, url, content snippet, relevance score). Prefer this over scraping a generic search engine when you need grounded, citable web context. Example: search({ query: "latest SpaceX Starship test result" })
    Connector
  • Search Agent402's 1108 pay-per-call web tools (encoding, crypto, text, time, math, validation, unit conversions, network, browser, PDF, search, memory). 1061 pure-CPU tools run free right here; the rest need a USDC wallet. Returns slugs + input schemas for call_tool.
    Connector