306,652 tools. Last updated 2026-07-25 18:37

"Tools for simulating browser behavior to scrape web pages" matching MCP tools:

recon_match
AINumbers Fintech Intelligence Suite
Parses a camt.053.001 statement (same schema-subset checks as camt053_parse) and a counterpart expectation set (CSV, header EndToEndId,Amount,Currency,Date -- same strict RFC 4180 parser as the WORKBOOK-1 CSV tools), then runs the SAME deterministic match engine as the tools/565 browser workbench: EndToEndId exact match first (statement order), then amount+date tolerance + currency match (first unmatched expectation in CSV order). Returns matches, exceptions (unmatched entries/expectations), and a reconciliation receipt (statement digest, expectation-set digest, match-rule declaration, counts, exception digests, execution_hash) -- byte-identical to the browser tool for the same inputs. Prepare/hash/receipt only; per-exception disposition receipts are the browser workbench's interactive follow-on.
Connector
upload_media
SendIt
FOR CLAUDE DESKTOP ONLY (with filesystem access). For Claude.ai/web: Use create_upload_session instead - it provides a browser upload link. Upload local media to cloud storage, returning a public HTTPS URL. WHEN TO USE: • Instagram, LinkedIn, Threads, X: REQUIRED for local files before calling publish_content • TikTok: NOT NEEDED - pass local path directly to publish_content SUPPORTED FORMATS: • Images: jpg, png, gif, webp (max 10MB) • Videos: mp4, mov, webm (max 100MB) Returns { url: 'https://...' } for use in publish_content mediaUrl parameter.
Connector
send_html
agentView
Shows HTML content on a display: menus, dashboards, welcome pages, schedules or any custom design. slot 'live' (default) replaces the current content; slot 'idle' stores the default/fallback content shown when nothing live is active (idle requires admin scope). Always pass a short description so later content reads stay meaningful. Exactly one of html or base64_html. For external web pages use send_url; to edit current content call read_display_html first. For polished results load prompt render_premium_display_html or resource agentview://public/design-system. Requires content scope.
Connector
lion_deep_research
LION — Verified Company & Compliance Data
Company research for AI agents $0.03: web + scrape + firmographics + domain trust in one call. Alias: /api/x402/company-research. Ed25519-attested. Use before/after people enrichment. Upsell verified company file $0.95. ?q=company or domain. [x402 paid: GET /api/x402/deep-research-json price $0.03 on Base]
Connector
calls_meet_browser
DialogBrain
Attach to a Google Meet bot's live browser to diagnose and recover a bot that isn't visibly joining. Pass the meet session's call_id; returns a page_id. Then drive the bot's Meet page with the generic browser tools (browser.snapshot / browser.click / browser.take_screenshot / browser.evaluate / browser.console_messages / browser.network_requests) using that page_id — read the snapshot to see whether the bot is in the lobby, blocked, or admitted, and click guest-side controls to recover a stalled join. Note: host admission ('Admit') happens in the host's own browser and is not present on the bot's page.
Connector
authorize
attest-mcp-remote
Start the device-flow authorization to attest works in this session (up to 20 attestations, 24h). Returns a link the USER must open in a browser and approve (anti-bot check included). After the user approves, call `complete_authorization`. Not needed if the connection already carries an API key header, or for verification tools.
Connector

Matching MCP Servers

web-scrape
teslashibe
-
license
-
quality
-
maintenance
Enables scraping and fetching websites with protection handling like Cloudflare and captchas, via an MCP interface.
Last updated 2026-07-18
1
MCP Web Scrape
Web Scraping Browser Automation Search
mukul975
A
license
B
quality
D
maintenance
A comprehensive web scraping server that transforms web content into clean, agent-ready Markdown with automatic citations and efficient caching. It features a robust suite of tools for metadata extraction, sentiment analysis, SEO auditing, and security scanning while strictly adhering to robots.txt policies.
Last updated 2026-02-16
48
28
29
MIT

Matching MCP Connectors

foundrynet-scrape
x402-gated web extraction gateway. Tools: extract, extract_batch.
Tldr Pages
tldr-pages community simplified man pages (cached 24h)

emit_chaingraph_artifact
AINumbers Fintech Intelligence Suite
Makes ChainGraph tools agent-callable (ChainGraph Standard v0.1 §3.1). Mode 1 — supply pre_computed_artifact (exported from the browser tool): validates §4 schema fields, recomputes execution_hash via SHA-256 over canonical {policy_parameters, output_payload}, returns verified structuredContent. Mode 2 — supply tool_id + policy_parameters: returns an artifact template envelope and browser prefill URL so an agent can hand the user a pre-filled link; GPU sims always delegate to the browser per §9.2. Mode 3 — supply tool_id only: returns node metadata and artifact schema scaffold. Mode 4 (Compute Binding, v0.4) — supply tool_id + policy_parameters + compute:"server" (or compute:"auto" for gpu:false nodes): runs the registered kernel server-side and returns a verified v0.4 artifact with execution_hash + output_payload in one round-trip. No browser required. gpu:true nodes always delegate to browser. readOnlyHint: true. Zero PII, zero payload logging. Pair with verify_execution_hash (independent hash verification) and build_chaingraph (DAG wiring).
Connector
snapforge_signup
SnapForge
Create a free SnapForge account (100 renders/month) with just an email address and get the API key instantly. The key is bound to the current MCP session, so the screenshot/pdf/markdown tools work immediately after signup — no browser needed.
Connector
firecrawl_scrape
xpay✦ Web Scraping Collection
Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
Connector
list_tags
Cataas
List all available cat tags for filtering. Use tag names with cat_by_tag to find cats by appearance or behavior.
Connector
get_subscribe_link
FreightUtils MCP Server
Get the URL where the user can subscribe to FreightUtils Pro for higher API limits (50,000 requests/month). Use when the user asks how to upgrade or about pricing, or after any other tool errors with a 429 rate_limited body. Behavior: static local response — no API call, never rate-limited. Returns: url, tier, monthly_limit, monthly_price, currency and note under result. Hand the URL to the USER to open in a browser — agents must NOT attempt to complete the subscription themselves.
Connector
stealth_markdown_parser
NodeProxy Web Surface Markdown Parser
Hardened headless-browser fetch with full JavaScript/SPA rendering and a realistic browser profile, returning fully rendered Markdown. Best for JavaScript-heavy/SPA pages and light bot checks; not guaranteed against advanced anti-bot walls (e.g. Cloudflare/Akamai). Price: $0.05 USDC per call.
Connector
get_site_analytics
vibedeploy
Return a privacy-safe traffic summary for a site over the last `period` days (default 7): total page views, distinct-visitor count, top pages, daily counts, device/browser breakdowns, and Web Vitals averages. Never exposes raw visitor IPs or user-agents.
Connector
scrapingdog_scrape
Scrapingdog
Scrape any website through Scrapingdog's rotating proxies and return its content. Returns HTML by default, or clean markdown with format:"markdown" (ideal for feeding an LLM). Set dynamic:true to render JavaScript in a headless browser for SPAs and dynamic pages (costs 5 credits instead of 1), premium:true for hard-to-scrape sites (residential proxies, 10 credits), and country to geotarget the proxy. Example: scrapingdog_scrape({ url: "https://example.com", format: "markdown", dynamic: true, _apiKey: "your-key" })
Connector
decodo_scrape
Decodo
Scrape any web page through Decodo (formerly Smartproxy) rotating proxies and return its content. Handles anti-bot pages; use render_js:true for JavaScript-heavy sites (headless-browser rendering) and markdown:true for clean LLM-ready markdown instead of raw HTML. BYOK — _apiKey is your Decodo Web Scraping API "username:password" credentials. Example: decodo_scrape({ url: "https://example.com", render_js: true, _apiKey: "user:pass" })
Connector
camt053_parse
AINumbers Fintech Intelligence Suite
Parses a camt.053.001 bank-to-customer statement XML document with the same schema-subset structural and facet checks as the tools/565 browser reconciliation workbench (IBAN mod-97, BIC, currency, date/decimal facets), returning the extracted statement (message id, statement id, account IBAN/currency, balances, entries) on success or the structural error list on failure. Byte-identical extraction to the browser tool for the same input. Read-only parse -- feed the result to recon_match for reconciliation.
Connector
pageviews_count
convalytics
Count page views for a specific project in a time window. Page views are the automatic hits captured by the browser script tag (separate from custom events). Use this for web-traffic questions like 'how many pageviews in the last 24 hours'. Default window is the last 7 days. Pass `user` to scope to one visitor.
Connector
web_url_reader
Inferventis MCP Server
Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
Connector
web_url_reader
Inferventis — Financial Data, News & Web MCP
Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
Connector
list_pages
California Justice Watch
Return the canonical list of pages on cajusticewatch.com — slug, URL, label, and purpose. Use this when the user asks about features/pages/tools of the site, OR when you need to recommend a page, OR before saying "I do not have access to X" — the page may actually exist.
Connector