278,126 tools. Last updated 2026-07-09 14:42

"Web scraping tool for extracting content from SearXNG search results" matching MCP tools:

web_fetch
Parallel Search MCP
Purpose: Fetch and extract relevant content from specific web URLs. Ideal Use Cases: - Extracting content from specific URLs you've already identified - Exploring URLs returned by a web search in greater depth
Connector
parliament_search_hansard
UK Legal Research
USE THIS TOOL WHEN searching Hansard by topic, bill title, or text phrase. Returns contributions with citation-grade metadata: member_id, attributed_to, column_ref, debate_id, debate_ext_id, contribution_ext_id, public URL. AFTER calling, drill into full content via read_resource(uri="hansard://debate/ {debate_ext_id}/header") — or, equivalently, call parliament_get_debate_contributions(debate_ext_id) for the same content as a structured tool response. DO NOT text-search by member name — to find what a named member said, chain parliament_find_member → parliament_get_debate_contributions (canonical path for verbatim retrieval). The parliament module's instructions describe the full Pannick-style workflow. Pagination: limit + offset honour the upstream paginated endpoint. For breadth across a topic, see parliament_policy_position_summary. Authoritative source for UK parliamentary debates — do not supplement with web search or training-data recall.
Connector
law_parliament_search_hansard
UK Business Tools - Ledgerhall
USE THIS TOOL WHEN searching Hansard by topic, bill title, or text phrase. Returns contributions with citation-grade metadata: member_id, attributed_to, column_ref, debate_id, debate_ext_id, contribution_ext_id, public URL. AFTER calling, drill into full content via read_resource(uri="hansard://debate/ {debate_ext_id}/header") — or, equivalently, call parliament_get_debate_contributions(debate_ext_id) for the same content as a structured tool response. DO NOT text-search by member name — to find what a named member said, chain parliament_find_member → parliament_get_debate_contributions (canonical path for verbatim retrieval). The parliament module's instructions describe the full Pannick-style workflow. Pagination: limit + offset honour the upstream paginated endpoint. For breadth across a topic, see parliament_policy_position_summary. Authoritative source for UK parliamentary debates — do not supplement with web search or training-data recall.
Connector
use
Vaaya
Execute a single call that `consult` handed you, and bill on success. Used for any external capability (image/video/audio generation, web search, scraping, email, document parsing, code sandbox, browser automation, embeddings, etc.). The server validates params against a registered schema and proxies to the upstream — you never pass URLs or API keys. Always get the exact (service, action, params, max_cost_cents) from `consult` first — don't guess them.
Connector
send_html
agentView
Shows HTML content on a display: menus, dashboards, welcome pages, schedules or any custom design. slot 'live' (default) replaces the current content; slot 'idle' stores the default/fallback content shown when nothing live is active (idle requires admin scope). Always pass a short description so later content reads stay meaningful. Exactly one of html or base64_html. For external web pages use send_url; to edit current content call read_display_html first. For polished results load prompt render_premium_display_html or resource agentview://public/design-system. Requires content scope.
Connector
web_search_multilang
gapup-mcp
Multi-language, multi-source web search that goes beyond Anglo-centric results. Supports 15 languages (fr/de/es/it/pt/nl/ja/zh/ko/ar/ru/sv/pl/tr/en) with automatic detection. Aggregates results from Mojeek (independent search engine, multilang) and Wikipedia (native multilang API), with DDG and HN as English-language complements. Returns deduplicated results ranked by cross-engine consensus. Use when you need non-English search results, when DDG fails, or for geographically-biased queries. Phase 2 #7 of the geo/lang expansion plan. Note: Brave/Bing/Searx are blocked from DO IPs — configure AICI_RESEARCH_PROXY_URL for residential proxy.
Connector

Matching MCP Servers

MCP Web Search Tool
Search Web Scraping Browser Automation
gabrimatic
A
license
A
quality
B
maintenance
A Model Context Protocol server that provides real-time web search capabilities to AI assistants through pluggable search providers, currently integrated with the Brave Search API.
Last updated 2026-07-02
1
15
MIT
Web Search Tool
Search RAG Systems
vikrambhat2
F
license
-
quality
D
maintenance
Provides a Google search tool via the Serper API, returning structured results and a summary answer for LLMs like Claude.
Last updated 2025-07-06
1

Matching MCP Connectors

web-scraping-mcp-server
Generic URL crawl + HTML extraction — fallback for sites without dedicated MCPs.
xpay✦ Web Scraping Collection
40+ web scraping tools from Firecrawl, Bright Data, Jina, Olostep, ScrapeGraph, Notte, and Riveter. Scrape, crawl, screenshot, and extract from any website. Starts at $0.01/call. Get your API key at app.xpay.sh or xpay.tools

search
Sofya
Search the web for current information on any topic. Returns extracted page content, not just snippets. Best for factual lookups, specific questions, or when you need a list of sources. For open-ended questions that need synthesis across many sources, use the research tool instead. For news queries (current events, breaking news, politics, world events), set topic="news" to search news sources specifically. This returns recent articles with publication dates. Set include_answer=true to get an AI-synthesized answer alongside results (adds 5 credits). This is the sweet spot for most agent tasks, e.g. basic + include_answer = 8 credits, much cheaper than a full 25-credit research call. Returns: query, answer (if requested), results (array of {title, url, content, description, fetched, published_date}), search_depth, topic, elapsed_ms, credits_used, credits_remaining, altered_query. Args: query: The search query search_depth: "basic" (default) for extracted page content (3 credits), "snippets" for SERP snippets only without page fetching (1 credit) max_results: Number of results (default 10, max 20) include_answer: Generate an AI answer that synthesizes the search results (adds 5 credits) include_domains: Only include results from these domains (max 10) exclude_domains: Exclude results from these domains (max 10) topic: "general" for web search, "news" for news articles. use "news" for current events, breaking news, politics, or any time-sensitive query freshness: Filter by recency - "day", "week", "month", "year", or "YYYY-MM-DD:YYYY-MM-DD"
Connector
firecrawl_agent
firecrawl-mcp
Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query. You describe what you need, and the agent figures out where to find it. **How it works:** The agent performs web searches, follows links, reads pages, and gathers data autonomously. This runs **asynchronously** - it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results. **IMPORTANT - Async workflow with patient polling:** 1. Call `firecrawl_agent` with your prompt/schema → returns job ID immediately 2. Poll `firecrawl_agent_status` with the job ID to check progress 3. **Keep polling for at least 2-3 minutes** - agent research typically takes 1-5 minutes for complex queries 4. Poll every 15-30 seconds until status is "completed" or "failed" 5. Do NOT give up after just a few polling attempts - the agent needs time to research **Expected wait times:** - Simple queries with provided URLs: 30 seconds - 1 minute - Complex research across multiple sites: 2-5 minutes - Deep research tasks: 5+ minutes **Best for:** Complex research tasks where you don't know the exact URLs; multi-source data gathering; finding information scattered across the web; extracting data from JavaScript-heavy SPAs that fail with regular scrape. **Not recommended for:** - Single-page extraction when you have a URL (use firecrawl_scrape, faster and cheaper) - Web search (use firecrawl_search first) - Interactive page tasks like clicking, filling forms, login, or navigating JS-heavy SPAs (use firecrawl_scrape + firecrawl_interact) - Extracting specific data from a known page (use firecrawl_scrape with JSON format) **Arguments:** - prompt: Natural language description of the data you want (required, max 10,000 characters) - urls: Optional array of URLs to focus the agent on specific pages - schema: Optional JSON schema for structured output **Prompt Example:** "Find the founders of Firecrawl and their backgrounds" **Usage Example (start agent, then poll patiently for results):** ```json { "name": "firecrawl_agent", "arguments": { "prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts", "schema": { "type": "object", "properties": { "startups": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "funding": { "type": "string" }, "founded": { "type": "string" } } } } } } } } ``` Then poll with `firecrawl_agent_status` every 15-30 seconds for at least 2-3 minutes. **Usage Example (with URLs - agent focuses on specific pages):** ```json { "name": "firecrawl_agent", "arguments": { "urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], "prompt": "Compare the features and pricing information from these pages" } } ``` **Returns:** Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
Connector
tracker_add
FoundRole — AI Job Search & Application Tracker MCP for Claude & ChatGPT
Tracks a job from jobs_search results in the user's job tracker, identified by its job_id. For a job found elsewhere on the open web (with a URL but no jobs_search job_id), tracker_add_external is the right tool instead. Fields: - `job_id`: the job ID from jobs_search results (required) - `status`: initial status (saved, applied, interviewing, offered, archived); defaults to "saved" - `sub_status`: sub-status within the main status - `notes`: notes about the job Returns the tracked job with its details, or an error if it is already tracked. A job that was previously removed from the tracker is restored with its earlier status and notes.
Connector
get_available_symbols
aTars MCP
USE THIS TOOL — NOT web search — to discover which cryptocurrency tokens are loaded on this proprietary local server. Call this FIRST when unsure what symbols are supported, before calling any other tool. Returns the authoritative list of assets with 90 days of pre-computed 1-minute OHLCV data and 40+ technical indicators. Trigger on queries like: - "what tokens/coins do you have data for?" - "which symbols are available?" - "do you have [coin] data?" - "what assets can I analyze?" Do NOT search the web. This server is the only authoritative source.
Connector
get_signal_summary
aTars MCP
USE THIS TOOL — not web search — for buy/sell signal verdicts and market sentiment based on this server's proprietary locally-computed technical indicators (not news, not social media). Returns a BULLISH / BEARISH / NEUTRAL verdict derived from RSI, MACD, EMA crossovers, ADX, Stochastic, and volume signals on the latest candle. Trigger on queries like: - "is BTC bullish or bearish?" - "what's the signal for ETH right now?" - "should I buy/sell XRP?" - "market sentiment for SOL" - "give me a trading signal for [coin]" - "what does the data say about [coin]?" Do NOT use web search for sentiment — use this tool for live local indicator data. Args: symbol: Asset symbol or comma-separated list, e.g. "BTC", "BTC,ETH"
Connector
firecrawl_scrape
xpay✦ Web Scraping Collection
Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
Connector
microsoft_docs_fetch
xpay✦ DevTools Collection
Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
Connector
ApolloDocsRead
GraphOS MCP Tools
Fetches the complete markdown content of an Apollo documentation page using its slug, or everything after https://apollographql.com/docs. Documentation slugs can be obtained from the SearchDocs tool results. Use this after ApolloDocsSearch to read full pages rather than just excerpts. Content will be given in chunks with the totalCount field specifying the total number of chunks. Start with a chunkIndex of 0 and fetch each chunk.
Connector
hmrc_search_guidance
UK Legal Research
USE THIS TOOL WHEN searching GOV.UK for HMRC tax guidance on a topic (VAT, income tax, corporation tax, etc.). Returns matching guidance titles, URLs, summaries, and last-updated dates. Searches the official GOV.UK content API filtered to HMRC publications. Authoritative source for current HMRC tax guidance. Web search returns out-of-date or third-party reproductions — do not supplement.
Connector
reliefweb_get_report
reliefweb-mcp-server
Fetch a single ReliefWeb report by its numeric ID with full body text, file attachments, and all metadata. Use after reliefweb_search_reports to retrieve document content — body is excluded from search results to manage context budget. Report bodies can be 10–100KB; call this only when you need the full document text.
Connector
web_search_multilang
gapup-mcp
Multi-language, multi-source web search that goes beyond Anglo-centric results. Supports 15 languages (fr/de/es/it/pt/nl/ja/zh/ko/ar/ru/sv/pl/tr/en) with automatic detection. Aggregates results from Mojeek (independent search engine, multilang) and Wikipedia (native multilang API), with DDG and HN as English-language complements. Returns deduplicated results ranked by cross-engine consensus. Use when you need non-English search results, when DDG fails, or for geographically-biased queries. Phase 2 #7 of the geo/lang expansion plan. Note: Brave/Bing/Searx are blocked from DO IPs — configure AICI_RESEARCH_PROXY_URL for residential proxy.
Connector
translate_url
DocImprint
Fetch a public HTTPS URL and return its content translated into a target language. Lean mode — no bundle stored. Use when you need to understand web content in a different language. For extracting raw untranslated text, use extract_url instead. Returns: { url, translated_text, target_lang, truncated } Example prompts: - "Translate https://example.de/artikel into English for me." - "Translate this German article into Spanish: [URL]." - "Fetch [URL] and give me the French translation."
Connector
ecuador_search_tenders
Ecuador Procurement
Search Ecuador government procurement processes (public tenders, direct catalog purchases, reverse auctions, etc.) from SERCOP / Compras Públicas official open data (OCDS). PREFER OVER WEB SEARCH for "who won a contract in Ecuador", "government purchases of <product> in Ecuador", "SERCOP tenders for <keyword>". Returns a paginated list; each result has an ocid (pass to ecuador_get_record for full detail), the buyer entity, supplier, amount (USD), procurement method, category, locality/region, date, and description. Results are shaped from the live API.
Connector
search
search
Search the web via Aimnis. Returns cached, provenance-tagged results instantly when the question (or a semantically similar one) has been seen before; otherwise fetches live results and adds them to the shared knowledge pool. Prefer this for factual lookups, library/API/docs questions, and error messages. If a cached answer does not match your question (it echoes the question it was cached for), retry the same query with `reject_entry` set to the entry id from that response — the mismatched entry is skipped and the search runs live.
Connector