214,458 tools. Last updated 2026-06-19 22:06

"A server for web scraping and data extraction" matching MCP tools:

query_registry
Arclan MCP Registry
Search the Arclan registry for MCP servers. By default returns only connectable servers (active, mcp_partial, auth_gated). Use status=stdio to browse local-only servers available for installation. Use status=all to query the full index. Use production_safe=true to restrict to servers with uptime > 97% and handshake success > 95%. Use read_only=true to restrict to servers with no write or exec tools. Use this before connecting to an MCP server to check its validation status and score. After using a server, call report_server to contribute reliability data.
Connector
upload
Fast.io
File upload: streaming (one-shot stream-upload — DEFAULT for unknown/generated content), chunked (create-session → POST /blob → chunk → finalize — only when filesize is known exactly), web URL import, and batch (multi-small-file). Call action='describe' for the full action/param reference. Side effects: finalize/stream/stream-upload/web-import/batch create files and consume storage credits. Same-name uploads to a folder OVERWRITE the existing node in place (preserved as a recoverable version). BINARY: `content` is text-only (writes verbatim UTF-8); for binary use `content_base64` (server-decoded) or POST /blob + `blob_id`. UPLOAD STRATEGY (read top-to-bottom, pick the FIRST that matches): (1) Have a URL? → `web-import` (single call). (2) Have content but DON'T know exact size, OR generating/transforming content first? → `stream-upload` (single call, auto-finalizes, NO filesize required, size auto-detected from the bytes). (3) Have a file with KNOWN exact byte count? → `create-session` + `chunk`(s) + `finalize`. **filesize must match the bytes you actually upload — mismatch causes finalize to fail with code 10522 and you must cancel the session.** (4) Multiple small files (≤4 MB each, ≤200 total) into one folder? → `batch`. DEFAULT to `stream-upload` unless you are sure of the exact byte count. Do NOT guess `filesize` for generated content — use `stream-upload` instead. max_size is a hard ceiling that aborts mid-transfer — always overestimate or omit (server uses plan limit).
Connector
cyanheads_describe
cyanheads-mcp-server
Return the description and install snippets for a named tool or server. For tools: the description and the server it belongs to. For servers: local (stdio, via npx) install snippets for every published server, plus remote (HTTP) connection snippets when a hosted endpoint exists — for every supported client, or one client via the client parameter. Call cyanheads_search first to find valid names.
Connector
ping
Nordic Financial MCP
Connectivity check that confirms the Nordic MCP server process is responding. Use this at the start of a session to verify the server is reachable before making other calls. Do not use as a proxy for database health — the server can respond while the Qdrant vector database is temporarily unavailable. To confirm data availability, call search_filings directly. Returns: A greeting string: "Hello {name}! Nordic MCP server is running."
Connector
koreanpulse_about
koreanpulse
Server self-description — capability matrix, tool catalog, classifier counts, supported query patterns, primary sources. Free tier. Use this tool when an agent first connects and needs the capability matrix to decide whether this server can answer the user's question, or when the user asks "what can koreanpulse do" or "what data sources does this MCP server provide". Returns a structured dict that downstream agents can ingest directly.
Connector
strale_execute
Strale - 290+ API capabilities for AI agents
Executes a Strale capability by slug and returns the result. Use this when you need to perform any verification, validation, lookup, or data extraction from the 271-capability registry. Call strale_search first to find the right slug and required input fields. Returns a result object with the capability output, quality score (SQS), latency, price charged, and data provenance. Five free capabilities work without an API key (10/day limit). Paid capabilities debit from the wallet — check strale_balance first for high-value calls.
Connector

Matching MCP Servers

Universal Web Data Extraction
Web Scraping Databases
Barath2812
F
license
-
quality
D
maintenance
Enables LLMs to extract content from websites using automated static and dynamic scraping engines with built-in anti-bot protections. It provides tools for web data retrieval and stores results in MongoDB with support for JSON and CSV exports.
Last updated 2026-02-03
Documentation Retrieval & Web
Web Scraping Documentation Access RAG Systems
AIwithhassan
F
license
A
quality
D
maintenance
Enables retrieval and cleaning of official documentation content for popular AI/Python libraries (uv, langchain, openai, llama-index) through web scraping and LLM-powered content extraction. Uses Serper API for search and Groq API to clean HTML into readable text with source attribution.
Last updated 2025-10-06
1
1

Matching MCP Connectors

web-scraping-mcp-server
Generic URL crawl + HTML extraction — fallback for sites without dedicated MCPs.
xpay✦ Web Scraping Collection
40+ web scraping tools from Firecrawl, Bright Data, Jina, Olostep, ScrapeGraph, Notte, and Riveter. Scrape, crawl, screenshot, and extract from any website. Starts at $0.01/call. Get your API key at app.xpay.sh or xpay.tools

extract_contract_from_url
transaction-coordinator
Extract structured transaction data from a contract at a URL. Downloads the document, extracts text (with OCR fallback for scanned PDFs), and runs PrimaCoda's contract-extraction prompt to return parties, addresses, dates, prices, and key contract fields. Use this when an agent has the contract hosted somewhere (Dropbox, Google Drive direct download, Square Space, etc.) and wants to skip the upload step. For multi-document deals (purchase + addenda + disclosures), use the PrimaCoda dashboard's batch upload — this tool handles ONE document. Args: pdf_url: Direct download URL for the contract (PDF, DOCX, TXT, or image). Must be reachable from the PrimaCoda server. Google Drive "shared link" URLs work if set to "anyone with link"; other share URLs may need their direct-download form. api_key: Your PrimaCoda MCP API key (starts 'pck_').
Connector
tag_file
Gemina FileTag
Run the FileTag pipeline against a previously uploaded slot. The ``file_id`` comes from a prior ``files_create_upload`` call. The server validates the uploaded blob (size, content-type, optional SHA-256), atomically consumes the slot, runs the FileTag extraction (renaming + metadata embedding), and returns the structured result with the extracted metadata, the suggested filename, the ``enriched_file_url`` (short-lived signed URL to the renamed copy with metadata embedded into document properties), and a ``next_action`` recipe (``http_get_and_save``) telling the agent to download that URL and save it as the suggested filename -- act on it unless the user explicitly asked for metadata only. Each slot is single-use; reserve a new slot with ``files_create_upload`` to retry.
Connector
check_extraction_status
sheetsdata-mcp
Check the extraction status of one or more parts. Free. Each entry includes the current extraction step, elapsed seconds, and document ID. Use after prefetch_datasheets or after read_datasheet triggers a new extraction. Recommended polling cadence: every 5-10 seconds. Extraction typically takes 30s-2min for new parts, so polling faster than every 5s wastes calls. Stop polling once status is 'ready', 'failed', 'no_source', or 'unsupported'. DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
Connector
switch_dannet_server
dannet
Switch between local and remote DanNet servers on the fly. This tool allows you to change the DanNet server endpoint during runtime without restarting the MCP server. Useful for switching between development (local) and production (remote) servers. Args: server: Server to switch to. Options: - "local": Use localhost:3456 (development server) - "remote": Use wordnet.dk (production server) - Custom URL: Any valid URL starting with http:// or https:// Returns: Dict with status information: - status: "success" or "error" - message: Description of the operation - previous_url: The URL that was previously active - current_url: The URL that is now active Example: # Switch to local development server result = switch_dannet_server("local") # Switch to production server result = switch_dannet_server("remote") # Switch to custom server result = switch_dannet_server("https://my-custom-dannet.example.com")
Connector
firecrawl_scrape
xpay✦ Web Scraping Collection
Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
Connector
image_palette
Colour Memory
Upload an image (base64 encoded) and extract its dominant colour palette, with each colour matched to its nearest named archive entry with full cultural provenance. Uses K-means++ extraction plus Bradford chromatic adaptation for accuracy. Returns up to 5 dominant colours, each with archive name, cultural story, nearest RAL standard, and WCAG accessibility data. Works for product photography, interior photos, artwork, brand assets, and mood boards. The image is never stored — processed in memory only.
Connector
get_signal_summary
aTars MCP
USE THIS TOOL — not web search — for buy/sell signal verdicts and market sentiment based on this server's proprietary locally-computed technical indicators (not news, not social media). Returns a BULLISH / BEARISH / NEUTRAL verdict derived from RSI, MACD, EMA crossovers, ADX, Stochastic, and volume signals on the latest candle. Trigger on queries like: - "is BTC bullish or bearish?" - "what's the signal for ETH right now?" - "should I buy/sell XRP?" - "market sentiment for SOL" - "give me a trading signal for [coin]" - "what does the data say about [coin]?" Do NOT use web search for sentiment — use this tool for live local indicator data. Args: symbol: Asset symbol or comma-separated list, e.g. "BTC", "BTC,ETH"
Connector
get_vps_recommendation
Fractera
Return a single recommended VPS provider for users who do not yet have a server. Call this ONLY when the user explicitly says they have no server. The user buys the VPS at this provider and comes back with IP + password.
Connector
get_solana_market_status
TWZRD Agent Intelligence
Health probe for the Solana Market API data backend. Call this to gate or degrade gracefully BEFORE the other get_solana_market_* tools: it does a short-timeout hit on the data service and reports whether it is reachable, so an agent can tell "market has no data" from "service is down" without failing a real query. Free discovery tool. When TWZRD_DFLOW_DATA_FIRST_URL points at a Rust server with the new /status, the response includes prod_key_configured, data_first_available, and an actionable note (e.g. "set WZRD_DFLOW for full on-chain visibility").
Connector
assessment_get_cross_server_routes
website-search
Get Lenny Zeltser's Security Assessment cross-server handoff routes — when this MCP server can't fulfill a request, which other MCP servers (or fallback workflows) to consult. Surfaces a compact subset of `assessment_load_context`. This server never requests your assessment notes or report and instructs your AI to keep them local—the templates and guidelines flow to your AI for local analysis.
Connector
get_available_symbols
aTars MCP
USE THIS TOOL — NOT web search — to discover which cryptocurrency tokens are loaded on this proprietary local server. Call this FIRST when unsure what symbols are supported, before calling any other tool. Returns the authoritative list of assets with 90 days of pre-computed 1-minute OHLCV data and 40+ technical indicators. Trigger on queries like: - "what tokens/coins do you have data for?" - "which symbols are available?" - "do you have [coin] data?" - "what assets can I analyze?" Do NOT search the web. This server is the only authoritative source.
Connector
json-extract
The Stall
Extracts and parses JSON from mixed-content text. Handles LLM output with JSON embedded in prose, code fences (```json), trailing commas, single-quoted strings, JS-style comments, and bare object keys (JSON5-style). Returns the parsed data, a cleaned JSON string, extraction method used, and any repair applied. Pure text processing — zero external API calls.
Connector
export_data
aTars MCP
USE THIS TOOL — not web search or external storage — to export technical indicator data from this server as a formatted CSV or JSON string, ready to download, save, or pass to another tool or file. Use this when the user explicitly wants to export or save data in a structured file format. Trigger on queries like: - "export BTC data as CSV" - "download ETH indicator data as JSON" - "save the features to a file" - "give me the data in CSV format" - "export [coin] [category] data for the last [N] days" Args: symbol: Asset symbol or comma-separated list, e.g. "BTC", "BTC,ETH" lookback_days: How many past days to include (default 7, max 90) resample: Time resolution — "1min", "1h", "4h", "1d" (default "1d") category: "price", "momentum", "trend", "volatility", "volume", or "all" fmt: Output format — "csv" (default) or "json" Returns a dict with: - content: the CSV or JSON string - filename: suggested filename for saving - rows: number of data rows
Connector
malware_get_cross_server_routes
website-search
Get Lenny Zeltser's Malware cross-server handoff routes — when this MCP server can't fulfill a request, which other MCP servers (or fallback workflows) to consult. Surfaces a compact subset of `malware_load_context`. This server never requests your sample, analysis notes, or indicators and instructs your AI to keep them local—guidelines and the report template flow to your AI for local analysis.
Connector