207,055 tools. Last updated 2026-06-17 17:12

"MCP server for web scraping and content extraction" matching MCP tools:

query_registry
Arclan MCP Registry
Search the Arclan registry for MCP servers. By default returns only connectable servers (active, mcp_partial, auth_gated). Use status=stdio to browse local-only servers available for installation. Use status=all to query the full index. Use production_safe=true to restrict to servers with uptime > 97% and handshake success > 95%. Use read_only=true to restrict to servers with no write or exec tools. Use this before connecting to an MCP server to check its validation status and score. After using a server, call report_server to contribute reliability data.
Connector
upload
Fast.io
File upload: streaming (one-shot stream-upload — DEFAULT for unknown/generated content), chunked (create-session → POST /blob → chunk → finalize — only when filesize is known exactly), web URL import, and batch (multi-small-file). Call action='describe' for the full action/param reference. Side effects: finalize/stream/stream-upload/web-import/batch create files and consume storage credits. Same-name uploads to a folder OVERWRITE the existing node in place (preserved as a recoverable version). BINARY: `content` is text-only (writes verbatim UTF-8); for binary use `content_base64` (server-decoded) or POST /blob + `blob_id`. UPLOAD STRATEGY (read top-to-bottom, pick the FIRST that matches): (1) Have a URL? → `web-import` (single call). (2) Have content but DON'T know exact size, OR generating/transforming content first? → `stream-upload` (single call, auto-finalizes, NO filesize required, size auto-detected from the bytes). (3) Have a file with KNOWN exact byte count? → `create-session` + `chunk`(s) + `finalize`. **filesize must match the bytes you actually upload — mismatch causes finalize to fail with code 10522 and you must cancel the session.** (4) Multiple small files (≤4 MB each, ≤200 total) into one folder? → `batch`. DEFAULT to `stream-upload` unless you are sure of the exact byte count. Do NOT guess `filesize` for generated content — use `stream-upload` instead. max_size is a hard ceiling that aborts mid-transfer — always overestimate or omit (server uses plan limit).
Connector
get_auth_status
ssuMCP
Returns the current MCP auth session status for each provider (SAINT, LMS, LIBRARY). Call this before private tools when you have an mcp_session_id and want to avoid unnecessary AUTH_REQUIRED retries. If mcp_session_id is missing or invalid, all providers show as not linked. Sessions are stored in server memory and reset on server restart — call start_auth again if your session is lost.
Connector
strale_ping
Strale - 290+ API capabilities for AI agents
Checks that the Strale API is reachable and the MCP server is running. Call this before a series of capability executions to verify connectivity, or when troubleshooting connection issues. Returns server status, version, tool count, capability count, solution count, and a timestamp. No API key required.
Connector
ping
Nordic Financial MCP
Connectivity check that confirms the Nordic MCP server process is responding. Use this at the start of a session to verify the server is reachable before making other calls. Do not use as a proxy for database health — the server can respond while the Qdrant vector database is temporarily unavailable. To confirm data availability, call search_filings directly. Returns: A greeting string: "Hello {name}! Nordic MCP server is running."
Connector
enable_auto_deposit
MERX - TRON Resource Exchange
Configure automatic top-up when balance drops below a threshold. The configuration lives ONLY in the current MCP session — it is held in memory by the MCP server process and is lost on server restart, MCP client reconnect, or server redeploy. Top-ups are signed locally with TRON_PRIVATE_KEY and sent to your Merx deposit address (memo-routed). For persistent auto-deposit you currently need to call this tool again at the start of each session.
Connector

Matching MCP Servers

Documentation Retrieval & Web
Web Scraping Documentation Access RAG Systems
AIwithhassan
F
license
A
quality
D
maintenance
Enables retrieval and cleaning of official documentation content for popular AI/Python libraries (uv, langchain, openai, llama-index) through web scraping and LLM-powered content extraction. Uses Serper API for search and Groq API to clean HTML into readable text with source attribution.
Last updated 2025-10-06
1
1
content-core
Web Scraping Multimedia Processing
lfnovo
A
license
B
quality
B
maintenance
Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
Last updated 2026-05-12
1
160
MIT

Matching MCP Connectors

web-scraping-mcp-server
Generic URL crawl + HTML extraction — fallback for sites without dedicated MCPs.
Social & Content MCP Server
MCP server for social media and content data including social profiles, engagement metrics, content trends, and influencer analytics for AI agents.

web_url_reader
Inferventis MCP Server
Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
Connector
tronsave_login
mcp
Authenticate with TronSave and create a server session. Returns `{ sessionId, walletAddress?, expiresAt }` — pass `sessionId` as the `mcp-session-id` header on every subsequent MCP request. `walletAddress` is set only for signature-mode logins. Two modes: (1) wallet signature (preferred for platform tools) — call this tool with `signature_timestamp` formatted as `<signature>_<timestamp>`, where `<signature>` must be produced client-side by signing the timestamp message; you may optionally call `tronsave_get_sign_message` to obtain a helper message/timestamp pair; (2) API key (internal tools) — pass `apiKey` (raw key, no prefix). Side effect: creates a new session on the server. Wallet signing must happen client-side; never send private keys to the server.
Connector
tag_file
Gemina FileTag
Run the FileTag pipeline against a previously uploaded slot. The ``file_id`` comes from a prior ``files_create_upload`` call. The server validates the uploaded blob (size, content-type, optional SHA-256), atomically consumes the slot, runs the FileTag extraction (renaming + metadata embedding), and returns the structured result with the extracted metadata, the suggested filename, the ``enriched_file_url`` (short-lived signed URL to the renamed copy with metadata embedded into document properties), and a ``next_action`` recipe (``http_get_and_save``) telling the agent to download that URL and save it as the suggested filename -- act on it unless the user explicitly asked for metadata only. Each slot is single-use; reserve a new slot with ``files_create_upload`` to retry.
Connector
switch_dannet_server
dannet
Switch between local and remote DanNet servers on the fly. This tool allows you to change the DanNet server endpoint during runtime without restarting the MCP server. Useful for switching between development (local) and production (remote) servers. Args: server: Server to switch to. Options: - "local": Use localhost:3456 (development server) - "remote": Use wordnet.dk (production server) - Custom URL: Any valid URL starting with http:// or https:// Returns: Dict with status information: - status: "success" or "error" - message: Description of the operation - previous_url: The URL that was previously active - current_url: The URL that is now active Example: # Switch to local development server result = switch_dannet_server("local") # Switch to production server result = switch_dannet_server("remote") # Switch to custom server result = switch_dannet_server("https://my-custom-dannet.example.com")
Connector
vf_health
VoiceFlip
Returns VoiceFlip MCP server health and version metadata. No authentication required. Use this first to verify the server is reachable from your MCP client.
Connector
rightmove_listing
UK Property Data
Full detail for a single Rightmove listing (URL or numeric ID). include_images fetches and embeds photos and floorplans as MCP image content. max_images caps the number of property photos (default 3); floorplans always included.
Connector
firecrawl_scrape
xpay✦ Web Scraping Collection
Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
Connector
ping
FXMacroData
Quick health check that confirms the FXMacroData API and MCP server are reachable. Use this only if other tools fail unexpectedly — it is not needed before normal calls.
Connector
check_connection
FundingLandscape
Verify MCP server connectivity. Returns success immediately with no database calls. Use this FIRST if experiencing tool errors - a successful response confirms the server is reachable and your authentication is valid.
Connector
how_to_use
mcp-server
Returns a plain-English usage guide for this server — example requests, what it asks the user for, and the available tools. Call this if the user asks how to use Abby SEO, or to orient yourself before starting. (Same content as the 'getting_started' prompt, exposed as a tool for clients that don't surface MCP prompts.) Takes no arguments.
Connector
status
Quotewise
Check server connectivity, authentication status, and database size. When to use: First tool call to verify MCP connection and auth state before collection operations. Examples: - `status()` - check if server is operational, see quote_count, and current auth state
Connector
assessment_get_cross_server_routes
website-search
Get Lenny Zeltser's Security Assessment cross-server handoff routes — when this MCP server can't fulfill a request, which other MCP servers (or fallback workflows) to consult. Surfaces a compact subset of `assessment_load_context`. This server never requests your assessment notes or report and instructs your AI to keep them local—the templates and guidelines flow to your AI for local analysis.
Connector
malware_get_cross_server_routes
website-search
Get Lenny Zeltser's Malware cross-server handoff routes — when this MCP server can't fulfill a request, which other MCP servers (or fallback workflows) to consult. Surfaces a compact subset of `malware_load_context`. This server never requests your sample, analysis notes, or indicators and instructs your AI to keep them local—guidelines and the report template flow to your AI for local analysis.
Connector
baselight_ping
Baselight
Simple ping test to verify MCP server is responding
Connector