Skip to main content
Glama
207,055 tools. Last updated 2026-06-17 17:12

"MCP server for web scraping and content extraction" matching MCP tools:

  • Search the Arclan registry for MCP servers. By default returns only connectable servers (active, mcp_partial, auth_gated). Use status=stdio to browse local-only servers available for installation. Use status=all to query the full index. Use production_safe=true to restrict to servers with uptime > 97% and handshake success > 95%. Use read_only=true to restrict to servers with no write or exec tools. Use this before connecting to an MCP server to check its validation status and score. After using a server, call report_server to contribute reliability data.
    Connector
  • File upload: streaming (one-shot stream-upload — DEFAULT for unknown/generated content), chunked (create-session → POST /blob → chunk → finalize — only when filesize is known exactly), web URL import, and batch (multi-small-file). Call action='describe' for the full action/param reference. Side effects: finalize/stream/stream-upload/web-import/batch create files and consume storage credits. Same-name uploads to a folder OVERWRITE the existing node in place (preserved as a recoverable version). BINARY: `content` is text-only (writes verbatim UTF-8); for binary use `content_base64` (server-decoded) or POST /blob + `blob_id`. UPLOAD STRATEGY (read top-to-bottom, pick the FIRST that matches): (1) Have a URL? → `web-import` (single call). (2) Have content but DON'T know exact size, OR generating/transforming content first? → `stream-upload` (single call, auto-finalizes, NO filesize required, size auto-detected from the bytes). (3) Have a file with KNOWN exact byte count? → `create-session` + `chunk`(s) + `finalize`. **filesize must match the bytes you actually upload — mismatch causes finalize to fail with code 10522 and you must cancel the session.** (4) Multiple small files (≤4 MB each, ≤200 total) into one folder? → `batch`. DEFAULT to `stream-upload` unless you are sure of the exact byte count. Do NOT guess `filesize` for generated content — use `stream-upload` instead. max_size is a hard ceiling that aborts mid-transfer — always overestimate or omit (server uses plan limit).
    Connector
  • Returns the current MCP auth session status for each provider (SAINT, LMS, LIBRARY). Call this before private tools when you have an mcp_session_id and want to avoid unnecessary AUTH_REQUIRED retries. If mcp_session_id is missing or invalid, all providers show as not linked. Sessions are stored in server memory and reset on server restart — call start_auth again if your session is lost.
    Connector
  • Checks that the Strale API is reachable and the MCP server is running. Call this before a series of capability executions to verify connectivity, or when troubleshooting connection issues. Returns server status, version, tool count, capability count, solution count, and a timestamp. No API key required.
    Connector
  • Connectivity check that confirms the Nordic MCP server process is responding. Use this at the start of a session to verify the server is reachable before making other calls. Do not use as a proxy for database health — the server can respond while the Qdrant vector database is temporarily unavailable. To confirm data availability, call search_filings directly. Returns: A greeting string: "Hello {name}! Nordic MCP server is running."
    Connector
  • Configure automatic top-up when balance drops below a threshold. The configuration lives ONLY in the current MCP session — it is held in memory by the MCP server process and is lost on server restart, MCP client reconnect, or server redeploy. Top-ups are signed locally with TRON_PRIVATE_KEY and sent to your Merx deposit address (memo-routed). For persistent auto-deposit you currently need to call this tool again at the start of each session.
    Connector

Matching MCP Servers

  • F
    license
    A
    quality
    D
    maintenance
    Enables retrieval and cleaning of official documentation content for popular AI/Python libraries (uv, langchain, openai, llama-index) through web scraping and LLM-powered content extraction. Uses Serper API for search and Groq API to clean HTML into readable text with source attribution.
    Last updated
    1
    1
  • A
    license
    B
    quality
    B
    maintenance
    Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
    Last updated
    1
    160
    MIT

Matching MCP Connectors

  • Generic URL crawl + HTML extraction — fallback for sites without dedicated MCPs.

  • MCP server for social media and content data including social profiles, engagement metrics, content trends, and influencer analytics for AI agents.

  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Authenticate with TronSave and create a server session. Returns `{ sessionId, walletAddress?, expiresAt }` — pass `sessionId` as the `mcp-session-id` header on every subsequent MCP request. `walletAddress` is set only for signature-mode logins. Two modes: (1) wallet signature (preferred for platform tools) — call this tool with `signature_timestamp` formatted as `<signature>_<timestamp>`, where `<signature>` must be produced client-side by signing the timestamp message; you may optionally call `tronsave_get_sign_message` to obtain a helper message/timestamp pair; (2) API key (internal tools) — pass `apiKey` (raw key, no prefix). Side effect: creates a new session on the server. Wallet signing must happen client-side; never send private keys to the server.
    Connector
  • Run the FileTag pipeline against a previously uploaded slot. The ``file_id`` comes from a prior ``files_create_upload`` call. The server validates the uploaded blob (size, content-type, optional SHA-256), atomically consumes the slot, runs the FileTag extraction (renaming + metadata embedding), and returns the structured result with the extracted metadata, the suggested filename, the ``enriched_file_url`` (short-lived signed URL to the renamed copy with metadata embedded into document properties), and a ``next_action`` recipe (``http_get_and_save``) telling the agent to download that URL and save it as the suggested filename -- act on it unless the user explicitly asked for metadata only. Each slot is single-use; reserve a new slot with ``files_create_upload`` to retry.
    Connector
  • Switch between local and remote DanNet servers on the fly. This tool allows you to change the DanNet server endpoint during runtime without restarting the MCP server. Useful for switching between development (local) and production (remote) servers. Args: server: Server to switch to. Options: - "local": Use localhost:3456 (development server) - "remote": Use wordnet.dk (production server) - Custom URL: Any valid URL starting with http:// or https:// Returns: Dict with status information: - status: "success" or "error" - message: Description of the operation - previous_url: The URL that was previously active - current_url: The URL that is now active Example: # Switch to local development server result = switch_dannet_server("local") # Switch to production server result = switch_dannet_server("remote") # Switch to custom server result = switch_dannet_server("https://my-custom-dannet.example.com")
    Connector
  • Returns VoiceFlip MCP server health and version metadata. No authentication required. Use this first to verify the server is reachable from your MCP client.
    Connector
  • Full detail for a single Rightmove listing (URL or numeric ID). include_images fetches and embeds photos and floorplans as MCP image content. max_images caps the number of property photos (default 3); floorplans always included.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Quick health check that confirms the FXMacroData API and MCP server are reachable. Use this only if other tools fail unexpectedly — it is not needed before normal calls.
    Connector
  • Verify MCP server connectivity. Returns success immediately with no database calls. Use this FIRST if experiencing tool errors - a successful response confirms the server is reachable and your authentication is valid.
    Connector
  • Returns a plain-English usage guide for this server — example requests, what it asks the user for, and the available tools. Call this if the user asks how to use Abby SEO, or to orient yourself before starting. (Same content as the 'getting_started' prompt, exposed as a tool for clients that don't surface MCP prompts.) Takes no arguments.
    Connector
  • Check server connectivity, authentication status, and database size. When to use: First tool call to verify MCP connection and auth state before collection operations. Examples: - `status()` - check if server is operational, see quote_count, and current auth state
    Connector
  • Get Lenny Zeltser's Security Assessment cross-server handoff routes — when this MCP server can't fulfill a request, which other MCP servers (or fallback workflows) to consult. Surfaces a compact subset of `assessment_load_context`. This server never requests your assessment notes or report and instructs your AI to keep them local—the templates and guidelines flow to your AI for local analysis.
    Connector
  • Get Lenny Zeltser's Malware cross-server handoff routes — when this MCP server can't fulfill a request, which other MCP servers (or fallback workflows) to consult. Surfaces a compact subset of `malware_load_context`. This server never requests your sample, analysis notes, or indicators and instructs your AI to keep them local—guidelines and the report template flow to your AI for local analysis.
    Connector