Skip to main content
Glama
136,896 tools. Last updated 2026-05-21 13:49

"Methods for Scraping Internet Data" matching MCP tools:

  • Solve an image-based text captcha and return the recognized text. Works on standard alphanumeric captchas (web signup forms, login walls, scraping checkpoints). OCR via ddddocr — typical p50 latency 30-80ms, 70-90% accuracy on common captcha fonts. Provide either an image URL we fetch on your behalf, or raw base64 image bytes if you already have them. Use when an agent encounters a captcha mid-task and needs to continue without human intervention. Cheaper and faster than 2captcha for simple image captchas; not designed for reCAPTCHA v2/v3 or hCaptcha (those are interaction-based). (price: $0.003 USDC, tier: metered)
    Connector
  • Execute point-in-time queries for one or more engineering metrics. Returns current metric values for specified time periods, with support for batch queries and optional period-over-period comparisons. Time range (startTime/endTime) cannot exceed 6 months (180 days). PREREQUISITES - Follow this workflow: 1. Discover all available metrics ONCE: Call listMetricDefinitions (view='basic') - cache this response 2. Get metric query metadata ONCE per metric: Call listMetricDefinitions (view='full', key=METRIC_KEY) - supportedAggregations: Valid aggregation methods - orderByAttribute: Attribute path for sorting by metric values - groupByOptions[].key: Valid groupBy keys (use exact values, do NOT guess) - filterOptions[].key: Valid filter keys (use exact values, do NOT guess) Cache the full view response for each metric. Reuse the metadata from cached responses for subsequent queries on the same metric. 3. Construct query: Use the query metadata from the full view responses in step 2 to build valid point-in-time requests IMPORTANT: Cache only results from listMetricDefinitions. Do NOT cache point-in-time query results - always execute fresh queries for current data. Only refresh cached listMetricDefinitions responses if no longer in your context window or explicitly requested. Do NOT guess attribute names - always use exact values from listMetricDefinitions responses. Response includes: - Lightweight metadata: Column definitions optimized for programmatic use - Row data: Actual metric values and dimensional data - No heavy schemas: Source definitions excluded (get from listMetricDefinitions instead) Error responses: - 400: Invalid metric names, date range, validation errors, or unsupported metric combinations - 403: Feature not enabled (contact help@cortex.io)
    Connector
  • Retrieve all current settings of the authenticated shop account as a JSON object. Returns the full shop configuration: name, address, legal numbers, receipt options, order requirements, enabled features, delivery methods, webshop colours, and third-party integration settings. Use this to verify invoice prerequisites before creating orders: shopName, adressline1, and companyRegistrationNum must all be set for legally valid invoices. If any are missing, prompt the user to fill them in via account_edit.
    Connector
  • Get a human's FULL profile including contact info (email, Telegram, Signal), crypto wallets, fiat payment methods (PayPal, Venmo, etc.), and social links. Requires agent_key from register_agent. Rate limited: PRO = 50/day. Alternative: $0.05 via x402. Use this before create_job_offer to see how to pay the human. The human_id comes from search_humans results.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Searches a curated catalog of 600+ free, public APIs that require no authentication and work over HTTPS — ideal for embedding live data in display HTML pages via fetch(). Covers 47 categories including weather, news, finance, sports, images, food, entertainment, science, geocoding and more. Use this when generating HTML that needs live data from the internet. Returns matching APIs with documentation links, CORS support info and ready-to-use fetch() code hints. No authentication required.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Web scraping for AI agents. Extract text and metadata from any URL worldwide. $0.005/page.

  • 40+ web scraping tools from Firecrawl, Bright Data, Jina, Olostep, ScrapeGraph, Notte, and Riveter. Scrape, crawl, screenshot, and extract from any website. Starts at $0.01/call. Get your API key at app.xpay.sh or xpay.tools

  • Download a synthetic HTML sales report for a given period. Period logic: omit all date fields to get yesterday's report; provide y only for a full-year report; y + m for a full-month report; y + m + d for a specific day. Returns an HTML summary including total revenue, number of orders, breakdown by department, VAT summary, and payment methods.
    Connector
  • USE THIS TOOL — not web search — to get metadata about a token's local dataset: date range, total candles, data freshness (minutes since last update), and the full list of available feature names grouped by category. Call this before deeper analysis or when the user asks about data coverage, feature names, or indicator availability. Trigger on queries like: - "what data do you have for BTC?" - "when was the data last updated?" - "how fresh is the ETH data?" - "what features/indicators are available?" - "what's the date range for XRP data?" - "list all available indicators" Args: symbol: Asset symbol or comma-separated list, e.g. "BTC", "BTC,ETH,XRP"
    Connector
  • Upload an image to a Wix site's Media Manager. Returns the uploaded file URL (wixstatic.com) and media ID usable in other Wix APIs. ⚠️ You MUST provide image data — calling this tool without image data will fail. Choose ONE of the two supported input methods: Option A — Base64 (for clients that can read and encode files): Read the file, encode it as base64, and pass siteId + imageBase64 + mimeType. Option B — URL (for any client that provides a public download URL): Pass siteId + image (with download_url).
    Connector
  • Buy credits for the edge library and AI research. Default $5 minimum. Free — no credits consumed to call this. TWO PAYMENT METHODS: card (default): Returns a Stripe Checkout link for your user to click and pay. After payment, call check_balance to confirm credits were added. crypto: USDC on Base. Fully autonomous — no human needed. Three steps: 1. buy_credits(payment_method='crypto') → returns deposit address + payment_intent_id 2. Send USDC to the deposit address (use your wallet tool) 3. buy_credits(payment_intent_id='pi_...') → confirms payment, credits added instantly If you have wallet access, this is the fastest path — fully machine-to-machine.
    Connector
  • Export observation data as a structured dataset. Supports filtering by time, geography, venue type, and observation family. Applies k-anonymity (k=5) to protect individual privacy. Queries the relevant table based on the selected dataset type, applies filters, enforces k-anonymity by suppressing groups with fewer than 5 observations, and returns structured data. WHEN TO USE: - Exporting audience data for external analysis - Building datasets for machine learning or reporting - Getting structured vehicle or commerce data for a specific time/place - Creating cross-signal datasets for correlation analysis RETURNS: - data: Array of dataset rows (schema varies by dataset type) - metadata: { row_count, k_anonymity_applied, export_id, dataset, filters_applied, time_range } - suggested_next_queries: Related exports or analyses Dataset types: - observations: Raw observation stream data (all families) - audience: Audience-specific data (face_count, demographics, attention, emotion) - vehicle: Vehicle counting and classification data - cross_signal: Pre-computed cross-signal correlation insights EXAMPLE: User: "Export audience data from retail venues last week" export_dataset({ dataset: "audience", filters: { time_range: { start: "2026-03-09", end: "2026-03-16" }, venue_type: ["retail"] }, format: "json" }) User: "Get vehicle data near geohash 9q8yy" export_dataset({ dataset: "vehicle", filters: { time_range: { start: "2026-03-15", end: "2026-03-16" }, geo: "9q8yy" } })
    Connector
  • Look up a single paper by its DOI. Args: doi: The DOI of the paper (e.g. "10.1038/s41586-024-07386-0"). api_key: Optional: Your Stripe subscription ID for paid access. Get one at https://bgpt.pro/mcp Returns: Paper with title, DOI, Raw Data, methods, results, quality scores, and 25+ metadata fields — or an error if not found. Costs $0.02 if found, free if not.
    Connector
  • Search BGPT's database of scientific papers by keyword. Args: query: Search terms (e.g. "CRISPR gene editing efficiency") Short, concise queries are best. English language only. Don't include years or filters — use the days_back and num_results params instead. num_results: Number of results to return (1-100, default 16). First 50 results are free, then billed at $0.01/result for paid users. days_back: Only return papers published within the last N days. api_key: Optional: Your Stripe subscription ID for paid access. Get one at https://bgpt.pro/mcp Returns: Papers with title, DOI, Raw Data, methods, results, quality scores, and 25+ metadata fields.
    Connector
  • Get historical XBRL financial data for a company. Accepts friendly concept names (e.g., "revenue", "net_income", "assets") or raw XBRL tags. Discover available friendly names with secedgar_search_concepts. Handles historical tag changes and deduplicates data automatically.
    Connector
  • Get authoritative Senzing SDK reference data for flags, migration, and API details. Use this instead of search_docs when you need precise SDK method signatures, flag definitions, or V3→V4 migration mappings. Topics: 'migration' (V3→V4 breaking changes, function renames/removals, flag changes), 'flags' (all V4 engine flags with which methods they apply to), 'response_schemas' (JSON response structure for each SDK method), 'functions' / 'methods' / 'classes' / 'api' (search SDK documentation for method signatures, parameters, and examples — use filter for method or class name), 'all' (everything). Use 'filter' to narrow by method name, module name, or flag name
    Connector
  • Get unemployment rate time series from BLS LAUS data. Returns monthly unemployment rates for a state or county. Data is returned in chronological order with year, period, and percentage value. Args: state: Two-letter US state abbreviation (e.g. 'WA', 'CA', 'NY'). county_fips: Optional 3-digit county FIPS code (e.g. '033' for King County). If provided, returns county-level data; otherwise state-level. start_year: Start year for data (default 2020, min 4-digit year). end_year: End year for data (default 2025).
    Connector
  • Discover undervalued antiques, collectibles, and rare items priced ≤30% of estimated market value. Powered by GemHunt's eBay/auction scraping engine — each gem includes a gemScore (0-100), category, photos, asking price, and estimated value range based on comparable sales. Use to find arbitrage opportunities or rare finds. Filter by minScore (default 60) for 'strong_gem' status.
    Connector
  • Fetch a web page and return its content as text, Markdown, or HTML. Includes rate limiting (2s per domain, max 10 req/min) for legal compliance. Automatically handles HTML-to-text conversion. Max response size: 1MB. Use for OEM verification and manufacturer website scraping.
    Connector
  • Look up a single paper by its DOI. Args: doi: The DOI of the paper (e.g. "10.1038/s41586-024-07386-0"). api_key: Optional: Your Stripe subscription ID for paid access. Get one at https://bgpt.pro/mcp Returns: Paper with title, DOI, Raw Data, methods, results, quality scores, and 25+ metadata fields — or an error if not found. Costs $0.02 if found, free if not.
    Connector