Skip to main content
Glama
214,374 tools. Last updated 2026-06-19 21:24

"Techniques for Scraping Dynamic Websites with JavaScript and Handling CAPTCHA or Proxy Issues" matching MCP tools:

  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Run JavaScript in the page context and return the result. Use for state not in the a11y tree, captcha iframe inspection, DOM events. Expression is either a plain JS value ('document.title') or a zero-arg IIFE ('(() => { … })()'). Inline any runtime values into the expression itself. Result is JSON-serialized; non-serializable values become strings. 256KB cap on output.
    Connector
  • Search the MITRE ATLAS catalog of AI/ML attack techniques by keyword, tactic, or maturity. Default response is SLIM (description truncated to 240 chars per row); pass include='full' for the verbose record. Pass exclude_id when chaining from atlas_technique_lookup to skip self in sibling-tactic searches. Use this to discover techniques matching a threat-model question, e.g. 'what techniques target LLM serving infrastructure?'. Drill into atlas_technique_lookup with any returned technique_id for the full description, ATT&CK bridge, and pivot hints. For broader cross-referencing: when a result has attack_reference_id, that bridges to D3FEND mitigations via d3fend_defense_for_attack. Free: 30/hr, Pro: 500/hr. Returns {query (echoed filters), total, results [{technique_id, name, description (truncated by default), tactics, inherited_tactics, maturity, attack_reference_id, subtechnique_of}], next_calls}.
    Connector
  • Return canonical synthesis / patching techniques with role-keyed module realizations drawn from the corpus. Use this when the user asks "how do I do X?" with X being a recognisable technique (low-pass-gate plucks, pinged-filter percussion, parallel multiband processing, complex-oscillator FM, karplus-strong pluck, clocked-delay feedback, modal-resonator excitation, wavefolder harmonics, envelope-follower ducking, Maths-style function-generator omnibus). It's also the right tool when the user has a module and asks "what's this good for?" — pass filter.module_id to retrieve every technique that references the module via its role_realizations. Each technique declares role_definitions (the roles the technique uses, each with required and optional affordances) and role_realizations (concrete modules that fill each role, with the affordances they provide). The model substitutes modules from the user's rack into roles by affordance match — DO NOT treat the realization list as exhaustive or as a recipe. Args: - filter (optional): { capability?, module_id?, text? } - capability: kebab-case capability id (see search_modules _meta.taxonomy). Returns techniques whose required *or* optional capability list includes this id. - module_id: "<manufacturer>/<module-slug>". Returns techniques that have a role_realization referencing this module. - text: free-text phrase. Substring-matches against technique id/label/description AND a curated alias table (technique_aliases) — that's the right surface when a user types evocative prose like "stuttering delay", "plucked string", "source of uncertainty" that doesn't grep against any kebab-case id. Two-way alias match: long alias ("source of uncertainty") matches short query ("uncertainty"), and vice versa. - When multiple filters supplied, AND-intersects. - Omit filter entirely to list all techniques. Returns: { "techniques": [ { "id": "low-pass-gate-pluck", "label": "Low-Pass Gate Pluck", "description": "Send a short envelope...", "required_capabilities": ["lowpass-gate"], "optional_capabilities": ["envelope-generator", "function-generator"], "role_definitions": [ { "role_id": "lpg", "description": "The vactrol-based or vactrol-emulating element. Strictly required...", "required_affordances": ["lowpass-gate"], "optional_affordances": [] }, ... ], "role_realizations": [ { "role_id": "lpg", "module_id": "make-noise/optomix", "affordances_provided": ["lowpass-gate"], "notes": "Two-channel vactrol-based LPG..." }, ... ], "canonical_instance": { "rationale": "...", "lineage": [ { "position": 1, "label": "Buchla 292 (1970)", "module_id": null, "notes": "..." }, { "position": 2, "label": "Tiptop Audio Buchla 292t", "module_id": "tiptop-audio/buchla-292t" }, ... ] }, "counter_canonical_notes": [ { "claim_pushed_back_against": "Optomix is the canonical pairing with Plaits...", "evidence": "The corpus catalogs 19 LPG-capable modules..." } ], "coverage": [ { "role_id": "voice", "realizations_count": 3 }, { "role_id": "lpg", "realizations_count": 19 }, { "role_id": "env", "realizations_count": 6 }, { "role_id": "clock", "realizations_count": 2 } ] } ], "_meta": { "filter": {...}, "feedback_hint"?: string } } How to use role data: - role_realizations are CURATORIAL SAMPLES, not exhaustive lists. The coverage[].realizations_count tells you how many are documented; other modules may fill the same role. - To find modules in the user's rack that can fill a role, use find_role_realizations(technique_id, role_id, available_modules). - canonical_instance is opt-in and sparse. Most techniques don't have one; that absence is information. When present, it documents a documented historical lineage (e.g., Buchla 292 → 292t → MMG → Optomix for low-pass-gate-pluck) — NOT a prescription. - counter_canonical_notes push back on likely training-data priors. When the user invokes a canonical-sounding claim that has a counter_canonical_note, surface the pushback. Errors: - "Module not found: <id>" if filter.module_id is supplied and unknown. - Empty techniques[] with a feedback_hint when filters produce no matches — call report_gap if the user expected coverage.
    Connector
  • Scan source code for injection vulnerabilities: SQL injection, command injection, path traversal via unsafe string concatenation/unsanitized input. Supports Python, JavaScript, TypeScript, Java, Go, Ruby, Shell, Bash. Use to detect input-handling bugs; for secrets use check_secrets. Companion code-security tools: check_secrets (hard-coded credential detection), check_dependencies (known-CVE vulnerability audit), check_headers (live HTTP security-header validation), scan_headers (live HTTP scan via domain). Free: 30/hr, Pro: 500/hr. Returns {total, by_severity, findings}. No data stored.
    Connector
  • AUTHORITATIVE list of recent SEC filings for a specific US public company. Pass a ticker ("AAPL") or CIK ("320193"). Filter by form type — "10-K" (annual report), "10-Q" (quarterly), "8-K" (material event — but for severity-classified 8-Ks specifically, prefer sec_8k_recent), "DEF 14A" (proxy), "S-1" (IPO registration), etc. Returns filing dates, form types, accession numbers, document links. Use for "what did $TICKER recently file" or "show me the last N proxy statements for $TICKER". For specific financial metrics over time use edgar_company_concept; for the full XBRL dump use edgar_company_facts.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Web scraping for AI agents. Extract text and metadata from any URL worldwide. $0.005/page.

  • LLM caching proxy (x402 USDC on Base) - exact + semantic cache. Free health.

  • List and filter issues from a single ACC project (limit 50 per call) via the APS Construction Issues API. When to use: The user or upstream agent needs to review open issues, count issues by status/priority, or look up an issue_id before calling acc_update_issue. E.g. 'show me all critical open issues on the Tower project'. When NOT to use: Do not use to fetch RFIs (use acc_list_rfis) or to search documents. APS scopes: data:read account:read. No write scope required. Rate limits: ACC Issues API ~100 req/min per app; results pageable (limit 50 here, max 200 upstream). For large projects, call once and filter client-side instead of looping. Errors: 401 (APS token expired — refresh); 403 (user lacks 'View Issues' permission on project or scope insufficient); 404 (project_id not found — verify 'b.' prefix and hub membership via acc_list_projects); 422 (invalid filter value — check status/priority spelling); 429 (rate limit — back off 60s); 5xx (ACC upstream — retry with jitter). Side effects: None. Read-only and idempotent.
    Connector
  • [Auth Required + Active] Get credentials to rent a real Chrome browser. Install CLI: `pip install ceki-sdk` (Python) or `npm install -g @ceki/sdk` (Node). Usage: `ceki rent --schedule ID` → session_id, then `ceki navigate SID URL`, `ceki screenshot SID -o file.png`, `ceki stop SID`. Per-minute billing from AgentWallet. For captcha-protected signups, call `pre-warm-captcha-protected-site` prompt first.
    Connector
  • Roll (regenerate) the personal proxy credential for a firewall. This invalidates the previous password and returns a new one with ready-to-use configuration commands. Only call this when the user explicitly needs new credentials — it will break any existing package manager configuration using the old password.
    Connector
  • Scrape and parse a competitor pricing page from a URL or domain. Fetches via proxy-aware timedFetch (tries /pricing, /plans, homepage fallback), then extracts: plan names, prices, billing cadence (monthly/annual/usage-based/one-time), key features, free tier presence, enterprise tier, estimated price range. Returns structured pricing tiers. If unfetchable or no pricing found (anti-bot, SPA, auth wall): returns a clear degraded result with warnings and signals — never fake success. ICP: founders, product managers, pricing strategists, competitive intel teams. Proxy-aware (AICI_RESEARCH_PROXY_URL). Cache TTL 6h.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Given a profile of the authorized test target (technology stack, exposed services, authentication type, OS), return a ranked list of ATT&CK techniques and OWASP test cases most relevant to that profile — not a generic dump of all techniques. Ranking factors: platform match, service match, auth type exposure, technique prevalence. Each result includes why it is relevant to this specific profile, the detection opportunity, and the recommended mitigation. Use when starting an authorized engagement to prioritize the testing scope; pair with pentest_guide to get the full methodology for each top-ranked vector.
    Connector
  • Look up a MITRE ATLAS technique — the AI/ML adversarial attack catalog. ATLAS catalogues TTPs targeting machine learning systems: prompt injection, model evasion, training data poisoning, model theft, etc. Roughly 80% of ATLAS techniques are AI/ML-specific (no ATT&CK bridge); 20% mirror an enterprise ATT&CK technique via attack_reference_id — use that to pivot to D3FEND defenses (d3fend_defense_for_attack) and CVE search. Sub-techniques inherit `tactics` from the parent (inherited_tactics=true flag) when ATLAS upstream leaves them empty. Use this tool when the user asks about AI/ML threats, LLM red-teaming, or adversarial ML; for multiple techniques in one call (e.g. drilling into a case study's techniques_used), prefer bulk_atlas_technique_lookup. Returns 404 when the id is not in the synced ATLAS catalog. Free: 30/hr, Pro: 500/hr. Returns {technique_id, name, description, tactics, inherited_tactics, maturity (demonstrated|feasible|realized), attack_reference_id, attack_reference_url, subtechnique_of, created_date, modified_date, next_calls}.
    Connector
  • Get Container Freight Station (CFS) handling tariffs — charges for LCL (Less than Container Load) cargo consolidation and deconsolidation at port warehouses. Use this for LCL shipments to estimate warehouse handling costs. Returns per-unit handling rates, minimum charges, and storage fees at the specified port. Not relevant for FCL (Full Container Load) shipments. PAID: $0.05/call via x402 (USDC on Base or Solana). Without payment, returns 402 with payment instructions. Returns: Array of { facility, service_type, cargo_type, rate_per_unit, unit, minimum_charge, currency }.
    Connector
  • Look up a MITRE ATT&CK technique by ID or keyword for authorized penetration testing and security research. Returns the full technique record: name, associated tactics, description, detection opportunities (log sources, behavioral indicators), real-world procedure examples from public reporting, recommended mitigations, and related sub-techniques. The detection and mitigation sections make this equally useful for defenders building detection coverage. Accepts exact IDs (T1190, T1059.001) or keyword search (e.g., "sql injection", "pass the hash", "web shell upload").
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Bulk ATLAS technique lookup — retrieve full records for up to 50 techniques in a single request instead of N separate atlas_technique_lookup calls. Designed as the natural follow-up to atlas_case_study_lookup, whose techniques_used array can be passed directly. Each item is the same shape as atlas_technique_lookup, including parent-tactics inheritance for sub-techniques (inherited_tactics=true flag) and per-item next_calls (D3FEND bridge when attack_reference_id present, sibling-technique search by tactic, parent lookup for sub-techniques). Free: 30/hr (1 per item), Pro: 500/hr. Returns {results [{technique_id, status (ok|not_found|invalid_format), technique, error}], total, successful, failed, partial, summary}.
    Connector
  • Execute custom JavaScript/Node.js code in a secure sandbox with access to popular NPM packages. Use this for data transformations, API calls, calculations, or any Node.js logic. Your code receives an 'input' variable and should return a value. Available packages: axios, lodash, cheerio, date-fns, uuid, moment, and more.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • List all issues for a task list (event). Returns open, acknowledged, and resolved issues with severity, type, and category. Use this to discover issues that need AI analysis via tascan_analyze_issue.
    Connector