Skip to main content
Glama
126,802 tools. Last updated 2026-05-05 05:52

"How to fetch or scrape data from a website for use in training an LLM" matching MCP tools:

  • Initiate a partnerships handoff for design partner, ecosystem, training, or advisory conversations requiring human review. Provide the reason, organization, role, and website to trigger operator review.
    MIT
  • Use an LLM to extract structured fields from crawled pages. Define fields or let the LLM auto-discover by sampling. Results saved to extracted.jsonl. Ideal for competitive research, API analysis, and dataset creation.
  • Scrape B2B leads from Apollo.io by submitting a search URL. Returns a runId to check status later. Use webhooks for async delivery instead of polling.
    MIT
  • Retrieve website HTML content directly from URLs for web scraping, data extraction, or content analysis purposes.
    MIT

Matching MCP Servers

Matching MCP Connectors

  • Fetch web pages and extract exactly the content you need. Select elements with CSS and retrieve co…

  • Improve security writing, score it against rubrics, plan IR and product strategy.

  • Convert website content to Markdown format by fetching URLs, enabling structured extraction of web data for documentation or analysis.
    MIT
  • Extract content from multiple pages on a website by starting a crawl job. Use to comprehensively gather data from related pages with configurable depth and limits.
    MIT
  • Crawl a website to gather content from multiple pages. Returns a job ID for async polling. Best for whole-site extraction; for single pages use scrape, for URL discovery use map.
  • Extract content from multiple website pages by starting an asynchronous crawl job to comprehensively gather data across related webpages.
    MIT
  • Track how LLM brand perception changes over time. Analyze up to 200 data points per query, with optional provider filter for OpenAI, Claude, or Gemini. Ideal for time-series analysis of AI visibility.
    MIT
  • Extract content from multiple website pages by starting a crawl job. Use for comprehensive coverage of related pages, with options to control depth and scope.
  • Discover all indexed URLs on a website to identify pages for scraping or locate specific content when scrape results are incomplete.
  • Discover all indexed URLs on a website to identify pages for scraping or locate specific sections. Returns an array of found URLs.
    MIT
  • Fine-tune an LLM on a GitHub repository to learn code patterns and conventions. Choose a training agent: Cody for code autocomplete or SIERA for bug-fix specialization.
    MIT
  • Discover all indexed URLs on a website to identify pages for scraping or locate specific site sections. Returns a list of found URLs.
    MIT
  • Monitor situations by scraping web content, analyzing with an LLM, and cross-referencing prediction markets for insights. Use for scheduled or one-shot URL ingestion with optional analysis and market enrichment.
    MIT