Skip to main content
Glama
Fadi88

LLM Inference Pricing Research Server

by Fadi88

scrape_websites

Scrape multiple websites to collect and store pricing data for LLM inference services using Firecrawl API, enabling cost comparison across providers.

Instructions

Scrape multiple websites using Firecrawl and store their content. Args: websites: Dictionary of provider_name -> URL mappings formats: List of formats to scrape ['markdown', 'html'] (default: both) api_key: Firecrawl API key (if None, expects environment variable) Returns: List of provider names for successfully scraped websites

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
websitesYes
formatsNo
api_keyNo

Implementation Reference

  • The primary handler for the 'scrape_websites' tool. It uses FirecrawlApp to scrape specified websites, saves content in markdown and/or html formats to files in 'scraped_content' directory, updates a metadata JSON file, and returns list of successfully scraped providers. The @mcp.tool() decorator registers the tool with the MCP server.
    @mcp.tool() def scrape_websites( websites: Dict[str, str], formats: List[str] = ['markdown', 'html'], api_key: Optional[str] = None ) -> List[str]: """ Scrape multiple websites using Firecrawl and store their content. Args: websites: Dictionary of provider_name -> URL mappings formats: List of formats to scrape ['markdown', 'html'] (default: both) api_key: Firecrawl API key (if None, expects environment variable) Returns: List of provider names for successfully scraped websites """ if api_key is None: api_key = os.getenv('FIRECRAWL_API_KEY') if not api_key: raise ValueError("API key must be provided or set as FIRECRAWL_API_KEY environment variable") app = FirecrawlApp(api_key=api_key) path = os.path.join(SCRAPE_DIR) os.makedirs(path, exist_ok=True) # save the scraped content to files and then create scraped_metadata.json as a summary file # check if the provider has already been scraped and decide if you want to overwrite metadata_file = os.path.join(path, "scraped_metadata.json") existing_metadata = {} if os.path.exists(metadata_file): try: with open(metadata_file, "r") as f: existing_metadata = json.load(f) except json.JSONDecodeError: logger.warning(f"Could not decode {metadata_file}, starting fresh.") scraped_providers = [] for provider, url in websites.items(): logger.info(f"Scraping {provider} at {url}") try: scrape_result = app.scrape(url, formats=formats) # Prepare metadata entry timestamp = datetime.now().isoformat() domain = urlparse(url).netloc content_files = {} for fmt in formats: content = getattr(scrape_result, fmt, "") if content: filename = f"{provider}_{fmt}.txt" file_path = os.path.join(path, filename) with open(file_path, "w", encoding="utf-8") as f: f.write(content) content_files[fmt] = filename # Handle metadata safely title = "Unknown Title" description = "No description" if hasattr(scrape_result, 'metadata'): title = getattr(scrape_result.metadata, 'title', "Unknown Title") description = getattr(scrape_result.metadata, 'description', "No description") metadata_entry = { "provider_name": provider, "url": url, "domain": domain, "scraped_at": timestamp, "formats": formats, "success": "true", "content_files": content_files, "title": title, "description": description } existing_metadata[provider] = metadata_entry scraped_providers.append(provider) logger.info(f"Successfully scraped {provider}") except Exception as e: logger.error(f"Failed to scrape {provider}: {e}") # Optionally record failure in metadata existing_metadata[provider] = { "provider_name": provider, "url": url, "scraped_at": datetime.now().isoformat(), "success": "false", "error": str(e) } with open(metadata_file, "w", encoding="utf-8") as f: json.dump(existing_metadata, f, indent=4) return scraped_providers
  • Input schema defined by type hints and docstring: websites dict, optional formats list, optional api_key; returns List[str] of scraped providers.
    def scrape_websites( websites: Dict[str, str], formats: List[str] = ['markdown', 'html'], api_key: Optional[str] = None ) -> List[str]: """ Scrape multiple websites using Firecrawl and store their content. Args: websites: Dictionary of provider_name -> URL mappings formats: List of formats to scrape ['markdown', 'html'] (default: both) api_key: Firecrawl API key (if None, expects environment variable) Returns: List of provider names for successfully scraped websites """
  • The @mcp.tool() decorator registers the scrape_websites function as an MCP tool.
    @mcp.tool()

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Fadi88/UDACITY_MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server