scrape_websites

Instructions

Scrape multiple websites using Firecrawl and store their content. Args: websites: Dictionary of provider_name -> URL mappings formats: List of formats to scrape ['markdown', 'html'] (default: both) api_key: Firecrawl API key (if None, expects environment variable) Returns: List of provider names for successfully scraped websites

Input Schema

TableJSON Schema

Name	Required	Description	Default
`websites`	Yes
`formats`	No
`api_key`	No

Implementation Reference

starter_server.py:26-125 (handler)
The primary handler for the 'scrape_websites' tool. It uses FirecrawlApp to scrape specified websites, saves content in markdown and/or html formats to files in 'scraped_content' directory, updates a metadata JSON file, and returns list of successfully scraped providers. The @mcp.tool() decorator registers the tool with the MCP server.
@mcp.tool() def scrape_websites( websites: Dict[str, str], formats: List[str] = ['markdown', 'html'], api_key: Optional[str] = None ) -> List[str]: """ Scrape multiple websites using Firecrawl and store their content. Args: websites: Dictionary of provider_name -> URL mappings formats: List of formats to scrape ['markdown', 'html'] (default: both) api_key: Firecrawl API key (if None, expects environment variable) Returns: List of provider names for successfully scraped websites """ if api_key is None: api_key = os.getenv('FIRECRAWL_API_KEY') if not api_key: raise ValueError("API key must be provided or set as FIRECRAWL_API_KEY environment variable") app = FirecrawlApp(api_key=api_key) path = os.path.join(SCRAPE_DIR) os.makedirs(path, exist_ok=True) # save the scraped content to files and then create scraped_metadata.json as a summary file # check if the provider has already been scraped and decide if you want to overwrite metadata_file = os.path.join(path, "scraped_metadata.json") existing_metadata = {} if os.path.exists(metadata_file): try: with open(metadata_file, "r") as f: existing_metadata = json.load(f) except json.JSONDecodeError: logger.warning(f"Could not decode {metadata_file}, starting fresh.") scraped_providers = [] for provider, url in websites.items(): logger.info(f"Scraping {provider} at {url}") try: scrape_result = app.scrape(url, formats=formats) # Prepare metadata entry timestamp = datetime.now().isoformat() domain = urlparse(url).netloc content_files = {} for fmt in formats: content = getattr(scrape_result, fmt, "") if content: filename = f"{provider}_{fmt}.txt" file_path = os.path.join(path, filename) with open(file_path, "w", encoding="utf-8") as f: f.write(content) content_files[fmt] = filename # Handle metadata safely title = "Unknown Title" description = "No description" if hasattr(scrape_result, 'metadata'): title = getattr(scrape_result.metadata, 'title', "Unknown Title") description = getattr(scrape_result.metadata, 'description', "No description") metadata_entry = { "provider_name": provider, "url": url, "domain": domain, "scraped_at": timestamp, "formats": formats, "success": "true", "content_files": content_files, "title": title, "description": description } existing_metadata[provider] = metadata_entry scraped_providers.append(provider) logger.info(f"Successfully scraped {provider}") except Exception as e: logger.error(f"Failed to scrape {provider}: {e}") # Optionally record failure in metadata existing_metadata[provider] = { "provider_name": provider, "url": url, "scraped_at": datetime.now().isoformat(), "success": "false", "error": str(e) } with open(metadata_file, "w", encoding="utf-8") as f: json.dump(existing_metadata, f, indent=4) return scraped_providers
starter_server.py:27-42 (schema)
Input schema defined by type hints and docstring: websites dict, optional formats list, optional api_key; returns List[str] of scraped providers.
def scrape_websites( websites: Dict[str, str], formats: List[str] = ['markdown', 'html'], api_key: Optional[str] = None ) -> List[str]: """ Scrape multiple websites using Firecrawl and store their content. Args: websites: Dictionary of provider_name -> URL mappings formats: List of formats to scrape ['markdown', 'html'] (default: both) api_key: Firecrawl API key (if None, expects environment variable) Returns: List of provider names for successfully scraped websites """
starter_server.py:26-26 (registration)
The @mcp.tool() decorator registers the scrape_websites function as an MCP tool.
@mcp.tool()

LLM Inference Pricing Research Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API