extract_scraped_info
Extract structured pricing data from scraped LLM inference provider websites to enable cost comparison across services like CloudRift, DeepInfra, Fireworks, and Groq.
Instructions
Extract information about a scraped website.
Args:
identifier: The provider name, full URL, or domain to look for
Returns:
Formatted JSON string with the scraped information
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| identifier | Yes |
Implementation Reference
- starter_server.py:127-180 (handler)The handler function decorated with @mcp.tool() that implements the logic for extracting scraped website information by identifier. It loads metadata from scraped_metadata.json, matches by provider, URL, or domain, and includes content from files if available, returning formatted JSON.@mcp.tool() def extract_scraped_info(identifier: str) -> str: """ Extract information about a scraped website. Args: identifier: The provider name, full URL, or domain to look for Returns: Formatted JSON string with the scraped information """ logger.info(f"Extracting information for identifier: {identifier}") if not os.path.exists(SCRAPE_DIR): return json.dumps({"error": "No scraped content found."}) metadata_file = os.path.join(SCRAPE_DIR, "scraped_metadata.json") if not os.path.exists(metadata_file): return json.dumps({"error": "Metadata file not found."}) try: with open(metadata_file, "r") as f: metadata = json.load(f) except json.JSONDecodeError: return json.dumps({"error": "Invalid metadata file."}) # Search for the identifier found_data = None # Check if identifier matches a provider key directly if identifier in metadata: found_data = metadata[identifier] else: # Search by url or domain for provider, data in metadata.items(): if identifier == data.get("url") or identifier == data.get("domain"): found_data = data break if found_data: # Read content files content_data = {} if "content_files" in found_data: for fmt, filename in found_data["content_files"].items(): file_path = os.path.join(SCRAPE_DIR, filename) if os.path.exists(file_path): with open(file_path, "r", encoding="utf-8") as f: content_data[fmt] = f.read() found_data["content"] = content_data return json.dumps(found_data, indent=2) else: return json.dumps({"error": f"No information found for {identifier}"})