Analyze a website's robots.txt file to determine crawl permissions and ensure compliance with ethical web scraping practices. Provides insights into allowed and disallowed paths for crawling.
Retrieve French tax information from cached data when web scraping fails, providing reliable access to official tax brackets and calculations for residents.
Enables retrieval and cleaning of official documentation content for popular AI/Python libraries (uv, langchain, openai, llama-index) through web scraping and LLM-powered content extraction. Uses Serper API for search and Groq API to clean HTML into readable text with source attribution.
Enables web page scraping via Jina reader API and searching FastMCP documentation using minsearch. Supports fetching markdown content from URLs and querying indexed documentation files.
A headless web scraping server that extracts main content from web pages into Markdown, text, or HTML for AI and automation integration. It features per-domain rate limiting and robust error handling using Playwright and BeautifulSoup.