fetch_page
Convert any webpage or PDF into clean, ad-free Markdown for use in LLM prompts, with optional token-based truncation.
Instructions
Fetch a single web page or PDF and return its main content as clean, ad-free Markdown — ready to drop into an LLM prompt.
Args:
url: A fully-qualified http(s) URL.
max_tokens: Optional soft cap on the returned Markdown (whitespace
tokens). When exceeded, the body is truncated and a
[...truncated] marker is appended.
Returns: The cleaned Markdown body of the page.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| max_tokens | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- src/ai_first_scraper_mcp/server.py:36-55 (handler)The fetch_page tool handler: accepts a URL and optional max_tokens, sends a GET request to the upstream SCRAPER_URL/raw endpoint, and returns the response body as clean Markdown text.
async def fetch_page(url: str, max_tokens: Optional[int] = None) -> str: """Fetch a single web page or PDF and return its main content as clean, ad-free Markdown — ready to drop into an LLM prompt. Args: url: A fully-qualified http(s) URL. max_tokens: Optional soft cap on the returned Markdown (whitespace tokens). When exceeded, the body is truncated and a `[...truncated]` marker is appended. Returns: The cleaned Markdown body of the page. """ params: dict[str, str | int] = {"url": url} if max_tokens: params["max_tokens"] = max_tokens async with httpx.AsyncClient(timeout=DEFAULT_TIMEOUT) as client: resp = await client.get(f"{SCRAPER_URL}/raw", params=params) resp.raise_for_status() return resp.text - src/ai_first_scraper_mcp/server.py:35-35 (registration)The tool is registered with FastMCP via the @mcp.tool() decorator on the fetch_page function.
@mcp.tool() - Configuration constants: SCRAPER_URL (target endpoint for fetch_page), SEARCH_URL, and DEFAULT_TIMEOUT, all configurable via environment variables.
SCRAPER_URL = os.getenv("SCRAPER_URL", "https://ai-first-scraper.onrender.com").rstrip("/") SEARCH_URL = os.getenv("SEARCH_URL", "https://ai-first-search.onrender.com").rstrip("/") DEFAULT_TIMEOUT = float(os.getenv("AFS_TIMEOUT", "45"))