estimate_tokens
Estimate the token count of a URL's content without performing a full fetch. This allows you to check if the content fits your remaining context window before deciding to retrieve it.
Instructions
Estimate token count of a URL's content WITHOUT fetching the body.
WHEN TO USE:
You're considering fetching a URL but unsure if it fits your remaining context window. This call is ~10x cheaper than a full fetch.
You want to triage a list of candidate URLs before deciding which to actually retrieve.
IMPORTANT: Many servers omit Content-Length on dynamic / chunked responses. When that happens, this tool returns confident=false and estimated_tokens=null. In that case, call fetch_url with a max_tokens cap instead of trusting the estimate.
Args: url: The URL to estimate.
Returns: { "url": str, "success": bool, "estimated_tokens": int | null, "byte_size": int | null, "content_type": str, "confident": bool, "note": str }
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes |
Implementation Reference
- agentfetch/mcp/server.py:106-134 (registration)Registers the 'estimate_tokens' FastMCP tool, decorated with @mcp.tool(), which calls the _estimate_tokens helper.
@mcp.tool() def estimate_tokens(url: str) -> dict: """Estimate token count of a URL's content WITHOUT fetching the body. WHEN TO USE: - You're considering fetching a URL but unsure if it fits your remaining context window. This call is ~10x cheaper than a full fetch. - You want to triage a list of candidate URLs before deciding which to actually retrieve. IMPORTANT: Many servers omit Content-Length on dynamic / chunked responses. When that happens, this tool returns confident=false and estimated_tokens=null. In that case, call fetch_url with a max_tokens cap instead of trusting the estimate. Args: url: The URL to estimate. Returns: { "url": str, "success": bool, "estimated_tokens": int | null, "byte_size": int | null, "content_type": str, "confident": bool, "note": str } """ return _estimate_tokens(url=url) - agentfetch/mcp/tools/estimate.py:7-14 (handler)MCP tool handler that delegates to estimate_url_tokens from the core pipeline.
def estimate_tokens(url: str) -> dict: """Estimate the token count of a URL's content WITHOUT fetching the body. Use this to decide whether a URL is worth fetching given your context budget. Returns the estimated token count along with the byte size and content type that informed the estimate. """ return estimate_url_tokens(url) - agentfetch/core/pipeline.py:295-365 (helper)Core implementation: sends a HEAD request (or fallback GET) to the URL, extracts Content-Length and Content-Type, then calls estimate_tokens_from_size to produce the estimate.
def estimate_url_tokens(url: str, timeout: int = 10) -> Dict[str, Any]: """HEAD the URL and convert Content-Length to a token estimate. Cheap and fast — no body fetch. Falls back to a tiny GET if HEAD isn't supported by the server. """ started = time.time() import httpx try: validate_url(url) except UnsafeURLError as e: return { "url": url, "success": False, "estimated_tokens": None, "error": f"URL rejected: {e}", "fetch_time_ms": int((time.time() - started) * 1000), } try: resp = httpx.head(url, timeout=timeout, follow_redirects=True) if resp.status_code == 405 or resp.status_code >= 400: # Some servers reject HEAD; do a streamed GET and just read headers. with httpx.stream("GET", url, timeout=timeout, follow_redirects=True) as s: headers = s.headers status = s.status_code else: headers = resp.headers status = resp.status_code except httpx.HTTPError as e: return { "url": url, "success": False, "estimated_tokens": 0, "error": str(e), "fetch_time_ms": int((time.time() - started) * 1000), } if status >= 400: return { "url": url, "success": False, "estimated_tokens": 0, "error": f"HEAD returned {status}", "fetch_time_ms": int((time.time() - started) * 1000), } content_length = int(headers.get("content-length", 0) or 0) content_type = headers.get("content-type", "text/html") estimate = estimate_tokens_from_size(content_length, content_type) # Many servers omit Content-Length on dynamic / chunked / compressed # responses. Don't lie to the agent — flag the estimate as unavailable. confident = content_length > 0 return { "url": url, "success": True, "estimated_tokens": estimate if confident else None, "byte_size": content_length if confident else None, "content_type": content_type, "fetch_time_ms": int((time.time() - started) * 1000), "confident": confident, "note": ( "Estimate based on Content-Length + content type. Actual count after " "cleaning may be lower." if confident else "Server did not return Content-Length. Estimate unavailable — " "fetch the URL to get the actual token count." ), } - agentfetch/mcp/server.py:124-132 (schema)Return type schema documented in the tool's docstring (url, success, estimated_tokens, byte_size, content_type, confident, note).
Returns: { "url": str, "success": bool, "estimated_tokens": int | null, "byte_size": int | null, "content_type": str, "confident": bool, "note": str } - agentfetch/core/tokenizer.py:36-61 (helper)Low-level heuristic: converts byte_size + content_type to estimated token count based on content-type survival ratios (HTML ~30%, PDF ~50%, text ~95%, etc.).
def estimate_tokens_from_size(byte_size: int, content_type: str = "text/html") -> int: """Rough estimate from a Content-Length header before fetching the body. Cheap heuristic — actual token count after cleaning will be lower because we strip nav/ads/scripts. We bias the estimate slightly high so agents don't over-fetch into a context-window blowout. Heuristics: - HTML: ~30% of bytes survive cleaning, ~4 chars/token → bytes * 0.3 / 4 - Plain text: ~95% survives, ~4 chars/token → bytes * 0.95 / 4 - PDF: ~50% extractable text, ~4 chars/token → bytes * 0.5 / 4 """ if byte_size <= 0: return 0 ct = (content_type or "").lower() if "html" in ct: return int(byte_size * 0.30 / 4) if "pdf" in ct: return int(byte_size * 0.50 / 4) if "json" in ct or "xml" in ct: return int(byte_size * 0.80 / 4) if "text" in ct: return int(byte_size * 0.95 / 4) # Unknown — assume HTML-ish return int(byte_size * 0.30 / 4)