estimate_tokens

Estimate the token count of a URL's content without performing a full fetch. This allows you to check if the content fits your remaining context window before deciding to retrieve it.

Instructions

Estimate token count of a URL's content WITHOUT fetching the body.

WHEN TO USE:

You're considering fetching a URL but unsure if it fits your remaining context window. This call is ~10x cheaper than a full fetch.
You want to triage a list of candidate URLs before deciding which to actually retrieve.

IMPORTANT: Many servers omit Content-Length on dynamic / chunked responses. When that happens, this tool returns confident=false and estimated_tokens=null. In that case, call fetch_url with a max_tokens cap instead of trusting the estimate.

Args: url: The URL to estimate.

Returns: { "url": str, "success": bool, "estimated_tokens": int | null, "byte_size": int | null, "content_type": str, "confident": bool, "note": str }

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes

Implementation Reference

agentfetch/mcp/server.py:106-134 (registration)

Registers the 'estimate_tokens' FastMCP tool, decorated with @mcp.tool(), which calls the _estimate_tokens helper.

@mcp.tool()
def estimate_tokens(url: str) -> dict:
    """Estimate token count of a URL's content WITHOUT fetching the body.

    WHEN TO USE:
    - You're considering fetching a URL but unsure if it fits your remaining
      context window. This call is ~10x cheaper than a full fetch.
    - You want to triage a list of candidate URLs before deciding which to
      actually retrieve.

    IMPORTANT: Many servers omit Content-Length on dynamic / chunked
    responses. When that happens, this tool returns confident=false and
    estimated_tokens=null. In that case, call fetch_url with a max_tokens
    cap instead of trusting the estimate.

    Args:
        url: The URL to estimate.

    Returns:
        {
          "url": str, "success": bool,
          "estimated_tokens": int | null,
          "byte_size": int | null,
          "content_type": str,
          "confident": bool,
          "note": str
        }
    """
    return _estimate_tokens(url=url)

agentfetch/mcp/tools/estimate.py:7-14 (handler)

MCP tool handler that delegates to estimate_url_tokens from the core pipeline.

def estimate_tokens(url: str) -> dict:
    """Estimate the token count of a URL's content WITHOUT fetching the body.

    Use this to decide whether a URL is worth fetching given your context
    budget. Returns the estimated token count along with the byte size and
    content type that informed the estimate.
    """
    return estimate_url_tokens(url)

agentfetch/core/pipeline.py:295-365 (helper)

Core implementation: sends a HEAD request (or fallback GET) to the URL, extracts Content-Length and Content-Type, then calls estimate_tokens_from_size to produce the estimate.

def estimate_url_tokens(url: str, timeout: int = 10) -> Dict[str, Any]:
    """HEAD the URL and convert Content-Length to a token estimate.

    Cheap and fast — no body fetch. Falls back to a tiny GET if HEAD isn't
    supported by the server.
    """
    started = time.time()
    import httpx

    try:
        validate_url(url)
    except UnsafeURLError as e:
        return {
            "url": url,
            "success": False,
            "estimated_tokens": None,
            "error": f"URL rejected: {e}",
            "fetch_time_ms": int((time.time() - started) * 1000),
        }

    try:
        resp = httpx.head(url, timeout=timeout, follow_redirects=True)
        if resp.status_code == 405 or resp.status_code >= 400:
            # Some servers reject HEAD; do a streamed GET and just read headers.
            with httpx.stream("GET", url, timeout=timeout, follow_redirects=True) as s:
                headers = s.headers
                status = s.status_code
        else:
            headers = resp.headers
            status = resp.status_code
    except httpx.HTTPError as e:
        return {
            "url": url,
            "success": False,
            "estimated_tokens": 0,
            "error": str(e),
            "fetch_time_ms": int((time.time() - started) * 1000),
        }

    if status >= 400:
        return {
            "url": url,
            "success": False,
            "estimated_tokens": 0,
            "error": f"HEAD returned {status}",
            "fetch_time_ms": int((time.time() - started) * 1000),
        }

    content_length = int(headers.get("content-length", 0) or 0)
    content_type = headers.get("content-type", "text/html")
    estimate = estimate_tokens_from_size(content_length, content_type)

    # Many servers omit Content-Length on dynamic / chunked / compressed
    # responses. Don't lie to the agent — flag the estimate as unavailable.
    confident = content_length > 0
    return {
        "url": url,
        "success": True,
        "estimated_tokens": estimate if confident else None,
        "byte_size": content_length if confident else None,
        "content_type": content_type,
        "fetch_time_ms": int((time.time() - started) * 1000),
        "confident": confident,
        "note": (
            "Estimate based on Content-Length + content type. Actual count after "
            "cleaning may be lower."
            if confident
            else "Server did not return Content-Length. Estimate unavailable — "
            "fetch the URL to get the actual token count."
        ),
    }

agentfetch/mcp/server.py:124-132 (schema)

Return type schema documented in the tool's docstring (url, success, estimated_tokens, byte_size, content_type, confident, note).

Returns:
    {
      "url": str, "success": bool,
      "estimated_tokens": int | null,
      "byte_size": int | null,
      "content_type": str,
      "confident": bool,
      "note": str
    }

agentfetch/core/tokenizer.py:36-61 (helper)

Low-level heuristic: converts byte_size + content_type to estimated token count based on content-type survival ratios (HTML ~30%, PDF ~50%, text ~95%, etc.).

def estimate_tokens_from_size(byte_size: int, content_type: str = "text/html") -> int:
    """Rough estimate from a Content-Length header before fetching the body.

    Cheap heuristic — actual token count after cleaning will be lower because
    we strip nav/ads/scripts. We bias the estimate slightly high so agents
    don't over-fetch into a context-window blowout.

    Heuristics:
      - HTML: ~30% of bytes survive cleaning, ~4 chars/token → bytes * 0.3 / 4
      - Plain text: ~95% survives, ~4 chars/token → bytes * 0.95 / 4
      - PDF: ~50% extractable text, ~4 chars/token → bytes * 0.5 / 4
    """
    if byte_size <= 0:
        return 0

    ct = (content_type or "").lower()
    if "html" in ct:
        return int(byte_size * 0.30 / 4)
    if "pdf" in ct:
        return int(byte_size * 0.50 / 4)
    if "json" in ct or "xml" in ct:
        return int(byte_size * 0.80 / 4)
    if "text" in ct:
        return int(byte_size * 0.95 / 4)
    # Unknown — assume HTML-ish
    return int(byte_size * 0.30 / 4)

agentfetch-mcp

estimate_tokens

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API