ai-first-scraper-mcp

fetch_pages_batch

Fetch up to 25 web pages in parallel and return each as clean Markdown. Reduces wait time for batch scraping tasks.

Instructions

Fetch many web pages in parallel and return each one's clean Markdown.

Use this whenever you need to read more than one URL at once — it is far faster than calling fetch_page in a loop because the upstream scraper handles the concurrency.

Args: urls: Up to 25 URLs. max_tokens: Optional per-URL soft cap on the returned Markdown.

Returns: A list of {url, ok, data?, error?} objects in the same order as the input URLs. data is {title, word_count, markdown, links, ...} on success; error contains the failure reason otherwise.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`urls`	Yes
`max_tokens`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/ai_first_scraper_mcp/server.py:59-81 (handler)

The async handler that fetches many web pages in parallel. It POSTs the list of URLs to the upstream scraper's /batch endpoint and returns the JSON response containing clean Markdown for each page.

async def fetch_pages_batch(urls: list[str], max_tokens: Optional[int] = None) -> list[dict]:
    """Fetch many web pages in parallel and return each one's clean Markdown.

    Use this whenever you need to read more than one URL at once — it is far
    faster than calling fetch_page in a loop because the upstream scraper
    handles the concurrency.

    Args:
        urls: Up to 25 URLs.
        max_tokens: Optional per-URL soft cap on the returned Markdown.

    Returns:
        A list of `{url, ok, data?, error?}` objects in the same order as the
        input URLs. `data` is `{title, word_count, markdown, links, ...}` on
        success; `error` contains the failure reason otherwise.
    """
    body: dict = {"urls": urls}
    if max_tokens:
        body["max_tokens"] = max_tokens
    async with httpx.AsyncClient(timeout=DEFAULT_TIMEOUT) as client:
        resp = await client.post(f"{SCRAPER_URL}/batch", json=body)
        resp.raise_for_status()
        return resp.json()

src/ai_first_scraper_mcp/server.py:59-81 (schema)

The function signature defines the input schema (list[str] urls, Optional[int] max_tokens) and return type (list[dict]). The docstring documents the output shape as {url, ok, data?, error?} with data containing {title, word_count, markdown, links, ...}.

async def fetch_pages_batch(urls: list[str], max_tokens: Optional[int] = None) -> list[dict]:
    """Fetch many web pages in parallel and return each one's clean Markdown.

    Use this whenever you need to read more than one URL at once — it is far
    faster than calling fetch_page in a loop because the upstream scraper
    handles the concurrency.

    Args:
        urls: Up to 25 URLs.
        max_tokens: Optional per-URL soft cap on the returned Markdown.

    Returns:
        A list of `{url, ok, data?, error?}` objects in the same order as the
        input URLs. `data` is `{title, word_count, markdown, links, ...}` on
        success; `error` contains the failure reason otherwise.
    """
    body: dict = {"urls": urls}
    if max_tokens:
        body["max_tokens"] = max_tokens
    async with httpx.AsyncClient(timeout=DEFAULT_TIMEOUT) as client:
        resp = await client.post(f"{SCRAPER_URL}/batch", json=body)
        resp.raise_for_status()
        return resp.json()

src/ai_first_scraper_mcp/server.py:32-32 (registration)
The tool is registered via the @mcp.tool() decorator on line 58 applied to the fetch_pages_batch function. FastMCP instance is created on line 32.
```
mcp = FastMCP("ai-first-scraper")
```

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It declares concurrency (parallel), input limits (up to 25 URLs), per-URL soft cap (max_tokens), and output structure (ordered list with success/error). Missing: rate limits, auth, retries, timeouts. But for a batch fetch, these are sufficient to inform the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is compact (~6 lines), with clear Args and Returns sections. No redundant or irrelevant information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists (signal true), so return values are explained in description. Inputs fully described. Completeness is high for a straightforward batch fetch tool; minor omission: no explanation of potential error types or edge cases like invalid URLs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. It explains 'urls: Up to 25 URLs' and 'max_tokens: Optional per-URL soft cap on the returned Markdown.' This adds crucial constraints and semantics beyond the schema types.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Fetch many web pages in parallel and return each one's clean Markdown.' It distinguishes from sibling tools: fetch_page (single page) and search_web (search). The verb 'fetch' and resource 'web pages' are specific, and the batch aspect is highlighted.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: 'Use this whenever you need to read more than one URL at once — it is far faster than calling fetch_page in a loop.' This directly tells when to use and why, contrasting with the sibling fetch_page.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yubinkim444/ai-first-scraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server