Skip to main content
Glama

batch-get-text

Extract raw text content from multiple URLs concurrently to improve efficiency. Simplifies data retrieval by processing batch URL requests simultaneously.

Instructions

Batch get raw text content from multiple URLs. Executes requests concurrently for better performance.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesArray of URLs to fetch

Implementation Reference

  • The batch_fetch_urls function implements the core logic for the batch-fetch-text tool, fetching text content concurrently from multiple URLs or requests, handling different formats and errors.
    async def batch_fetch_urls(requests: list[str | dict[str, Any]], as_json: bool = True, output_format: str = "markdown") -> list[dict[str, Any]]: """ Batch fetch content from multiple URLs concurrently. Args: requests: List of URLs (strings) or request objects with url, method, data, headers, output_format as_json: If True, validates content as JSON; if False, returns text content output_format: Default output format - "markdown", "clean_text", or "raw_html" (can be overridden per request) Returns: List of dictionaries with 'url', 'success', 'content', and optional 'error' keys """ async def fetch_single(request: str | dict[str, Any]) -> dict[str, Any]: try: if isinstance(request, str): # Simple URL string content = await fetch_url_content(request, as_json=as_json, output_format=output_format) return {"url": request, "success": True, "content": content} else: # Request object with additional parameters url = request.get("url", "") method = request.get("method", "GET") data = request.get("data") headers = request.get("headers") request_output_format = request.get("output_format", output_format) content = await fetch_url_content( url, as_json=as_json, method=method, data=data, headers=headers, output_format=request_output_format ) return {"url": url, "success": True, "content": content} except Exception as e: url = request if isinstance(request, str) else request.get("url", "") return {"url": url, "success": False, "error": str(e)} tasks = [fetch_single(request) for request in requests] results = await asyncio.gather(*tasks) return list(results)
  • Input schema definition for the batch-fetch-text tool, specifying the structure of the 'requests' array supporting simple URLs or detailed request objects.
    types.Tool( name="batch-fetch-text", description=( "Batch fetch raw text content from multiple URLs using various HTTP methods. " "Executes requests concurrently for better performance." ), inputSchema={ "type": "object", "properties": { "requests": { "type": "array", "description": "Array of URLs (strings) or request objects", "items": { "oneOf": [ {"type": "string"}, { "type": "object", "properties": { "url": { "type": "string", "description": "The URL to get text content from", }, "method": { "type": "string", "description": "HTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.", "default": "GET" }, "data": { "type": ["object", "string", "null"], "description": "Request body data for POST/PUT/PATCH requests. Can be a JSON object or string.", }, "headers": { "type": "object", "description": "Additional HTTP headers to include in the request", "additionalProperties": {"type": "string"} }, "output_format": { "type": "string", "description": "Output format: 'markdown' (default), 'clean_text', or 'raw_html'.", "enum": ["markdown", "clean_text", "raw_html"], "default": "markdown" } }, "required": ["url"] } ] }, }, }, "required": ["requests"], }, ),
  • Registration and dispatch logic in handle_call_tool that invokes batch_fetch_urls when the batch-fetch-text tool is called.
    elif tool_name == "batch-fetch-text": requests = args.get("requests", []) if not isinstance(requests, list) or not requests: result = "Failed to call tool, error: Missing or empty 'requests' array" else: output_format = args.get("output_format", "markdown") response_result = await batch_fetch_urls(requests, as_json=False, output_format=output_format) result = json.dumps(response_result)
  • Core helper function fetch_url_content used by batch_fetch_urls to fetch and process individual URL content, including text extraction and formatting.
    async def fetch_url_content( url: str, as_json: bool = True, method: str = "GET", data: dict | str | None = None, headers: dict[str, str] | None = None, output_format: str = "markdown" ) -> str: """ Fetch content from a URL using different HTTP methods. Args: url: URL to fetch content from as_json: If True, validates content as JSON; if False, returns text content method: HTTP method (GET, POST, PUT, DELETE, etc.) data: Request body data (for POST/PUT requests) headers: Additional headers to include in the request output_format: If as_json=False, output format - "markdown", "clean_text", or "raw_html" Returns: String content from the URL (JSON, Markdown, clean text, or raw HTML) Raises: httpx.RequestError: For network-related errors json.JSONDecodeError: If as_json=True and content is not valid JSON ValueError: If URL is invalid or unsafe """ # Validate URL first validate_url(url) config = await get_http_client_config() max_size = config.pop("max_size", 10 * 1024 * 1024) # Remove from client config # Merge additional headers with config headers (user headers override defaults) if headers: if config.get("headers"): config["headers"].update(headers) else: config["headers"] = headers async with httpx.AsyncClient(**config) as client: # Handle different HTTP methods method = method.upper() if method == "GET": response = await client.get(url) elif method == "POST": if isinstance(data, dict): response = await client.post(url, json=data) else: response = await client.post(url, content=data) elif method == "PUT": if isinstance(data, dict): response = await client.put(url, json=data) else: response = await client.put(url, content=data) elif method == "DELETE": response = await client.delete(url) elif method == "PATCH": if isinstance(data, dict): response = await client.patch(url, json=data) else: response = await client.patch(url, content=data) elif method == "HEAD": response = await client.head(url) elif method == "OPTIONS": response = await client.options(url) else: # For any other method, use the generic request method if isinstance(data, dict): response = await client.request(method, url, json=data) else: response = await client.request(method, url, content=data) response.raise_for_status() # Check response size content_length = len(response.content) if content_length > max_size: raise ValueError(f"Response size ({content_length} bytes) exceeds maximum allowed ({max_size} bytes)") if as_json: # For JSON responses, use response.text directly (no compression expected) content_to_parse = response.text if not content_to_parse: # If response.text is empty, try decoding content directly try: content_to_parse = response.content.decode('utf-8') except UnicodeDecodeError: content_to_parse = "" if content_to_parse: try: json.loads(content_to_parse) return content_to_parse except json.JSONDecodeError: # If text parsing fails, try content decoding as fallback if content_to_parse == response.text: try: fallback_content = response.content.decode('utf-8') json.loads(fallback_content) return fallback_content except (json.JSONDecodeError, UnicodeDecodeError): pass raise json.JSONDecodeError("Response is not valid JSON", content_to_parse, 0) else: # Empty response return "" else: # For text content, apply format conversion return extract_text_content(response.text, output_format)

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ackness/fetch-jsonpath-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server