extract_batch
Extract content from multiple web pages simultaneously to gather research data efficiently. Processes URLs concurrently to compile structured information from various sources.
Instructions
Extract content from multiple URLs concurrently. Returns a list of extraction results.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | List of URLs to extract content from. | |
| max_concurrent | No | Maximum concurrent extractions (default 5). |
Implementation Reference
- src/interdeep/extraction/hybrid.py:65-74 (handler)Implementation of `extract_batch_async`, which performs concurrent extraction from multiple URLs.
async def extract_batch_async(urls: list[str], max_concurrent: int = 5) -> list[ExtractionResult]: """Extract content from multiple URLs concurrently.""" semaphore = asyncio.Semaphore(max_concurrent) async def _extract(url: str) -> ExtractionResult: async with semaphore: return await extract_hybrid_async(url=url) return await asyncio.gather(*[_extract(url) for url in urls]) - src/interdeep/server.py:175-186 (handler)Handler function `_handle_extract_batch` which processes tool calls for batch extraction.
async def _handle_extract_batch(arguments: dict) -> list[TextContent]: urls = arguments.get("urls", []) if not urls: return _err("urls is required and must be non-empty") max_concurrent = arguments.get("max_concurrent", 5) try: results = await extract_batch_async(urls, max_concurrent=max_concurrent) return _ok({"results": [_result_to_dict(r) for r in results]}) except Exception as e: logger.exception("extract_batch failed") return _err(f"Batch extraction failed: {e}") - src/interdeep/server.py:243-243 (registration)Registration of `extract_batch` in the `_HANDLERS` dictionary in the MCP server.
"extract_batch": _handle_extract_batch,