crawl_web_page
Crawl a list of web pages to extract their content and data.
Instructions
웹 페이지 크롤링
args:
url_list: List[str], 크롤링할 웹 페이지 리스트
returns:
dict[Any] | None: 크롤링 결과Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url_list | Yes |
Implementation Reference
- data_api.py:402-402 (registration)The tool is registered via the @mcp.tool() decorator on line 402.
@mcp.tool() - data_api.py:404-421 (handler)The handler function 'crawl_web_page' that executes the web crawling logic. It accepts a list of URLs, sends a POST request to the HUBBLE_API_URL/web_crawl endpoint with the URLs, and returns the response text.
async def crawl_web_page( url_list: List[str]) -> dict[Any] | None: ''' 웹 페이지 크롤링 args: url_list: List[str], 크롤링할 웹 페이지 리스트 returns: dict[Any] | None: 크롤링 결과 ''' async with httpx.AsyncClient() as client: headers = {"X-API-Key": HUBBLE_API_KEY} response = await client.post( f"{HUBBLE_API_URL}/web_crawl", headers=headers, json={"urls": url_list}, timeout=30.0) response.raise_for_status() return response.text - data_api.py:403-403 (helper)The @async_retry decorator applied to the handler, providing retry logic (2 tries with 0.3s delay) for transient failures.
@async_retry(exceptions=(Exception), tries=2, delay=0.3) - data_api.py:19-38 (helper)The async_retry helper function that provides retry logic used by the crawl_web_page tool.
def async_retry(exceptions=(Exception), tries=3, delay=0.3, logger=None): def wrapper(func): @wraps(func) async def wrapped(*args, **kwargs): Tries = [] for i in range(tries): try: return await func(*args, **kwargs) except exceptions as ex: ex_msg = f"Tries({ex.__class__.__name__}) Cnt: {i+1}, {ex}" Tries.append(ex_msg) if logger: logger.warning(ex_msg) else: print(ex_msg) if delay: await asyncio.sleep(delay) raise TooManyTriesException(Tries) return wrapped return wrapper