Skip to main content
Glama
ascentkorea

Hubble MCP Server

by ascentkorea

crawl_web_page

Extract content from specified web pages by providing URLs, returning structured data for analysis or processing.

Instructions

웹 페이지 크롤링
args:
    url_list: List[str], 크롤링할 웹 페이지 리스트
returns:
    dict[Any] | None: 크롤링 결과

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
url_listYes

Implementation Reference

  • The main handler function for the 'crawl_web_page' tool, decorated with @mcp.tool() for registration. It asynchronously posts the list of URLs to the Hubble API's /web_crawl endpoint and returns the response.
    @mcp.tool()
    @async_retry(exceptions=(Exception), tries=2, delay=0.3)
    async def crawl_web_page(
            url_list: List[str]) -> dict[Any] | None:
        '''
        웹 페이지 크롤링
        args:
            url_list: List[str], 크롤링할 웹 페이지 리스트
        returns:
            dict[Any] | None: 크롤링 결과
        '''
        async with httpx.AsyncClient() as client:
            headers = {"X-API-Key": HUBBLE_API_KEY}
            response = await client.post(
                f"{HUBBLE_API_URL}/web_crawl",
                headers=headers,
                json={"urls": url_list},
                timeout=30.0)
            response.raise_for_status()
            return response.text
  • data_api.py:402-402 (registration)
    The @mcp.tool() decorator registers the crawl_web_page function as an MCP tool.
    @mcp.tool()
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It only states the basic action ('웹 페이지 크롤링') and return type, but lacks critical behavioral details such as rate limits, authentication needs, whether it's read-only or destructive, error handling, or what the crawling entails (e.g., depth, content extraction). This is inadequate for a tool with potential complexity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with a clear structure: a title phrase followed by args and returns sections. It uses minimal words to convey the core information, though the lack of detailed content means it's efficient but potentially under-specified. Every sentence earns its place, but more could be added for completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of web crawling (with no annotations, no output schema, and 0% schema coverage), the description is incomplete. It doesn't explain the return value's structure (beyond 'dict[Any] | None'), error cases, or behavioral aspects like concurrency or timeouts. For a tool that interacts with external resources, this leaves too much undefined.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds minimal semantics by specifying 'url_list: List[str], 크롤링할 웹 페이지 리스트' (list of URLs to crawl), which clarifies the parameter's purpose beyond the schema's basic type. However, it doesn't explain constraints like URL format, maximum list size, or handling of invalid URLs, leaving significant gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states '웹 페이지 크롤링' (web page crawling) which provides a basic verb+resource, but it's vague about what specific crawling operation it performs. It doesn't distinguish itself from sibling tools like crawl_google_serp or crawl_google_trends, which suggests different crawling targets. The purpose is identifiable but lacks specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description doesn't mention any context, prerequisites, or exclusions, such as whether it's for general web pages versus specific types (e.g., Google services). With siblings like crawl_google_serp, there's no indication of when to choose one over the other.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ascentkorea/hubble_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server