Skip to main content
Glama
mcp-open-data-hk

Hong Kong Open Data MCP Server

Official

search_datasets

Search Hong Kong open data datasets by querying titles, descriptions, and metadata across the government portal. Supports pagination and language options.

Instructions

Search for datasets by query term using the package_search API.

This function searches across dataset titles, descriptions, and other metadata to find datasets matching the query term.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryNoThe solr query string (e.g., "transport", "weather", "*:*" for all)*:*
limitNoMaximum number of datasets to return (default: 10, max: 1000)
offsetNoOffset for pagination
languageNoLanguage code (en, tc, sc)en

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The actual handler/implementation of the 'search_datasets' tool. It is decorated with @mcp.tool and uses the data.gov.hk package_search API to search for datasets by query term. Returns count, results, and has_more flag.
    async def search_datasets(
        query: str = "*:*", limit: int = 10, offset: int = 0, language: str = "en"
    ) -> Dict[str, Any]:
        """
        Search for datasets by query term using the package_search API.
    
        This function searches across dataset titles, descriptions, and other metadata
        to find datasets matching the query term.
    
        Args:
            query: The solr query string (e.g., "transport", "weather", "*:*" for all)
            limit: Maximum number of datasets to return (default: 10, max: 1000)
            offset: Offset for pagination
            language: Language code (en, tc, sc)
    
        Returns:
            A dictionary containing:
            - count: Total number of matching datasets
            - results: List of matching datasets (up to limit)
            - has_more: Boolean indicating if there are more results available
        """
        # Using package_search API for search functionality
        base_url = BASE_URLS.get(language, BASE_URLS["en"])
        url = f"{base_url}/package_search"
    
        # Limit the maximum number of results
        rows = min(limit, 1000)
    
        params = {"q": query, "rows": rows, "start": offset}
    
        result = await make_api_request(url, params)
    
        if result.get("success"):
            search_result = result["result"]
            return {
                "count": search_result.get("count", 0),
                "results": search_result.get("results", []),
                "has_more": search_result.get("count", 0) > (offset + rows),
            }
        else:
            raise Exception(f"API Error: {result.get('error', 'Unknown error')}")
  • The make_api_request helper function used by search_datasets to make HTTP requests to the data.gov.hk API.
    async def make_api_request(
        url: str, params: Optional[Dict[str, Any]] = None
    ) -> Dict[str, Any]:
        """Make an API request to data.gov.hk"""
        async with httpx.AsyncClient() as client:
            # Print the request for debugging
            print(f"Making request to {url} with params {params}")
            response = await client.get(url, params=params)
            print(f"Response status: {response.status_code}")
            response.raise_for_status()
            return response.json()
  • The @mcp.tool decorator that registers search_datasets as a tool with the FastMCP server. The mcp instance is created on line 6.
    @mcp.tool
    async def search_datasets(
  • Input schema: query (str, default '*:*'), limit (int, default 10), offset (int, default 0), language (str, default 'en'). Output schema: Dict[str, Any] with keys 'count', 'results', 'has_more'.
        query: str = "*:*", limit: int = 10, offset: int = 0, language: str = "en"
    ) -> Dict[str, Any]:
        """
        Search for datasets by query term using the package_search API.
    
        This function searches across dataset titles, descriptions, and other metadata
        to find datasets matching the query term.
    
        Args:
            query: The solr query string (e.g., "transport", "weather", "*:*" for all)
            limit: Maximum number of datasets to return (default: 10, max: 1000)
            offset: Offset for pagination
            language: Language code (en, tc, sc)
    
        Returns:
            A dictionary containing:
            - count: Total number of matching datasets
            - results: List of matching datasets (up to limit)
            - has_more: Boolean indicating if there are more results available
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It reveals that the tool uses the package_search API and searches across metadata fields, but it does not disclose behavioral traits like pagination behavior, rate limits, or what happens with excess parameters. The output is not described despite an output schema existing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is exceptionally concise: two sentences that effectively state the purpose and the scope of search. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having an output schema and 100% schema coverage, the description omits important contextual details such as pagination (limit/offset), language filtering, and the fact that all parameters are optional. It does not prepare the agent for how to handle large result sets or filter by language.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, providing clear parameter definitions. The tool description adds some context (e.g., 'searches across dataset titles, descriptions, and other metadata') but does not significantly enhance parameter understanding beyond the schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches for datasets by query term using the package_search API. It specifies the scope (titles, descriptions, metadata). However, it does not explicitly differentiate from sibling tools like search_datasets_with_facets, which may perform similar but richer searches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as search_datasets_with_facets or list_datasets. It does not mention ideal scenarios, prerequisites, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mcp-open-data-hk/mcp-open-data-hk'

If you have feedback or need assistance with the MCP directory API, please join our Discord server