Skip to main content
Glama
mcp-open-data-hk

Hong Kong Open Data MCP Server

Official

search_datasets

Search Hong Kong government open data portal datasets by query term to find relevant public data resources across titles, descriptions, and metadata.

Instructions

Search for datasets by query term using the package_search API.

This function searches across dataset titles, descriptions, and other metadata to find datasets matching the query term.

Args: query: The solr query string (e.g., "transport", "weather", ":" for all) limit: Maximum number of datasets to return (default: 10, max: 1000) offset: Offset for pagination language: Language code (en, tc, sc)

Returns: A dictionary containing: - count: Total number of matching datasets - results: List of matching datasets (up to limit) - has_more: Boolean indicating if there are more results available

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryNo*:*
limitNo
offsetNo
languageNoen

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The handler function for the 'search_datasets' tool, decorated with @mcp.tool for registration. It searches datasets using the data.gov.hk package_search API, handling query, limit, offset, and language parameters, and returns formatted results with count, results, and has_more flag.
    @mcp.tool
    async def search_datasets(
        query: str = "*:*", limit: int = 10, offset: int = 0, language: str = "en"
    ) -> Dict[str, Any]:
        """
        Search for datasets by query term using the package_search API.
    
        This function searches across dataset titles, descriptions, and other metadata
        to find datasets matching the query term.
    
        Args:
            query: The solr query string (e.g., "transport", "weather", "*:*" for all)
            limit: Maximum number of datasets to return (default: 10, max: 1000)
            offset: Offset for pagination
            language: Language code (en, tc, sc)
    
        Returns:
            A dictionary containing:
            - count: Total number of matching datasets
            - results: List of matching datasets (up to limit)
            - has_more: Boolean indicating if there are more results available
        """
        # Using package_search API for search functionality
        base_url = BASE_URLS.get(language, BASE_URLS["en"])
        url = f"{base_url}/package_search"
    
        # Limit the maximum number of results
        rows = min(limit, 1000)
    
        params = {"q": query, "rows": rows, "start": offset}
    
        result = await make_api_request(url, params)
    
        if result.get("success"):
            search_result = result["result"]
            return {
                "count": search_result.get("count", 0),
                "results": search_result.get("results", []),
                "has_more": search_result.get("count", 0) > (offset + rows),
            }
        else:
            raise Exception(f"API Error: {result.get('error', 'Unknown error')}")
  • Helper function used by search_datasets to make HTTP requests to the data.gov.hk API endpoints.
    async def make_api_request(
        url: str, params: Optional[Dict[str, Any]] = None
    ) -> Dict[str, Any]:
        """Make an API request to data.gov.hk"""
        async with httpx.AsyncClient() as client:
            # Print the request for debugging
            print(f"Making request to {url} with params {params}")
            response = await client.get(url, params=params)
            print(f"Response status: {response.status_code}")
            response.raise_for_status()
            return response.json()
  • The @mcp.tool decorator registers the search_datasets function as an MCP tool.
    @mcp.tool
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the API used ('package_search API'), search scope ('titles, descriptions, and other metadata'), and return structure. However, it lacks details on permissions, rate limits, error handling, or whether this is a read-only operation (implied by 'search' but not explicit). The description adds some behavioral context but misses key operational traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, args, returns) and uses bullet points for readability. It's appropriately sized but could be slightly more concise by integrating the 'Args' and 'Returns' into flowing text. Every sentence adds value, though the API name ('package_search API') might be overly technical without context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters with 0% schema coverage and an output schema (implied by 'Returns' section), the description is mostly complete. It covers parameters thoroughly and outlines return values, but lacks context on authentication, error cases, or sibling tool differentiation. The output schema description reduces the need for return value details, but behavioral gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides detailed semantics for all 4 parameters: 'query' with examples and explanation, 'limit' with default and max, 'offset' for pagination, and 'language' with codes. This adds significant meaning beyond the bare schema, fully documenting parameter usage and constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Search for datasets by query term using the package_search API' and 'searches across dataset titles, descriptions, and other metadata.' It specifies the verb ('search') and resource ('datasets'), but doesn't explicitly differentiate from siblings like 'search_datasets_with_facets' or 'list_datasets' beyond mentioning the query-based approach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through the query parameter examples (e.g., 'transport', 'weather', '*:*'), suggesting it's for keyword-based searches. However, it doesn't explicitly state when to use this tool versus alternatives like 'search_datasets_with_facets' (which likely includes facet filtering) or 'list_datasets' (which might list without querying). No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mcp-open-data-hk/mcp-open-data-hk'

If you have feedback or need assistance with the MCP directory API, please join our Discord server