Skip to main content
Glama
kkawailab

MLIT Data Platform MCP Server

by kkawailab

get_all_data

Retrieve large datasets from Japan's MLIT Data Platform using batch processing with filters for location, attributes, and keywords.

Instructions

条件に当てはまる大量のデータを取得する。

            使い方:
            - 大量件数をバッチで取得します。内部的に GraphQL `getAllData` を使用し、返却された `nextDataRequestToken` を用いて次バッチを自動で取得します。
            - 絞り込みは `term` / `phrase_match` と、属性(`catalog_id`, `dataset_id`, `prefecture_code`, `municipality_code`, `address`)や矩形範囲を組み合わせて指定できます。
            - 1回のバッチ件数は `size`(API上限は1000)。本ツールの既定は `size=1000`(最大値)で、`max_batches` または `max_items` で総取得量を制御します。
            - メタデータが不要な場合は `include_metadata=False` で転送量を削減できます。

            例:
            - データセット単位で全件取得(メタデータ付き):
            term="", dataset_id="mlit-001", size=1000, max_batches=10, include_metadata=True

            - カタログIDと矩形で範囲取得(東京都心部の例):
            term="", catalog_id="dimaps",
            location_rectangle_top_left_lat=35.80,  location_rectangle_top_left_lon=139.55,
            location_rectangle_bottom_right_lat=35.60, location_rectangle_bottom_right_lon=139.85,
            size=1000, max_batches=5

            - 都道府県コードのみで全件走査:
            term="", prefecture_code="13", size=1000, max_items=5000

            注意:
            - API仕様上、`locationFilter`(矩形など)**単独では検索不可**です。必ず `term` または `attributeFilter`(本ツールでは `catalog_id` / `dataset_id` / `prefecture_code` / `municipality_code` / `address` に相当)を併用してください。
            - 次バッチ取得時は `nextDataRequestToken` を使用し、**他の条件は無視**されます(ツール側で自動処理)。データが空になった時点で取得を停止します。
            - `size` のAPI上限は1000です(本ツールの既定値は1000)。大量取得時は `max_batches` / `max_items` を併用して制御してください。
            - 座標は WGS84。矩形は「北西(top_left)→南東(bottom_right)」の順で指定してください。
            - `include_metadata=False` にすると `id`/`title` 中心の軽量レスポンスになります。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
sizeNo1回のリクエストで取得する件数(最大1000)。大量データの場合はバッチ処理で自動的に複数回リクエストされます
termNo検索キーワード。属性フィルタのみの場合は空文字列""または省略
phrase_matchNoフレーズマッチモード
prefecture_codeNo都道府県コード。normalize_codesで正規化済みのコードを使用してください
municipality_codeNo市区町村コード(5桁)。例: '13101'=千代田区
addressNo住所による検索。都道府県名や市区町村名を含む文字列
catalog_idNoカタログID。get_data_catalog_summaryで確認可能
dataset_idNoデータセットID
location_rectangle_top_left_latNo矩形範囲の左上緯度
location_rectangle_top_left_lonNo矩形範囲の左上経度
location_rectangle_bottom_right_latNo矩形範囲の右下緯度
location_rectangle_bottom_right_lonNo矩形範囲の右下経度
max_batchesNo最大バッチ処理回数。20回 × size(1000) = 最大20,000件まで取得可能
include_metadataNoメタデータを含めるか。falseにするとレスポンスサイズが小さくなります
max_itemsNo取得する最大アイテム数の上限。設定するとmax_batchesより優先されます

Implementation Reference

  • The handler for collecting all data using the `get_all_data_iter` async iterator, managing batch counts, and returning a dictionary of items.
    async def get_all_data_collect(
        self,
        params: GetAllDataInput,
        *,
        max_items: Optional[int] = None
    ) -> Dict[str, Any]:
        items: List[Dict[str, Any]] = []
        count = 0
        batches = 0
    
        async for batch in self.get_all_data_iter(params):
            rows = [x.dict() for x in batch]
            items.extend(rows)
            count += len(rows)
            batches += 1
    
            if max_items is not None and count >= max_items:
                break
    
            if len(str(items).encode("utf-8")) > 900_000:
                logger.warning("get_all_data_collect: response approaching size cap, truncating further collection")
                break
    
        return {"batches": batches, "count": count, "items": items}
  • The iterator function that performs the actual API requests for `get_all_data`, handling batching, next_token logic, and filters.
    async def get_all_data_iter(self, params: GetAllDataInput) -> AsyncIterator[List[GetAllDataItem]]:
        token: Optional[str] = None
        batches = 0
    
        attr_filter_str = self.make_attribute_filter_strict_for_get_all_data(
            prefecture_code=params.prefecture_code,
            municipality_code=params.municipality_code,
            address=params.address,
            catalog_id=params.catalog_id,
            dataset_id=params.dataset_id,
        )
    
        loc_filter_str: Optional[str] = None
        if all(v is not None for v in [
            params.location_rectangle_top_left_lat,
            params.location_rectangle_top_left_lon,
            params.location_rectangle_bottom_right_lat,
            params.location_rectangle_bottom_right_lon
        ]):
            tl_lat = float(params.location_rectangle_top_left_lat)  # type: ignore
            tl_lon = float(params.location_rectangle_top_left_lon)  # type: ignore
            br_lat = float(params.location_rectangle_bottom_right_lat)  # type: ignore
            br_lon = float(params.location_rectangle_bottom_right_lon)  # type: ignore
            if not (-90 <= tl_lat <= 90 and -180 <= tl_lon <= 180 and -90 <= br_lat <= 90 and -180 <= br_lon <= 180):
                raise ValueError("Invalid rectangle coordinates")
            loc_filter_str = self.make_rectangle_filter(tl_lat, tl_lon, br_lat, br_lon)
    
        effective_first_term: Optional[str] = params.term
        if effective_first_term is None:
            if attr_filter_str is not None or loc_filter_str is not None:
                effective_first_term = ""
    
        while True:
            if batches >= params.max_batches:
                break
    
            if token:
                q = self.build_get_all_data(size=params.size, next_token=token)
            else:
                q = self.build_get_all_data(
                    size=params.size,
                    term=effective_first_term,
                    phrase_match=params.phrase_match,
                    attribute_filter=attr_filter_str,
                    location_filter=loc_filter_str,
                    next_token=None,
                )
    
            data = await self.post_query(q)
            node = data.get("getAllData") or {}
            raw_items = node.get("data") or []
            token = node.get("nextDataRequestToken")
    
            batch: List[GetAllDataItem] = []
            for it in raw_items:
                item = GetAllDataItem(
                    id=str(it.get("id")),
                    title=it.get("title"),
                    metadata=(it.get("metadata") if params.include_metadata else None),
                )
                batch.append(item)
    
            yield batch
    
            batches += 1
            if not batch or not token:
                break
  • The registration point in `server.py` where the `get_all_data` tool is called and routed to the `client.get_all_data_collect` method.
    elif name == "get_all_data":
        arguments = await _auto_normalize_region_args(arguments, client)
        p = GetAllDataInput.model_validate({
            "size": arguments.get("size", 1000),
            "term": arguments.get("term"),
            "phrase_match": arguments.get("phrase_match"),
            "prefecture_code": arguments.get("prefecture_code"),
            "municipality_code": arguments.get("municipality_code"),
            "address": arguments.get("address"),
            "catalog_id": arguments.get("catalog_id"),
            "dataset_id": arguments.get("dataset_id"),
            "location_rectangle_top_left_lat": arguments.get("location_rectangle_top_left_lat"),
            "location_rectangle_top_left_lon": arguments.get("location_rectangle_top_left_lon"),
            "location_rectangle_bottom_right_lat": arguments.get("location_rectangle_bottom_right_lat"),
            "location_rectangle_bottom_right_lon": arguments.get("location_rectangle_bottom_right_lon"),
            "max_batches": arguments.get("max_batches", 20),
            "include_metadata": arguments.get("include_metadata", True),
        })
        max_items = arguments.get("max_items")
        data = await client.get_all_data_collect(p, max_items=max_items)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, yet description comprehensively discloses: internal GraphQL usage, automatic pagination with nextDataRequestToken, API limit of 1000 per batch, WGS84 coordinate system, response format differences based on include_metadata (lightweight id/title only vs full), and termination condition (stops when data empty).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Excellent structure with clear sections (purpose, usage, examples, cautions). Information is front-loaded with the core purpose, followed by implementation details, concrete examples, and critical constraints. No redundant text; every sentence provides actionable guidance or constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex 15-parameter tool with batch processing and no output schema, the description is comprehensive. It explains the pagination mechanism, response payload variations, coordinate systems, and batch control logic. Minor gap: could explicitly state this is read-only (though implied by '取得'), but behavior is otherwise fully specified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage (baseline 3), but description adds crucial semantic relationships: max_items takes priority over max_batches, size=1000 is the API maximum and tool default, coordinate ordering (NW top-left to SE bottom-right), and that prefecture_code requires prior normalization via sibling tool. Examples demonstrate valid parameter combinations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Opening sentence '条件に当てはまる大量のデータを取得する' clearly states the specific action (retrieve large batch data) and scope. It distinguishes from siblings like get_data or search by emphasizing batch processing ('バッチで取得'), automatic pagination, and large-scale retrieval capabilities ('大量件数').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Dedicated '使い方' section explains exactly when to use (large batch retrieval) and how to combine filters (term + attributes/rectangle). Three concrete examples cover different scenarios (dataset retrieval, rectangle-based, prefecture scan). '注意' section explicitly states critical when-not constraints: locationFilter cannot be used alone, and nextDataRequestToken ignores other conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kkawailab/kklab-mlit-dpf-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server