Skip to main content
Glama
llnOrmll

World Bank Data360 MCP Server

by llnOrmll

retrieve_data_tool

Retrieve World Bank economic and social indicator data for specific countries, years, and indicators with customizable filters and sorting options.

Instructions

[STEP 3/3] Retrieve actual data from World Bank Data360.

⚠️ PREREQUISITE: Call get_temporal_coverage first to get latest_year.

🚨 CRITICAL TYPE REQUIREMENTS 🚨

When calling this tool, you MUST pass parameters with the EXACT types shown below. Common mistakes that cause validation errors:

❌ INCORRECT: {"limit": "10"} ← limit as STRING (causes error!) ✅ CORRECT: {"limit": 10} ← limit as INTEGER

❌ INCORRECT: {"exclude_aggregates": "true"} ← boolean as STRING ✅ CORRECT: {"exclude_aggregates": true} ← boolean as BOOLEAN

❌ INCORRECT: {"year": 2023} ← year as NUMBER ✅ CORRECT: {"year": "2023"} ← year as STRING

📋 PARAMETER TYPES - MUST MATCH EXACTLY:

STRING parameters (use quotes in JSON): indicator: "WB_WDI_SP_POP_TOTL" database: "WB_WDI" year: "2023" countries: "USA,CHN,JPN" sex: "M" or "F" or "_T" age: "0-14" sort_order: "desc" or "asc"

INTEGER parameters (no quotes in JSON): limit: 10 (default: 20)

BOOLEAN parameters (no quotes in JSON): exclude_aggregates: true or false (default: true) compact_response: true or false (default: true)

🎯 CORRECT JSON EXAMPLES:

Example 1 - Top 10 countries by population: { "indicator": "WB_WDI_SP_POP_TOTL", "database": "WB_WDI", "year": "2023", "limit": 10, "sort_order": "desc", "exclude_aggregates": true }

Example 2 - Specific countries GDP: { "indicator": "WB_WDI_NY_GDP_MKTP_CD", "database": "WB_WDI", "year": "2023", "countries": "USA,CHN,JPN" }

Example 3 - All data with aggregates: { "indicator": "WB_WDI_SP_POP_TOTL", "database": "WB_WDI", "year": "2022", "exclude_aggregates": false }

⚡ HOW IT WORKS:

  • countries parameter: API fetches ONLY those countries (efficient)

  • exclude_aggregates: Filters out 47 regional/income codes (ARB, AFE, WLD, HIC, etc.) ⚠️ DEFAULT is TRUE - only individual countries returned Set to false to include aggregates like "World", "High income", "Arab World"

  • sort_order: Sorts by OBS_VALUE before limiting

  • limit parameter: Returns top N records to minimize tokens ⚠️ DEFAULT is 20 - provides reasonable default, override if you need more

  • compact_response: Returns only essential fields (country, country_name, year, value) ⚠️ DEFAULT is TRUE - minimizes token usage by ~75% Set to false if you need all fields (REF_AREA, TIME_PERIOD, OBS_VALUE, UNIT_MEASURE, etc.)

📊 AFTER RECEIVING DATA: Format results as markdown table:

  • Sort by value (highest to lowest)

  • Add rank numbers

  • Format with thousand separators

  • Include country names (not just codes)

Returns: Data records with summary statistics.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
indicatorYes
databaseYes
yearNo
countriesNo
sexNo
ageNo
limitNo
sort_orderNodesc
exclude_aggregatesNo
compact_responseNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • Handler function for 'retrieve_data_tool' registered via @server.tool() decorator. Converts year parameter to string and delegates to the retrieve_data helper function. Includes detailed docstring specifying parameter types and usage examples.
        def retrieve_data_tool(
            indicator: str,
            database: str,
            year: str | int | None = None,
            countries: str | None = None,
            sex: str | None = None,
            age: str | None = None,
            limit: int = 20,
            sort_order: str = "desc",
            exclude_aggregates: bool = True,
            compact_response: bool = True
        ) -> dict[str, Any]:
            """[STEP 3/3] Retrieve actual data from World Bank Data360.
    
    ⚠️ PREREQUISITE: Call get_temporal_coverage first to get latest_year.
    
    🚨 CRITICAL TYPE REQUIREMENTS 🚨
    
    When calling this tool, you MUST pass parameters with the EXACT types shown below.
    Common mistakes that cause validation errors:
    
    ❌ INCORRECT: {"limit": "10"}      ← limit as STRING (causes error!)
    ✅ CORRECT:   {"limit": 10}        ← limit as INTEGER
    
    ❌ INCORRECT: {"exclude_aggregates": "true"}    ← boolean as STRING
    ✅ CORRECT:   {"exclude_aggregates": true}      ← boolean as BOOLEAN
    
    ❌ INCORRECT: {"year": 2023}       ← year as NUMBER
    ✅ CORRECT:   {"year": "2023"}     ← year as STRING
    
    📋 PARAMETER TYPES - MUST MATCH EXACTLY:
    
    STRING parameters (use quotes in JSON):
      indicator: "WB_WDI_SP_POP_TOTL"
      database: "WB_WDI"
      year: "2023"
      countries: "USA,CHN,JPN"
      sex: "M" or "F" or "_T"
      age: "0-14"
      sort_order: "desc" or "asc"
    
    INTEGER parameters (no quotes in JSON):
      limit: 10 (default: 20)
    
    BOOLEAN parameters (no quotes in JSON):
      exclude_aggregates: true or false (default: true)
      compact_response: true or false (default: true)
    
    🎯 CORRECT JSON EXAMPLES:
    
    Example 1 - Top 10 countries by population:
    {
      "indicator": "WB_WDI_SP_POP_TOTL",
      "database": "WB_WDI",
      "year": "2023",
      "limit": 10,
      "sort_order": "desc",
      "exclude_aggregates": true
    }
    
    Example 2 - Specific countries GDP:
    {
      "indicator": "WB_WDI_NY_GDP_MKTP_CD",
      "database": "WB_WDI",
      "year": "2023",
      "countries": "USA,CHN,JPN"
    }
    
    Example 3 - All data with aggregates:
    {
      "indicator": "WB_WDI_SP_POP_TOTL",
      "database": "WB_WDI",
      "year": "2022",
      "exclude_aggregates": false
    }
    
    ⚡ HOW IT WORKS:
    - countries parameter: API fetches ONLY those countries (efficient)
    - exclude_aggregates: Filters out 47 regional/income codes (ARB, AFE, WLD, HIC, etc.)
      ⚠️ DEFAULT is TRUE - only individual countries returned
      Set to false to include aggregates like "World", "High income", "Arab World"
    - sort_order: Sorts by OBS_VALUE before limiting
    - limit parameter: Returns top N records to minimize tokens
      ⚠️ DEFAULT is 20 - provides reasonable default, override if you need more
    - compact_response: Returns only essential fields (country, country_name, year, value)
      ⚠️ DEFAULT is TRUE - minimizes token usage by ~75%
      Set to false if you need all fields (REF_AREA, TIME_PERIOD, OBS_VALUE, UNIT_MEASURE, etc.)
    
    📊 AFTER RECEIVING DATA:
    Format results as markdown table:
    - Sort by value (highest to lowest)
    - Add rank numbers
    - Format with thousand separators
    - Include country names (not just codes)
    
    Returns: Data records with summary statistics."""
            year_str = str(year) if year is not None else None
    
            return retrieve_data(
                indicator, database, year_str, countries, sex, age,
                limit, sort_order, exclude_aggregates, compact_response
            )
  • Core helper function implementing the data retrieval logic: constructs API parameters, handles pagination loop, fetches data from Data360 endpoint, applies client-side filtering (exclude aggregates), sorting by OBS_VALUE, limiting results, compacting response fields, and generating summary statistics.
    def retrieve_data(
        indicator: str,
        database: str,
        year: str | None = None,
        countries: str | None = None,
        sex: str | None = None,
        age: str | None = None,
        limit: int | None = 20,
        sort_order: str = "desc",
        exclude_aggregates: bool = True,
        compact_response: bool = True
    ) -> dict[str, Any]:
        """Retrieve actual data from World Bank API"""
        
        params = {
            "DATABASE_ID": database,
            "INDICATOR": indicator,
            "skip": 0,
        }
    
        # Apply filters
        if year:
            params["timePeriodFrom"] = year
            params["timePeriodTo"] = year
        
        if countries:
            params["REF_AREA"] = countries
        
        if sex:
            params["SEX"] = sex
        
        if age:
            params["AGE"] = age
    
        all_data = []
    
        try:
            # Pagination loop (max 10000 records as safety limit)
            while len(all_data) < 10000:
                response = requests.get(
                    DATA_ENDPOINT,
                    params=params,
                    headers={"Accept": "application/json"},
                    timeout=30,
                )
                response.raise_for_status()
    
                data = response.json()
                values = data.get("value", [])
    
                if not values:
                    break
    
                all_data.extend(values)
    
                total_count = data.get("count", 0)
                if len(all_data) >= total_count:
                    break
    
                params["skip"] = len(all_data)
    
            # CLIENT-SIDE: Filter out regional/income aggregates if requested
            if exclude_aggregates and all_data:
                all_data = [d for d in all_data if d.get("REF_AREA") not in AGGREGATE_CODES]
    
            # CLIENT-SIDE: Sort by OBS_VALUE if requested
            if all_data and sort_order:
                data_with_values = [d for d in all_data if d.get("OBS_VALUE") is not None]
                data_without_values = [d for d in all_data if d.get("OBS_VALUE") is None]
    
                reverse_order = (sort_order.lower() == "desc")
                try:
                    sorted_data = sorted(
                        data_with_values,
                        key=lambda x: float(str(x.get("OBS_VALUE", "0"))),
                        reverse=reverse_order
                    )
                    all_data = sorted_data + data_without_values
                except (ValueError, TypeError) as e:
                    # If sorting fails, return error in response
                    return {
                        "success": False,
                        "error": f"Sorting failed: {str(e)}. OBS_VALUE type: {type(data_with_values[0].get('OBS_VALUE')) if data_with_values else 'no data'}"
                    }
    
            # CLIENT-SIDE: Apply limit to reduce tokens sent to Claude
            display_data = all_data[:limit] if limit else all_data
    
            # Generate summary (before compacting to preserve field names)
            unique_countries = set(d.get("REF_AREA") for d in display_data if d.get("REF_AREA"))
            unique_years = sorted(set(d.get("TIME_PERIOD") for d in display_data if d.get("TIME_PERIOD")))
    
            # CLIENT-SIDE: Compact response to minimize tokens (only essential fields)
            if compact_response and display_data:
                display_data = [
                    {
                        "country": d.get("REF_AREA"),
                        "country_name": d.get("REF_AREA_label"),
                        "year": d.get("TIME_PERIOD"),
                        "value": d.get("OBS_VALUE"),
                    }
                    for d in display_data
                ]
    
            return {
                "success": True,
                "record_count": len(display_data),
                "total_available": len(all_data),
                "data": display_data,
                "summary": {
                    "countries": len(unique_countries),
                    "years": unique_years,
                    "applied_filters": {
                        "year": year,
                        "countries": countries,
                        "sex": sex,
                        "age": age
                    }
                }
            }
            
        except Exception as e:
            return {"success": False, "error": str(e)}
  • The @server.tool() decorator registers the retrieve_data_tool as an MCP tool in the FastMCP server.
        def retrieve_data_tool(
            indicator: str,
            database: str,
            year: str | int | None = None,
            countries: str | None = None,
            sex: str | None = None,
            age: str | None = None,
            limit: int = 20,
            sort_order: str = "desc",
            exclude_aggregates: bool = True,
            compact_response: bool = True
        ) -> dict[str, Any]:
            """[STEP 3/3] Retrieve actual data from World Bank Data360.
    
    ⚠️ PREREQUISITE: Call get_temporal_coverage first to get latest_year.
    
    🚨 CRITICAL TYPE REQUIREMENTS 🚨
    
    When calling this tool, you MUST pass parameters with the EXACT types shown below.
    Common mistakes that cause validation errors:
    
    ❌ INCORRECT: {"limit": "10"}      ← limit as STRING (causes error!)
    ✅ CORRECT:   {"limit": 10}        ← limit as INTEGER
    
    ❌ INCORRECT: {"exclude_aggregates": "true"}    ← boolean as STRING
    ✅ CORRECT:   {"exclude_aggregates": true}      ← boolean as BOOLEAN
    
    ❌ INCORRECT: {"year": 2023}       ← year as NUMBER
    ✅ CORRECT:   {"year": "2023"}     ← year as STRING
    
    📋 PARAMETER TYPES - MUST MATCH EXACTLY:
    
    STRING parameters (use quotes in JSON):
      indicator: "WB_WDI_SP_POP_TOTL"
      database: "WB_WDI"
      year: "2023"
      countries: "USA,CHN,JPN"
      sex: "M" or "F" or "_T"
      age: "0-14"
      sort_order: "desc" or "asc"
    
    INTEGER parameters (no quotes in JSON):
      limit: 10 (default: 20)
    
    BOOLEAN parameters (no quotes in JSON):
      exclude_aggregates: true or false (default: true)
      compact_response: true or false (default: true)
    
    🎯 CORRECT JSON EXAMPLES:
    
    Example 1 - Top 10 countries by population:
    {
      "indicator": "WB_WDI_SP_POP_TOTL",
      "database": "WB_WDI",
      "year": "2023",
      "limit": 10,
      "sort_order": "desc",
      "exclude_aggregates": true
    }
    
    Example 2 - Specific countries GDP:
    {
      "indicator": "WB_WDI_NY_GDP_MKTP_CD",
      "database": "WB_WDI",
      "year": "2023",
      "countries": "USA,CHN,JPN"
    }
    
    Example 3 - All data with aggregates:
    {
      "indicator": "WB_WDI_SP_POP_TOTL",
      "database": "WB_WDI",
      "year": "2022",
      "exclude_aggregates": false
    }
    
    ⚡ HOW IT WORKS:
    - countries parameter: API fetches ONLY those countries (efficient)
    - exclude_aggregates: Filters out 47 regional/income codes (ARB, AFE, WLD, HIC, etc.)
      ⚠️ DEFAULT is TRUE - only individual countries returned
      Set to false to include aggregates like "World", "High income", "Arab World"
    - sort_order: Sorts by OBS_VALUE before limiting
    - limit parameter: Returns top N records to minimize tokens
      ⚠️ DEFAULT is 20 - provides reasonable default, override if you need more
    - compact_response: Returns only essential fields (country, country_name, year, value)
      ⚠️ DEFAULT is TRUE - minimizes token usage by ~75%
      Set to false if you need all fields (REF_AREA, TIME_PERIOD, OBS_VALUE, UNIT_MEASURE, etc.)
    
    📊 AFTER RECEIVING DATA:
    Format results as markdown table:
    - Sort by value (highest to lowest)
    - Add rank numbers
    - Format with thousand separators
    - Include country names (not just codes)
    
    Returns: Data records with summary statistics."""
            year_str = str(year) if year is not None else None
    
            return retrieve_data(
                indicator, database, year_str, countries, sex, age,
                limit, sort_order, exclude_aggregates, compact_response
            )
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and delivers comprehensive behavioral disclosure. It explains how parameters affect API behavior (e.g., countries parameter fetches only those countries efficiently, exclude_aggregates filters out regional codes), describes defaults and their rationale (limit default 20 for token minimization), and details response formatting expectations. No contradictions exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (prerequisite, type requirements, examples, how it works, formatting instructions). However, it's quite lengthy with repetitive emphasis on type requirements. Some information could be more condensed while maintaining clarity, as multiple examples and warnings about quotes could be streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (10 parameters, 0% schema coverage, no annotations, but has output schema), the description is remarkably complete. It covers prerequisites, parameter semantics, behavioral traits, defaults, examples, and post-processing instructions. The output schema existence means return values don't need explanation, and the description provides everything else needed for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed parameter semantics. It explains type requirements with examples, describes what each parameter does (e.g., 'countries parameter: API fetches ONLY those countries'), clarifies defaults and their effects, and provides correct JSON examples. This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool retrieves data from World Bank Data360, which is a clear purpose. However, it doesn't differentiate from sibling tools like search_datasets_tool or list_popular_indicators, leaving it vague whether this is for raw data retrieval versus metadata exploration. The description focuses more on technical requirements than distinguishing functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: it states a prerequisite (call get_temporal_coverage first), gives clear examples of when to use specific parameters, and implicitly contrasts with siblings by focusing on actual data retrieval rather than metadata. The 'HOW IT WORKS' section offers detailed context for parameter selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/llnOrmll/world-bank-data-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server