Skip to main content
Glama
jpazvd

unicefstats-mcp

by jpazvd

MCP Badge

unicefstats-mcp

Experimental — not an official UNICEF product. Verify retrieved values against the UNICEF Data Warehouse before citing in publications. See Limitations.

MCP server for UNICEF child development statistics. Query 790+ child-focused indicators across 200+ countries with disaggregations by sex, age, wealth quintile, and residence. No API key required.

Indicators cover child mortality, nutrition, education, child protection, WASH (water/sanitation/hygiene), HIV/AIDS, immunization, early childhood development, and more. Many align with SDG targets, but the dataset is broader than SDGs alone.

Data source: UNICEF SDMX API

Identity

Property

Value

MCP identity

io.github.jpazvd/unicefstats-mcp

PyPI package

unicefstats-mcp

Canonical source

github.com/jpazvd/unicefstats-mcp

Data source

UNICEF Data Warehouse via SDMX REST API

Maintainer

Joao Pedro Azevedo (jpazvd)

Status

Experimental — not endorsed by UNICEF

Third-party aggregator listings (LobeHub, Smithery, mcp.so, Glama) are not controlled by the maintainer. Verify against the canonical source above.

Contents

Key documents

Document

Description

PROVENANCE.md

Data origin, ownership, distribution pipeline, verification steps

CHANGELOG.md

Version history (v0.1.0–v0.4.0) with sources cited

RELEASE.md

Release process checklist and version management

CONTRIBUTING.md

Development setup, code style, PR guidelines

CODE_OF_CONDUCT.md

Contributor Covenant v2.1

examples/RESULTS.md

Full 300-query benchmark analysis with EQA decomposition

examples/LITERATURE_REVIEW.md

Literature review: MCP servers for official statistics — ecosystem, patterns, evaluation, 15 papers

examples/LANDSCAPE.md

20 official statistics MCP servers compared — timeline, feature matrix, strengths/weaknesses

examples/results/related_work.md

Annotated bibliography — 15 papers on tool-augmented hallucination

examples/results/statistical_summary.md

Wilcoxon, bootstrap CI, McNemar tests on benchmark results

examples/MCP-DIRECTORY-STATS.md

Comprehensive directory of all official statistics MCP servers

How it relates to the unicefdata packages

unicefstats-mcp is not a replacement for the unicefdata packages in Python, R, or Stata. They serve different audiences:

unicefstats-mcp

unicefdata (Python/R/Stata)

Audience

AI assistants (Claude, Cursor, Copilot)

Data scientists, researchers, analysts

Interface

MCP protocol (tool calls via JSON)

Native language API (library(), import, ssc install)

Use case

Conversational data exploration, quick lookups, AI-assisted analysis

Reproducible research, ETL pipelines, statistical analysis

Output

JSON (compact or full) optimized for LLM context

DataFrames, tibbles, Stata matrices

Scripting

No — single queries via AI chat

Yes — full programmatic control, loops, joins, transforms

Caching

Delegates to unicefdata

Built-in SDMX response caching

Bulk download

Limited (max 500 rows per call)

Unlimited — designed for full dataset pulls

Under the hood, unicefstats-mcp wraps the unicefdata Python package. Every tool call ultimately calls unicefdata.unicefData() or its metadata functions. Think of the MCP as a thin AI-friendly interface on top of the same data layer.

When to use which:

  • Use unicefstats-mcp when you're chatting with an AI and want to quickly explore indicators, check values, or compare countries

  • Use unicefdata (Python/R/Stata) when you're writing scripts, building dashboards, running regressions, or doing any reproducible analytical work

How it compares to other data MCPs

Feature

unicefstats-mcp

FRED MCP

World Bank MCP

Tools

8 (search → metadata → data → code → identity)

3 (browse → search → get)

1 (get only)

Indicators

790+ child-focused indicators

800,000+ economic series

~1,600 indicators

Countries

200+ (ISO3)

US-focused (some intl)

200+ (ISO2)

Disaggregations

Sex, age, wealth quintile, residence

Frequency, seasonal adjustment

None

MCP Prompt

compare_indicators

None

None

Output modes

Compact (5 cols) / Full (all cols)

JSON

CSV

Data summary

Value range, year range, country count

None

None

Pagination metadata

total_rows_available vs rows_returned

limit/offset

None (hardcoded 20K)

Input validation

ISO3, sex, wealth, residence validated

Zod schemas

None

Error guidance

error + tip with next steps

HTTP status text

Raw exception

API key

Not required

FRED_API_KEY required

Not required

Truncation handling

rows_truncated flag + filter tips

None

None

Landscape: MCP servers for official statistics

This project is part of a growing ecosystem of MCP servers for international and official statistics. As of March 2026:

UN Agencies

Server

Data Source

Tools

SDMX

Published

unicefstats-mcp (this repo)

UNICEF Data Warehouse

7

Yes

PyPI

sdmx-mcp

Any SDMX registry

23

Yes

No

unicef-datawarehouse-mcp

UNICEF Data Warehouse

3

Yes

No

mcp_unhcr

UNHCR refugee data

5

No

No

medical-mcp

WHO GHO / FDA / PubMed

18

No

npm

International Organizations

Server

Data Source

Tools

SDMX

Published

fred-mcp-server

FRED (800K+ series)

3

No

npm

world_bank_mcp_server

World Bank Open Data

1

No

No

imf-data-mcp

IMF (IFS, BOP, WEO)

10

Yes

PyPI

OECD-MCP

OECD (5,000+ datasets)

9

Yes

npm

eurostat-mcp

Eurostat EU statistics

7

Yes

No

National Statistics Offices

Server

Data Source

Tools

Published

us-census-bureau-data-api-mcp

US Census Bureau (official)

5

No

us-gov-open-data-mcp

40+ US Gov APIs

300+

npm

ibge-br-mcp

Brazil IBGE (227 tests)

22

npm

ukrainian-stats-mcp-server

Ukraine SDMX v3

8

npm

istat_mcp_server

Italy ISTAT SDMX

7

No

Known gaps

No MCP server exists for: FAO/FAOSTAT, UNESCO/UIS (4,000+ education indicators), ILO/ILOSTAT, UNSD SDG API, UN DESA Population, UNDP/HDI.

Full directory with install commands: MCP-DIRECTORY-STATS.md

Relationship to sdmx-mcp

UNICEF also maintains sdmx-mcp, a generic SDMX protocol MCP server. The two servers are complementary, not competing:

unicefstats-mcp (this repo)

sdmx-mcp

Scope

UNICEF child development data only

Any SDMX registry (UNICEF, Eurostat, OECD, ...)

Tools

7 (analyst-friendly, 4-step workflow)

23 (SDMX power-user, structural queries)

Data layer

Wraps unicefdata Python package

Direct SDMX REST API calls via httpx

Output

Formatted for LLMs (compact tables, summaries, tips)

Raw SDMX-JSON/CSV

Accuracy (EQA)

0.990

0.074

Hallucination

7% T1 / 34% T2

0% T1 / 0% T2

Cost per query

$0.018

$0.087

Latency

9.8s avg

60s avg

Key tradeoff: unicefstats-mcp is dramatically more accurate (EQA 0.990 vs 0.074) because its formatted output is optimized for LLM parsing. sdmx-mcp has zero hallucination because its assistant_guidance fields and validate_query_scope pattern effectively prevent fabrication when data is absent.

When to use which:

  • Use unicefstats-mcp for UNICEF child development analysis — it's simpler, faster, and far more accurate

  • Use sdmx-mcp when you need to query non-UNICEF SDMX registries, explore dataflow structures, or work with hierarchical codelists

Full 3-way benchmark (LLM alone vs unicefstats-mcp vs sdmx-mcp): examples/results/

Quick Start

pip install unicefstats-mcp

Claude Code

Add to ~/.claude/.mcp.json:

{
  "mcpServers": {
    "unicefstats": {
      "command": "unicefstats-mcp"
    }
  }
}

Cursor / VS Code

Add to your MCP settings:

{
  "unicefstats": {
    "command": "unicefstats-mcp"
  }
}

Tools

Tool

Purpose

API call?

search_indicators(query, limit)

Find indicators by keyword

No

list_categories()

Browse thematic groups (CME, NUTRITION, EDUCATION, ...)

No

list_countries(region)

List countries with ISO3 codes

No

get_indicator_info(code)

Full metadata, SDMX details, available disaggregations

No

get_temporal_coverage(code)

Available year range and country count

Yes (lightweight)

get_data(indicator, countries, ...)

Fetch observations with optional disaggregation filters

Yes

get_api_reference(language, function)

unicefdata package API reference (Python/R/Stata)

No

get_server_metadata()

Server identity, version, provenance, data source

No

Workflow

1. search_indicators("child mortality")     → find indicator codes
2. get_indicator_info("CME_MRY0T4")         → check disaggregations & SDMX details
3. get_temporal_coverage("CME_MRY0T4")      → check year range
4. get_data("CME_MRY0T4", ["BRA", "IND"])   → fetch data
5. get_api_reference("python", "unicefData") → get code template to continue in a script

Resources

The server exposes six MCP resources clients can load for guidance and reference data:

URI

Purpose

unicef://system-prompt

Recommended system prompt — operating loop + temporal-frontier check + anti-extrapolation directive (load at session start)

unicef://llm-instructions

Full DO/DON'T rules, common mistakes, and anti-fabrication guidance

unicef://context

Runtime context — current_date / current_year for temporal-query sanity checks

unicef://categories

All indicator categories with counts

unicef://countries

ISO3 codes and country names

unicef://glossary

Disaggregation codes and indicator-prefix legend

The system-prompt and context resources address the T2 hallucination failure mode (model fabricating values for years beyond the data frontier). Pattern adopted from the World Bank data360-mcp server. See CHANGELOG entry for v0.5.0.

Demo

Step 1: Search for indicators

>>> search_indicators("stunting", limit=3)
{
  "query": "stunting",
  "total_matches": 11,
  "showing": 3,
  "results": [
    {"code": "FD_STUNTING", "name": "Moderate and severe stunting (Functional difficulties)"},
    {"code": "NT_ANT_HAZ_NE2", "name": "Height-for-age <-2 SD (stunting)"},
    {"code": "NT_ANT_HAZ_NE3", "name": "Height-for-age <-3 SD (severe stunting)"}
  ],
  "tip": "Use get_indicator_info('FD_STUNTING') for full details including available disaggregations."
}

Step 2: Get indicator metadata

>>> get_indicator_info("CME_MRY0T4")
{
  "code": "CME_MRY0T4",
  "name": "Under-five mortality rate",
  "description": "Probability of dying between birth and exactly 5 years of age, expressed per 1,000 live births",
  "dataflow": "GLOBAL_DATAFLOW",
  "sdmx_api": "https://sdmx.data.unicef.org/ws/public/sdmxapi/rest/data/UNICEF,GLOBAL_DATAFLOW,1.0/.CME_MRY0T4?format=csv",
  "disaggregation_filters": {
    "sex": ["_T (Total)", "M (Male)", "F (Female)"],
    "wealth_quintile": ["Q1 (Lowest)", "Q2", "Q3", "Q4", "Q5 (Highest)"],
    "residence": ["_T (Total)", "U (Urban)", "R (Rural)"]
  }
}

Step 3: Check temporal coverage

>>> get_temporal_coverage("CME_MRY0T4")
{
  "code": "CME_MRY0T4",
  "start_year": 1931,
  "end_year": 2024,
  "latest_year": 2024,
  "countries_with_data": 249,
  "note": "Not all countries have data for all years. Coverage varies by country."
}

Step 4: Fetch data

>>> get_data("CME_MRY0T4", ["BRA", "IND", "NGA"], start_year=2018, end_year=2023)
{
  "indicator": "CME_MRY0T4",
  "countries_requested": ["BRA", "IND", "NGA"],
  "total_rows_available": 18,
  "rows_returned": 18,
  "rows_truncated": false,
  "format": "compact",
  "summary": {
    "value_range": {"min": 14.42, "max": 117.56, "mean": 54.78},
    "year_range": {"earliest": 2018, "latest": 2023},
    "countries_in_result": 3
  },
  "data": [
    {"iso3": "BRA", "country": "Brazil",  "period": 2018, "indicator": "CME_MRY0T4", "value": 15.22},
    {"iso3": "BRA", "country": "Brazil",  "period": 2019, "indicator": "CME_MRY0T4", "value": 15.03},
    {"iso3": "BRA", "country": "Brazil",  "period": 2020, "indicator": "CME_MRY0T4", "value": 14.87},
    {"iso3": "BRA", "country": "Brazil",  "period": 2021, "indicator": "CME_MRY0T4", "value": 14.72},
    {"iso3": "BRA", "country": "Brazil",  "period": 2022, "indicator": "CME_MRY0T4", "value": 14.59},
    {"iso3": "BRA", "country": "Brazil",  "period": 2023, "indicator": "CME_MRY0T4", "value": 14.42},
    {"iso3": "IND", "country": "India",   "period": 2018, "indicator": "CME_MRY0T4", "value": 36.87},
    {"iso3": "IND", "country": "India",   "period": 2019, "indicator": "CME_MRY0T4", "value": 34.86},
    {"iso3": "IND", "country": "India",   "period": 2020, "indicator": "CME_MRY0T4", "value": 32.98},
    {"iso3": "IND", "country": "India",   "period": 2021, "indicator": "CME_MRY0T4", "value": 31.19},
    {"iso3": "IND", "country": "India",   "period": 2022, "indicator": "CME_MRY0T4", "value": 29.53},
    {"iso3": "IND", "country": "India",   "period": 2023, "indicator": "CME_MRY0T4", "value": 27.99},
    {"iso3": "NGA", "country": "Nigeria", "period": 2018, "indicator": "CME_MRY0T4", "value": 117.19},
    {"iso3": "NGA", "country": "Nigeria", "period": 2019, "indicator": "CME_MRY0T4", "value": 117.37},
    {"iso3": "NGA", "country": "Nigeria", "period": 2020, "indicator": "CME_MRY0T4", "value": 117.42},
    {"iso3": "NGA", "country": "Nigeria", "period": 2021, "indicator": "CME_MRY0T4", "value": 117.56},
    {"iso3": "NGA", "country": "Nigeria", "period": 2022, "indicator": "CME_MRY0T4", "value": 117.46},
    {"iso3": "NGA", "country": "Nigeria", "period": 2023, "indicator": "CME_MRY0T4", "value": 116.82}
  ]
}

Key insights an AI assistant would extract from this:

  • Brazil: 14.4 per 1,000 — steadily declining, on track for SDG 3.2 target (≤25)

  • India: 28.0 per 1,000 — rapid improvement (37→28 in 5 years), recently crossed SDG target

  • Nigeria: 117 per 1,000 — essentially flat, 4.7× the SDG target, highest burden

Step 5: Get code template to continue in a script

>>> get_api_reference("r", "unicefData")
{
  "language": "r",
  "install": "install.packages(\"unicefdata\")",
  "import": "library(unicefdata)",
  "function": "unicefData",
  "signature": "unicefData(\n    indicator = NULL,        # character — indicator code(s)\n    countries = NULL,         # character vector — ISO3 codes, NULL = all\n    year = NULL,              # numeric, character (\"2015:2023\"), or vector\n    sex = \"_T\",               # character — \"_T\", \"M\", \"F\"\n    totals = FALSE,           # logical — only return aggregate totals\n    tidy = TRUE,              # logical — standardize column names\n    country_names = TRUE,     # logical — add country name column\n    format = \"long\",          # character — \"long\", \"wide\", \"wide_indicators\"\n    latest = FALSE,           # logical — most recent value per country\n    circa = FALSE,            # logical — closest available year\n    add_metadata = NULL,      # character vector — e.g. c('region', 'income_group')\n    dropna = FALSE,           # logical — drop rows with missing values\n    simplify = FALSE,         # logical — minimal columns\n    mrv = NULL,               # integer — most recent N values per country\n    raw = FALSE,              # logical — all disaggregations, no filtering\n)",
  "returns": "tibble with columns: indicator_code, iso3, country, period, value, sex, age, wealth_quintile, residence, ...",
  "examples": [
    {"description": "Under-5 mortality for Brazil, India, Nigeria (2015–2023)", "code": "df <- unicefData(\"CME_MRY0T4\", countries = c(\"BRA\", \"IND\", \"NGA\"), year = \"2015:2023\")"},
    {"description": "Latest stunting data for all countries", "code": "df <- unicefData(\"NT_ANT_HAZ_NE2\", latest = TRUE)"},
    {"description": "Wide format with region metadata", "code": "df <- unicefData(\"CME_MRY0T4\", format = \"wide\", add_metadata = c(\"region\", \"income_group\"))"}
  ]
}

This lets the AI generate correct R/Python/Stata code using the exact parameter names and syntax — no guessing from training data.

get_data parameters

Parameter

Type

Default

Description

indicator

str

required

Indicator code

countries

list[str]

required

ISO3 codes (max 30)

start_year

int

None

Start of year range

end_year

int

None

End of year range

sex

str

"_T"

"_T" (total), "M" (male), "F" (female)

wealth_quintile

str

None

"Q1"–"Q5", "B20", "B40", "T20"

residence

str

None

"U" (urban), "R" (rural), "_T" (total)

format

str

"compact"

"compact" (5 cols) or "full" (all cols)

limit

int

200

Max rows (1–500)

Response features

  • summary: Value range (min/max/mean), year range, country count

  • disaggregations_in_data: Which dimensions have non-trivial variation

  • total_rows_available vs rows_returned: Pagination metadata

  • tip: Contextual guidance for next steps or narrowing results

Prompts

compare_indicators

Pre-built analysis workflow: fetches indicator metadata and data, then produces a structured comparison.

compare_indicators(indicator="CME_MRY0T4", countries="BRA,IND,NGA", start_year="2015", end_year="2023")

write_unicefdata_code

Generate runnable Python, R, or Stata code using the unicefdata package. The AI will call get_api_reference() to get the exact function signatures, then write code matching the user's task.

write_unicefdata_code(
    task="Compare under-5 mortality for Brazil and India, 2015-2023, then plot the trends",
    language="r"
)

This bridges the gap between conversational exploration (via MCP tools) and reproducible analysis scripts (via unicefdata packages).

Benchmark Results

We benchmarked the MCP against a bare LLM (Claude Sonnet 4, no tools) using the EQA metric from Azevedo (2025). 300 queries across 10 indicators, 20 countries, 2 prompt types, and 2 hallucination test categories.

Headline numbers (300-query benchmark, v0.5.x)

Metric

LLM alone

LLM + MCP

Improvement

EQA ("latest" prompt)

0.172

0.984

5.7×

EQA ("direct" prompt)

0.121

0.995

8.2×

Indicators at EQA >= 0.95

0/10

10/10

T1 hallucination (gap years)

9%

7%

-2pp

T2 hallucination (never existed)

11%

37% raw / ~10% corrected

See analysis

Cost per query

$0.003

$0.018

v0.7.1 same-day clean reproduction (n=500, 2026-05-08)

After v0.7.0 shipped the indicator-name resolver, we re-ran a 500-query subset (100 POSITIVE + 200 T1 + 200 T2) on the per-wave checkpoint architecture (PR #53), with the v0.6.4 baseline run same-day to control for upstream-model snapshot drift:

Metric

LLM alone

LLM + MCP (v0.7.1)

Δ

POS EQA mean

0.121

0.897

+77.6 pp (~7×)

T1 + T2 hallucination (combined)

2.0%

13.0%

+11.0 pp

Wall-clock (parallel runs)

3.8 h (v0.6.4)

9.2 h (v0.7.1)

+5.4 h

A-side EQA was within 0.3 pp across the two runs, confirming the same-day discipline worked: the B-side delta is real, not snapshot drift. The v0.7.1 reproduction confirms the original 6.7×/8.2× accuracy headline at 7×, and shows that the v0.4.0 safety layer + v0.7.0 indicator resolver brought T2 fabrication from 37% (v0.3.0) down to 13% — but the residual ~11 pp gap relative to the no-tools baseline appears structural, matching what the broader tool-augmented LLM and RAG literature documents (see Limitations).

EQA decomposition (baseline_latest prompt)

Component

LLM alone

LLM + MCP

Gain

ER (extraction rate)

0.50

1.00

+0.50

YA (year accuracy)

0.24

0.99

+0.75

VA (value accuracy)

0.37

1.00

+0.63

EQA = ER × YA × VA

0.147

0.990

+0.843

Key findings

  1. All 10 indicators at EQA >= 0.95 with MCP, replicated across 40 countries (R1 + R2 with zero overlap). 7 of 10 achieve perfect EQA = 1.000.

  2. Year accuracy is the bare LLM's biggest weakness (YA = 0.24). It cites 2021-2022 as "latest" when IGME 2024 estimates exist. The MCP queries the API and returns the actual latest year.

  3. The direct prompt shows larger MCP gain (+0.722 vs +0.613) because it eliminates YA and isolates pure retrieval accuracy.

  4. T2 hallucination (~37%) is inflated by ground truth misclassification: the SDMX API has IGME mortality data for micro-states that the ground truth pipeline missed. After correction: MCP ~10%, LLM alone ~5%. The remaining hallucination is driven by the confidence effect — Claude overrides tool errors when it has strong domain priors.

  5. The confidence effect: When the MCP tool returns "no data" but the LLM has strong domain priors (e.g., child mortality for well-known countries), it overrides the tool and fabricates anyway. This is a fundamental LLM behavior, not MCP-specific.

3-way comparison (vs sdmx-mcp)

Metric

LLM alone

unicefstats-mcp

sdmx-mcp

EQA (all positive)

0.147

0.990

0.074

T1 hallucination

9%

7%

0%

T2 hallucination

11%

37%

0%

Cost (300 queries)

$0.89

$5.47

$26.20

Avg latency

5s

9.8s

60s

sdmx-mcp's raw SDMX-JSON output is hard for LLMs to parse (VA = 0.11), but its anti-hallucination guardrails are highly effective (0% fabrication). See Relationship to sdmx-mcp for details.

Full analysis, per-indicator decomposition, and methodology: examples/RESULTS.md

Benchmark data (parquet with full LLM responses): examples/results/

Benchmark design rationale: examples/DESIGN_ISSUES.md

Reproducing the benchmark

# Build ground truth from UNICEF SDMX API
python examples/00_build_ground_truth.py

# Run 200-query benchmark (requires ANTHROPIC_API_KEY, ~$6)
python examples/benchmark_eqa.py

# Add 100 direct-prompt queries to existing run (~$3)
python examples/01_run_direct_supplement.py

Citation

This benchmark uses the EQA metric from:

Azevedo, J.P. (2025). "AI Reliability for Official Statistics: Benchmarking Large Language Models with the UNICEF Data Warehouse." UNICEF Chief Statistician Office. github.com/jpazvd/unicef-sdg-llm-benchmark-dev

Deployment

Local (stdio)

unicefstats-mcp

Remote (SSE)

unicefstats-mcp --transport sse --port 8000

Docker

docker build -t unicefstats-mcp .
docker run -p 8000:8000 unicefstats-mcp

Development

pip install -e ".[dev]"
pytest tests/ -v
ruff check src/ tests/
mypy src/unicefstats_mcp/

Contributing

Contributions are welcome.

Ways to contribute

  • Bug reports: Open an issue with steps to reproduce

  • Feature requests: Suggest new tools, indicators, or output formats via issues

  • Code: Fork, branch, submit a PR — see development setup below

  • Benchmark: Run the EQA benchmark on different models and share results

  • Documentation: Improve examples, fix typos, add use cases

Development setup

git clone https://github.com/jpazvd/unicefstats-mcp.git
cd unicefstats-mcp
pip install -e ".[dev,benchmark]"
pytest tests/ -v
ruff check src/ tests/
mypy src/unicefstats_mcp/

Pull request guidelines

  1. One concern per PR — keep changes focused and reviewable

  2. Include tests for new tools or bug fixes

  3. Run the linter (ruff check) and type checker (mypy) before submitting

  4. Update the README if you change tool signatures or add new features

  5. Do not commit API keys or benchmark result parquets larger than 500KB

Priority areas

See the audit findings for known issues. High-impact areas:

  • MNCH dataflow bug: MNCH_CSEC and MNCH_BIRTH18 return 0 EQA due to a dataflow resolution issue in the unicefdata package

  • T2 hallucination reduction: Further reduce fabrication when API returns no results (currently ~10%; see Limitations)

Limitations and Hallucination Risks

Data limitations

  • Coverage is uneven across indicators, countries, and years. Survey-based indicators (nutrition, education, protection) have 3-5 year gaps between data points by design.

  • Mortality indicators (CME_*) are modeled estimates from the UN Inter-agency Group (IGME), with uncertainty intervals not surfaced in compact output.

  • Not all indicators support all disaggregation dimensions; get_indicator_info() lists what's available per indicator.

  • get_data() caps at 500 rows per call.

Hallucination risks

Benchmark testing (600 queries pooled across two replication samples, 10 indicators, 45 countries) identified two patterns:

Type

Description

Rate (v0.5.x)

v0.7.1 same-day clean

Mitigation

T1 (gap-year)

LLM cites a year when data exists but for a different year

~7%

T1+T2 combined: 13% (n=400)

Server returns the actual year; LLM occasionally ignores it

T2 (forward-of-frontier)

LLM fabricates a value for a year beyond the data frontier

~36%

(T1+T2 combined above)

v0.5.0 ships an anti-extrapolation system prompt (unicef://system-prompt) and runtime context (unicef://context). Load these at session start.

T2 was historically the dominant risk — driven by a confidence effect where the LLM, having retrieved adjacent-year data, extrapolates forward. The v0.4.0 safety layer + v0.5.0 system prompt + v0.7.0 indicator resolver brought combined T1+T2 fabrication from 37% (v0.3.0) down to 13% (v0.7.1, same-day clean baseline) — a ~24 pp reduction.

The residual ~11 pp gap relative to the no-tools baseline (2%) appears structural, not a bug we have not yet fixed. This finding aligns with what the broader tool-augmented LLM and RAG literature has been documenting in parallel:

  • The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination (ICLR 2025) — shows the relationship is causal: as models get better at tool use, tool hallucination rises proportionally with capability.

  • Reducing Tool Hallucination via Reliability Alignment (Cao et al., 2024, arXiv:2412.04141) — formalises the failure as tool-selection errors (wrong tool, failed refusal) and tool-usage errors (fabricated parameters).

  • ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability (Sun et al., 2024) — shows mechanistically that an LLM's parametric knowledge can override retrieved context inside the residual stream.

The takeaway for users: server-side guardrails reduce the magnitude of tool-augmented hallucination; they do not, on current evidence, change the direction. Any production deployment should:

  1. Load the unicef://system-prompt and unicef://context resources at session start (handles forward-of-frontier fabrication).

  2. Treat MCP results as best-effort retrieval, not infallible truth — verify load-bearing values against the UNICEF Data Warehouse before citing.

  3. Prefer queries with explicit years ("under-five mortality in Nigeria in 2023") over open-ended ones ("the latest under-five mortality in Nigeria") — the former triggers refusal more reliably when data is absent.

Full benchmark methodology: examples/RESULTS.md

Provenance and Ownership

All data served by this MCP originates from the UNICEF Data Warehouse, accessed live via the public SDMX REST API. No observation data is stored or cached — every get_data() call results in a live SDMX request. The indicator and country registries are cached in memory at first access for performance; these are catalogue metadata, not statistical values. The MCP reformats output for LLM consumption but does not alter values.

All releases are published from GitHub Actions using PyPI Trusted Publishing (OIDC). No long-lived API tokens exist. Release provenance is verifiable via PyPI attestations.

For full details on data origin, ownership, distribution pipeline, and interpretation caveats, see PROVENANCE.md.

How to Verify This MCP

Check

How

Source

Repository is jpazvd/unicefstats-mcp on GitHub

Package

pip show unicefstats-mcp — verify Home-page points to the canonical repo

Version

python -c "import unicefstats_mcp; print(unicefstats_mcp.__version__)" — compare with server.json and PyPI

Provenance

PyPI attestations link each release to a GitHub Actions workflow

Runtime

Call get_server_metadata() — returns canonical name, version, publisher, and data source

License

MIT

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
1dRelease cycle
9Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jpazvd/unicefstats-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server