ibge-microdata-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ibge-microdata-mcplist files for the POF survey"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
IBGE Microdata MCP Server
Local-first MCP server for discovering, downloading, unpacking, converting, querying, and analyzing official public IBGE microdata.
This project does not host IBGE datasets. It uses IBGE download servers as the source of truth, downloads only files explicitly requested by the user, mirrors them into a local cache, and runs analysis on local files.
Why Local-First
IBGE microdata files are public, but many are large enough that an MCP server should not return them directly in chat responses. The practical workflow is:
discover official files -> inspect size -> download selected archive -> inspect/extract entries -> convert selected variables to Parquet -> query with DuckDBIn plain terms:
DuckDB is a local analytical SQL engine. It can query large local files without running a database server.
Parquet is a compressed columnar file format. Convert fixed-width TXT microdata once, then query only the columns you need.
Related MCP server: recife-open-data-mcp
Core Generic Tools
Tool | Purpose |
| List survey families with convenience support. |
| List known public archive files for supported survey families. |
| List any official |
| Bounded crawl of official IBGE directories to find microdata, data, documentation, and layout files. |
| Read file size, type, update timestamp, and validators with HTTP HEAD. |
| Download or reuse one official IBGE file in a local cache. |
| List files already downloaded into a local cache with URLs, paths, sizes, and timestamps. |
| Preview or delete selected cached files using safe filters. |
| List files inside a local ZIP archive without extracting all of it. |
| Extract one selected ZIP entry to a local path. |
| Parse a local IBGE fixed-width input layout and search variables. |
| Convert a fixed-width TXT file plus official layout into a local Parquet file. |
| Convert one fixed-width TXT entry inside a ZIP directly into local Parquet. |
| Run bounded read-only DuckDB SQL over local Parquet files exposed as |
| Run bounded read-only DuckDB SQL over multiple named Parquet views for joins. |
| Calculate weighted totals, means, group shares, and top-bracket shares over local Parquet views. |
| Inspect schemas, row counts, and sample rows for named Parquet views. |
| Profile local Parquet views with row counts, null counts, numeric ranges, frequent values, and samples. |
The generic path is discovery, caching, layout inspection, Parquet conversion, profiling, and DuckDB querying. These tools are the main public surface of the server.
Optional Survey-Specific Helpers
These helpers are layered on top of the same local-first workflow. They are useful shortcuts for known IBGE formats, but they are not required for the generic workflow.
Tool | Purpose |
| Parse a POF Excel dictionary and map record sheets to data ZIP entries. |
| Convert one POF record from a Dados ZIP to Parquet using the POF dictionary. |
| PNAD Contínua convenience summary over an extracted fixed-width TXT file. |
| PNAD Contínua convenience summary directly over a TXT entry inside a ZIP. |
Install
Clone the GitHub repository:
git clone https://github.com/emmanueltsallis/ibge-microdata-mcp.git
cd ibge-microdata-mcpInstall dependencies and build the local MCP server:
pnpm install
pnpm run buildRun
node dist/index.jsExample MCP client config:
{
"mcpServers": {
"ibge-microdata": {
"command": "node",
"args": ["/absolute/path/to/ibge-microdata-mcp/dist/index.js"]
}
}
}For a shorter generic walkthrough, see examples/generic-workflow.md.
Generic Workflow
Find public files from a known survey family or an official directory:
ibge_microdata_list_surveys({})ibge_microdata_list_files({
"survey": "pof"
})ibge_microdata_discover({
"rootUrl": "https://ftp.ibge.gov.br/",
"maxDepth": 3,
"maxDirectories": 50
})Inspect file metadata before downloading:
ibge_microdata_file_info({
"url": "https://ftp.ibge.gov.br/path/to/public/archive.zip"
})Download to a local cache:
ibge_microdata_download_file({
"url": "https://ftp.ibge.gov.br/path/to/public/archive.zip",
"cacheRoot": "/Users/you/.cache/ibge-microdata-mcp"
})The downloader mirrors the official ftp.ibge.gov.br path under cacheRoot. On repeated calls, it checks IBGE content-length metadata first and returns a cache hit when the existing local file has the expected byte size.
List the cache later if you need to rediscover local paths:
ibge_microdata_list_cache({
"cacheRoot": "/Users/you/.cache/ibge-microdata-mcp",
"limit": 50,
"offset": 0
})Preview cache cleanup when storage grows:
ibge_microdata_cleanup_cache({
"cacheRoot": "/Users/you/.cache/ibge-microdata-mcp",
"dryRun": true,
"olderThanDays": 30,
"minBytes": 100000000
})The cleanup tool defaults to dryRun: true, requires at least one filter, and only considers files under cacheRoot/ftp.ibge.gov.br. Set dryRun: false only after reviewing the preview.
Inspect archive contents:
ibge_microdata_zip_entries({
"zipPath": "/Users/you/.cache/ibge-microdata-mcp/ftp.ibge.gov.br/path/to/public/archive.zip"
})Inspect a fixed-width layout and choose variables:
ibge_microdata_inspect_layout({
"layoutPath": "/path/to/official-input-layout.txt",
"search": "weight",
"limit": 50
})Convert selected variables to Parquet:
ibge_microdata_fixed_width_zip_to_parquet({
"layoutPath": "/path/to/official-input-layout.txt",
"zipPath": "/Users/you/.cache/ibge-microdata-mcp/ftp.ibge.gov.br/path/to/public/archive.zip",
"entryName": "MICRODATA.txt",
"outputPath": "/Users/you/.cache/ibge-microdata-mcp/converted/sample.parquet",
"selectedVariables": ["record_id", "region", "sample_weight", "target_value"]
})Profile the Parquet file before writing custom SQL:
ibge_microdata_profile_parquet_views({
"views": [
{
"name": "microdata",
"parquetPaths": ["/Users/you/.cache/ibge-microdata-mcp/converted/sample.parquet"]
}
],
"columns": ["region", "sample_weight", "target_value"],
"topK": 10,
"sampleRows": 3
})If columns is omitted, the tool profiles the first 25 columns by default. This keeps wide microdata files manageable while still giving enough information to choose variables and write queries.
Query the Parquet file with DuckDB:
ibge_microdata_query_parquet({
"parquetPaths": ["/Users/you/.cache/ibge-microdata-mcp/converted/sample.parquet"],
"sql": "select region, sum(sample_weight * target_value) / sum(sample_weight) as weighted_mean from microdata group by region order by region",
"maxRows": 100
})The query tools accept only SELECT or WITH queries, reject semicolons and write-oriented keywords, and cap returned rows.
Weighted Distributions
Use ibge_microdata_weighted_distribution when a Parquet file contains one row per analytical unit, a numeric value column, and a numeric survey/sample weight column:
ibge_microdata_weighted_distribution({
"views": [
{
"name": "microdata",
"parquetPaths": ["/Users/you/.cache/ibge-microdata-mcp/converted/sample.parquet"]
}
],
"unitSql": "select region, target_value as value, sample_weight as weight from microdata",
"valueColumn": "value",
"weightColumn": "weight",
"groupColumn": "region",
"topPercents": [0.01, 0.05, 0.1]
})The tool ranks units by the value column, applies weights, reports total weight, total value, weighted mean, optional group shares, and top-bracket shares. If a top bracket cuts through tied values at the cutoff, the tied bucket is allocated proportionally.
Relational Records
Some surveys publish multiple record files. Convert each record to Parquet, inspect the resulting schemas, then join named views:
ibge_microdata_describe_parquet_views({
"views": [
{
"name": "record_a",
"parquetPaths": ["/Users/you/.cache/ibge-microdata-mcp/converted/record_a.parquet"]
},
{
"name": "record_b",
"parquetPaths": ["/Users/you/.cache/ibge-microdata-mcp/converted/record_b.parquet"]
}
],
"includeRowCounts": true,
"sampleRows": 3
})ibge_microdata_query_parquet_views({
"views": [
{
"name": "record_a",
"parquetPaths": ["/Users/you/.cache/ibge-microdata-mcp/converted/record_a.parquet"]
},
{
"name": "record_b",
"parquetPaths": ["/Users/you/.cache/ibge-microdata-mcp/converted/record_b.parquet"]
}
],
"sql": "select a.region, count(*) as rows from record_a a join record_b b using (record_id) group by a.region order by a.region",
"maxRows": 100
})POF Dictionaries
POF editions use Excel dictionary workbooks. Use the manifest tool to map dictionary sheets to TXT entries before converting records:
ibge_microdata_pof_manifest({
"dictionaryPath": "/path/to/dictionary.xls",
"dataZipPath": "/Users/you/.cache/ibge-microdata-mcp/ftp.ibge.gov.br/path/to/Dados.zip",
"search": "weight",
"variableLimit": 20
})ibge_microdata_pof_zip_record_to_parquet({
"dictionaryPath": "/path/to/dictionary.xls",
"zipPath": "/Users/you/.cache/ibge-microdata-mcp/ftp.ibge.gov.br/path/to/Dados.zip",
"recordName": "Domicílio",
"outputPath": "/Users/you/.cache/ibge-microdata-mcp/converted/pof_record.parquet",
"selectedVariables": ["UF", "ESTRATO_POF", "TIPO_SITUACAO_REG"]
})The POF converter applies implied decimal scaling from the dictionary and writes DuckDB-queryable Parquet files.
Tests
Offline unit tests:
pnpm testLive smoke tests against official IBGE endpoints:
RUN_IBGE_SMOKE=1 pnpm test -- tests/smoke.test.tsSmoke tests list official directories, read HEAD metadata, and download the smaller POF documentation ZIP to verify dictionary parsing. They do not download large microdata data ZIPs.
Current Limits
This is a local-first MCP server, not a hosted warehouse of all IBGE microdata.
Discovery is deliberately bounded; broad root crawls should use explicit
maxDepthandmaxDirectoriesvalues to avoid excessive requests.Generic fixed-width conversion, Parquet profiling/querying, weighted distribution summaries, and POF dictionary conversion are implemented.
Additional survey-specific harmonized recipes can be added as optional layers without changing the generic workflow.
License
MIT. See LICENSE.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/emmanueltsallis/ibge-microdata-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server