get_data
Retrieve curated ATO and ACNC data on personal tax, company tax, charity finances, and super contributions. Filter by location, time period, and measure to get targeted observations.
Instructions
Query a curated ATO/ACNC dataset and return observations.
Examples: # Median taxable income in postcode 2000 (Sydney CBD), 2022-23 resp = await get_data( "IND_POSTCODE_MEDIAN", filters={"state": "nsw", "postcode": "2000"}, measures="median_taxable_income_2022_23", )
# All registered charities in NSW with size = "large"
resp = await get_data(
"ACNC_REGISTER",
filters={"state": "NSW", "charity_size": "Large"},
measures=["total_gross_income", "total_employees"],
)
# 500 ACNC charity financial records (huge dataset — cap to fit context)
resp = await get_data(
"ACNC_AIS_FINANCIALS",
filters={"state": "NSW"},
limit=500,
)
# 2023-24 corporate tax payable for entities with total income > $1B
resp = await get_data("CORP_TRANSPARENCY", filters={"income_year": "2023-24"})Returns:
DataResponse with records (or csv), unit, period bounds, row_count,
source URL, and CC-BY attribution. truncated_at is set when the
underlying slice was larger than limit.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Optional cap on number of records returned. Useful for register-shaped datasets where a slice can still be very large (ACNC_AIS_FINANCIALS = ~50k charities × 16 measures = 800k+ records; ACNC_REGISTER = ~65k charities). Without a cap the response can blow an agent's context window. Truncated responses set DataResponse.truncated_at to the original row count. Default None = no cap (subject to the portfolio-wide 100k hard ceiling for pathological cases). | |
| format | No | Response shape. 'records' (default): flat list of observations. 'series': grouped by measure. 'csv': pandas CSV string in `csv` field. | records |
| filters | No | Dimension filters. Keys are plain-English aliases from the dataset's describe_dataset response. Values are matched against the source data; pass a list to OR across values. Examples: {'state': 'nsw'}, {'postcode': '2000'}, {'industry_broad': ['A', 'B']}. | |
| measures | No | Which measure(s) to return. Plain-English keys from describe_dataset. Omit to return all measures. | |
| dataset_id | Yes | Curated dataset ID. Use search_datasets() / list_curated(). | |
| end_period | No | Inclusive end period. Same format as start_period. | |
| start_period | No | Inclusive start period for transposed time-series datasets (GST_MONTHLY etc). Ignored for wide single-year tables. Format: 'YYYY' or 'YYYY-MM' or ATO FY 'YYYY-YY'. Bare int years like 2020 are coerced to '2020' automatically. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| csv | No | ||
| unit | No | ||
| query | No | ||
| stale | No | ||
| period | No | ||
| source | No | Australian Taxation Office | |
| ato_url | Yes | Click-through URL for this dataset's source page. ato-mcp legacy name — prefer source_url (canonical) for new code. Both fields are populated identically. | |
| records | No | ||
| row_count | No | ||
| dataset_id | Yes | ||
| source_url | Yes | Canonical click-through URL. Same value as ato_url; both populated for backward compat. | |
| attribution | No | Data sourced from the Australian Taxation Office (and, for charity data, the Australian Charities and Not-for-profits Commission) via data.gov.au. Licensed under Creative Commons Attribution 3.0 Australia (CC BY 3.0 AU). https://creativecommons.org/licenses/by/3.0/au/ | |
| dataset_name | Yes | ||
| retrieved_at | Yes | ||
| stale_reason | No | ||
| truncated_at | No | ||
| server_version | No |