get_data
Retrieve WGEA dataset observations by dataset ID and filters such as employer name, reporting year, and period, returning records or CSV.
Instructions
Query a curated WGEA dataset and return observations.
Examples: # Gender breakdown at Commonwealth Bank resp = await get_data( "WORKFORCE_COMPOSITION", filters={"employer_name": "Commonwealth Bank"}, )
# Promotions to manager by gender at Westpac in 2024-25
resp = await get_data(
"WORKFORCE_MANAGEMENT",
filters={"employer_name": "Westpac", "movement_type": "Promotions",
"manager_category": "Managers"},
)
# Which employers in mining set gender targets?
resp = await get_data(
"GENDER_EQUALITY_ACTIONS",
filters={"anzsic_division": "Mining",
"section": "Gender Pay Gap",
"response": "Yes"},
)
# Sexual harassment policy responses across financial services
resp = await get_data(
"HARM_PREVENTION",
filters={"anzsic_division": "Financial and Insurance Services",
"subsection": "Sexual Harassment"},
)Returns: DataResponse with records (or csv), unit, reporting_year, row_count, source URL, the actual download_url used, "did you mean?" fuzzy hints if the employer-name filter didn't match exactly, and CC-BY 3.0 AU attribution.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset_id | Yes | Curated dataset ID. Use search_datasets() / list_curated(). | |
| filters | No | Dimension filters. Keys are plain-English aliases from the dataset's describe_dataset response. Values are matched against the source data; pass a list to OR across values. Permissive dimensions (e.g. employer_name, question_text) accept any string and support fuzzy matching — try {'employer_name': 'CBA'} or {'employer_name': 'commonwealth*'} for wildcard substring search. | |
| start_period | No | Inclusive start reporting year. Format: 'YYYY-YY' (e.g. '2023-24') or 'YYYY' (matched against WGEA's reporting_year column). Bare int years like 2023 are coerced to '2023' automatically. | |
| end_period | No | Inclusive end reporting year. Same format as start_period. | |
| format | No | Response shape. 'records' (default): flat list of observations. 'series': grouped by measure. 'csv': pandas CSV string in `csv` field. | records |
| max_rows | No | Cap on returned rows after filtering. Default 2000. Max 10000. Tighten filters to narrow further. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset_id | Yes | ||
| dataset_name | Yes | ||
| query | No | ||
| reporting_year | No | ||
| period | No | Canonical period bounds {start, end} for cross-sister consumers. Populated alongside the wgea-specific reporting_year. For a single reporting year both bounds match; for multi-year spans they bracket the range. | |
| unit | No | ||
| row_count | No | ||
| records | No | ||
| csv | No | ||
| source | No | Workplace Gender Equality Agency | |
| attribution | No | Source: Workplace Gender Equality Agency. Licensed under Creative Commons Attribution 3.0 Australia (https://creativecommons.org/licenses/by/3.0/au/). Original dataset: https://data.gov.au/data/dataset/wgea-dataset | |
| retrieved_at | Yes | ||
| source_url | Yes | ||
| download_url | No | ||
| did_you_mean | No | ||
| stale | No | ||
| stale_reason | No | ||
| truncated_at | No | ||
| server_version | No |