Skip to main content
Glama

get_data

Retrieve WGEA dataset observations by dataset ID and filters such as employer name, reporting year, and period, returning records or CSV.

Instructions

Query a curated WGEA dataset and return observations.

Examples: # Gender breakdown at Commonwealth Bank resp = await get_data( "WORKFORCE_COMPOSITION", filters={"employer_name": "Commonwealth Bank"}, )

# Promotions to manager by gender at Westpac in 2024-25
resp = await get_data(
    "WORKFORCE_MANAGEMENT",
    filters={"employer_name": "Westpac", "movement_type": "Promotions",
             "manager_category": "Managers"},
)

# Which employers in mining set gender targets?
resp = await get_data(
    "GENDER_EQUALITY_ACTIONS",
    filters={"anzsic_division": "Mining",
             "section": "Gender Pay Gap",
             "response": "Yes"},
)

# Sexual harassment policy responses across financial services
resp = await get_data(
    "HARM_PREVENTION",
    filters={"anzsic_division": "Financial and Insurance Services",
             "subsection": "Sexual Harassment"},
)

Returns: DataResponse with records (or csv), unit, reporting_year, row_count, source URL, the actual download_url used, "did you mean?" fuzzy hints if the employer-name filter didn't match exactly, and CC-BY 3.0 AU attribution.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataset_idYesCurated dataset ID. Use search_datasets() / list_curated().
filtersNoDimension filters. Keys are plain-English aliases from the dataset's describe_dataset response. Values are matched against the source data; pass a list to OR across values. Permissive dimensions (e.g. employer_name, question_text) accept any string and support fuzzy matching — try {'employer_name': 'CBA'} or {'employer_name': 'commonwealth*'} for wildcard substring search.
start_periodNoInclusive start reporting year. Format: 'YYYY-YY' (e.g. '2023-24') or 'YYYY' (matched against WGEA's reporting_year column). Bare int years like 2023 are coerced to '2023' automatically.
end_periodNoInclusive end reporting year. Same format as start_period.
formatNoResponse shape. 'records' (default): flat list of observations. 'series': grouped by measure. 'csv': pandas CSV string in `csv` field.records
max_rowsNoCap on returned rows after filtering. Default 2000. Max 10000. Tighten filters to narrow further.

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataset_idYes
dataset_nameYes
queryNo
reporting_yearNo
periodNoCanonical period bounds {start, end} for cross-sister consumers. Populated alongside the wgea-specific reporting_year. For a single reporting year both bounds match; for multi-year spans they bracket the range.
unitNo
row_countNo
recordsNo
csvNo
sourceNoWorkplace Gender Equality Agency
attributionNoSource: Workplace Gender Equality Agency. Licensed under Creative Commons Attribution 3.0 Australia (https://creativecommons.org/licenses/by/3.0/au/). Original dataset: https://data.gov.au/data/dataset/wgea-dataset
retrieved_atYes
source_urlYes
download_urlNo
did_you_meanNo
staleNo
stale_reasonNo
truncated_atNo
server_versionNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description fully bears the transparency burden. It clearly explains the read-only nature (querying), return structure (DataResponse with records, csv, metadata, fuzzy hints, attribution), and details on filtering behavior (permissive dimensions, fuzzy matching, wildcard support, OR lists). No destructive or side effects are implied.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with a clear purpose, followed by examples and return details. While it is somewhat long (including 4 multi-line examples), every section adds value. It could be slightly more concise, but the examples greatly aid understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema exists), the description is complete. It covers all parameter usage, return fields, constraints (max_rows, period format), and even provides fuzzy matching hints. The existence of an output schema means the return description is sufficient without detailing every field.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. However, the description adds significant value beyond schema descriptions: it provides concrete examples for dataset_id and filters, explains fuzzy matching and wildcard syntax in filters, details period format with coercion, and lists format options. This goes well beyond what the schema alone provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Query a curated WGEA dataset and return observations') and provides multiple examples showing usage. It distinguishes from siblings like list_curated (listing datasets) and describe_dataset (describing schema) by focusing on data retrieval with filtering and formatting options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives (e.g., search_datasets for finding datasets, top_n for simple top records). Usage is implied through examples, but no direct 'use for' or 'use when' statements are present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Bigred97/wgea-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server