US Government Open Data MCP

Overview Schema Related Servers Score Discussions

naep_compare_groups

Analyze NAEP achievement gaps between demographic groups like race, gender, and poverty with statistical significance testing to identify educational disparities.

Instructions

Compare NAEP scores across demographic groups (race, gender, poverty) with significance testing. Shows achievement gaps between groups (e.g., White vs Black, Male vs Female, eligible vs not eligible for free lunch).

Input Schema

TableJSON Schema

Name	Required	Description
`subject`	Yes	Subject: 'reading', 'math', 'science', 'writing', 'civics', 'history', 'geography', 'economics', 'tel', 'music'. Aliases accepted.
`grade`	Yes	Grade: 4, 8, or 12. Math: 4,8 only. Economics/TEL/Music: 8 or 12 only.
`variable`	Yes	'SDRACE' (race gap), 'GENDER' (gender gap), 'SLUNCH3' (poverty gap), 'IEP' (disability gap), 'LEP' (ELL gap)
`jurisdiction`	No	'NP' (default), or state codes
`year`	No	Year: '2022'. Default: most recent

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. While it mentions 'significance testing' (a valuable behavioral detail), it lacks other important information: what format the comparison results take (tabular, visual, statistical), whether there are rate limits or authentication requirements, how missing data is handled, or what happens with invalid parameter combinations. For a statistical comparison tool with no annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each earn their place. The first sentence establishes the core purpose and key behavioral feature (significance testing). The second sentence provides concrete examples of the gap analyses. There's zero wasted language, and the most important information appears first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (statistical comparisons with 5 parameters), no annotations, and no output schema, the description is adequate but has clear gaps. It covers the purpose and provides usage examples well, but doesn't describe the output format, error conditions, or behavioral constraints. For a tool performing significance testing without output schema documentation, more information about result interpretation would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly with descriptions and constraints. The description adds marginal value by mentioning the demographic groups (race, gender, poverty) that map to the 'variable' parameter, and providing concrete examples (White vs Black, Male vs Female). However, it doesn't add syntax, format, or usage details beyond what the schema provides, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Compare NAEP scores across demographic groups (race, gender, poverty) with significance testing.' It specifies the exact action (compare scores), the resource (NAEP scores), and the scope (demographic groups with significance testing). It also distinguishes itself from siblings by focusing on group comparisons rather than state comparisons (naep_compare_states) or year comparisons (naep_compare_years).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for analyzing achievement gaps between demographic groups (e.g., White vs Black, Male vs Female). It implicitly suggests alternatives by mentioning specific gap types (race, gender, poverty) that align with the 'variable' parameter options. However, it doesn't explicitly state when NOT to use it or name specific alternative tools for different comparison types.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security
Open Source Has a Bot Problem
By punkpeye on March 19, 2026.
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lzinga/us-government-open-data-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server