Apple Health MCP
Server Quality Checklist
This repository includes a README.md file.
This repository includes a LICENSE file.
Latest release: v0.1.0
No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.
Tip: use the "Try in Browser" feature on the server page to seed initial usage.
Add a glama.json file to provide metadata about your server.
- This server provides 3 tools. View schema
No known security issues or vulnerabilities reported.
Are you the author?
Add related servers to improve discoverability.
Tool Scores
- Behavior2/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full disclosure burden. 'Get' implies read-only safety, but description omits what data structure returns (list of workouts, duration, types, calories), behavior when no workouts exist, or whether this includes all activity types. No output schema compounds this gap.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise (6 words) and front-loaded with verb-first structure. No redundant phrases. However, brevity arguably underserves tool's behavioral complexity given lack of annotations and output schema.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Acceptable for a single-parameter tool with 100% schema coverage, but gaps remain. Missing: return value description (critical without output schema), explicit sibling differentiation, and confirmation of read-only behavior. Minimum viable but not exemplary.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% (date parameter fully documented with format and default). Description references 'a date' acknowledging the parameter exists, but adds no semantic detail beyond schema's 'YYYY-MM-DD, defaults to today'. Baseline score appropriate given schema completeness.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb 'Get' and resource 'workout sessions'. Scope 'for a date' implies filtering by specific date. Implicitly distinguishes from siblings by focusing on individual sessions vs aggregates (daily/trends), though explicit differentiation would strengthen this.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this versus apple_health_daily or apple_health_trends. Agent must infer that 'workouts' retrieves individual exercise sessions while siblings handle aggregated metrics. No mention of prerequisites or date range limitations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full disclosure burden. It enumerates the data categories returned, which helps understand scope. However, it lacks behavioral details such as error handling for future dates, privacy/auth requirements, data availability when Health permissions are denied, or whether values are summed/averaged.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
Single efficient sentence front-loaded with the action. The colon-separated list of metrics is compact and readable. No extraneous text, though slightly dense.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Without output schema, the description adequately compensates by listing the seven metric categories returned. However, for a health data tool with no annotations, it should clarify behavior regarding missing data (null values vs zeros) and permission requirements to be fully complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% coverage (date field fully documented with format and default), establishing baseline 3. Description does not add semantic context about the date parameter (e.g., timezone handling, how 'today' is determined), but none is required given complete schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
Uses specific verb 'Get' and resource 'Apple Health daily summary'. Lists specific metric categories (steps, energy, HR, HRV, sleep stages, body comp, workouts) clarifying scope. However, it does not explicitly differentiate from sibling 'apple_health_workouts' despite mentioning workouts in its data list, nor contrast with 'apple_health_trends'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use this tool versus the sibling 'trends' or 'workouts' tools. The inclusion of 'workouts' in the daily summary list without qualification creates potential ambiguity about whether this returns summary workout statistics or detailed records.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses what metrics are returned (steps, HR, HRV, sleep, weight) which is valuable behavioral context. However, it lacks information on data availability (what happens if no data exists?), whether this is read-only (implied but not stated), or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of a single efficient sentence with no wasted words. The parenthetical list of metrics is compact and informative. It could benefit from a second sentence addressing sibling differentiation, but as written it is appropriately sized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the low complexity (one optional parameter, no nested objects) and lack of output schema, the description adequately compensates by enumerating the specific health metrics returned. However, for a data-retrieval tool with no annotations, it should confirm the read-only nature and mention any data availability constraints.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% for the single 'days' parameter, establishing a baseline of 3. The description adds minimal meaning beyond the schema for the parameter itself, though the phrase 'date range' contextualizes the 'days' parameter appropriately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Get') and resource ('daily health metrics'), and scopes the output with specific examples (steps, HR, HRV, sleep, weight). It implicitly distinguishes from sibling 'apple_health_daily' by specifying 'date range' versus implied single-day scope, though it doesn't explicitly name siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Usage is implied by the tool name ('trends'), the parameter ('days' to look back), and the phrase 'date range,' suggesting multi-day analysis. However, there is no explicit guidance on when to select trends versus daily snapshots or workout-specific data.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
GitHub Badge
Glama performs regular codebase and documentation scans to:
- Confirm that the MCP server is working as expected.
- Confirm that there are no obvious security issues.
- Evaluate tool definition quality.
Our badge communicates server capabilities, safety, and installation instructions.
Card Badge
Copy to your README.md:
Score Badge
Copy to your README.md:
How to claim the server?
If you are the author of the server, you simply need to authenticate using GitHub.
However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.
{
"$schema": "https://glama.ai/mcp/schemas/server.json",
"maintainers": [
"your-github-username"
]
}Then, authenticate using GitHub.
Browse examples.
How to make a release?
A "release" on Glama is not the same as a GitHub release. To create a Glama release:
- Claim the server if you haven't already.
- Go to the Dockerfile admin page, configure the build spec, and click Deploy.
- Once the build test succeeds, click Make Release, enter a version, and publish.
This process allows Glama to run security checks on your server and enables users to deploy it.
How to add a LICENSE?
Please follow the instructions in the GitHub documentation.
Once GitHub recognizes the license, the system will automatically detect it within a few hours.
If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.
How to sync the server with GitHub?
Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.
To manually sync the server, click the "Sync Server" button in the MCP server admin interface.
How is the quality score calculated?
The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).
Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.
Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).
Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/daveremy/apple-health-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server