Apple Mail MCP Server

by patrickfreyer

Overview Schema Related Servers Score Discussions

Python

Local

Server Quality Checklist

Profile completionA complete profile improves this server's visibility in search results.

This repository includes a README.md file.
This repository includes a LICENSE file.
Latest release: v3.1.0
No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.
Tip: use the "Try in Browser" feature on the server page to seed initial usage.
Add a glama.json file to provide metadata about your server.
View server inspector
This server provides 24 tools. View schema
No known security issues or vulnerabilities reported.
Report a security issue
Are you the author?
Add related servers to improve discoverability.

Tool Scores

Behavior2/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full responsibility for behavioral disclosure. While it mentions the return value (confirmation message), it fails to disclose critical file-system behaviors: whether it overwrites existing files at `save_path`, what happens if the attachment is not found, or if multiple emails match the `subject_keyword`.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with a clear one-sentence summary followed by Args and Returns sections. It is appropriately sized without redundant text, though the Args/Returns headers consume space that could be considered slightly formal for a tool description.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description adequately covers the parameter semantics given poor schema coverage, and acknowledges the return value. However, for a destructive operation (writing to disk), it lacks necessary safety context regarding file overwriting and error conditions, leaving the agent under-informed about failure modes.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description effectively compensates by documenting all four parameters in the Args section. It adds crucial semantic meaning (e.g., explaining that `account` refers to configured account names like 'Gmail' or 'Work') that the schema titles alone do not provide.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action (save), resource (attachment), and destination (disk). However, it does not explicitly differentiate from sibling tools like `export_emails` (which might export whole emails) or `list_email_attachments` (which lists without saving), though 'to disk' implies persistence.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like `list_email_attachments` (to verify attachment names first) or `export_emails`. It omits prerequisites such as needing to know the exact attachment name or account beforehand.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full disclosure burden. It documents the return value ('Confirmation message...') and default mailbox value, but omits critical behavioral details: what happens if subject_keyword matches multiple emails, zero emails, or partial matches. No mention of whether this creates a sent item or modifies the original.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Uses structured Args/Returns format that efficiently conveys information without prose bloat. Every line provides specific constraints or examples. Slightly mechanical format but appropriate given the need to compensate for empty schema descriptions.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Documents all input parameters and acknowledges the output schema (confirmation message), satisfying basic requirements. However, for a complex search-and-forward operation with 7 parameters, it lacks edge case documentation and ambiguity resolution (e.g., multiple matches) that would make it complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Excellent compensation for 0% schema description coverage. The Args section documents all 7 parameters with precise semantics: account examples ('Gmail', 'Work'), subject_keyword purpose ('search for in email subjects'), format guidance for email fields ('comma-separated for multiple'), and optionality markers ('Optional').
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose3/5
Does the description clearly state what the tool does and how it differs from similar tools?
The opening sentence 'Forward an email to one or more recipients' states the core action but fails to mention the critical search mechanism (via subject_keyword) used to locate the email. It reads as if forwarding a specific known email, when actually it searches first. No differentiation from sibling tools like reply_to_email.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this versus compose_email or reply_to_email. No mention of prerequisites like knowing the exact subject keyword or handling multiple matches. The user must infer usage solely from the parameter list.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior2/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full disclosure burden. It only mentions the return value (account names), which is likely covered by the output schema. It omits permissions required, performance characteristics, or what 'available' entails (active vs. disabled accounts).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise with two efficient sentences. Information is front-loaded with the action statement first, followed by return value. No redundant or wasted text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a zero-parameter tool with an output schema (which absolves the description from detailing return values). However, it has a clear gap in failing to clarify the relationship between 'accounts' and 'mailboxes' given the sibling tool 'list_mailboxes' exists.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema contains zero parameters. Per scoring rules, 0 parameters earns a baseline score of 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
States specific verb (List) and resource (Mail accounts) clearly. However, it does not explicitly differentiate from sibling tool 'list_mailboxes', which could confuse agents about whether to query accounts or mailboxes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use this tool versus alternatives (e.g., 'list_mailboxes'), nor any prerequisites or conditions for invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully documents the return value structure ('names and sizes') and the default value for max_results, but omits critical operational details like which folders are searched, case sensitivity rules for the keyword, or behavior when no matches exist.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Uses a standard docstring format (summary + Args + Returns) that is appropriately sized and front-loaded. Every sentence conveys necessary information; the structure efficiently separates input specifications from output expectations without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (3 primitive parameters) and the existence of an output schema, the description provides sufficient context. It compensates adequately for the schema's lack of descriptions through its Args section, though it could benefit from noting edge cases or search scope limitations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Excellent compensation for 0% schema description coverage. The Args section documents all three parameters with clear semantics and provides helpful examples for the 'account' parameter ('Gmail', 'Work', 'Personal'). It also surfaces the default value (1) for max_results which isn't visible in the schema properties section provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists attachments filtered by subject keyword, distinguishing it from sibling tools like 'save_email_attachment' (which downloads) and 'search_emails' (which likely returns full messages). However, it doesn't explicitly clarify the difference from 'list_inbox_emails' which may overlap in functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus alternatives like 'search_emails' or 'save_email_attachment'. The agent must infer that this is for discovery/inspection of attachments without downloading them, based solely on the verb 'List'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses the safety cap (max_emails default), return value format (confirmation with location), and filesystem interaction. Missing critical behavioral details for a file-writing tool: overwrite behavior, disk space requirements, and idempotency guarantees.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Uses standard Google-style docstring format (Args/Returns) which is scannable and front-loaded. Purpose statement is immediate. Minor verbosity in headers is acceptable for clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Handles 7-parameter complexity well, particularly the conditional logic between scope types. Acknowledges output schema existence with brief return description. Could improve by noting error conditions (e.g., invalid directory) or file format specifics beyond extension names.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Excellent compensation for 0% schema description coverage. Documents all 7 parameters with examples (account), enum values (scope, format), defaults (mailbox, format, max_emails), and conditional requirements (subject_keyword dependency on scope).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
States clear action (export emails to files) and use case (backup/analysis). Distinguishes implicitly from siblings like 'search_emails' by emphasizing file export, though explicit contrast with 'save_email_attachment' is absent.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides use cases (backup/analysis) and explains the conditional dependency between 'scope' and 'subject_keyword'. However, lacks explicit guidance on when to choose this over 'search_emails' or 'save_email_attachment', and omits prerequisites like directory permissions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full disclosure burden. It effectively communicates cross-account aggregation behavior ('across all accounts') and the meta-behavioral trait of generating 'AI suggestions' that 'prompt the assistant to suggest relevant actions.' It also details the return structure. It omits explicit read-only/safety declarations, though 'Get' implies this.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized with three sentences and a structured Returns section. It is front-loaded with the primary purpose ('Get a comprehensive overview...'). The bullet points within the Returns description are slightly informal but remain readable and purposeful, with minimal redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the zero-parameter complexity and existence of an output schema, the description provides sufficient contextual value by explaining what the overview contains (unread counts, folders, AI suggestions) and its cross-account scope. It adequately prepares an agent to interpret results without needing to detail the output schema structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema contains zero parameters, which establishes a baseline score of 4 per the rubric. With no parameters requiring clarification, the description meets the expected standard without needing additional semantic content.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the specific action ('Get a comprehensive overview') and resource ('email inbox status across all accounts'). It distinguishes itself from siblings like `get_mailbox_unread_counts` by emphasizing the cross-account scope and unique 'AI suggestions for actions' feature, though it does not explicitly differentiate from `inbox_dashboard`.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implied usage context ('designed to give you a complete picture'), suggesting when to use the tool for comprehensive status checks. However, it lacks explicit 'when-not' guidance or direct comparisons to similar tools like `get_statistics` or `inbox_dashboard` that would help an agent select between them.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It establishes read-only nature via 'Get' and 'Returns' verbs, documents default values (30 days), and describes output format ('Formatted statistics report'). However, it omits safety confirmations, rate limits, or whether statistics are real-time vs cached.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured docstring format with clear 'Args' and 'Returns' sections. Information is front-loaded with the purpose statement. Minor filler ('comprehensive') but overall efficient use of space for a 5-parameter tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for the complexity: documents all parameters (compensating for empty schema), explains the scope conditional logic, and provides a high-level return value description (sufficient since output schema exists). Could improve by mentioning error conditions or data freshness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description fully compensates by documenting all 5 parameters with rich semantics: examples for 'account' ('Gmail', 'Work'), valid enum values for 'scope', conditional usage rules for 'sender'/'mailbox', and special value semantics for 'days_back' (0 = all time).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
States specific action ('Get') and resource ('email statistics and analytics') clearly. The term 'comprehensive' hints at broad capability, but the description does not explicitly differentiate this from sibling analytics tools like 'get_top_senders' or 'get_inbox_overview'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implicit usage guidance by mapping parameters to specific scopes (e.g., 'sender' for 'sender_stats' scope), but lacks explicit guidance on when to choose this tool over alternatives like 'get_top_senders' or search functions. No 'when not to use' guidance provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully explains the UI behavior of the 'open' action and notes that output is formatted by action type. However, it fails to disclose that 'delete' and 'send' actions are destructive/irreversible, which is critical safety information for an agent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The docstring format with explicit 'Args' and 'Returns' sections is appropriate and scannable. While not terse, the detail is justified by the complete absence of schema descriptions; every sentence provides necessary constraint information or behavioral context without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex multi-action tool with eight parameters, the description adequately covers input requirements and conditional validation rules. It acknowledges the existence of formatted output (confirmed by output schema presence), though it could briefly mention error handling or the specific nature of the formatted returns.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Given 0% schema description coverage, the description comprehensively compensates by documenting all 8 parameters. It provides crucial conditional logic (e.g., subject/body required for 'create', draft_subject required for 'send/open/delete'), format hints (comma-separated emails), and concrete examples ('Gmail', 'Work') that the schema completely lacks.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool manages draft emails and enumerates the five supported actions (list, create, send, open, delete). However, it does not explicitly differentiate this tool from siblings like 'compose_email' or 'create_rich_email_draft', leaving ambiguity about which tool to use for simple draft creation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
It provides specific guidance for the 'open' action (opens a visible compose window for review), which is helpful. However, it lacks broader guidance on when to use this multi-purpose tool versus specialized siblings, and does not mention prerequisites like authentication or existing account requirements.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses the send/draft/open modes and their interactions (mode overrides send), reply_to_all behavior, and HTML/plain text fallback requirements. However, it lacks disclosure of error behaviors (e.g., multiple subject matches), authentication requirements, or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The docstring format (Args/Returns) is well-structured and front-loaded with the core purpose. While lengthy due to the need to document 10 undocumented parameters, every sentence provides necessary value given the schema coverage gap. Minor verbosity in the Returns section.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 10-parameter tool with complex delivery modes and no annotations, the description is comprehensive. It covers all parameters, return values, and the critical mode/send interaction. Minor gaps remain regarding edge cases (multiple email matches) and authorization requirements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Given 0% schema description coverage, the Args section comprehensively compensates by documenting all 10 parameters with rich semantics, including format examples (account: 'Gmail', 'Work'), delimiter specifications (comma-separated for cc/bcc/attachments), and path examples for attachments.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The opening sentence 'Reply to an email matching a subject keyword' provides a specific verb (Reply), resource (email), and distinguishing mechanism (subject keyword search) that clearly differentiates it from sibling tools like compose_email (new emails) and forward_email (forwarding).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no explicit guidance on when to use this tool versus alternatives like compose_email or forward_email, nor does it mention prerequisites (e.g., that the target email must exist). The agent must infer usage solely from the parameter names.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses performance characteristics (body search slowdown, cross-account penalty, content preview cost) and mentions pagination and date filtering capabilities. However, it lacks disclosure on rate limits, authentication requirements, error handling behaviors, or whether searches include archived/deleted items.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately structured with a value-proposition headline, consolidation explanation, detailed Args section, and Returns summary. Given 17 undocumented parameters, the length is justified and information-dense. The 'Args:' and 'Returns:' pseudo-docstring format is clear and parseable. Minor deduction for slightly redundant phrasing ('Optional' appears repeatedly) and the Returns section being somewhat vague given the complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex 17-parameter search tool with no schema descriptions, the description achieves high completeness by documenting every parameter's semantics and providing performance warnings. Since an output schema exists (per context signals), the brief Returns statement is acceptable. It misses only edge-case behaviors (rate limiting, empty result handling, search scope boundaries) that would elevate it to a 5.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema has 0% description coverage (titles only), requiring the description to compensate fully. The Args section provides comprehensive semantic meaning for all 17 parameters, including format specifications (e.g., 'YYYY-MM-DD'), value examples (e.g., 'Gmail', 'Work'), default behaviors, and inter-parameter dependencies (e.g., max_content_length only applies when include_content=True). This is exemplary compensation for schema deficiencies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies this as a 'Unified search tool' that consolidates subject, sender, body, and cross-account search. While it implies email context through these fields and the tool name, it never explicitly states 'search for emails' in the opening, relying on theArgs section to clarify the domain. It differentiates from siblings by emphasizing its consolidated nature.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by stating it consolidates multiple search types (subject, sender, body) into one tool, suggesting it should be used when multiple filters are needed. It provides performance guidance (body search is 'significantly slower', cross-account is 'slower'). However, with many sibling tools like list_inbox_emails or get_email_thread, it lacks explicit when-to-use/when-not-to-use guidance comparing these alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively explains the three delivery modes and the HTML/plain text fallback relationship, but omits safety warnings about the irreversible nature of sending emails, rate limits, or attachment size constraints.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description employs a standard docstring structure with clear Args and Returns sections, presenting information efficiently without redundancy despite the high parameter count. Every sentence provides necessary specification details not present in the schema.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the high complexity (9 parameters, 4 required) and complete lack of schema annotations, the description provides sufficient detail for correct invocation by documenting all parameters and return behavior. It appropriately acknowledges the output without over-specifying, though it could benefit from mentioning error conditions or account validation requirements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description comprehensively compensates by documenting all 9 parameters in the Args section, including format specifications (comma-separated values), concrete examples (account names, file paths), and semantic relationships between body and body_html.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The opening sentence 'Compose and send a new email from a specific account' clearly identifies the action and resource. However, it does not differentiate from sibling tools like `create_rich_email_draft` or `reply_to_email`, and the term 'send' is slightly misleading since the 'draft' mode does not actually send.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description documents the `mode` parameter options (send/draft/open), providing implicit guidance on delivery methods. However, it lacks explicit guidance on when to choose this tool over alternatives like `create_rich_email_draft` for draft creation or prerequisites like account configuration.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses return format behavior (nested dict vs flat dict) based on the summary_only flag, but omits other behavioral traits like side effects, permission requirements, rate limits, or caching behavior that would be relevant for a read operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description uses a clean, standard docstring format with clear Purpose-Args-Returns sections. It is front-loaded with the core purpose and contains no redundant text; every sentence provides specific operational detail or parameter documentation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (3 optional parameters, two return formats), the description is comprehensive. It documents all inputs and explains the return structure variations, even though an output schema exists (making the Returns section optional but helpful). It lacks only edge case handling or error condition documentation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage (only titles). The description fully compensates via the Args section, which clearly explains all three parameters: account as an optional filter, include_zero for zero-count inclusion, and summary_only for toggling between detailed and summary return formats.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Get[s] unread counts per mailbox for one account or all accounts' with specific verb and resource. It distinguishes granularity from siblings (get_inbox_overview, get_statistics) by emphasizing 'per mailbox' and noting it replaces a former tool, though it doesn't explicitly contrast with all current alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implied usage guidance by explaining the summary_only parameter behavior ('When summary_only=True...'), helping users choose between detailed and summary modes. However, it lacks explicit when-to-use guidance versus sibling tools like get_inbox_overview or get_statistics, and mentions no prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It adds valuable behavioral context about thread matching ('same or similar subject') and return formatting ('sorted by date'). However, it omits safety characteristics (read-only status, error handling if thread not found) that would be necessary for complete transparency without annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The docstring format with Args and Returns sections is well-structured and front-loaded. Each parameter description earns its place by providing examples or constraints. The format is slightly formal but efficiently organized without redundant prose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 4 parameters with zero schema descriptions, the description successfully documents all inputs and the return value ('Formatted thread view'). It adequately covers the tool's contract despite lacking annotations, though it could improve by explicitly stating the read-only nature of the operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage (titles only). The description fully compensates by documenting all 4 parameters with clear semantics, practical examples ('Gmail', 'Re: Project Update'), and explicit default values ('default: 50', 'default: INBOX'). This adds essential meaning missing from the structured schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Get[s] an email conversation thread' and specifies the scope as 'all messages with the same or similar subject,' which distinguishes it from siblings like search_emails (broad search) or list_inbox_emails (general listing). The verb and resource are specific and unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage through the specific mention of 'conversation thread' and 'subject_keyword,' suggesting when to use this versus general search tools. However, it lacks explicit guidance on when to prefer this over search_emails or get_inbox_overview, and does not mention prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses prerequisites (package dependencies, client compatibility) and error conditions ('error message if UI is unavailable'), but does not explicitly state safety characteristics (read-only nature) or rate limits that would help an agent assess invocation risks.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with the purpose front-loaded, followed by bulleted feature details, technical requirements, and return value documentation. While slightly verbose, every sentence provides necessary context about the UI resource nature or technical prerequisites.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of an output schema, the description comprehensively documents the return value (UIResource with specific URI and HTML content) and error states. For a zero-parameter tool, this level of detail regarding the UI package requirements provides complete contextual coverage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema contains zero parameters, which establishes a baseline score of 4 per the evaluation rules. The description correctly offers no parameter details since none exist, avoiding unnecessary elaboration.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with the specific action 'Get' and resource 'interactive dashboard view of your email inbox.' It clearly distinguishes itself from data-fetching siblings (like get_inbox_overview or list_inbox_emails) by emphasizing the UIResource return type and interactive visualization aspects.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context about requirements ('Requires mcp-ui-server package and a compatible MCP client') and implies usage for visual interaction rather than data retrieval. However, it lacks explicit guidance on when to choose this over get_inbox_overview for users wanting non-UI data.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, description carries full burden. Discloses output formatting behavior (indented format vs path format for nested mailboxes) and default values (include_counts defaults to True). Could improve by mentioning error cases or permissions requirements.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured docstring format with clear Args/Returns sections. Front-loaded with core purpose. Returns section is slightly verbose but provides valuable formatting context. No redundant or wasted sentences.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Appropriate for tool complexity (2 optional parameters, simple types). Despite 0% schema coverage, description fully documents inputs. Output schema exists per context signals, so detailed return documentation in description is supplementary rather than required.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Excellent compensation for 0% schema description coverage. Provides detailed semantics for both parameters: account includes examples ('Gmail', 'Work') and null behavior, while include_counts explains its boolean purpose and default value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb-resource-scope combination: 'List all mailboxes (folders) for a specific account or all accounts.' Explicitly distinguishes from siblings like list_accounts (which lists accounts, not folders) and list_inbox_emails (which lists messages, not folders).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implied usage through parameter documentation (account filtering, include_counts), but lacks explicit when-to-use guidance versus alternatives like get_mailbox_unread_counts or list_accounts. No exclusions or prerequisites stated.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Without annotations, the description carries the full disclosure burden. It successfully documents the safety limit (default max_moves=50) and preview behavior (dry_run), but omits critical mutation details such as whether moves are reversible, error handling when destinations don't exist, or thread vs. individual email handling.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured with clear sections (purpose, usage hints, Args, Returns) and front-loaded core functionality. The length is necessarily extended to compensate for schema deficiencies, but every sentence adds value—either explaining a parameter or clarifying usage patterns.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the brief return description ('Confirmation message with details') is sufficient. The description adequately covers the complex parameter set (10 params) and primary use cases (archiving, filtered moving). Minor gaps remain regarding error states and edge case handling.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Fully compensates for 0% schema description coverage by documenting all 10 parameters in the Args section with specific examples (account: 'Gmail', 'Work'), format guidance (to_mailbox: use '/' separator for nesting), and behavioral notes (subject_keywords: 'matches any keyword'). This provides essential semantic context missing from the structured schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a precise action statement: 'Move email(s) matching filters from one mailbox to another,' specifying the verb (Move), resource (emails), and scope (mailbox-to-mailbox with filtering). It clearly differentiates from sibling tools like search_emails (which only finds) or compose_email (which creates).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides concrete usage patterns including 'Use dry_run=True to preview matches without moving' and specific guidance for archiving workflows ('For archiving to "Archive", just set to_mailbox="Archive"'). Lacks explicit contrast with alternatives like search_emails, but offers strong contextual guidance for key parameters.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full disclosure burden. While the verb 'Analyse' implies a read-only operation and the 'Returns' section documents the output format ('Ranked list of senders... with email counts'), the description omits explicit safety confirmations, performance characteristics, or side effects that would be necessary for a tool with ambiguous mutability.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description follows a clear docstring structure (summary, use cases, args, returns) with no wasted words. Information is front-loaded with the core purpose, and the length is appropriate for the 5-parameter complexity. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 0% input schema coverage, the description successfully fills all documentation gaps by detailing every parameter. It also explains return values despite the existence of an output schema (which may lack descriptions). For a tool of this complexity (5 flat parameters), the description is sufficiently complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description fully compensates by documenting all 5 parameters in the Args section. It provides semantic meaning beyond types (e.g., '0 = all time' for days_back, domain vs individual sender distinction for group_by_domain, and concrete examples like 'Gmail', 'Work' for account).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a specific verb ('Analyse') and resource ('mailbox... most frequent senders') that clearly defines the scope. It distinguishes from siblings like get_statistics or search_emails by focusing specifically on sender frequency analysis rather than general email retrieval or broad metrics.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides concrete use cases ('identifying key contacts, high-volume senders to filter, or newsletter sources'), which helps agents understand when to invoke the tool. However, it lacks explicit exclusions or named alternatives (e.g., 'use search_emails for finding specific messages instead').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Adds valuable performance context ('slower' for include_content) and migration metadata (replaces former tool). However, lacks explicit safety classification (though 'List' implies read-only), auth requirements, or rate limit disclosures.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Uses structured docstring format (Args/Returns) with zero wasted text. Front-loaded with purpose statement. Migration note is contextually necessary. Each parameter description efficiently combines type, default, and behavioral notes.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 5 optional parameters and existence of output schema, the description is complete. It documents all parameters, provides return value summary ('subject, sender, date, and read status'), and includes migration context from the deprecated tool. No gaps identified.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0% (only titles provided). The Args section fully compensates by documenting all 5 parameters with semantics, examples (e.g., 'Gmail', 'Work'), special values (0 = all), and performance implications. Essential for agent to understand parameter usage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
Opens with specific verb ('List') + resource ('emails from inbox') + scope ('across all accounts or a specific account'). Explicitly distinguishes from sibling tool 'get_recent_emails' by stating this tool replaces it and clarifies the migration path.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on when to use vs the former tool: 'use account + max_emails to get recent emails from a single account.' This indicates the specific filtering pattern for targeted retrieval, though it does not explicitly contrast with other siblings like search_emails.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and discloses important behavioral traits: the precedence of message_ids over filters, the safety limit of max_updates (default 10), and the apply_to_all guardrail requirement. It also describes the return value. It does not mention error handling behavior or idempotency characteristics.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description follows a logical docstring structure with purpose upfront, followed by behavioral logic, then detailed Args, and finally Returns. It is appropriately sized for the complexity (10 parameters with zero schema documentation), though the Args section contains some repetitive 'Optional' qualifiers that could be tightened.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a mutation tool with 10 parameters, no annotations, and zero schema coverage, the description successfully documents all inputs and the filtering logic. The mention of output schema existence satisfies return value needs. Minor gaps remain regarding error scenarios (e.g., invalid message_ids) and atomicity guarantees.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Given 0% schema description coverage, the Args section comprehensively compensates by documenting all 10 parameters with semantics, examples (e.g., 'Gmail', 'Work' for account), and valid values (e.g., 'mark_read', 'flag' for action), adding essential meaning absent from the structured schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The opening sentence clearly states the specific action (update email status) and the exact operations supported (mark as read/unread or flag/unflag), distinguishing it from sibling tools like compose_email or move_email which handle different email operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear internal logic for when to use message_ids (exact ID matching that ignores other filters) versus when to use filtering parameters (subject, sender, age). However, it does not explicitly contrast with sibling alternatives to guide selection between this and tools like manage_trash or move_email.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and successfully discloses the matching algorithm (subject matching + same recipient check). This is crucial behavioral context that defines what 'awaiting reply' means. Minor gap: doesn't explicitly state it's read-only or describe performance characteristics.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Uses Google-style docstring format with Args/Returns sections. Front-loaded with the value proposition in the first sentence, followed by implementation details, use case, and structured parameter documentation. No redundant or wasted text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensive for a 4-parameter tool with cross-referencing logic. Documents all inputs and summarizes return values (subject, recipient, date) despite the existence of an output schema. Minor gap: doesn't mention error conditions (e.g., invalid account name).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 0% schema description coverage (titles only), the Args section provides complete semantic documentation for all 4 parameters including examples (account), units (days_back), and business logic (exclude_noreply). Fully compensates for the schema's lack of descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a specific action ('Find sent emails that haven't received a reply yet') and immediately clarifies the unique implementation (cross-referencing Sent with Inbox using subject matching). This distinguishes it from generic search tools like 'search_emails' or overview tools like 'get_inbox_overview'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context ('Useful for follow-up tracking') indicating when to use the tool. However, it lacks explicit guidance on when NOT to use it (e.g., 'use search_emails instead for content-based searching') or comparisons to sibling 'get_needs_response'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses key behavioral traits not in annotations: auto-creation of intermediate path segments ('each segment is created if needed') and return value format ('Confirmation with the new mailbox path'). Missing error handling details (e.g., behavior if mailbox exists) which would be helpful given zero annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Excellent structure with clear Args/Returns sections. Front-loaded purpose statement. Every sentence adds value: examples illustrate valid inputs, slash-handling explains behavior, return description sets expectations. No redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Complete for a 3-parameter creation tool. Despite 0% schema description coverage, narrative fully documents all parameters and their relationships. Return value is summarized appropriately (output schema exists for detailed structure).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Comprehensive compensation for 0% schema coverage: provides concrete examples for account ('Gmail', 'Work'), explains name syntax including slash-handling behavior, and clarifies parent_mailbox optionality. Documents parameter interactions (mutual exclusivity patterns) not evident in schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
Opens with specific verb+resource ('Create a new mailbox') and clarifies mailbox equals 'folder'. Clearly distinguishes from sibling tools like list_mailboxes (read vs. write) and compose_email (message vs. folder operations).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on two valid usage patterns for nesting (parent_mailbox param vs. slash-separated name). However, lacks explicit 'when not to use' guidance or comparison to sibling alternatives like manage_drafts or list_mailboxes.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and successfully discloses the behavioral heuristics (filters newsletters/automated emails, prioritizes direct emails with question marks). It also describes the return value (ranked list with priority hints). It omits error handling or empty-result behavior, but the core algorithm is transparent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description follows a clear docstring structure (summary, Args, Returns) with no wasted words. The opening sentence establishes purpose immediately, followed by behavioral details, then structured parameter documentation. Each sentence earns its place by conveying unique information not present in the structured schema fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 4 parameters and an output schema, the description is complete. It compensates for the 0% schema coverage with full parameter documentation, explains the ranking algorithm, and describes the return format sufficiently (acknowledging the output schema handles detailed structure).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Given 0% schema description coverage, the Args section fully compensates by documenting all 4 parameters with clear semantics and helpful examples (e.g., account: 'Gmail', 'Work', 'Personal'). It explains the purpose of each parameter (mailbox to scan, days back to look) and notes defaults, effectively serving as the primary parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool 'Identify unread emails that likely need a response from you,' providing a specific verb (identify) and resource (emails). It distinguishes itself from generic email listing tools (like list_inbox_emails) by detailing the specific filtering heuristics (newsletters, noreply) and prioritization logic (question marks, direct emails).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use the tool by explaining what gets filtered out and prioritized, implying the use case (finding actionable emails vs. browsing all unread). However, it lacks explicit 'when not to use' guidance or named sibling alternatives (e.g., contrasting with get_awaiting_reply).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior5/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and excels by disclosing: (1) dry_run behavior previews without acting, (2) max_deletes is a safety limit, (3) action-specific constraints (empty_trash ignores subject_keyword, delete_permanent ignores mailbox), and (4) destructive scope of each action. It fully explains the safety model and default behaviors.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The structure is logical: summary → dry_run behavior (front-loaded safety) → Args → Returns. Given 11 parameters with complex interdependencies, the length is appropriate. The Args list is comprehensive but necessary due to poor schema coverage. The Returns section is minimal but acceptable since an output schema exists.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex tool with 11 parameters, 3 distinct actions, and multiple safety guards, the description is comprehensive. It covers all parameters, action-specific behaviors, default values, safety constraints (confirm_empty, max_deletes), and filtering logic. The output schema handles return values, so the brief Returns section is sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema has 0% description coverage (only titles). The Args section in the description fully compensates by explaining all 11 parameters: providing examples (account: 'Gmail'), listing valid enum values for action, explaining constraints (mailbox not used for empty_trash), and clarifying safety semantics (confirm_empty, apply_to_all, max_deletes). It adds essential meaning that the schema lacks.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The opening sentence 'Manage trash operations - delete emails or empty trash' provides specific verbs (manage, delete, empty) and resources (trash, emails). It clearly distinguishes from siblings like 'move_email' or 'compose_email' by focusing specifically on trash lifecycle operations (move_to_trash, delete_permanent, empty_trash).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides excellent safety guidance explaining when to use dry_run=True (default) for previews versus dry_run=False for actual execution. It documents safety requirements like confirm_empty and apply_to_all. However, it lacks explicit comparison to sibling alternatives (e.g., when to use 'move_to_trash' vs the general 'move_email' tool).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full behavioral disclosure burden. It explains the fallback logic between text_body and html_body parameters, default behaviors for open_in_mail (True) and save_as_draft (False), and the file generation mechanism. Minor gap: does not specify file overwrite behavior or error conditions when output_path exists.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Excellent structure with four distinct sections (purpose/rationale, Args, Returns). Information density is high with zero waste—every sentence provides actionable guidance (e.g., explaining why .eml is preferred) or parameter constraints. Technical details (comma-separated, fallback generation) are precisely where needed.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 10 parameters with complex interdependencies (text_body/html_body fallbacks) and 0% schema coverage, the description is remarkably complete. It documents all parameters, explains return structure ('Confirmation with the generated `.eml` path'), and covers the output schema existence. No gaps remain for agent invocation decisions.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0% (titles only), requiring full compensation. The Args section provides rich semantics for all 10 parameters: account includes examples ('Work', 'Oracle'), recipient fields specify comma-separated formatting, body fields explain mutual fallback generation, and booleans clarify default values. Comprehensive coverage of complex parameter interactions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states the tool generates unsent `.eml` messages for rich-text emails, using specific verbs (create/generate) and resource identifiers. It clearly distinguishes itself from sibling `compose_email` by explaining this is the 'preferred path for HTML or richly formatted emails because Mail reliably renders `.eml` content, while setting raw HTML through AppleScript often stores the literal markup instead.'
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use guidance by contrasting with AppleScript alternatives: 'This is the preferred path for HTML or richly formatted emails.' It explains the technical rationale (reliable rendering vs. literal markup storage), giving agents clear criteria for selecting this tool over `compose_email` or other siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

GitHub Badge

Glama performs regular codebase and documentation scans to:

Confirm that the MCP server is working as expected.
Confirm that there are no obvious security issues.
Evaluate tool definition quality.

Our badge communicates server capabilities, safety, and installation instructions.

Card Badge

Copy to your README.md:

[![apple-mail-mcp MCP server](https://glama.ai/mcp/servers/patrickfreyer/apple-mail-mcp/badges/card.svg)](https://glama.ai/mcp/servers/patrickfreyer/apple-mail-mcp)

Score Badge

Copy to your README.md:

[![apple-mail-mcp MCP server](https://glama.ai/mcp/servers/patrickfreyer/apple-mail-mcp/badges/score.svg)](https://glama.ai/mcp/servers/patrickfreyer/apple-mail-mcp)

How to claim the server?

If you are the author of the server, you simply need to authenticate using GitHub.

However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.

{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}

Then, authenticate using GitHub.

Browse examples.

How to make a release?

A "release" on Glama is not the same as a GitHub release. To create a Glama release:

Claim the server if you haven't already.
Go to the Dockerfile admin page, configure the build spec, and click Deploy.
Once the build test succeeds, click Make Release, enter a version, and publish.

This process allows Glama to run security checks on your server and enables users to deploy it.

How to add a LICENSE?

Please follow the instructions in the GitHub documentation.

Once GitHub recognizes the license, the system will automatically detect it within a few hours.

If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.

How to sync the server with GitHub?

Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.

To manually sync the server, click the "Sync Server" button in the MCP server admin interface.

How is the quality score calculated?

The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).

Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.

Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).

Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.

Latest Blog Posts

Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security
Open Source Has a Bot Problem
By punkpeye on March 19, 2026.
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/patrickfreyer/apple-mail-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server