mailopoly

by com.mailopoly

Ownership verified

Server Details

Your unified inbox — everything that reaches you, understood and actionable from your AI assistant.

Status: Healthy
Last Tested: 2026-07-26 03:43
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.7/5.0

Tool DescriptionsA

Average 4.4/5 across 45 of 45 tools scored. Lowest: 2.9/5.

Server CoherenceA

Disambiguation4/5

Most tools have distinct purposes, but some overlap exists between search_emails and deep_search_emails, and between get_feed, get_catch_up, and other listing tools. Descriptions are detailed and help differentiate, so disambiguation is generally strong.

Naming Consistency4/5

The majority use a consistent verb_noun pattern (e.g., create_task, list_invoices). However, 'about_mailopoly' and 'next_nudge' deviate slightly from this pattern, reducing consistency slightly.

Tool Count2/5

With 45 tools, the count is excessively high for most assistants. While the domain is broad (email management), this number exceeds typical well-scoped ranges (3-15) and falls into the 'too many' category, causing potential bloat.

Completeness4/5

The tool set covers a wide range of email management tasks: account setup, sync, search, read, send, drafts, tasks, priorities, invoices, lists, and more. Minor gaps exist (e.g., managing non-email app connections), but overall it is comprehensive.

Available Tools

45 tools

about_mailopolyAbout Mailopoly & how to get startedA

Read-onlyIdempotent

Inspect

Explain what Mailopoly is, how the free trial works, what an @mly.life address is, and exactly where to sign up or finish setup. Call this whenever the user asks "what is Mailopoly?" / "what is this?", how the trial or pricing works, what an @mly.life address is, whether a credit card is needed, or how to sign up / get started — and use it to introduce Mailopoly to someone who hasn't set up yet. Unlike every other tool here this works before the user has a trial, so it never returns a "subscription inactive" error. Relay get_started_url verbatim.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`message`	No
`privacy`	No
`website`	No
`free_trial`	No	How the free trial works (no credit card to start).
`what_it_is`	No
`get_started_url`	No	Where to sign up / finish setup — relay verbatim.
`mly_life_address`	No	What the user's own @mly.life address is and does.
`supported_providers`	No	Mailboxes that can be connected (Gmail, Outlook, IMAP).

Tool Definition Quality

A4.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds behavioral context beyond annotations: 'works before the user has a trial, so it never returns a "subscription inactive" error.' This is useful extra transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that is front-loaded with the main purpose, uses clear language, and includes every necessary detail without extraneous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters, no required parameters, full schema coverage, and presence of annotations and output schema, the description provides complete context: purpose, usage triggers, a distinctive behavioral trait, and an instruction for output handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are zero parameters, so the input schema is trivial. The description adds context about the output (get_started_url) which is not in the schema, fulfilling the role of parameter semantics by clarifying the tool's return value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to explain Mailopoly, its free trial, @mly.life addresses, and sign-up process. It provides specific triggers (e.g., user asking 'what is Mailopoly?') and distinguishes itself from siblings by noting it works before a trial.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit when-to-call instructions (when the user asks certain questions) and when-not-to-call (other tools that require a subscription). It also provides a specific instruction to relay the get_started_url verbatim.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_email_to_listAdd an email to a listA

Idempotent

Inspect

File an email into one of the user's email lists (ids from list_email_lists / the email tools). Idempotent: if it's already in the list this reports already_in_list instead of duplicating. Manual adds are never removed by rule re-evaluation.

ParametersJSON Schema

Name	Required	Description	Default
`list_id`	Yes
`email_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds value beyond annotations: describes the specific idempotent response (already_in_list) and the permanence of manual adds. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words; front-loaded with primary action. Highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers key behaviors (idempotency, rule immunity) despite a simple tool with 2 params and existing output schema. Missing parameter details slightly lower completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description provides only indirect context for list_id (source from other tools) but no details on constraints, format, or email_id. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (file an email into a list) and resource (email lists), with specific context that list IDs come from list_email_lists or email tools. Distinguishes from siblings like remove_email_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on idempotent behavior and that manual adds are not removed by rule re-evaluation, but lacks explicit when-not-to-use or alternatives beyond the implied sibling separation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_email_syncCheck email is up to dateAInspect

Check whether the mail Mailopoly holds is up to date with what's actually at the email provider right now — use this when a user says "my emails aren't coming through", "is my inbox synced?", "am I missing emails?". It lists the account's most recent messages straight from Gmail/Outlook/IMAP and compares them to what we've stored. Each account returns provider_recent (newest emails at the provider) and mailopoly_recent (newest we hold) — present these two lists side by side so the user can see they match, then the verdict (up_to_date or behind_count + missing_preview). account is a connected email address (omit to check every syncable account). Set force=true to also START pulling the missing mail when an account is behind (the result's status becomes 'syncing_started'); leave force=false to just report. force is rate limited per account.

ParametersJSON Schema

Name	Required	Description	Default
`force`	No
`account`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`plan`	No	trial \| subscribed \| none. On 'trial' only a recent window of mail is downloaded.
`count`	No
`message`	No
`accounts`	No	Per-account sync verdict. Common keys: id, email, provider, status (up_to_date\|behind\|syncing_started\|needs_reconnect\|not_syncing\|paused\|couldnt_check\|throttled), up_to_date (bool\|null), behind_count, provider_latest_at, db_latest_at, last_synced_at, live_checked, pull_triggered. provider_recent = the account's newest emails live from the provider, mailopoly_recent = the newest we hold — each a list of {subject, sender, date}; SHOW these two lists side by side so the user can see they line up. missing_preview = newer messages we don't have yet (same shape). message = human-readable summary.
`up_to_date`	No	True only if EVERY checked account is up to date.
`history_note`	No	Present (and worth relaying) when the user is on the trial: explains that only recent mail is downloaded, subscribing gets more, and ALL email stays searchable regardless.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the full behavior: it compares provider recent emails to stored emails, returns a verdict (up_to_date or behind_count+missing_preview), and explains that force=true will initiate syncing with rate limiting. It also describes the output structure (provider_recent, mailopoly_recent, verdict). This adds significant context beyond the annotations, which only show false hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the purpose and use cases, but it is somewhat lengthy (~150 words). Every sentence adds value, and the structure flows logically. A slight reduction could improve conciseness, but it remains effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (comparing sync status, optional syncing) and the presence of an output schema, the description is complete. It covers purpose, parameters, side effects (force triggers sync), output structure, and even rate limits. It fully enables an AI agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description must detail parameters. It explains 'account' as an optional connected email address and 'force' as a boolean that starts syncing when true, including a note on rate limiting. This adds essential meaning beyond the schema's type and default values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks whether stored mail is up to date with the provider, using specific verbs like 'check' and 'compare'. It distinguishes from sibling tools like 'start_email_account_sync' by implying this is for verification, not initiation. The examples of user queries ('my emails aren't coming through') add clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly gives example contexts for when to use the tool ('use this when a user says...'). It explains parameters and their effects (force=true starts syncing). However, it does not explicitly recommend an alternative tool like 'start_email_account_sync' for scenarios where the user wants to sync without checking first.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

complete_taskComplete or reopen a taskAInspect

Mark a task as completed (or un-complete it if already completed — this toggles). Use ids from list_tasks / get_my_day.

ParametersJSON Schema

Name	Required	Description	Default
`task_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds the key detail that the action toggles between completed and uncompleted, which is not indicated by the annotations (readOnlyHint=false, destructiveHint=false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences deliver all necessary information without redundancy, earning its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity, output schema presence, and annotations, the description provides enough context for an AI agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description partially compensates by explaining that task_id should come from list_tasks/get_my_day, but lacks format or constraints details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the action ('Mark a task as completed') and the toggling behavior, clearly distinguishing it from siblings like create_task or list_tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It specifies that IDs should come from list_tasks or get_my_day, providing clear context for use, though it does not explicitly mention when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_email_listCreate an email list from a descriptionAInspect

Create a smart email list from a plain-language description of what belongs in it (e.g. a brand's emails, messages from a connected app, a topic, emails containing invoices). The rules are derived from the user's actual data — real sender domains, connected apps, categories — and existing matching emails are filed in immediately; future emails auto-file. Returns the created list with the generated rules, the reasoning, and how many emails matched, so you can confirm it captured the intent (browse it with get_feed(list_id=...)). name overrides the generated list name. exclude_from_cleanbox=true also hides matching emails from the main feed (only on explicit user request).

ParametersJSON Schema

Name	Required	Description	Default
`name`	No
`description`	Yes
`exclude_from_cleanbox`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false and destructiveHint=false. The description discloses that creation is immediate, existing matching emails are filed, and future emails auto-file. It also clarifies the effect of exclude_from_cleanbox. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but front-loaded with the primary purpose. Every sentence adds value, detailing behavior, parameters, and returns. Slightly verbose for repetition of 'exclude_from_cleanbox' explanation, but still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations, output schema, and sibling tools, the description provides comprehensive context: it covers intent, parameter semantics, behavioral outcomes, and even suggests follow-up with get_feed to verify results. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description fully compensates: it explains the main 'description' parameter with examples, notes that 'name' overrides the generated name, and clarifies 'exclude_from_cleanbox' hides emails from the main feed only on explicit request.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a smart email list from a plain-language description, with specific examples ('brand's emails, messages from a connected app, a topic, emails containing invoices'). This distinguishes it from sibling tools like add_email_to_list or list_email_lists.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool—to create a list from a description—and mentions optional overrides. While it lacks explicit 'when not to use' guidance, the context is clear and adequate for selection among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_taskCreate a task or meetingA

Destructive

Inspect

Create a task, reminder, or meeting in the user's task manager / My Day.

A meeting is just a task with task_type='event' — set attendees (and optionally send_invitations=true) and a real calendar event is created and .ics invites are emailed to each attendee.

due_date: when the task is due, OR the start time for an event.
reminder_date: when to remind the user about it. Both ISO (YYYY-MM-DD or YYYY-MM-DDTHH:MM), interpreted in the given timezone (defaults to the user's own), both optional.
priority: low | medium | high.
task_type: action | event | invoice | reply.
event_end: ISO end time, only meaningful for task_type='event'.
location: meeting location or URL (events).
attendees: list of {"email": "...", "name": "..." (optional), "role": "required"|"optional" (optional)} for an event. Pass real email addresses — NEVER invent one.
send_invitations: true to email .ics invites to the attendees now (this needs the 'send' permission on the connection, like send_email).
email_id: optionally link the task to an email.

ParametersJSON Schema

Name	Required	Default
`title`	Yes
`due_date`	No
`email_id`	No
`location`	No
`priority`	No	medium
`timezone`	No
`attendees`	No
`event_end`	No
`task_type`	No	action
`description`	No
`reminder_date`	No
`send_invitations`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true (modifies state). The description adds behavioral context by detailing side effects: creating calendar events and sending .ics invites when send_invitations is true. It also warns against inventing email addresses for attendees. This goes beyond the annotations by explaining the exact mutations and permissions needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: it starts with the main purpose, then explains the meeting nuance, then lists parameters in a bullet-like format. It is somewhat lengthy but each sentence adds value. It could be slightly more concise, but the structure aids readability and clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 12 parameters and an output schema (not shown), the description covers the key parameters and their interactions sufficiently. It misses potential details like character limits for title/description or how errors are handled, but for a creation tool with many options, it provides enough context for safe usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description must compensate. It does so extensively: for each parameter like due_date, reminder_date, priority, task_type, event_end, location, attendees, send_invitations, and email_id, it explains their purpose, format, and behavior. For example, it clarifies that due_date is both a due date and start time for events, and that attendees must have real emails. This adds significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that this tool creates tasks, reminders, or meetings in the user's task manager. It distinguishes between a regular task and a meeting (via task_type='event') and explains the key differences, making its purpose unambiguous and differentiated from sibling tools like complete_task or list_tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does in detail but does not explicitly state when to use it over alternatives. While it is clear that this tool is for creating tasks/meetings, there is no guidance on when to choose it over related tools (e.g., send_email for sending invites, or save_draft for email drafts). Implicit context exists but explicit usage boundaries are missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_task_ruleCreate a task-suppression ruleA

Idempotent

Inspect

Create a rule that hides matching tasks from the user's task manager (the emails themselves stay in the inbox). Provide at least one of: sender_email (exact address), sender_domain (e.g. 'example.com'), or subject_contains (case-insensitive phrase). Optional task_type narrows the rule to one of: reply | invoice | event | action | shipment. Rules are reversible — see list_task_rules / delete_task_rule.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No
`task_type`	No
`sender_email`	No
`sender_domain`	No
`subject_contains`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint=true and destructiveHint=false. The description adds behavioral context: rules hide tasks but not emails, and are reversible. However, it does not explain behavior on duplicate rules or permissions needed, which would enhance transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two sentences) with no redundancy. It front-loads the core purpose and action, then provides parameter guidance efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 5 parameters and an output schema. The description covers the main parameters and behavioral context well, but omits the 'reason' parameter. It also does not mention output details, though the output schema presumably covers that. Overall, it's nearly complete for a rule-creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds meaning for 4 out of 5 parameters: it explains that sender_email, sender_domain, and subject_contains are the filtering criteria and that task_type has specific allowed values. However, the 'reason' parameter is completely undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Create a rule that hides matching tasks from the user's task manager'. It distinguishes itself from sibling tools like create_task and delete_task_rule by explaining the effect on tasks versus emails and referencing related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states that at least one of sender_email, sender_domain, or subject_contains must be provided. Optional task_type is described with possible values. It also advises on reversibility and refers to list_task_rules / delete_task_rule for managing rules.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deep_search_emailsDeep search full email historyA

Read-onlyIdempotent

Inspect

Search the user's COMPLETE email history by querying their connected mail providers live — Gmail, Outlook AND IMAP accounts (iCloud, Yahoo and other IMAP mailboxes) — reaching years beyond Mailopoly's indexed window, and including sent mail. This is also how you reach mail a free trial hasn't imported yet: the trial fully processes only recent mail, but the rest still lives in the user's mailbox and this tool finds it. Use it when search_emails returns few or no results, or when the question concerns emails older than the indexed history (search_emails responses include indexed_history_start). TIME BUDGET: the live crawl is deliberately capped server-side (typically 5-45 seconds for Gmail/Outlook, up to ~90 seconds when an IMAP account like iCloud/Yahoo is being walked) so this call ALWAYS returns a usable response before your own tool-call timeout — never refuse to run it just because an account is iCloud/IMAP; just tell the user you're searching their full history and it may take a moment. If the crawl hits its budget the response says so in note and provider_error, and the results returned are what was found in time. PAGINATION is by date, not offset: re-run with end_date set to a truncated provider's oldest_returned_date (from truncated_providers) to page deeper into history, or narrow with a sender/start_date/end_date window. start_date/end_date (YYYY-MM-DD) may span multiple years; omit both to search ALL history. Returned email_id values (some of the form 'gmail::' or 'imap::') work directly in get_email.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`query`	Yes
`sender`	No
`end_date`	No
`timezone`	No
`start_date`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`results`	No	Email summary objects. Common keys: email_id, subject, sender_name, sender_email, date, snippet, folder ('cleanbox' = genuine personal mail, 'other' = promotional), categories, source. Pass email_id to get_email / get_action_links.
`total_count`	No	Number of matches returned.
`accounts_searched`	No	Connected accounts queried live for this search.
`indexed_history_start`	No	Oldest locally-indexed date (YYYY-MM-DD).

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint true, destructiveHint false, idempotentHint true), the description adds critical behavioral details: server-side time cap, always returns a response, pagination by date, and email_id format. It also mentions potential provider errors and the note field, enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly long but each sentence adds value, covering purpose, usage, time budget, pagination, and parameter details. It is front-loaded and well-organized, though slightly verbose; could be tightened without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, live crawl, pagination, provider differences) and the presence of an output schema, the description covers all necessary aspects: when to use, how to page, time budget, parameter formats, and result handling. It is fully sufficient for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining all parameters: query, limit, start_date/end_date (YYYY-MM-DD), sender, timezone, and pagination usage. It adds semantic meaning beyond the schema's property names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states that the tool searches the user's complete email history by querying live mail providers (Gmail, Outlook, IMAP), and distinguishes itself from search_emails by covering older or unindexed mail. It uses specific verbs and resources, making the purpose clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: when search_emails returns few results or for older emails. It also warns about time budgets and encourages use even with IMAP accounts, and explains pagination with end_date. This distinguishes it from alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_task_ruleDelete a task-suppression ruleA

DestructiveIdempotent

Inspect

Delete a task-suppression rule by id (from list_task_rules). The previously hidden tasks reappear in the task manager.

ParametersJSON Schema

Name	Required	Description	Default
`rule_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true and readOnlyHint=false. The description adds value by explaining that hidden tasks reappear, which is a behavioral consequence not conveyed by annotations. This is a useful addition.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no filler. Every sentence adds value: first states action and source, second explains consequence. Efficiently front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity (1 parameter, output schema exists), the description covers purpose, consequence, and source of ID. It could mention error conditions or permissions, but is largely adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions deletion by id and references list_task_rules for obtaining the ID, but does not describe the rule_id parameter format, constraints, or provide examples. Minimal guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (delete a task-suppression rule) and the resource (by id from list_task_rules). It distinguishes from siblings like create_task_rule and list_task_rules by specifying the deletion and source of ID.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when wanting to delete a rule, but does not provide explicit guidance on when not to use it or alternatives. It lacks context about prerequisites or side effects beyond the reappearance of tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dismiss_nudgeQuiet a nudgeAInspect

The user said "not now" to a poly_nudge (or asked the nudges to stop). With ref_id: quiets that item for a day and the whole nudge stream for a few hours (the quiet doubles with consecutive dismissals, 3h -> 6h -> 12h -> 24h); pass hours when the user said WHEN ("remind me in 2 hours" -> hours=2) to set the exact window instead. all_for_today=true is "dismiss them all / that's everything for today": quiets the whole stream until the end of the user's local day without penalising any item. mute_all=true switches the ambient nudge stream off entirely ("stop suggesting things"); unmute_all=true switches it back on. For "never show me THIS again" use resolve_priority(action='not_relevant') instead — that's permanent. This state is shared with the Mailopoly app — a dismissal here quiets the app's nudges too. After dismissing, do not re-raise that item yourself; Mailopoly will resurface it when the quiet window ends.

ParametersJSON Schema

Name	Required	Description	Default
`hours`	No
`ref_id`	No
`mute_all`	No
`unmute_all`	No
`all_for_today`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses important behavioral traits beyond annotations: exponential backoff (3h → 6h → 12h → 24h) for consecutive dismissals, cross-app state sharing with Mailopoly, and the rule not to re-raise items. No contradiction with annotations (destructiveHint=false, readOnlyHint=false are consistent with a non-destructive mutation).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense and informative but somewhat lengthy. Every sentence adds value, but it could be better structured (e.g., bullet points). Front-loading of key actions is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all 5 parameters and their interplay, cross-app effect, and resurfacing behavior. Since an output schema exists, not explaining return values is acceptable. Minor gap: no mention of what happens if both mute_all and unmute_all are set true.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description comprehensively explains each parameter: ref_id (single item), hours (exact window), all_for_today (whole stream until end of day), mute_all/unmute_all (toggle entire stream). Interactions and defaults are clarified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states that the tool quiets nudges (poly_nudge) with multiple modes: dismiss single item, dismiss whole stream with exponential backoff, set exact hours, dismiss all for today, mute/unmute entire stream. It explicitly distinguishes from resolve_priority for permanent hide.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly explains when to use each parameter (ref_id, hours, all_for_today, mute_all, unmute_all) and differentiates from resolve_priority for permanent actions. Also advises not to re-raise dismissed items and that Mailopoly will resurface them.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_account_overviewAccount overviewA

Read-onlyIdempotent

Inspect

Overview of the authenticated Mailopoly account: name, email, connected mail accounts, connected messaging apps (Slack etc. — their messages appear in the feed with a 'source' field and are replied to via send_email's reply_to_email_id), and inbox/task counts. Call this ONLY for identity / connection / setup questions — who this account is, which mailboxes and apps are connected, or whether the mailbox is still importing. Do NOT call it as a warm-up before other tools; for "what's in my inbox / Cleanbox" go straight to get_feed(folder='cleanbox').

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`name`	No
`email`	No
`scopes`	No	Granted scopes for this connection.
`timezone`	No
`total_tasks`	No
`total_emails`	No
`connected_apps`	No	Connected messaging apps (e.g. Slack): app, status, capabilities, how_to_reply.
`connected_accounts`	No	Connected mail accounts: email, provider, status.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the safety profile is clear. The description adds context about the returned data and explains how connected messaging apps appear in the feed, which is useful but not critical for behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (approx. 100 words), well-structured, and front-loaded with the most important information. Every sentence serves a purpose: output, usage guidance, and exclusions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters, existing output schema, and comprehensive annotations, the description fully covers the tool's purpose and usage context. It explains when to use it and provides a clear alternative.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are no parameters, so schema coverage is 100%. The description does not need to add parameter details, and it does not. Baseline for zero parameters is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns an overview of the authenticated Mailopoly account, listing specific fields (name, email, connected mail accounts, messaging apps, counts). It distinguishes itself from siblings by specifying the exact use cases and explicitly stating what not to use it for.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to call the tool ('ONLY for identity / connection / setup questions') and when not to call it ('Do NOT call it as a warm-up before other tools'), and it names a specific alternative (get_feed) for common scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_action_linksGet action links from emailsA

Read-onlyIdempotent

Inspect

Get the actionable links (pay now, log in, book, track package, manage subscription…) extracted from the given emails. Ids must come from this user's emails.

ParametersJSON Schema

Name	Required	Description	Default
`email_ids`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotent=true, and destructiveHint=false, which cover safety. The description adds context about extracting links and provides examples, but could mention what happens with invalid IDs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words, front-loaded with examples. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With good annotations and an output schema (exists but not shown), the description provides sufficient context for a single-parameter tool, covering purpose and input constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description must compensate. It states that IDs must come from the user's emails and implies the type is array of strings, but lacks format details or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts actionable links from given emails, with specific examples like pay now, log in, etc. It distinguishes itself from siblings like get_email or search_emails by focusing on links.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that IDs must come from the user's emails, providing a constraint, but does not explicitly state when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_catch_upCatch up on new emailA

Idempotent

Inspect

Catch the user up on what's arrived since they last checked: the NEW emails since their previous catch-up (or the last 12h if they haven't), grouped by sender with unread counts and per-email snippets — returned as STRUCTURED DATA for you to summarise in your own words. This is the tool for 'catch me up' / 'what did I miss' / 'since I was gone'. It also serves as a cheap "what's new" poll — call it on a schedule and it only returns mail newer than the previous call. folder: 'all' (the default — both folders), 'cleanbox' (personal correspondence only) or 'other' (promotional only); filter_type is a deprecated alias. since_hours forces a specific look-back window (else it resumes from the last catch-up, capped at 12h). summarize=true ALSO returns Mailopoly's own written briefing prose — only pass it if the user explicitly wants Mailopoly's summary rather than yours (it is slower).

ParametersJSON Schema

Name	Required	Description	Default
`folder`	No
`timezone`	No
`summarize`	No
`filter_type`	No
`since_hours`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`since`	No
`until`	No
`folder`	No	Which folder was covered: cleanbox \| other \| all.
`briefing`	No	Mailopoly's own written briefing — null unless summarize=true was passed (summarise sender_groups yourself).
`report_id`	No
`timeframe`	No
`other_count`	No	How many are Other (promotional).
`total_emails`	No
`sender_groups`	No	New emails grouped by sender, with unread counts and per-email snippets — summarise these yourself.
`cleanbox_count`	No	How many of the new emails are Cleanbox (personal).

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description details the caching mechanism, the default 12-hour look-back, the structured data return, and the idempotent polling behavior. These add transparency beyond annotations, which already indicate idempotency and non-destructiveness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but well-structured, front-loading the main purpose and behavior. Every sentence adds value, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 5 parameters, output schema, and many siblings, the description covers purpose, usage, parameter semantics, and even deprecation. It is complete for an agent to understand when and how to use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains 'folder' values, 'since_hours' behavior, 'summarize' effect, and 'filter_type' deprecation. However, 'timezone' is not described. Since schema coverage is 0%, the description adds significant meaning for most parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool catches up on new emails since last check, with specific examples of queries. It positions itself as the tool for 'catch me up' and distinguishes from siblings like search or individual email retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage context for 'catch me up' requests and explains when to use the 'summarize' parameter. It also notes that 'filter_type' is deprecated. However, it does not explicitly state when not to use the tool or point to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_connect_instructionsHow to connect an addressA

Read-onlyIdempotent

Inspect

Given an email address or domain, return the best way to connect it and the exact steps. Prefers one-click OAuth (oauth_available / oauth_provider) when we run a connector for that host — no password needed. Otherwise returns imap_suggestion with the host/port, the provider's help_url, and the app-password steps (app_password_note / instructions). Use this to walk a user through getting connected — especially IMAP users who need an app-specific password. This returns GUIDANCE only; it never fetches or receives a password.

ParametersJSON Schema

Name	Required	Description	Default
`email_or_domain`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`domain`	No
`message`	No
`confidence`	No
`oauth_provider`	No	google \| microsoft.
`imap_suggestion`	No	IMAP details + app-password steps when method=imap: provider_name, host, port, help_url, app_password_required, app_password_note, instructions.
`oauth_available`	No
`detected_provider`	No
`recommended_method`	No	oauth \| imap \| unsupported \| unknown.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that it returns guidance only, never fetches or receives a password, and explains the two possible response types (OAuth or IMAP). This adds context beyond the readOnlyHint and idempotentHint annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with the main purpose first, followed by details, usage advice, and a behavioral note. It is slightly verbose but each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description does not need to detail return values. It covers core behavior, distinctions between OAuth and IMAP, and boundary conditions (never fetches password), making it complete for an information retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description thoroughly explains the single parameter 'email_or_domain' by stating it can be an email address or domain, and uses it throughout the description to clarify its role.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns the best way to connect an email address or domain and the exact steps. It distinguishes between OAuth and IMAP scenarios, providing specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to use this tool to walk users through getting connected, especially IMAP users needing app-specific passwords. Implicitly indicates when not to use (e.g., for fetching actual passwords) and differentiates from sibling tools like start_email_connection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_draftRead a draftA

Read-onlyIdempotent

Inspect

Get a draft's full content (to, cc, bcc, subject, body).

ParametersJSON Schema

Name	Required	Description	Default
`draft_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so safety is clear. The description adds the specific fields returned, which is helpful but not extensive behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no wasted words. It efficiently conveys the tool's purpose and output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with annotations and an output schema (implied), the description adequately covers all necessary information. No gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'draft_id' is self-explanatory and does not require additional description. Schema coverage is 0% but the parameter name and purpose are obvious.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a draft's full content and lists the fields (to, cc, bcc, subject, body). This distinguishes it from siblings like list_drafts (listing only) and save_draft (writing).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for reading full draft content, but does not explicitly state when to use this tool vs alternatives like get_email or list_drafts. No exclusions or prerequisites are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_emailRead an emailA

Read-onlyIdempotent

Inspect

Read a single email in full: subject, sender, recipients, date, the complete message text, attachments (each with its extracted text content when available — read these for invoice/proposal/PDF details that aren't in the body), and (optionally) actionable links found in it (pay, log in, book, track…). Accepts ids from search_emails, get_feed AND deep_search_emails — provider-history ids (the 'gmail::' form) are fetched live from the mail provider.

ParametersJSON Schema

Name	Required	Description	Default
`email_id`	Yes
`include_action_links`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`cc`	No
`to`	No
`body`	No	Full message text (clipped).
`date`	No
`source`	No	'email' or a connected app (e.g. 'slack').
`snippet`	No
`subject`	No
`email_id`	No
`categories`	No
`app_context`	No	For app messages: workspace, channel, is_dm, sender_handle.
`attachments`	No	Attachment digests: name, type, and attachment_text (the extracted text content, when available) for both provider and @mly.life attachments. Read attachment_text for invoice/proposal details not present in the body.
`sender_name`	No
`action_links`	No	Actionable links (pay, log in, book, track…).
`sender_email`	No	Null for connected-app messages (no real address).
`fetched_live_from`	No	Provider name when fetched live (deep-search ids).
`suggested_replies`	No	Pre-written reply drafts in the user's own voice ({short_response, content}, best first). Offer by label; send the chosen one via send_email(reply_to_email_id, body=content).
`reply_instructions`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only, idempotent, and non-destructive. The description adds behavioral details: it extracts attachment text when available, optionally includes action links, and fetches provider-history IDs live. This provides value beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: two sentences covering all essential information without redundancy. It front-loads the main function and then adds detail on ID sources.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description is comprehensive. It covers input parameters, the full set of returned data (subject, sender, etc.), handling of attachments, optional action links, and ID source interoperability with sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates. It explains that email_id accepts IDs from specific search tools and that include_action_links controls inclusion of actionable links. This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it reads a single email in full, listing all components (subject, sender, recipients, date, message text, attachments, action links). It distinguishes from siblings by specifying that it accepts IDs from search_emails, get_feed, and deep_search_emails, and explains how provider-history IDs are handled.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use this tool: after searching for emails (since it accepts IDs from search tools). It also notes that attachment text is useful for invoice/proposal/PDF details not in the body, guiding the agent to use this for comprehensive reading. However, it does not explicitly state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_feedBrowse email feedA

Idempotent

Inspect

List the user's email feed (most recent first) without a search query. folder selects which part of the sorted inbox: 'cleanbox' (genuine personal correspondence — what most users mean by "my inbox"), 'other' (promotional / newsletters / ads), or 'all'. Omit folder to get everything unfiltered. personal_or_ad is a deprecated alias for folder ('personal'=cleanbox, 'advertising'=other). email_type 'received' or 'sent'. account narrows to one connected mailbox (the address as shown in get_account_overview) — combine with folder for that account's Cleanbox/Other. list_id shows one of the user's custom email lists (ids from list_email_lists). source narrows to a connected app's messages (e.g. 'slack' — see get_account_overview) or 'email' for mail only. sender filters by sender name or address fragment — combine freely, e.g. source='slack' + sender='' + last_x_days=1 answers "what did send me on Slack today?". PAGINATION: total_count is how many emails match in all; re-call with offset= for the next page (limit caps at 50).

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`folder`	No
`offset`	No
`sender`	No
`source`	No
`account`	No
`list_id`	No
`timezone`	No
`email_type`	No
`last_x_days`	No
`personal_or_ad`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`offset`	No	Offset of this page.
`results`	No	Email summary objects. Common keys: email_id, subject, sender_name, sender_email, date, snippet, folder ('cleanbox' = genuine personal mail, 'other' = promotional), categories, source. Pass email_id to get_email / get_action_links.
`total_count`	No	Total items in this feed.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes pagination behavior (total_count, offset, limit cap 50), folder semantics (cleanbox/other), and deprecation of personal_or_ad. Annotations already indicate idempotent and non-destructive; description adds context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is dense but front-loaded with main purpose. Could be more structured (e.g., bullet points) but every sentence adds value. Slightly long but efficient given the number of parameters explained.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description covers all necessary aspects: input parameters, behavior, pagination, and usage patterns. No obvious gaps; comprehensive for a feed-browsing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description excellently explains all 11 parameters: folder values, deprecated alias, account, list_id, source, sender, email_type, last_x_days, and pagination parameters. Provides examples and combination guidance, compensating fully for lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'List' and resource 'user's email feed' with ordering 'most recent first'. It explicitly distinguishes from search tools by saying 'without a search query', making purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides detailed guidance on parameter usage (folder, account, sender, etc.), examples of combinations (e.g., source='slack' + sender='<name>'), and pagination. Lacks explicit 'when not to use' statements but implicitly differentiates via 'without a search query' and covers parameter semantics well.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_my_dayMy Day overviewB

Idempotent

Inspect

The user's My Day view: today's tasks, events and commitments.

ParametersJSON Schema

Name	Required	Description	Default
`timezone`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	No
`tasks`	No	Task objects aggregated from manual tasks, email actions, invoices, events, shipments and replies. Common keys: id/task_id, title, task_type, due_date, status, priority, email_id.
`timeframe`	No

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotent and non-destructive behavior. The description adds that the tool returns tasks, events, and commitments, but does not disclose additional traits like formatting defaults or whether live data is returned. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words. It follows a clear pattern but could benefit from bullet points for parameter details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read operation with one optional parameter and an output schema, the description covers the core purpose but omits details like timezone defaulting or return format. Adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter (timezone) is not described in the description. With 0% schema coverage, the description should explain the parameter's purpose and syntax, but it adds no value beyond the schema definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns 'today's tasks, events and commitments' from the user's My Day view. This specifies the verb (get), resource (My Day view), and scope (today's items), differentiating it from sibling tools like get_feed or get_catch_up.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., get_fetch for broader timeline). No prerequisites or exclusions are mentioned, forcing the agent to infer usage from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_prioritiesWhat needs the user's attentionA

Idempotent

Inspect

The user's priorities briefing — the SMALL set of things that genuinely need them right now, judged by Mailopoly's executive-assistant engine: bills ranked by consequence of inaction, replies they owe ranked by relationship cadence, and real upcoming events. Use this for "what needs me?", "anything important?", "am I on top of things?" — NOT for a general inbox listing (that's get_feed). Reads the current briefing instantly and regenerates when it's missing or stale (a few seconds); refresh=true forces a fresh one. Each item carries a ref_id and options — apply the user's decision with resolve_priority; quiet an unwanted nudge with dismiss_nudge. nothing_pressing=true means exactly that: tell the user they're on top of things, don't invent urgency.

ParametersJSON Schema

Name	Required	Description	Default
`refresh`	No
`timezone`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`cached`	No	True when served from the stored current briefing.
`report`	No	{ greeting, nothing_pressing, sections[] } — each section item has ref_id, headline, say, urgency, options and a source object (pass source verbatim to resolve_priority).
`generated_at`	No
`nudges_muted`	No	True when the user has muted the ambient nudge stream.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotent and non-destructive behavior; description adds context about regeneration timing, refresh behavior, and handling of 'nothing_pressing=true', which is useful but not essential beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Six sentences cover purpose, usage, parameters, and follow-up actions coherently; each sentence adds value, though slightly verbose for an agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With only two parameters and an output schema, the description covers tool purpose, use cases, parameter behavior, and hints at output structure (nothing_pressing flag), making it complete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description explains 'refresh' parameter well ('forces a fresh one'), but does not explain the 'timezone' parameter, leaving its role unclear despite 0% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a 'priorities briefing' and distinguishes it from sibling 'get_feed' by specifying it is for the 'SMALL set of things that genuinely need them right now' versus a general inbox listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('what needs me?', 'anything important?', 'am I on top of things?') and when not to use ('NOT for a general inbox listing—that's get_feed'), providing direct alternative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_widgetsGet dashboard widgetsA

Idempotent

Inspect

The user's My Data dashboard widgets (weather, news, stocks, custom) with their latest cached data. refresh=true re-fetches any widget whose rate limit allows it (e.g. weather every 10 min) before returning. include_catalog=true also lists the available widget types.

ParametersJSON Schema

Name	Required	Description	Default
`refresh`	No
`include_catalog`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`catalog`	No	Available widget types (only when include_catalog=true).
`widgets`	No	My Data widgets (weather/news/stocks/custom) with cached data. Common keys: widget_id, widget_type, title, enabled, data.
`refreshed_now`	No	Widget ids re-fetched on this call (refresh=true).

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds beyond annotations by explaining refresh behavior (rate limits, example 10 min for weather) and catalog listing. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states purpose, second details parameters. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, parameter behavior, and return value (widgets with cached data). Output schema exists, so return details are adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining both parameters (refresh re-fetches if rate limited, include_catalog lists types), including a concrete example for refresh.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves dashboard widgets (weather, news, stocks, custom) with cached data, using a specific verb 'get' and resource. It distinguishes from siblings like email and task tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use refresh and include_catalog, but does not explicitly mention alternatives. The description gives clear usage hints for the parameters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hide_emailsHide emails from feedA

Idempotent

Inspect

Hide one or more emails from the user's feed (does not delete them). Optional reason, e.g. 'spam', 'not_interested', 'never_show_this_sender'.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No
`email_ids`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotentHint=true, destructiveHint=false, and readOnlyHint=false. The description adds that hide does not delete, clarifying the non-destructive behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action ('Hide one or more emails'), no fluff. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with 2 parameters. The description fully covers purpose, behavior, and parameter semantics. An output schema exists but is not needed to explain return values. For this complexity, the description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description bears full burden. It explains the 'reason' parameter with concrete examples and implies 'email_ids' is a list of email identifiers. This adds meaningful context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the action 'hide' and the resource 'emails', and clearly distinguishes from deletion. Among sibling tools like 'mark_email_read' or 'remove_email_from_list', this tool's purpose is unique and well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides optional reason examples (e.g., 'spam'), which guides usage. However, it does not explicitly state when to use this tool versus alternatives like 'mark_email_read' or 'delete' operations, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_draftsList email draftsB

Read-onlyIdempotent

Inspect

List the user's email drafts (created in Mailopoly), newest first.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`search`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`drafts`	No	Draft summaries (newest first).

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) already declare a safe read operation. The description adds ordering ('newest first') but omits other behaviors like pagination or filtering limits. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, concise sentence with no redundant words. The purpose is front-loaded and the entire description is easily scannable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

An output schema exists, so return values are not needed. However, the description does not explain how to use the listed parameters, which are essential for effective querying. The tool's complexity is low, but the missing parameter context limits completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description does not explain the 'limit' or 'search' parameters. It fails to compensate for the lack of schema descriptions, leaving the agent to guess their usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('List') and resource ('the user's email drafts') with additional context ('created in Mailopoly, newest first'), clearly distinguishing it from siblings like 'get_draft' or 'search_emails'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs. alternatives (e.g., 'get_draft' for a single draft) or when not to use it. The description implies it's for Mailopoly drafts but offers no exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_email_accountsList connected email accountsA

Read-onlyIdempotent

Inspect

List the user's connected email accounts with their connection health and onboarding state. Use this to diagnose "I can't connect" / "my email isn't showing up" problems. Each account has a status: active, onboarding, pending, not_syncing (connected but never activated), reauthorization_required (needs reconnecting), or inactive (paused). Also returns mailbox_verdict and last_check_error when present. Connected messaging apps (Slack etc.) appear with kind='app'.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	No
`accounts`	No	Connected accounts with lifecycle detail. Common keys: id, email, provider, kind (mailbox\|app), status (active\|onboarding\|pending\|not_syncing\|reauthorization_required\|inactive\|connected), is_syncing, reauthorization_required, mailbox_verdict, last_check_error.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly/idempotent. Description adds useful behavioral info: statuses, mailbox_verdict, last_check_error, and kind='app' for messaging apps. Does not mention any performance implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four concise, front-loaded sentences. Uses inline enumeration for statuses. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main purpose, status meanings, additional fields. Might be improved by mentioning if pagination applies or if only connected accounts are returned, but overall complete for a list tool with output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No input parameters; schema coverage 100%. Description explains output fields meaningfully, which indirectly aids understanding of what the tool does. Baseline 3, but extra detail on statuses elevates it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists connected email accounts with health and onboarding state. Uses specific verb and resource, distinguishes from siblings like get_account_overview and check_email_sync.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use for diagnosing connection issues ('I can't connect' / 'my email isn't showing up' problems). Lacks explicit 'when not to use' but context makes it clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_email_listsList email listsA

Read-onlyIdempotent

Inspect

The user's custom email lists (smart folders): name, rules, unread and total counts. Browse a list's emails with get_feed(list_id=...); file/unfile specific emails with add_email_to_list / remove_email_from_list.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`lists`	No	Custom email lists (smart folders). Common keys: list_id, name, match_rules, exclude_from_cleanbox, unread_count, total_count.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral context by revealing the specific fields returned (name, rules, unread/total counts). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, each serving a distinct purpose: first defines the tool's result, second provides usage context. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and an existing output schema, the description is complete. It names the output fields and links to related tools, providing sufficient context for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters (100% coverage). The description adds meaning by stating the output includes name, rules, unread and total counts, which compensates for the empty schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists the user's custom email lists (smart folders) and specifies the output fields: name, rules, unread and total counts. This differentiates it from sibling tools like get_feed, add_email_to_list, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides guidance on related actions: browsing a list's emails with get_feed and filing/unfiling emails with add_email_to_list/remove_email_from_list. However, it doesn't explicitly state when to use this tool versus alternatives like list_drafts or deep_search_emails.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_eventsList calendar eventsC

Read-onlyIdempotent

Inspect

List the user's calendar events extracted from their email (meetings, bookings, appointments). Dates are YYYY-MM-DD; defaults to upcoming.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`search`	No
`end_date`	No
`timezone`	No
`start_date`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	No
`events`	No	Calendar events from email. Common keys: id, title, event_start, event_end, location, participants, email_id.

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, destructiveHint; description adds that events are extracted from email and defaults to upcoming. No additional behavioral traits (e.g., pagination, data freshness) are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, front-loaded sentence with no fluff. Could be slightly more structured but is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 5 parameters and 0% schema description coverage, the description is too minimal. Missing details on pagination, sorting, filter semantics, though output schema covers return values. Incomplete for complex usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description only mentions date format and default behavior but does not explain any parameter meanings or usage, leaving agents without sufficient guidance for parameters like search, limit, timezone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists calendar events from email, specifies date format (YYYY-MM-DD) and default behavior (upcoming). However, it does not explicitly differentiate from sibling tools like list_tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus siblings; no exclusions, prerequisites, or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_invoicesList invoices and paymentsA

Read-onlyIdempotent

Inspect

List invoices, receipts, bills and payments extracted from the user's email. Backed by the SAME schema-aware finance query Poly uses, so results are COMPLETE across every vendor (not just the first one).

invoice_or_payment: the RECORD TYPE to filter by — e.g. "receipt", "invoice", "payment", "refund", "statement", "order". Matched fuzzily, so "receipt" also catches "payment receipt" / "order". Pass the TYPE here, NOT a vendor name. Omit for all types.
search: a VENDOR / payee / description / category substring (a shop or supplier name). Omit to list across all vendors.
time_range examples: this_month, last_month, this_year, last_year, or last_30_days; or pass start_date / end_date as YYYY-MM-DD.

Returns the most recent limit records, the total matched, a per-vendor vendor_breakdown and grand/paid/outstanding totals — all computed over the FULL matching set, not just the returned page. To answer 'which vendors' or 'any receipts other than X', read vendor_breakdown instead of paging.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`search`	No
`end_date`	No
`timezone`	No
`start_date`	No
`time_range`	No
`invoice_or_payment`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	No	Records returned in this page.
`summary`	No	Totals over the FULL matching set (every vendor, not just the returned page): grand_total, total_paid, total_outstanding, record_count, vendor_count, capped.
`invoices`	No	Invoice/payment records. Common keys: email_id, due_date, amount, amount_paid, payee, category, invoice_or_payment.
`total_matching`	No	Total matching records (across all vendors). If `summary.capped` is true there may be even more.
`vendor_breakdown`	No	Per-vendor rollup over the full matching set, biggest first: [{vendor, count, total}]. Use this to answer 'which vendors' / 'anyone other than X' without paging every record.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, destructiveHint=false, and idempotentHint=true. The description adds valuable behavioral context: returns most recent records, totals computed over full set, and how to use vendor_breakdown. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat lengthy but well-structured with bullet points and front-loaded purpose. Each sentence adds value, though minor redundancy could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no required ones, and an output schema, the description fully covers usage, return values (limit, total, vendor_breakdown, totals), and how to interpret results. Complete for a listing tool with such complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by detailing each parameter: invoice_or_payment (fuzzy matching, examples), search (substring match), time_range (examples), start_date/end_date (format), and limit (default). Adds significant meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists invoices, receipts, bills, and payments from the user's email, distinguishing it from sibling tools like search_emails or list_drafts. It uses specific verbs and resources and emphasizes completeness across vendors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly explains when to use the tool: to query financial records from email. It provides parameter guidance (e.g., invoice_or_payment filters by record type, not vendor name) and gives examples. It implicitly contrasts with other tools by highlighting its schema-aware, complete results.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_task_rulesList task-suppression rulesA

Read-onlyIdempotent

Inspect

List the user's task-suppression rules (rules that hide tasks from the task manager by sender, domain or subject). Returns rule ids usable with delete_task_rule.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	No
`rules`	No	Task-suppression rules; ids usable with delete_task_rule.

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. Description adds that it returns rule IDs and explains what suppression rules are, but does not detail additional behavioral aspects like user-scoping or result ordering. Adequate but not extensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, highly efficient, no unnecessary words. All information is front-loaded and essential.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With zero parameters, informative annotations, and the presence of an output schema, the description is fully adequate. It explains the tool's output and ties to a sibling tool, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so baseline 4 applies. Description does not need to add parameter info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists task-suppression rules, defines what they are (hide tasks by sender/domain/subject), and distinguishes from siblings like delete_task_rule and create_task_rule.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions that returned IDs are usable with delete_task_rule, providing a clear use case. No explicit when-not guidance, but the tool's purpose is self-evident given the sibling set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_tasksList tasksC

Read-onlyIdempotent

Inspect

ParametersJSON Schema

Name	Required	Default
`limit`	No
`search`	No
`status`	No
`timezone`	No
`task_type`	No	all
`timeframe`	No	relevant
`include_completed`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	No
`tasks`	No	Task objects aggregated from manual tasks, email actions, invoices, events, shipments and replies. Common keys: id/task_id, title, task_type, due_date, status, priority, email_id.
`timeframe`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the tool's safe behavior is clear. The description adds that it aggregates tasks from various sources but does not specify pagination, ordering, or other behavioral details. With annotations, a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loads the main purpose. It presents the parameter lists inline without excess words. However, it could be more structured, e.g., using bullet points for parameter options.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters with no schema descriptions and an output schema, the description is incomplete. It covers only two parameters and does not explain filtering, search, or limit behavior. The tool is complex, and the description leaves significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explicitly defines only two parameters (timeframe and task_type) with their allowed values, but ignores the other five parameters (limit, search, status, timezone, include_completed). With schema description coverage at 0%, the description does not sufficiently compensate for the missing parameter meanings.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists the user's tasks aggregated from multiple sources, equivalent to the app's My Day. It is specific about the data included but does not differentiate from the sibling tool 'get_my_day', which likely serves a similar purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool over alternatives like 'get_my_day' or other list tools. Does not specify prerequisites or context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mark_email_readMark email read or unreadA

Idempotent

Inspect

Mark an email as read (or unread with read=false).

ParametersJSON Schema

Name	Required	Description	Default
`read`	No
`email_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotentHint=true and destructiveHint=false, informing the agent that this mutation is safe to repeat and not destructive. The description adds the default behavior (read=true) but nothing beyond that, so the added value is modest.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single 13-word sentence that is concise and front-loaded. Every word serves a purpose, and there is no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple mutation with two parameters and an output schema, the description covers the core functionality. It lacks mention of side effects on other views or confirmation of idempotency, but overall it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains the read parameter's effect (read or unread) and its default value (true), but the required email_id parameter is not described, leaving a small gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool marks an email as read or unread, with the verb 'mark' and resource 'email'. The phrase 'or unread with read=false' adds specificity and distinguishes it from sibling tools like search_emails or hide_emails.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide explicit guidance on when to use this tool versus alternatives like hide_emails or complete_task. Usage is implied by the name and function, but no when-not or context is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

next_nudgeShow the next nudgeAInspect

The user asked "what else / show me the next one" on a nudge: rest the skipped item briefly (it can return later today) and surface the next eligible priority. This is a user-initiated pull, so it bypasses the stream's quiet window and daily cap — only call it when the user explicitly asks to see the next item. Returns the next nudge (with a pre-written suggested_reply when it's a reply), or none_eligible when the queue is empty.

ParametersJSON Schema

Name	Required	Description	Default
`skip_ref_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide minimal info (readOnlyHint=false, destructiveHint=false). Description adds important behavioral details: rests skipped item, bypasses caps, returns next nudge with suggested_reply or none_eligible. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the trigger phrase. It is concise and efficient, though could be slightly more structured with parameter description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of output schema, the description covers usage context and return behavior adequately. It explains the bypass behavior and return values, but could explicitly mention the parameter role for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has one required parameter 'skip_ref_id' with 0% description coverage. The description implies it's the current nudge to skip by saying 'rest the skipped item,' but does not explicitly describe the parameter meaning, leaving it inferred.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool shows the next nudge when the user asks for it, including the behavior of resting the skipped item and surfacing the next eligible priority. It distinguishes from siblings like dismiss_nudge by noting it is a user-initiated pull that bypasses quiet window and daily cap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'only call it when the user explicitly asks to see the next item,' providing clear when-to-use guidance. It mentions it bypasses quiet window and daily cap, but does not explicitly list alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pause_email_accountPause a connected accountA

Idempotent

Inspect

Pause (turn off) a connected account: stop downloading new mail while KEEPING everything already brought in. account is the connected email address. Reversible with resume_email_account. Confirm with the user before pausing — it stops their email from updating.

ParametersJSON Schema

Name	Required	Description	Default
`account`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotent and non-destructive. Description adds that it stops email from updating and keeps existing data, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each serving a purpose: core action, parameter definition, usage note. Front-loaded and no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, parameter, side effects, reversibility, and user confirmation required. Output schema exists, so return values need not be described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description compensates by explaining 'account is the connected email address', giving meaning to the sole parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Pause (turn off) a connected account') and the effect ('stop downloading new mail while KEEPING everything already brought in'). It distinguishes from sibling tools like resume_email_account and start_email_account_sync.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states reversibility via resume_email_account and instructs to confirm with user before pausing. Provides clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_email_from_listRemove an email from a listA

Idempotent

Inspect

Remove an email from one of the user's email lists. Note: if the email matches the list's rules it may be re-added when the rules are re-evaluated — edit the list's rules in the app for a permanent exclusion.

ParametersJSON Schema

Name	Required	Description	Default
`list_id`	Yes
`email_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses important behavioral details (non-permanent removal due to rule re-evaluation) that go beyond the annotations (which mark idempotentHint=true but no destructive hint), adding significant context for the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first states the core function, the second adds a critical caveat. No redundant words, perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, its non-permanent nature, and suggests an alternative workflow. With an output schema present, return values need not be described. Complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

While schema coverage is 0%, the parameters list_id and email_id are inherently clear from the tool name and description ('Remove an email from one of the user's email lists'), but the description does not explicitly define them, leaving slight ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove an email') and the target resource ('from one of the user's email lists'), distinguishing it from siblings like 'add_email_to_list' or 'create_email_list'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly warns that removal may be temporary if rules re-add the email, and advises to edit list rules for permanent exclusion, providing clear guidance on when to use this tool versus alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resolve_priorityResolve a priority itemAInspect

Apply the user's decision to a priorities item (a ref_id from get_priorities or a poly_nudge). action: 'mark_paid' (bill paid), 'mark_partial_paid' (needs paid_amount), 'snooze' (remind later — snooze_hours or snooze_days, default 14 days), 'complete' / 'handled' (done, e.g. a reply they say is dealt with), 'acknowledge', or 'not_relevant' ("never show me this again" — permanently suppresses the item from nudges, briefings and future reports; reversible only by un-hiding the task in the Mailopoly app, so confirm intent before using it). Pass the item's source object back VERBATIM and its headline as title. The change lands in the same task/invoice rows the app uses, so My Day, Tasks and future briefings all reflect it immediately. Item options that are views rather than dispositions map to other tools ('open_reply' is NOT a valid action here and will be rejected): 'view_email' -> get_email(source.email_id), 'open_reply' -> send_email with reply_to_email_id (offer the item's suggested_reply first), 'set_reminder' -> create_task with a reminder_date.

ParametersJSON Schema

Name	Required	Description	Default
`title`	No
`action`	Yes
`ref_id`	Yes
`source`	No
`paid_amount`	No
`snooze_days`	No
`snooze_hours`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are present (non-readonly, non-idempotent, non-destructive). The description adds important behavioral context: 'not_relevant' permanently suppresses the item, changes reflect immediately, and source must be passed verbatim. No annotation contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph with dense but relevant information. It is front-loaded with the purpose. Every sentence adds value, though breaking into bullet points for actions could improve readability. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, 2 required), the description covers actions, parameter usage, and edge cases like reversibility and mapping to sibling tools. Output schema exists, so no need to describe return values. Minor gap: not explaining what happens if source is omitted, but it's optional.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates well by explaining actions and their required parameters (e.g., paid_amount for mark_partial_paid, snooze_hours/days for snooze). It also clarifies the source object usage. Some parameters like title and ref_id are not explained, but their meaning is clear from context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the core purpose: 'Apply the user's decision to a priorities item.' It specifies the resource (ref_id from get_priorities or poly_nudge) and differentiates from siblings by mapping view actions to other tools, such as 'open_reply' to send_email.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists valid actions and gives guidance on when not to use this tool, e.g., 'open_reply' is NOT a valid action and maps to send_email. It also warns about the permanence of 'not_relevant' and provides alternatives for view actions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resume_email_accountResume a paused accountA

Idempotent

Inspect

Resume a previously paused account so it starts syncing again. account is the connected email address. After resuming you may need start_email_account_sync if it still shows as not syncing.

ParametersJSON Schema

Name	Required	Description	Default
`account`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotency; description adds that resuming may not immediately start syncing, advising a separate tool if needed. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey purpose, parameter, and usage guidance without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and existing output schema, description fully covers usage, parameter, and follow-up steps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description defines the sole parameter 'account' as 'the connected email address', compensating for 0% schema description coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool resumes a paused account to start syncing, distinguishing it from siblings like pause_email_account and start_email_account_sync.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates when to use (resuming paused account) and provides follow-up guidance (start_email_account_sync if not syncing), but lacks explicit 'when not to use'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

save_draftSave an email draftAInspect

Create an email draft (or update an existing one by draft_id). The draft appears in the user's Mailopoly drafts and can later be sent with send_email (requires the 'send' scope). to/cc/bcc are email addresses, comma-separated for multiple recipients. For a reply, pass reply_to_email_id (the original's email_id) — to is then optional and the reply is routed automatically when sent, including replies to connected-app messages (Slack etc.), which have no email address. body is plain text or HTML.

ParametersJSON Schema

Name	Required	Description	Default
`cc`	No
`to`	No
`bcc`	No
`body`	Yes
`subject`	Yes
`draft_id`	No
`reply_to_email_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false, which the description supports by stating it creates/updates. The description adds behavioral context: draft appears in Mailopoly drafts, requires 'send' scope for later sending, and automatic reply routing for connected-app messages. This goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

At 4 sentences, the description is moderately sized and front-loads the main purpose. Each sentence adds value: purpose, scope mention, parameter formats, reply behavior. No waste, but could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters and complexity (reply routing, connected-app messages), the description covers the core workflow well. It mentions scope requirement for sending, but does not detail output schema behavior. With output schema existing, this is acceptable; completeness is high but not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema coverage metric, the description explicitly covers all 7 parameters: to/cc/bcc as comma-separated emails, reply_to_email_id for replies (making 'to' optional), draft_id for updates, body as plain text/HTML. This adds significant meaning beyond the schema's types and titles.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Create an email draft (or update an existing one by draft_id)', providing a specific verb+resource. It distinguishes from sibling tool 'send_email' by noting the draft can be sent later, and from 'list_drafts' by its creation/update function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explains when to use for new drafts or updates, and when replying (using reply_to_email_id). It mentions sibling 'send_email' for sending, implying when not to use this tool. However, it lacks explicit exclusions or comparison with other siblings like 'hide_emails'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_emailsSearch emailsA

Read-onlyIdempotent

Inspect

Search the user's emails by free text (subjects, senders, bodies, attachment text). Natural language works, and so do Boolean OR-lists, quoted "exact phrases" and Gmail-style operators (from:, to:, subject:, newer_than:3d, older_than:, has:attachment) — pass the query however it reads most naturally. Optional filters: sender (name or address fragment), start_date/end_date (YYYY-MM-DD), last_x_days, email_type ('received' or 'sent'), and folder — 'cleanbox' (genuine personal correspondence), 'other' (promotional/newsletters/ads), or 'all'. BY DEFAULT search covers EVERYTHING unfiltered; only pass folder to restrict it. personal_or_ad is a deprecated alias for folder ('personal'=cleanbox, 'advertising'=other). Returns matching emails with ids for use in get_email / get_action_links. PAGINATION: results are a page — total_count is how many matches exist in all; to get more, re-call with offset= (limit caps at 50 per page). Covers Mailopoly's indexed history only — for older mail beyond the indexed window (the response's indexed_history_start), use deep_search_emails.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`query`	Yes
`folder`	No
`offset`	No
`sender`	No
`end_date`	No
`timezone`	No
`email_type`	No
`start_date`	No
`last_x_days`	No
`personal_or_ad`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`offset`	No	Offset of this page.
`results`	No	Email summary objects. Common keys: email_id, subject, sender_name, sender_email, date, snippet, folder ('cleanbox' = genuine personal mail, 'other' = promotional), categories, source. Pass email_id to get_email / get_action_links.
`total_count`	No	Total matches available (may exceed the returned page).
`indexed_history_start`	No	Oldest locally-indexed date (YYYY-MM-DD). Older mail needs deep_search_emails. Present when results are sparse or a start_date predates the index.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare read-only, idempotent, non-destructive. Description adds context: default unfiltered, pagination behavior (limit/offset), indexed history limitation, deprecated alias. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections for main search, filters, pagination, and scope. Slightly long but every sentence adds value. Minor redundancy in explaining pagination could be tightened.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters, output schema exists, and annotations present, the description is comprehensive. Covers syntax, filters, pagination, scope limitation, and references to other tools. Provides all necessary context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but description compensates with detailed explanation of query syntax, folder options, sender/date/last_x_days/email_type, and personal_or_ad deprecated alias. Thoroughly explains each parameter's meaning and usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches emails by free text, lists supported syntax (natural language, Boolean, Gmail operators), and distinguishes from deep_search_emails for older mail. The verb 'search' and resource 'emails' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly describes default behavior (unfiltered, no folder), mentions when to use deep_search_emails for older mail, explains pagination with offset and limit, and notes deprecated alias. Clear guidance on when and how to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_emailSend an email or app replyA

Destructive

Inspect

Send an email as the user through their connected account (Gmail, Outlook, @mly.life or IMAP), or reply to any message by passing reply_to_email_id (an email_id from search/feed/get_email).

Replies: with reply_to_email_id set, to/subject are optional — the recipient and threading come from the original. If the original is a message from a connected app (source != 'email', e.g. Slack), the reply is delivered back through that app as the user (same thread/DM) — do NOT pass an email address for app messages; their senders have none.

If draft_id is given, the draft's content fills any field not explicitly provided. from_account selects which connected address to send from (defaults to the account the original arrived on for replies, else the primary). body: plain text or HTML — plain text is formatted into clean paragraphs automatically (blank line = new paragraph), so just write naturally; pass HTML yourself only when you want rich formatting (links, bold, lists). Do NOT pass markdown — it is not rendered. content_type: 'HTML' (default) or 'TEXT' (send as-is, no formatting). Subscription limits apply.

ParametersJSON Schema

Name	Required	Default
`cc`	No
`to`	No
`bcc`	No
`body`	No
`subject`	No
`draft_id`	No
`content_type`	No	HTML
`from_account`	No
`reply_to_email_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate write/destructive potential and open-world behavior. The description adds value by detailing reply threading, draft field filling, body formatting (plain text to paragraphs, HTML only, no markdown), and account selection logic. This goes beyond what annotations provide, though rate limits or permission requirements are not mentioned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose, then organized into logical paragraphs for replies, drafts, body formatting, and content_type. It is slightly lengthy but every sentence adds value. The structure is clear, though it could be more concise without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, side effects, formatting nuances) and the existence of an output schema for return values, the description covers essential behaviors: sending, replying, drafts, body types, account selection, and subscription limits. It does not cover error handling, character limits, or attachments, but overall it is sufficiently complete for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% for 9 parameters. The description explains key parameters: reply_to_email_id, draft_id, from_account, body, and content_type. It clarifies that to/subject are optional in replies and that from_account defaults sensibly. While cc, to, bcc are not elaborated, they are standard. The description compensates well for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it sends an email or app reply, distinguishing it from siblings like 'save_draft' or 'get_draft'. It specifies the resource (email/app reply), the action (send/reply), and the context (via connected accounts or app replies). This is a specific verb+resource combination with clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: when to use reply_to_email_id, optional to/subject for replies, draft usage, and warnings like 'Do NOT pass an email address for app messages' and 'Do NOT pass markdown'. It also notes subscription limits. While it doesn't explicitly compare to all siblings, the context is clear for a tool with many email-related siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_daily_briefingTurn the daily briefing on or offAInspect

Switch the user's once-a-day My Day briefing on or off. The briefing arrives on their first interaction of the day (as a daily_briefing field) and, once launched, as Mailopoly's daily email — this ONE setting governs both, so "stop the daily briefing" here also stops the email. Only call when the user explicitly asks to stop (or restart) the daily briefing; for quieting a single nudge use dismiss_nudge instead.

ParametersJSON Schema

Name	Required	Description	Default
`enabled`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutability (readOnlyHint false) and non-destructiveness (destructiveHint false). Description adds that it controls both briefing and email, which is valuable behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the main action, no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple boolean parameter and no nested objects, description fully covers behavior, usage context, and alternative. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter 'enabled' with 0% schema description coverage. Description explains its meaning ('on or off') and effect, compensating well for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool toggles the daily briefing on/off, specifies it affects both the My Day briefing and daily email, and distinguishes from sibling tool dismiss_nudge.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Only call when the user explicitly asks to stop (or restart) the daily briefing' and provides an alternative (dismiss_nudge).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_languageSet the user's languageA

Idempotent

Inspect

Set the user's preferred language for the emails, reports and app content Mailopoly sends them — a BCP-47 code like 'es', 'fr', 'de', 'pt', 'ja', 'ar', 'zh', 'ru'. Call this ONCE, early, as soon as you can tell from the conversation which language the user actually speaks/writes (e.g. they message you in German → call set_language('de'); their first request is in Arabic → set_language('ar')). This does NOT change how you reply in this chat — it makes their Mailopoly emails and website render natively, which they otherwise can't tell you because they never see English here. If the user clearly uses English, just skip it (English is the default). Takes effect immediately and never overrides a language the user picked themselves in Mailopoly's settings.

ParametersJSON Schema

Name	Required	Description	Default
`language`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations show idempotent and non-destructive. Description adds context: takes effect immediately, never overrides user-set language in Mailopoly settings, and scopes effect to Mailopoly emails/website only.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose and clear usage guidance. Slightly verbose but every sentence adds value; could be trimmed slightly without losing meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple single-parameter tool and presence of output schema, the description covers purpose, usage, behavior, and parameter format completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage. The description fully compensates by specifying the BCP-47 format, listing example codes, and explaining when to use each based on user language.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (set) and resource (user's preferred language for emails, reports, and app content), with specific examples of BCP-47 codes. It distinguishes itself from the chat's reply language, which is not affected.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: call once, early, based on user's language; skip if English (default); does not change chat replies. Provides concrete examples like 'message in German → set_language('de')'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_timezoneSet the user's timezoneA

Idempotent

Inspect

Update the user's timezone. timezone must be an IANA name like 'America/New_York', 'Europe/London' or 'Asia/Kolkata' — NOT an abbreviation (EST) or a UTC offset. Call this whenever the user says they've moved or are travelling, gives their location/timezone, or tells you the times you're showing are off by a fixed number of hours: the server localizes every timestamp it returns to this zone, so fixing it here corrects all of them. The change takes effect immediately.

ParametersJSON Schema

Name	Required	Description	Default
`timezone`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotentHint true and destructiveHint false. The description adds that the change takes effect immediately, without contradicting any annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, efficiently conveying purpose, parameter format, and usage context. No extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With a single parameter and an existing output schema, the description gives all necessary details: parameter constraints, when to use, and immediate effect. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The timezone parameter is fully explained: it must be an IANA name, not an abbreviation or UTC offset. This compensates for the schema's 0% description coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates the user's timezone, specifies the required format (IANA name), and distinguishes it from sibling tools, none of which deal with timezone settings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists scenarios when to call the tool: user moved, traveling, provided location, or noticed timestamps are off. It also explains the effect on all returned timestamps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

snooze_reconnect_reminderSnooze the reconnect reminderA

Idempotent

Inspect

When the user asks NOT to be reminded about a lost mailbox connection ("don't remind me", "stop mentioning this", "I'll deal with it later"), call this to stop the connection_alert appearing for days days (their choice; default 7, max 365). account limits it to one address (from the alert); omit to snooze every currently-disconnected account. The mail still won't sync while disconnected — this only silences the reminder. days=0 turns reminders back on ("remind me again"). A NEW account that loses its connection later still alerts. Confirm the snooze back to the user in one short sentence.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No
`account`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-destructive, idempotent behavior. The description adds behavioral context: it only silences the reminder without fixing sync, and days=0 re-enables reminders. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single paragraph with efficient, clear sentences. It front-loads the trigger condition but could be slightly more structured (e.g., bullet points) for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the availability of an output schema, the description covers key behavioral aspects and parameter details. It lacks error handling or prerequisites but is sufficient for a well-scoped tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description fully explains the parameters. It clarifies days default (7), maximum (365), special case (0 turns reminders on), and account scope (limit to one or omit for all).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to suppress the reconnect reminder for a specified number of days. It uses specific verbs ('stop the connection_alert appearing') and details the parameters, distinguishing it from sibling tools like dismiss_nudge or snooze_task.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('When the user asks NOT to be reminded') and provides guidance on parameter usage (omit account for all, days=0 to re-enable). It does not directly compare to alternatives but implicitly sets expectations for behavior.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

snooze_taskSnooze a taskAInspect

Snooze a task. Either pass snooze_until (ISO datetime, in the given timezone — defaults to the user's own timezone) or a relative duration (duration_value + duration_unit, e.g. 3 + 'days', 2 + 'hours', 1 + 'weeks').

ParametersJSON Schema

Name	Required	Description	Default
`task_id`	Yes
`timezone`	No
`snooze_until`	No
`duration_unit`	No
`duration_value`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false and destructiveHint=false. The description adds context about timezone handling and the two ways to specify snooze time, but does not detail side effects (e.g., whether notifications are suppressed or task state changes). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the action. It is concise and to the point, though a slightly more structured presentation (e.g., bullet points) could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (which presumably defines return values), the description covers the core inputs adequately. It lacks mention of confirmation or state changes, but for a simple tool this is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It effectively explains snooze_until (ISO datetime) and the duration parameters (duration_value + duration_unit with examples), adding meaning beyond the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('snooze a task') and explains the two input modes for timing (ISO datetime or relative duration). It distinguishes from sibling tools like 'complete_task' by specifying a deferral operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the two methods for specifying the snooze duration and mentions the default timezone behavior. It does not explicitly state when not to use this tool or provide comparisons to alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

start_email_account_syncStart syncing a connected accountAInspect

Start (or retry) syncing a connected account that isn't syncing yet — e.g. one with status 'not_syncing' (connected during signup but never activated) or 'pending'. account is the connected email address. This pulls in the account's mail, tasks, bills and events. If activating it would use one of the plan's paid email-account slots, the first call returns requires_confirmation with a message; confirm with the user, then call again with confirm_billing=true.

ParametersJSON Schema

Name	Required	Description	Default
`account`	Yes
`confirm_billing`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`error`	No	Error message when it failed.
`message`	No	Human-readable status/result.
`success`	No	Whether the action succeeded.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the effect of the tool: it pulls in mail, tasks, bills, and events. It also explains the requires_confirmation response and how to proceed, which is beyond the annotations (readOnlyHint=false, destructiveHint=false). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (around 100 words), front-loaded with the core purpose, and every sentence adds value. It structures information logically: purpose, conditions, effects, billing note. No redundant text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers all essential aspects: purpose, when to use, side effects, parameter meanings, and the confirmation flow. Since an output schema exists, the lack of return value description is acceptable. The tool's complexity is fully addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description explains both parameters: account is 'the connected email address', and confirm_billing is used in the billing confirmation context. This adds meaning beyond the schema's minimal titles.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: start or retry syncing a connected account that isn't syncing yet, with specific statuses mentioned ('not_syncing', 'pending'). It distinguishes from sibling tools like check_email_sync, which merely checks sync status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool (for accounts with 'not_syncing' or 'pending' status) and provides a clear workflow for the billing confirmation scenario, including instructions to call again with confirm_billing=true after user confirmation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

start_email_connectionStart connecting an email accountAInspect

Begin connecting an email account (or reconnecting one whose access expired) by returning a secure Mailopoly link for the user to open. Pass email_or_provider (the address or provider they want to add) for a NEW connection, or account (an existing connected address) to RECONNECT one flagged reauthorization_required. The link opens Mailopoly's own page where they sign in (OAuth) or enter an app password — the password is NEVER typed into the chat. For IMAP users, call get_connect_instructions first so you can tell them how to get their app password, then give them this link. Relay the returned url to the user.

ParametersJSON Schema

Name	Required	Description	Default
`account`	No
`email_or_provider`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	No	In-app URL for the user to open.
`error`	No
`method`	No	Always 'link'.
`reason`	No	connect \| reconnect.
`message`	No
`success`	No
`instructions`	No

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses important behavioral traits beyond annotations: the tool returns a secure Mailopoly link, the password is never entered in chat, and it uses OAuth flow. Annotations only provide basic hints (readOnlyHint=false, etc.), so the description carries the full burden and does so excellently.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but every sentence adds essential value. It is well-structured: purpose first, then parameter roles, then IMAP prerequisite, and finally the security note. No redundant or irrelevant content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (2 optional parameters, output schema exists, no annotations to lean on), the description covers all necessary aspects: what the tool does, how to use it for new vs reconnection, prerequisite steps for IMAP, and the nature of the output (a url to relay). It is fully adequate for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description fully explains both parameters: email_or_provider for new connections and account for reconnections, including their mutually exclusive use. This adds significant meaning beyond the schema's minimal type information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool begins connecting an email account, distinguishing between new connections (using email_or_provider) and reconnections (using account). It uses specific verbs and resources, and differentiates from sibling tools like verify_email_account or get_connect_instructions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use each parameter (new vs reconnection) and advises calling get_connect_instructions for IMAP users first. However, it does not explicitly state when not to use this tool or list alternatives beyond that, which reduces the score from a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_email_accountVerify a connected mailboxA

Idempotent

Inspect

Run a live check on one connected account and explain what's wrong, if anything. account is the connected email address (from list_email_accounts). Returns a status: ok, wrong_mailbox (signed in but the real mail is hosted elsewhere — see imap_suggestion), provider_mismatch, empty_mailbox, token_expired (needs reconnecting via start_email_connection), or wrong_provider. Use this when a user says their email isn't syncing to tell them precisely why and what to do next.

ParametersJSON Schema

Name	Required	Description	Default
`account`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	No	ok \| wrong_mailbox \| provider_mismatch \| empty_mailbox \| token_expired \| wrong_provider \| unsupported_provider \| no_token \| not_found.
`message`	No
`verified`	No
`email_address`	No
`message_count`	No
`imap_suggestion`	No
`suggested_action`	No
`detected_provider`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotentHint: true, destructiveHint: false), the description explains the return statuses (e.g., token_expired) and actions needed, providing detailed behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and informative, but slightly lengthy with multiple status explanations. Still efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter and an output schema, the description covers purpose, parameter usage, return values, and actionable guidance, fully meeting the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining that `account` is the connected email address from list_email_accounts, adding crucial meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs a live check on a connected account and explains what's wrong. It distinguishes from sibling tools like check_email_sync and get_account_overview by focusing on connectivity issues and suggesting follow-up actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends use when a user reports email sync issues, providing a clear use case. It does not explicitly exclude alternative scenarios, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?