Skip to main content
Glama

dialogbrain

Server Details

Unified inbox MCP for WhatsApp, Telegram, Email, voice — read/send messages, search, AI agents.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
saloprj/dialogbrain-mcp
GitHub Stars
0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.2/5 across 165 of 165 tools scored. Lowest: 2.9/5.

Server CoherenceA
Disambiguation4/5

Most tools have clearly distinct purposes within their categories, aided by detailed descriptions. However, some overlapping functionalities (e.g., `web_search` vs. `web__local_search`, `agents_ask` vs. `agents_simulate_inbound`) could cause minor confusion for an agent, preventing a perfect score.

Naming Consistency4/5

The naming convention is predominantly snake_case with a consistent `noun_verb` pattern. However, the use of a double underscore in `web__local_search` (vs. `web_search`) is an outlier that breaks the pattern, and a few verbs are less descriptive (e.g., `process`, `run` are absent here, but the overall pattern is good).

Tool Count2/5

With 165 tools, the server is oversized for a typical MCP server. While the tools are well-organized into domains, the sheer volume overwhelms an agent's ability to efficiently select the right tool, exceeding the recommended 3-15 range by a wide margin.

Completeness4/5

The tool surface covers a vast range of functionalities including agents, messaging, calls, browser automation, knowledge management, and integrations with LinkedIn, YouTube, and calendars. Minor gaps exist (e.g., no tool for managing YouTube channel settings or advanced web scraping), but the core workflows are well-supported.

Available Tools

165 tools
agent_handoffA
Read-onlyIdempotent
Inspect

Delegate a multi-step task (research, composing messages, booking, scheduling) to the full agentic planner. Use when a user ask needs more than a direct answer. Returns final_answer for you to narrate in one short sentence. Do NOT re-trigger the same handoff if the tool_result has status timeout or error — acknowledge and offer to retry.

ParametersJSON Schema
NameRequiredDescriptionDefault
modelNoOverride the escalation model. Omit (recommended) to use the calling agent's configured model from settings; falls back to claude-sonnet-4-6 when no agent context. Ignored when `agent_id` is set — the target agent uses its own stored model.
agent_idNoOptional ID of another agent in the same workspace to delegate the task to. When set, the target agent runs with ITS OWN prompt, tools, and model; `task_description` becomes its user query. Spawns a new trace linked back to this trace via parent_trace_id (visible in the admin lineage card). Omit to run a sub-loop on the calling agent (default behaviour).
task_descriptionYesPlain-language description of what the planner should accomplish. Include everything the planner needs: the user's goal, constraints, and any context already gathered in this voice call.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, idempotentHint. Description adds key behaviors: returns final_answer for narration, handling of timeout/error, and trace spawning when agent_id is set. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, followed by usage and error handling. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, error handling, and delegation behavior. No output schema exists, but final_answer is mentioned. Minor gap: final_answer structure could be detailed, but sufficient for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions. The description adds only minimal context (e.g., task_description should include everything). Baseline 3 is appropriate; no significant extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it delegates multi-step tasks to a planner, contrasting with direct-answer tools. It specifies the verb 'delegate' and the resource 'full agentic planner', distinguishing it from sibling tools like agents_ask.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly says 'use when a user ask needs more than a direct answer' and provides error-handling guidance ('Do NOT re-trigger... acknowledge and offer to retry'). While no alternative tool is named, the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_add_fileAInspect

Attach a file to this agent's private knowledge (agent-specific files, not shared with other agents).

Workflow:

  1. Upload the file with files_upload (pass source_url for remote files)

  2. Index it with files_ingest (pass the file_id)

  3. Call this tool with agent_id + file_id

Returns chunk_count — shows 0 while still processing. Call agents.list_files later to see the final chunk count once indexing completes.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesfile_id returned by files_upload or files_ingest
agent_idYesID of the agent to attach the file to
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool returns chunk_count which may be 0 while indexing, and that the file is agent-specific. Annotations minimal but description adds useful behavioral context about processing delay.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five sentences, front-loaded with purpose, then workflow, then return value explanation. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Explains return value and workflow, but lacks error handling or additional notes on idempotency. Still sufficiently complete for a tool with good annotations and schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage. Description mentions parameters but adds no significant new meaning beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool attaches a file to an agent's private knowledge, specifying it is agent-specific and not shared. Distinguishes from siblings like agents_remove_file and agents_list_files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a three-step workflow (upload, ingest, attach) and explains that chunk_count of 0 means processing, with suggestion to check later. Does not explicitly state when not to use, but workflow guidance is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_approve_draftAInspect

Approve a pending agent draft and send the message.

The draft will be sent to the conversation it was generated for. You can optionally edit the text before sending.

Use this when user says:

  • 'Approve this draft'

  • 'Send this reply'

  • 'Approve and send'

  • 'Looks good, send it'

IMPORTANT: This will send a message to a real person.

ParametersJSON Schema
NameRequiredDescriptionDefault
draft_idYesID of the draft to approve
edited_textNoOptional edited response text (if user wants to modify before sending)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds meaningful behavioral context beyond annotations: mentions sending to a real person, optional editing, and warning about real-world impact. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with main action, includes examples and a warning. Slightly repetitive but efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete description for a simple tool: explains action, outcome, optional edit, and provides usage examples. No output schema needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions fully cover both parameters (100% coverage), so baseline is 3. Description reinforces optional edit but doesn't add new details beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action: approve a pending agent draft and send the message. Distinguishes from sibling tools like agents_reject_draft and agents_list_drafts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit user phrases for when to use (e.g., 'Approve this draft'), offers context for usage, but lacks explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_askAInspect

Send a message to an AI agent and get its response.

The agent runs with its configured prompt, tools, and knowledge. Use this to test agents or have them process a task.

Returns: {status: 'replied'|'silent', response_text, messages[], full_reply, model_used, tokens_*, send_mode, execution_mode}. messages[] carries each messages.send invocation the agent made (text, subject, reply_to_message_id, timestamp, message_id, attachments=[{file_id,name,mime}]). full_reply concatenates text only — attachment-only sends show up in messages but not full_reply. status='silent' iff both response_text is empty AND messages is empty.

Execution may take 10-60s depending on agent complexity.

ParametersJSON Schema
NameRequiredDescriptionDefault
messageYesMessage/goal to send to the agent
agent_idYesID of the AI agent to ask
send_modeNoSend mode for the agent run: 'draft' = create drafts, 'auto' = send directly. Defaults to the agent's configured default_send_mode. Does NOT change execution_mode — that is fixed by the agent's config.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the basic annotations, the description discloses execution time (10-60s), return structure including status conditions ('replied'/'silent'), detailed attributes of messages (attachments, IDs), and how response_text and messages relate. This comprehensive behavioral disclosure adds significant value beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the purpose, followed by detailed return information. While every sentence adds value, the detailed enumeration of return fields could be slightly more concise. Overall, it is well-structured and informative without being excessively long.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description provides a thorough explanation of the return format, including edge cases (status='silent'), token usage, and execution time. It also covers the behavior of send_mode relative to agent config. This makes the tool's behavior fully predictable for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter (message, agent_id, send_mode) is already well-documented in the input schema. The description does not add new information about parameter usage or constraints beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Send a message to an AI agent and get its response.' It further clarifies use cases: 'Use this to test agents or have them process a task.' This distinguishes it from sibling tools like agents_create (creating agents) or agent_handoff (handoff actions), providing a specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context by stating the tool is for testing agents or processing tasks. It does not explicitly exclude scenarios or name alternatives, but the purpose is well-defined, and the context is sufficient for an AI agent to decide when to use this tool over other agent-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_createAInspect

Create a new AI agent in the workspace.

Execution modes:

  • ai_assisted (default, recommended): Two-phase AI — fast pre-classifier (Haiku) for keyword filtering and simple replies, then full AI with tools for complex messages. Best for: auto-replies, group monitoring, keyword-based filtering.

  • agentic: Autonomous multi-step agent with planning and tool execution. Best for: complex scheduled tasks, multi-step automation.

  • rule_based: Simple pattern matching without AI.

For keyword filtering: use ai_assisted mode + set keywords in trigger conditions (free, deterministic) and/or auto_reply_rules (smart, LLM-based) via agents.update.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesName of the AI agent (1-100 characters)
prompt_idNoID of the prompt to assign to this agent
send_modeNoDefault send mode: 'auto' or 'draft'. OMIT to use 'draft' (the default).
descriptionNoOptional description of what this agent does
execution_modeNoExecution mode: 'rule_based', 'ai_assisted' (default), 'agentic', 'claude_channels', or 'voice'. OMIT to use 'ai_assisted'.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate it is a write operation (readOnlyHint=false). The description adds context about execution modes and defaults (send_mode), but does not cover return values or side effects. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with the main purpose and organized with bullet points. A few minor verbose parts but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Missing information about return values (e.g., created agent ID) and required permissions. With no output schema, the description should cover what the tool returns. Otherwise adequate for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining execution modes beyond enum labels, and provides usage context for parameters like send_mode and execution_mode.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new AI agent in the workspace' with a specific verb and resource. It distinguishes from sibling tools like agents_update and agents_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use each execution mode (ai_assisted for keyword filtering, agentic for complex tasks, rule_based for simple). Does not explicitly state when not to use the tool, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_deleteBInspect

Permanently delete an AI agent.

WARNING: This cannot be undone. The agent and all its triggers will be removed.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the agent to delete
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that deletion is permanent and removes triggers, but the annotation destructiveHint=false contradicts this, indicating a serious inconsistency. Scoring is low due to contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: the first states the purpose, the second provides a critical warning. No unnecessary words, front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple destructive action with no output schema, the description covers the action, irreversibility, and what is removed. It could mention the return value or confirmation, but it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There is one parameter (agent_id) with a schema description, and the tool description adds no additional meaning beyond that. Schema coverage is 100%, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Permanently delete an AI agent' with a specific verb and resource, and it distinguishes from sibling tools like agents_create and agents_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The warning 'This cannot be undone' hints at use when irreversible deletion is intended, but it lacks explicit guidance on when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_getA
Read-onlyIdempotent
Inspect

Get detailed information about a specific AI agent.

Returns full agent config including:

  • Execution configuration

  • Tool configuration

  • Knowledge configuration

  • Escalation configuration

  • Triggers list

  • Knowledge collections

  • Custom AI instructions (prompt_text)

  • Auto-reply rules override (auto_reply_rules)

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the AI agent to fetch
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true, but description adds value by enumerating exact return fields (execution config, tool config, etc.). No contradictions. Provides behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is appropriately sized; first sentence clearly states purpose, followed by a bulleted list of return fields. No fluff. Every sentence adds value. Could be slightly more compact but is well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only one parameter and strong annotations, the description fully covers what the tool does and what it returns. No output schema, so the detailed list compensates. Complete enough for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Single parameter agent_id with schema description 'ID of the AI agent to fetch'. Schema coverage is 100%, so description doesn't need to add more. Description does not elaborate on the parameter, but baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description starts with clear verb 'Get' and resource 'detailed information about a specific AI agent'. Lists specific return fields, distinguishing it from siblings like agents_list (for listing) and agents_create (for creation). This provides exact, unambiguous purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage context is implied by purpose ('specific' vs list), but no explicit mention of when to use this tool vs alternatives (e.g., agents_list). No guidance on prerequisites or when not to use it. Adequate but minimal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_silenceA
Read-onlyIdempotent
Inspect

End this turn without sending any message. Use when the thread is owned by a human operator after job.escalate, when the guest is self-resolving, when the message is a duplicate, or for observation-only turns. Calling this tool is the ONLY correct way to stay silent — narrated silence text (e.g. '(Staying silent…)', 'Internal:…') would be delivered to the guest verbatim.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonYesFree-form explanation for admin audit. Stored in trace_tool_executions.tool_params (ClickHouse String; reason filters are scan-only).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, destructiveHint=false, and idempotentHint=true, which cover safety. The description adds valuable context: it ends the turn, and that narrated silence would be delivered verbatim to the guest. This goes beyond annotations without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, each carrying weight: core action, use cases, warning about alternative, and emphasis on correctness. No wasted words, and the key point is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema, clear annotations), the description is fully adequate. It covers purpose, usage, and a critical behavioral nuance, leaving no gaps for a typical use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the 'reason' parameter described in the schema. The description does not add extra meaning or context for the parameter beyond what the schema provides, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool ends the turn without sending a message, and explicitly distinguishes it from alternative actions like narrated silence. The verb 'end' and resource 'turn' are specific, and the purpose is unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists explicit scenarios for use (after job.escalate, self-resolving, duplicate, observation-only) and warns against using narrated silence as an alternative. This provides clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_listA
Read-onlyIdempotent
Inspect

List all AI agents configured in the workspace.

Returns agents with their basic info, trigger count, and knowledge collection count.

Use this to:

  • See all configured AI agents

  • Filter by enabled/disabled status

  • Get agent IDs for further operations

ParametersJSON Schema
NameRequiredDescriptionDefault
enabledNoFilter by enabled status (true = enabled only, false = disabled only, omit = all)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that it returns basic info, trigger count, and knowledge collection count, beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three paragraphs with front-loaded purpose. The bullet list slightly repeats the first sentence but overall efficient and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool without output schema, the description explains return fields (basic info, counts) and filter option. Covers main aspects adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the 'enabled' parameter. The description restates filtering capability but does not add new semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all AI agents configured in the workspace' with a specific verb and resource. It differentiates from sibling tools like agents_create or agents_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases (see all agents, filter, get IDs) but does not mention when not to use or alternatives. Still clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_draftsA
Read-onlyIdempotent
Inspect

List pending agent drafts awaiting approval.

Shows drafts that have been generated by AI agents but not yet sent. Each draft includes:

  • Thread/conversation info

  • Trigger message (what prompted the reply)

  • Generated response text

  • Creation time and expiration

Use this when user asks:

  • 'Show pending agent drafts'

  • 'What messages are waiting for approval?'

  • 'List drafts to approve'

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of drafts to return
thread_idNoFilter by specific thread ID (optional)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds context about draft content (thread info, trigger message, generated response, creation/expiration) beyond what annotations provide, enhancing transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a concise introductory sentence followed by bullet points for included fields. It is not overly verbose, and each element serves a purpose, though the bullet list could be slightly more compact.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, input semantics via schema, and output content. It lacks explicit details about pagination or error handling, but for a read-only list tool with good annotations, it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% as both parameters have descriptions. The description does not add additional parameter-level detail beyond what is in the schema, so it meets the baseline without extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List pending agent drafts awaiting approval' with a specific verb and resource. It lists included fields and provides example user queries, distinguishing it from sibling tools like agents_approve_draft and agents_reject_draft.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit example queries for when to use the tool, offering clear context. It does not explicitly state when not to use it or mention alternatives, but the context makes it clear that it's for listing drafts only.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_filesA
Read-onlyIdempotent
Inspect

List files directly attached to this agent (agent-specific files, not shared collections).

Returns file_id, title, status, and chunk_count for each file. chunk_count shows how many indexed chunks were created — 0 means the file is still processing.

Use agents.add_file to attach a new file, or agents.remove_file to detach one.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the agent whose files to list
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, indicating a safe read. The description adds value by explaining the return fields and that chunk_count=0 means processing. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, return fields, and related tool usage. No fluff, each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, return values, and a special case (chunk_count=0). It does not mention pagination or limits, but for a simple list tool this is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for agent_id. The description does not add further parameter meaning beyond what the schema provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and resource 'files directly attached to this agent', with explicit differentiation from shared collections. It also specifies the return fields, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description tells when to use this tool (list agent-specific files) and provides alternatives: 'Use agents.add_file to attach a new file, or agents.remove_file to detach one.' It implicitly distinguishes from sibling list tools by stating 'not shared collections'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_historyA
Read-onlyIdempotent
Inspect

List past versions of an agent's prompt_text. Every edit to the agent's prompt is snapshotted to an append-only table — use this tool to browse history, find a prior known-good version, and copy it into agents.prompt_restore.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax versions to return (1-200, default 50)
agent_idYesID of the agent
before_versionNoCursor: return versions strictly below this version_number
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and destructiveHint. The description adds that every edit is snapshotted to an append-only table, providing behavioral context beyond annotations. It could mention ordering or pagination, but overall sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. The purpose is front-loaded, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides a clear use case and hints at return content (versions with prompt_text). No output schema exists, but the tool is straightforward. Minor gap: ordering of results is not specified, but cursor parameter implies descending.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters (agent_id, limit, before_version). The description does not add further meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists past versions of an agent's prompt_text, using specific verbs and resource. It distinguishes itself from siblings like agents_prompt_restore by specifying its role in browsing history and finding prior versions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use the tool ('browse history, find a prior known-good version') and how to use it ('copy it into agents.prompt_restore'), providing clear alternatives and highlighting its read-only nature.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_restoreAInspect

Restore a past version of an agent's prompt_text by version_number. Creates a new version pointing at the restored content — history is preserved. Use agents.prompt_history first to find the version_number you want.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoOptional: why this restore is happening (shows up in history UI)
agent_idYesID of the agent
version_numberYesThe version_number to restore (get it from agents.prompt_history)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds that it creates a new version pointing to restored content and that history is preserved, which goes beyond annotations (destructiveHint: false, readOnlyHint: false) by explaining the non-destructive behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no filler. First sentence states purpose, second gives usage guidance. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a restore operation: explains what it does, how to use it, and that history is preserved. Lacks return value info but no output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions, so the baseline is 3. The description adds minimal extra meaning (e.g., where to get version_number) but mostly repeats schema info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it restores a past version of an agent's prompt_text by version_number, preserving history. It distinguishes itself from siblings like agents_prompt_history (which lists versions) and prompts_prompt_restore (for prompts).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to use agents.prompt_history first to find the version_number. No explicit when-not-to-use, but the context is clear and the sibling tool prompts_prompt_restore provides an alternative for prompts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_reject_draftAInspect

Reject a pending agent draft without sending.

The draft will be marked as rejected and won't be sent. Use this when the generated response isn't appropriate.

Use this when user says:

  • 'Reject this draft'

  • 'Don't send this'

  • 'Cancel this reply'

  • 'Delete this draft'

  • 'This response is wrong'

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoOptional reason for rejection (for logging/feedback)
draft_idYesID of the draft to reject
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses that 'The draft will be marked as rejected and won't be sent', which is behavioral context beyond annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded, with a list of example phrases. No unnecessary content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple action with good schema, description adequately explains behavior. No output schema needed. Could mention logging but not essential.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description adds no extra meaning beyond what schema already provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Title 'Reject Agent Draft' and description 'Reject a pending agent draft without sending' clearly state the verb and resource. It distinguishes from sibling 'agents_approve_draft'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'when the generated response isn't appropriate' and provides specific user phrases. Implies not to use when approval is needed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_remove_fileAInspect

Remove a file from this agent's private knowledge.

The file itself is not deleted — it's just detached from this agent. Use agents.list_files to find the file_id to remove.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to detach (from agents.list_files)
agent_idYesID of the agent to remove the file from
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the file is not deleted but only detached, which is important behavioral context. Annotations are non-destructive, so description adds clarity beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. Front-loads the main action and clarifies side effects immediately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal tool with no output schema, the description fully covers behavior, prerequisite knowledge (agents.list_files), and consequences (detachment). Complete and self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The description adds context for file_id (from agents.list_files), which provides additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Specifically describes the action 'Remove a file from this agent's private knowledge' and distinguishes it from deleting by noting the file is 'just detached'. Clear verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Clearly states the tool's purpose and references agents.list_files for finding file_id, but does not explicitly mention when not to use or compare to siblings like agents_add_file.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_simulate_inboundA
Read-onlyIdempotent
Inspect

Replay an inbound message on a thread through the real trigger pipeline and return what would have happened. The router auto-picks the winning enabled agent + trigger by priority/specificity (same logic as production). By default send_mode='draft' so no real message is sent; pass send_mode='auto' on a test account to let the matched agent actually deliver (drafts get overwritten by the next draft, so 'auto' is the only way to verify Telegram/email delivery end-to-end).

Use to verify routing for a thread: which agent answers, which trigger wins, or — when nothing matches — the structured skip reason. Pass blockchain_tx_data instead of message_text to simulate a blockchain:transfer event on the thread.

Returns: {matched: true, matched_agent: {id, name, execution_mode}, matched_trigger: {id, trigger_type, conditions, specificity_score}, routing_reason, response_text, messages[], execution_mode, send_mode, model_used, tokens_input, tokens_output, latency_ms, rag_queries_made, rag_results_used} on a hit, or {matched: false, skip_reason, simulator_warnings} on a miss.

ParametersJSON Schema
NameRequiredDescriptionDefault
send_modeNoHow the matched agent should deliver its reply. 'draft' (default, safe) creates a draft only — no real send, no idempotency key. 'auto' lets the agent deliver through the channel adapter exactly as it would in production — use this on a test account to verify Telegram/email delivery end-to-end. Drafts get overwritten by the next draft on the thread, so 'auto' is required when you want to see the message persisted.draft
thread_idYesThread ID to route the simulated event from. Must belong to the API key's workspace.
message_textNoInbound message body to simulate. Defaults to '[MCP simulation test]' when omitted.
system_messageNoTag the simulated inbound as a system/service-message row (missed call, group join, pinned message, etc.) so the `excluded_system_message_kinds` trigger filter can be exercised end-to-end. Shape: {"category": <one of call_event | membership_change | contact_signup | pinned_message | chat_metadata_change | voice_chat_event | other_service>, "native_kind": <free-form upstream event class name, e.g. 'MessageActionPhoneCall'>}. The category is written into `message.meta.system_message` (mirroring the real Telegram ingest path) AND surfaced on the synthetic IncomingEvent so the trigger evaluator honors the block-list. Omit for a normal text-message simulation.
blockchain_tx_dataNoWhen set, simulate a blockchain:transfer event instead of a channel:message:new event. Expected keys: chain, to_address / from_address, tx_hash.
attachment_file_idsNoOptional list of workspace file IDs to attach to the simulated inbound message — same shape as a real Telegram message with image/document attachments. Use this to test agent behavior on incoming messages that carry images (e.g. logos for invoices) or documents the agent must reference. File IDs must belong to the API key's workspace.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses behavioral traits: drafts overwritten, auto mode sends real messages, router uses production logic. However, annotations (readOnlyHint=true) contradict the ability to send real messages with 'auto' mode, which is a significant inconsistency. Despite this, description adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is thorough but somewhat lengthy (3 paragraphs). It is well-structured, front-loading the main action and then detailing parameters. Could be slightly more concise, but all sentences are informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters with nested objects and no output schema, the description provides a complete picture: it explains the return structure for both matched and unmatched cases, covers edge cases (system_message, blockchain), and addresses behavior differences between draft and auto modes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the description still adds meaning: explains send_mode trade-offs, system_message shape and purpose, blockchain_tx_data usage, and attachment_file_ids for testing image-aware agents. This goes beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool replays an inbound message through the trigger pipeline and returns what would have happened. It specifies the verb (simulate), resource (inbound message routing), and distinguishes from siblings like agents_ask by focusing on verification of routing logic rather than actual message sending.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'verify routing for a thread' and provides alternatives like blockchain_tx_data for simulating blockchain events. Also gives guidance on send_mode: 'draft' for safe simulation, 'auto' for end-to-end testing on test accounts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_task_completeAInspect

Report that a Claude Code agent task has been completed. Call this when you finish processing an agent_task from DialogBrain.

ParametersJSON Schema
NameRequiredDescriptionDefault
successYesWhether the task completed successfully
summaryNoBrief summary of what was done
trace_idYesTrace ID from the agent task event
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-destructive, non-readOnly behavior. The description adds context that the tool reports completion, which implies a state change. It does not contradict annotations and provides adequate behavioral insight for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the purpose, and contains no extraneous information. Every sentence serves a clear function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple notification tool with no output schema and 100% parameter coverage, the description fully informs the agent about when and why to use this tool. It is complete given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for all 3 parameters. The description does not add any additional meaning beyond what the schema already provides, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reports agent task completion, with a specific verb ('Report') and resource ('Claude Code agent task'). It distinguishes from sibling agent tools by specifying the exact event (finishing an agent_task from DialogBrain).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Call this when you finish processing an agent_task from DialogBrain,' providing clear context for when to use it. It does not mention alternatives or when-not-to-use, but the context is sufficient given the tool's narrow scope.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trace_getA
Read-onlyIdempotent
Inspect

Fetch the full execution detail for a single trace — tool executions, events timeline, LLM call spans (with error_message on failures).

Use after agents.traces_list identifies a specific trace of interest (failed run, slow run, unexpected outcome).

By default LLM system_prompt and prompt_messages are stripped — set include_llm_bodies=true to fetch them when diagnosing prompt engineering issues (emits a WARNING audit log). Set full=true to disable all field truncation. completion_text on failed LLM calls is always returned (capped at 8 KB).

ParametersJSON Schema
NameRequiredDescriptionDefault
fullNoDisable all field truncation. Escape hatch for a human operator. OMIT for the standard truncated view.
agent_idYesExpected agent_id — used for scope validation. Mismatch returns not_found.
trace_idYesTrace identifier returned by agents.traces_list.
include_llm_bodiesNoInclude system_prompt and prompt_messages in LLM spans. Audited at WARNING level. OMIT to keep them stripped (the default).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds beyond that: default stripping of LLM bodies, WARNING audit log when include_llm_bodies is set, truncation behavior, and that completion_text on failed LLM calls is always returned (capped at 8KB). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is extremely concise: one paragraph with clear front-loading of main purpose, then usage guidelines, then parameter details. No wasted words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description lists what the trace contains (tool executions, events, LLM call spans with error_message) and mentions key behavioral details. This is sufficient for an agent to understand the return value and when to use the tool, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds meaningful context: trace_id comes from agents_traces_list, agent_id for scope validation, include_llm_bodies triggers audit log, full is for human operator escape hatch. This elevates beyond schema alone, justifying a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Fetch the full execution detail for a single trace — tool executions, events timeline, LLM call spans', with a specific verb and resource. It distinguishes itself from sibling tools like agents_traces_list by specifying that this is for a single trace after identification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use after agents.traces_list identifies a specific trace of interest (failed run, slow run, unexpected outcome)'. Provides context for when to set include_llm_bodies and full, offering clear usage guidance and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_listA
Read-onlyIdempotent
Inspect

List recent execution traces for an agent — the same data as /admin/requests, scoped to one agent and readable by an LLM.

Use this when an agent call timed out, drafted the wrong response, or you want to know which tool/LLM call burned the latency. Pair with agents.trace_get for full detail on a specific trace.

Filters: status, success, source (single value or comma-separated: agent,voice), date_from/date_to (ISO-8601), pagination via limit/offset.

Returns returned_count, dropped_on_page (should be 0 — positive means the backend agent_id predicate let something through), and has_more. Edge case: a raw page of all-dedup-dropped rows yields returned_count=0, has_more=true; re-call with offset += limit.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax rows per page (1–100).
offsetNoRows to skip for pagination. OMIT to start at row 0 (default).
sourceNoFilter by trace source. Single value or comma-separated, e.g. 'agent,voice'. Values: agent / auto_reply / agentic / outreach / voice. Note: source='agent' also matches voice traces today (known upstream bug).
statusNoFilter by status. OMIT to include all statuses.
date_toNoISO-8601 upper bound on created_at.
successNoFilter to succeeded (true) or failed (false) runs only. OMIT to include both.
agent_idYesAgent ID to pull traces for (must belong to your workspace).
date_fromNoISO-8601 lower bound on created_at, e.g. '2026-04-10T00:00:00Z'.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses edge case of dropped_on_page (should be 0, positive means backend bug) and pagination behavior (offset increment when has_more true). Annotations already declare readOnlyHint and idempotentHint, and description adds important behavioral nuances beyond those.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise at 6 sentences, front-loaded with purpose and usage, then filters and return fields. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lack of output schema, description explains return fields and edge case. For a list tool with 8 parameters, it covers purpose, usage, filters, return format, and a specific edge condition. No obvious gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds minor value (source comma-separated values, known bug with source='agent' matching voice, ISO-8601 format for dates). Not transformative but helpful.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists execution traces for an agent, distinguishing from sibling tools like agents_trace_get (for full detail) and agents_traces_stats (presumably stats). It specifies the data source (/admin/requests) and that it's scoped to one agent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use scenarios (timeout, wrong draft, latency analysis) and pairs with agents_trace_get for further detail. Does not explicitly state when not to use, but the use cases are clear and context is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_statsA
Read-onlyIdempotent
Inspect

Aggregated trace statistics for one agent over the last N days — total runs, success rate, avg duration, error breakdown, top tools used, runs-per-day histogram.

Use this when you want a bird's-eye view of an agent's health before diving into individual traces with agents.traces_list / agents.trace_get. Scoped to the target agent (exact match, no substring bleed). days is capped at 30 — matches the ClickHouse request_traces TTL.

ParametersJSON Schema
NameRequiredDescriptionDefault
daysNoRolling window in days (1–30).
agent_idYesAgent ID to compute stats for (must belong to your workspace).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds context beyond annotations (e.g., scoped to exact agent match, days capped at 30 due to TTL), but annotations already cover readOnly and idempotent hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words; front-loaded with purpose then usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 100% schema coverage and no output schema, the description lists returned metrics adequately, though output format details are omitted.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions; description adds extra context like the TTL cap and workspace ownership, adding value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('aggregated trace statistics') and lists concrete metrics (total runs, success rate, etc.), clearly distinguishing it from sibling tools like agents.traces_list and agents.trace_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use this tool ('bird's-eye view before diving into individual traces') and names alternatives (agents.traces_list / agents.trace_get).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_createAInspect

Create a new trigger for an AI agent.

Triggers determine when the agent activates.

Trigger types:

  • incoming_message: Activates on new incoming messages

  • schedule: Activates on a schedule

  • webhook: Activates on webhook events

  • event: Activates on system events

ParametersJSON Schema
NameRequiredDescriptionDefault
enabledNoWhether the trigger is enabled. OMIT to use the default (true).
agent_idYesID of the agent to create a trigger for
priorityNoTrigger priority — lower numbers run first (default: 100)
send_modeNoSend mode override for this trigger. OMIT to inherit from the agent.
conditionsNoTrigger conditions (JSON). Supported fields for incoming_message: - keywords: ["pricing","demo"] — message must contain keyword(s) (free, no LLM cost) - keyword_match: "any" (default, OR) or "all" (AND) - channel_types: ["telegram","whatsapp","livechat_voice","twilio_voice","telegram_voice","voice",...] — filter by channel. For voice, use EITHER the three per-channel keys (scoped) OR "voice" alone (wildcard matching all three) — mixing them is redundant. Per-channel keys: "livechat_voice" (web widget), "twilio_voice" (PSTN inbound), "telegram_voice" (Telegram p2p calls) - context_types: ["dm","group","channel","livechat"] — filter by chat type - group_mode: "mentions_only" or "questions" — for group chats - channel_account_ids: ["123"] — restrict to specific accounts - folder_ids: [5,10] — restrict to threads in folders - ai_tag_ids: [1,2] — restrict to threads with AI tags - ai_filter_ids: [1,2] — semantic intent filters (message matched via embedding similarity, works in noisy groups) - ai_filter_mode: "any" (default, OR) or "all" (AND) — how multiple AI filters combine - ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. Prefer referencing existing filters by id when available. Use ai_filters.create + ai_filters.test for fine-tuning before assigning. - contact_states: ["active"] — filter by contact state - cooldown_seconds: 30 — min gap between runs per thread - max_runs_per_thread_per_hour: 5 — rate limit Supported fields for job_completed (proactive callback when a delegated job finishes): - source_agent_id: <int> — fire only when this agent's job completed - source_agent_slug: <str> — alternate to source_agent_id - job_type: "agentic_session" — match a specific job type (default: any) - outcome: ["completed"] | ["escalated"] | ["completed","escalated"] — default ["completed"] - min_duration_seconds: <int> — skip very-short jobs (noise filter) - thread_filter: {thread_ids: [<int>...]} — restrict to specific threads
thread_idsNoRestrict this trigger to specific threads (chats) by their numeric thread IDs. When set, the trigger only fires for messages in these threads. Maps to conditions.thread_filter.thread_ids.
trigger_typeYesType of trigger: 'incoming_message', 'incoming_call', 'voice_transcript', 'schedule', 'webhook', 'event', 'blockchain_event', or 'job_completed'
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations, detailing trigger types and complex condition fields. Annotations only provide readOnlyHint=false and destructiveHint=false, so the description carries the burden and does so well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the purpose, but the conditions section is extensive and verbose. While structured, it could be more concise by referencing external documentation for detailed conditions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of triggers and nested conditions, the description provides substantial detail without needing an output schema. It explains trigger types and condition fields comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value with detailed explanations for the 'conditions' parameter, covering supported fields and their behavior. For other parameters, it repeats schema info but adds clarity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new trigger for an AI agent' and explains trigger types and their activation conditions. It distinguishes from sibling tools like agents_trigger_update and agents_trigger_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists trigger types and conditions but does not explicitly state when to use this tool vs alternatives like agents_trigger_update. No direct guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_deleteAInspect

Delete a trigger from an AI agent.

WARNING: This cannot be undone.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the agent that owns this trigger
trigger_idYesID of the trigger to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description includes a strong warning: 'WARNING: This cannot be undone,' which adds critical behavioral transparency about irreversibility beyond the annotations. However, the annotations set destructiveHint=false, which contradicts the implied destructiveness, slightly reducing clarity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences, the first clearly stating the action and the second adding a crucial warning. No extraneous words or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation, the description covers the essential purpose and the irreversible nature. It lacks details about success/failure conditions or side effects, but given the straightforward nature and full schema coverage, it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter schema coverage is 100%, and both parameters (agent_id, trigger_id) have clear descriptions in the schema. The tool description does not add any additional semantic information beyond what the schema already provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action: 'Delete a trigger from an AI agent.' The verb 'delete' and resource 'trigger' are unambiguous, and it distinguishes itself from sibling tools like agents_trigger_create and agents_trigger_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., other delete tools or update operations). It does not mention prerequisites, such as ownership or permissions, nor does it indicate when deletion is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_updateAInspect

Update an existing AI agent trigger.

All parameters are optional — only provided fields will be updated.

ParametersJSON Schema
NameRequiredDescriptionDefault
enabledNoEnable or disable this trigger. OMIT to leave the enabled flag unchanged.
agent_idYesID of the agent that owns this trigger
priorityNoTrigger priority — lower numbers run first
send_modeNoNew send mode override. OMIT to leave the send-mode unchanged.
conditionsNoNew trigger conditions (replaces existing). Same fields as trigger_create: keywords, keyword_match, channel_types, context_types, group_mode, channel_account_ids, folder_ids, ai_tag_ids, ai_filter_ids, ai_filter_mode, ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. contact_states, cooldown_seconds, max_runs_per_thread_per_hour
thread_idsNoRestrict this trigger to specific threads (chats) by their numeric thread IDs. When set, merged into conditions.thread_filter.thread_ids. If conditions is also provided, thread_ids is merged into it.
trigger_idYesID of the trigger to update
trigger_typeNoNew trigger type. OMIT to keep the existing type unchanged.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false, destructiveHint=false, consistent with an update operation. The description adds that parameters are optional, providing some behavioral context, but does not disclose side effects, authorization needs, or other traits beyond what annotations already convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences. The first sentence states the purpose, and the second provides a critical usage nuance. Every word is purposeful, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high schema coverage (100%) and presence of annotations, the description is largely sufficient. It covers the partial update behavior, which is key. However, it could briefly mention that the 'conditions' parameter replaces existing conditions (though detailed in schema) for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The description adds that all parameters are optional, which is not explicitly stated in each schema description but is implied. It does not add significant new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates an existing AI agent trigger, which is a specific verb-resource combination. It distinguishes itself from sibling tools like agents_trigger_create (create new trigger) and agents_trigger_delete (delete trigger) by using 'update' and 'existing'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that all parameters are optional and only provided fields are updated, giving useful context for partial updates. However, it does not explicitly state when to use this tool versus alternatives (create, delete) or provide any exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_updateAInspect

Update an existing AI agent's configuration.

All parameters are optional — only provided fields will be updated.

Use this to:

  • Enable or disable an agent

  • Change agent name or description

  • Assign or detach a prompt

  • Change default send mode

  • Replace knowledge collections

  • Update agent status

  • Change agent priority for trigger matching (lower number = higher priority)

  • Override which tools the agent can/can't call on triggered runs

  • Override which context sections (situation, communication style, job state, conversation history, thread summary) the agent receives

  • Opt into boilerplate prompt sections (safety guidelines, data confidentiality, factual accuracy) — all default OFF

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew name for the agent
modelNoCanonical source for which LLM the agent runs on. To switch models pass JUST this — do NOT also rewrite prompt_text (any 'duty model' section in the prompt is stale doc, not the config). OMIT to leave the model unchanged.
statusNoAgent status: 'active', 'paused', or 'archived'. OMIT to leave the status unchanged.
enabledNoEnable or disable the agent. OMIT to leave the enabled flag unchanged.
agent_idYesID of the agent to update
priorityNoAgent priority for trigger matching. LOWER number = HIGHER priority (wins tiebreaks). Typical range 1-100. Fallback auto-reply agents use 10; specialised/topical agents use 100. When two agents match the same incoming message, the one with the lower priority number fires.
prompt_idNoPrompt ID to assign (null to detach)
send_modeNoDefault send mode: 'auto' or 'draft'. OMIT to leave the send-mode unchanged.
fast_modelNoModel for the fast-path responder (voice, text auto-reply, agent executor). Defaults to claude-haiku-4-5-20251001 when unset. Non-Anthropic models (deepseek-chat, gpt-4.1-nano, kimi-k2.6) do NOT use BYOK today — they use the system API key + credits. Pass null to revert to default.
api_surfaceNoOpenAI HTTPS endpoint for this agent's LLM calls (Phase 3a). 'chat_completions' (default, also when null) routes to /v1/chat/completions. 'responses' routes to /v1/responses — required for OpenAI native server tools (web_search, code_interpreter, image_generation, input_file PDFs). Capability still wins: agents whose tool list triggers the server_tool_responses_api substitution always route to Responses regardless of this setting. Ignored on non-OpenAI models (Anthropic, DeepSeek, Moonshot). OMIT to leave the api_surface unchanged.
descriptionNoNew description for the agent
prompt_textNoDESTRUCTIVE — REPLACES the entire system prompt. Pass ONLY when the user explicitly asks to edit/rewrite the prompt. To READ the prompt use prompts.get. When updating other fields (model, name, …) OMIT this. To append, prompts.get first then concatenate. Pass null to revert to the linked template.
voice_toolsNoAllow-list of tool IDs usable in voice mode (e.g. ['calls.end']). Empty list [] = explicit no-tools allow-list. Omit leaves unchanged. MCP cannot null-clear — use REST to revert to inherit from agent allowed_tools.
denied_toolsNoBlock-list of tool IDs the agent must not call on triggered runs. Applied after allowed_tools and default visibility. Empty list [] = clear the block-list.
allowed_toolsNoExplicit allow-list of tool IDs this agent can call on triggered runs (e.g. ['messages.send', 'agents.handoff']). Empty list [] = clear the allow-list and fall back to system defaults. When set, only these tools (minus denied_tools) are exposed to the agent. Does NOT affect the My AI dropdown path.
execution_modeNoExecution mode: 'agentic', 'ai_assisted', 'rule_based', 'claude_channels', or 'voice'. OMIT to leave the execution mode unchanged.
vision_enabledNoPer-agent opt-in for vision content. When true, the executor splices recent image attachments from the active thread into the LLM call (Phase 3a continuous vision for Meet bot screen-share, plus any future channel that uploads images). Requires the agent's model to support vision (model_has_vision check). Default false; new calls pay zero token cost until the operator opts in. OMIT to leave the vision flag unchanged.
voice_greetingNoOpening line the agent speaks when the call connects. Pass an empty string "" to clear. Omit or null leaves unchanged.
voice_stt_modelNoSpeech-to-text model: 'flux' (LLM-powered end-of-turn) or 'nova-3' (silence-based). Flux is more responsive; nova-3 is the fallback when your Deepgram plan lacks Flux. OMIT to leave the STT model unchanged.
voice_tts_speedNoTTS playback speed multiplier (0.5-2.0, default 1.0). Yandex/OpenAI/Cartesia only — ignored for Deepgram.
voice_tts_voiceNoTTS voice id — provider-specific (e.g. 'aura-2-thalia-en' for Deepgram, 'alloy' for OpenAI, 'alena' for Yandex, Cartesia voice UUID). Pass null to revert to provider default.
auto_reply_rulesNoPlain-English rules injected into the fast model's system prompt as a `## Rules` block. No reserved keywords — the fast model reads them as guidance and decides per turn whether to reply directly or escalate to the main model for tools. Example: '- If the user greets, reply "Hi! How can I help?"\n- If the user asks what you can do, reply with a 1-sentence summary\n- If the question needs live data (prices, stock, booking), escalate' Engagement filtering (SKIP) belongs in trigger `conditions` (keywords, ai_filters, channel_types, cooldown), NOT here — if a message should be ignored the trigger shouldn't have fired. Pass null to clear.
voice_max_tokensNoMax TTS tokens per voice reply (40-200, default 100). Lower = snappier, higher = more detail.
include_job_stateNoInclude current job state (active job context, tasks, notes) in the agent's prompt. OMIT to leave this flag unchanged.
include_situationNoInclude situation context (channel, sender info, trigger type) in the agent's prompt. OMIT to leave this flag unchanged.
voice_stt_keytermsNoDomain-vocab bias for STT — names, product SKUs, etc. Passed verbatim as repeated `&keyterm=<w>` query params. Works on both Nova-3 and Flux. Prefer short phrases over full sentences. Empty list [] = no bias. Omit leaves unchanged.
voice_stt_languageNoSTT language hint. 'multi' (default) enables code-switching; singletons like 'en', 'ru', 'es' give higher accuracy when the caller language is known. Use 'multi' for bilingual callers. OMIT to leave the STT language unchanged.
voice_tts_languageNoTTS language code, BCP-47 lite e.g. 'en', 'es', 'pt-BR' (Cartesia only, default 'en').
voice_tts_providerNoText-to-speech provider: 'deepgram' (default, Aura-2 EN-only), 'openai' (multilingual), 'yandex' (best Russian), or 'cartesia' (Sonic-3 ultra-low TTFB). OMIT to leave the TTS provider unchanged.
voice_primary_modelNoPrimary LLM for voice turns (e.g. 'gpt-4.1-mini', 'claude-haiku-4-5-20251001'). gpt-4.1-nano is too weak for reliable turn tracking; mini is the recommended floor. Pass null to revert to default.
fast_prompt_overrideNoFull fast-path prompt override. Placeholders substituted via .replace(): {message}, {history}, {rules}, {tools}, {output_contract}. agent.prompt_text is NOT injected into fast_prompt_override — include it yourself if you want it. Pass null to clear.
voice_filler_enabledNoEmit 'thinking' filler audio while tools run so the caller hears life on the line (default true). OMIT to leave this flag unchanged.
voice_max_tool_callsNoMax tool calls per voice turn (1-10, default 3). OMIT to leave unchanged.
voice_thinking_textsNoPool of phrases spoken while the agent sets up the turn before calling the LLM (e.g. ['Hmm', 'So', 'One sec']). Pre-rendered to PCM at call start; one is picked at random per turn so the agent doesn't repeat the same word. Pass [] to clear. Omit or null leaves unchanged.
include_learned_styleNoInclude learned communication style (per-contact tone, dormancy state) in the agent's prompt. OMIT to leave this flag unchanged.
include_thread_summaryNoInclude condensed summary of older thread messages in the agent's prompt. OMIT to leave this flag unchanged.
include_factual_accuracyNoInject the Factual Accuracy block (~100 tokens, generic anti-hallucination rules) into the system prompt. Default OFF — skip if you write domain-specific accuracy rules in Instructions. Agentic mode only. OMIT to leave this flag unchanged.
knowledge_collection_idsNoReplace all knowledge collections with these IDs (empty list = clear all)
include_safety_guidelinesNoInject the generic Safety Guidelines block (~80 tokens) into the system prompt. Default OFF — enable only if you don't already write safety rules in your Instructions. Agentic mode only. OMIT to leave this flag unchanged.
include_tool_call_historyNoInclude the agent's own tool calls and results from the last 3 runs on this thread, compacted to IDs + top hits (~200-1000 tokens). Lets the agent recall file IDs, search hits, and decisions it already made across turns. Default ON. Agentic mode only. OMIT to leave this flag unchanged.
voice_endpointing_min_delayNoSilence after end-of-utterance before agent replies (0.1-2.0s, default 0.3). Higher = fewer false interrupts; lower = snappier.
voice_preemptive_generationNoSpeculatively start the LLM on STT partials so the agent begins responding before end-of-utterance. Matches LiveKit stock template. Default true. OMIT to leave this flag unchanged.
include_conversation_historyNoInclude recent messages from this thread (up to 20) in the agent's prompt. OMIT to leave this flag unchanged.
include_data_confidentialityNoInject the Data Confidentiality block (~250 tokens, cross-contact PII isolation + prompt-injection defense) into the system prompt. Recommended for multi-tenant workspaces. Default OFF. Agentic mode only. OMIT to leave this flag unchanged.
voice_interruption_min_durationNoMin caller speech duration to interrupt the agent (0.1-1.5s, default 0.25). Higher = ignore short fillers like 'uh-huh'.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description goes beyond annotations (readOnlyHint=false, destructiveHint=false) by noting 'All parameters are optional' and flagging prompt_text as 'DESTRUCTIVE', plus detailed parameter-specific behavior like model switching nuances.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with purpose, then optionality note, then bulleted use cases. It is slightly lengthy due to complexity but well-organized; could trim some redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 45 parameters, description covers major use cases and relationships (e.g., model/prompt_text interplay). Missing output schema info but annotations provide no output schema, so this is acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the tool description adds value by framing optionality and grouping use cases, though it does not detail each parameter. The bullet list helps agents understand which parameters to use together.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Update an existing AI agent's configuration' and lists specific use cases (enable/disable, change name, etc.), distinguishing it from sibling tools like agents_create and agents_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly lists 'Use this to:' with bullet points, covering key scenarios. It notes all parameters are optional but does not explicitly exclude when not to use, though context implies for updates only.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_createAInspect

Create a new AI filter for semantic intent-based message matching.

AI filters use vector embeddings (via Voyage AI) to detect whether an incoming message matches a specific intent or topic. The filter's description is embedded as a reference vector at creation time. When a message arrives, its embedding is compared against this reference using cosine similarity.

The description field is the most important part — it becomes the reference embedding that all incoming messages are compared against. Write it as a clear statement of what kind of messages should match:

  • 'Customer asking about pricing, subscription plans, or billing'

  • 'User reporting a bug, crash, or unexpected behavior in the product'

  • 'Inbound sales lead expressing interest in purchasing or trialing'

The threshold controls sensitivity: 0.5 is a balanced default, lower values (0.3) cast a wider net, higher values (0.8) require closer matches.

Note: This tool calls the Voyage AI embedding API to generate the reference vector.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesFilter name — a short, human-readable label (max 100 chars)
thresholdNoCosine similarity threshold for a message to be considered a match. Range 0.1–1.0. Default 0.50. Lower values (e.g. 0.3) are more permissive and catch more messages. Higher values (e.g. 0.8) require closer semantic similarity.
descriptionYesReference text that defines what messages should match this filter. This text is embedded as a vector and used for cosine similarity comparison against all incoming messages. Be specific and descriptive — the quality of this text directly determines filter accuracy. E.g. 'Customer asking about pricing, subscription costs, or billing issues'. Max 500 chars.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by explaining the use of Voyage AI embedding API, vector embeddings, cosine similarity, and how the description becomes the reference vector. Annotations only indicate it's not read-only and has side effects, but the description adds rich behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is multi-paragraph but well-structured. It starts with the purpose, then explains the mechanism, gives examples, explains threshold, and notes the external API call. Each section adds value, though slightly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of AI filters and no output schema, the description covers the creation process, parameter usage, and external API call. It is sufficient for an agent to understand when and how to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions, but the tool description adds significant value: it provides examples for the description field, explains threshold sensitivity with concrete numbers, and emphasizes the critical role of the description in filter accuracy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new AI filter for semantic intent-based message matching' with a specific verb and resource. It distinguishes from sibling tools like ai_filters_delete, ai_filters_update, etc., by focusing on creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use this tool (to create an AI filter) and provides guidance on how to write the description and set the threshold. It does not directly mention when not to use it or alternatives, but the context makes it clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_deleteA
DestructiveIdempotent
Inspect

Permanently delete an AI filter.

When to use:

  • User wants to remove a filter they no longer need

This action cannot be undone. Any triggers that reference this filter by ID will no longer match it — review and update those triggers after deletion.

ParametersJSON Schema
NameRequiredDescriptionDefault
filter_idYesID of the filter to delete
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses permanence ('cannot be undone') and side effect on triggers ('triggers that reference this filter by ID will no longer match it'). The annotations (destructiveHint=true) align and are supplemented by detailed behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: three short paragraphs with no wasted words. The action is front-loaded, and each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation with one parameter and no output schema, the description fully covers purpose, usage, irreversibility, and downstream effects. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with one parameter filter_id described as 'ID of the filter to delete.' The description adds no further semantics beyond the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Permanently delete an AI filter,' specifying the action and resource. It distinguishes from sibling tools like ai_filters_create and ai_filters_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a clear 'When to use' section: 'User wants to remove a filter they no longer need.' Lacks explicit when-not-to-use or alternatives, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_listA
Read-onlyIdempotent
Inspect

List all AI filters for the current workspace.

AI filters are semantic intent-based message filters that use embeddings (vector representations) to detect whether an incoming message matches a specific intent or topic. Unlike keyword filters, they understand meaning: 'I need help with my order' and 'my package hasn't arrived' both match a 'shipping support' filter even without shared keywords.

Each filter stores a reference embedding of its description. When a message arrives, its embedding is compared via cosine similarity against the filter's reference vector. If the similarity exceeds the threshold, the filter matches.

When to use:

  • Check which semantic filters already exist before creating a new one

  • Get filter IDs for use in trigger conditions

  • Review thresholds and active status of existing filters

Returns all filters with id, name, description, threshold, and is_active.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds valuable context: it explains that AI filters are semantic intent-based using embeddings and cosine similarity, and that the operation returns all filters. This goes beyond the annotations and provides full transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: a purpose statement, an explanation of AI filters, usage scenarios, and return details. Every sentence adds value, and it is appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and no output schema, the description provides complete information: what it does, what it returns (fields), and contextual understanding of AI filters. It is fully sufficient for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has zero parameters (100% coverage), so baseline is 4. The description adds meaning by explaining the nature of AI filters and the output fields, which is sufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all AI filters for the current workspace' and specifies the returned fields (id, name, description, threshold, is_active). This is specific and distinguishes it from sibling tools like ai_filters_create, ai_filters_delete, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use' section with explicit scenarios: check existing filters before creating, get filter IDs for triggers, review thresholds and active status. While it provides clear usage context, it does not explicitly mention when not to use or compare with alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_testA
Read-onlyIdempotent
Inspect

Test a message against an AI filter to check whether it would match.

This tool embeds the provided message using Voyage AI and computes the cosine similarity between the message vector and the filter's stored reference vector. It returns the similarity score, whether the message would match (similarity >= threshold), and the filter's threshold value.

Use this to:

  • Verify a filter works as intended before using it in a trigger

  • Tune the threshold by testing borderline messages

  • Debug why a message did or did not match a filter in production

Returns: {similarity: float, matched: bool, threshold: float}

Note: This tool calls the Voyage AI embedding API to embed the test message.

ParametersJSON Schema
NameRequiredDescriptionDefault
messageYesThe message text to test. This is embedded and compared against the filter's reference vector via cosine similarity.
filter_idYesID of the filter to test against
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by revealing that the tool calls the Voyage AI embedding API and computes cosine similarity. It confirms the tool is idempotent and safe, consistent with annotations, and provides detailed behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficient and well-structured: purpose, mechanism, use cases, output format, and a note on API call. No redundant sentences, and every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description explicitly states the return format {similarity, matched, threshold}. It covers behavioral traits, usage guidelines, and parameter roles, making it fully complete for a test tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for both parameters. The description adds marginal value by explaining the role of message in embedding and threshold comparison, but the schema already handles the semantics adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tests a message against an AI filter to check for matches, explaining the embedding and cosine similarity computation. It distinguishes itself from sibling tools like ai_filters_create, ai_filters_delete, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists three use cases: verify filter before use, tune threshold, debug production issues. It provides clear context for when to use the tool, though it doesn't explicitly state when not to use it or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_updateAInspect

Update an existing AI filter's name, description, threshold, or active state.

When to use:

  • User wants to rename a filter

  • User wants to refine the filter description to improve match accuracy

  • User wants to adjust the similarity threshold (higher = stricter matching)

  • User wants to enable or disable a filter without deleting it

Provide only the fields you want to change. At least one field is required.

Note: If the description is changed, this tool calls the Voyage AI embedding API to re-generate the reference vector with the new description text.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew filter name (max 100 chars, optional)
filter_idYesID of the filter to update
is_activeNoEnable (true) or disable (false) the filter. OMIT to leave the active flag unchanged.
thresholdNoNew cosine similarity threshold. Range 0.1–1.0. Optional.
descriptionNoNew reference description text. If changed, the Voyage AI embedding API is called to re-generate the reference vector. Max 500 chars. Optional.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false, but the description adds critical behavior: calling the Voyage AI embedding API to re-generate the reference vector when description changes. This goes beyond annotations and helps the agent anticipate side effects. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three paragraphs: main action, when-to-use list, and a note about re-embedding. It front-loads the purpose and avoids unnecessary detail. Could be slightly more compact, but every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 5 parameters, the description covers the key aspects: what fields can be updated, the effect of changing description (API call), and the threshold range implied by min/max in schema. It provides enough context for an agent to invoke correctly, though it lacks clarification on whether partial updates are always allowed when some fields are omitted.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so baseline is 3. The description adds value by explaining threshold meaning ('higher = stricter matching') and clarifying that changing description triggers an embedding API call. It also reinforces that at least one field must be changed (though schema only marks filter_id as required). This enriches parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates an AI filter's name, description, threshold, or active state. It distinguishes itself from sibling tools like ai_filters_create and ai_filters_delete by focusing on modification. The verb 'Update' and resource 'AI filter' are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use the tool (rename, refine description, adjust threshold, enable/disable). It also advises to provide only fields to change and that at least one field is required. However, it does not explicitly mention when not to use it or compare to alternatives like ai_filters_test, but the provided guidance is clear and helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_add_to_threadAInspect

Apply one or more AI tags to a thread (manually).

When to use:

  • User wants to label a conversation with one or more tags

  • User asks to categorize or tag a thread

Provide the thread_id (integer) and an array of tag_ids to apply. If a tag is already applied it will be updated to is_manual=true.

ParametersJSON Schema
NameRequiredDescriptionDefault
tag_idsYesArray of tag IDs to apply (1–20 IDs)
thread_idYesID of the thread to tag
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutation (readOnlyHint=false). Description adds key detail: 'If a tag is already applied it will be updated to is_manual=true', which is not obvious from annotations. This enriches behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: one-line title, brief 'When to use' section, parameter instruction, and a single behavioral note. Every sentence adds value; no superfluous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 2-parameter mutation tool with no output schema, the description covers purpose, usage, parameter constraints, and idempotency behavior. Minor omission: no mention of prerequisite (e.g., tags must exist), but overall sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters are covered 100% in the input schema. The description restates them without adding new semantic meaning beyond the schema descriptions. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'apply', the resource 'AI tags to a thread', and adds 'manually' to distinguish from automatic tagging. The purpose is specific and distinct from siblings like 'ai_tags_remove_from_thread'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use' scenarios: user wants to label a conversation or categorize/tag a thread. Lacks direct exclusions or alternatives, but the context is clear and useful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_createAInspect

Create a new AI tag (automatic message filter).

AI tags are lightweight classifiers that run on every incoming message. When a message matches the tag's description/criteria, the thread is automatically labelled — so AI agents can cheaply pre-filter threads instead of running full LLM analysis on everything. Good descriptions are the key: they tell the classifier exactly when to apply this tag.

When to use:

  • User wants to auto-classify incoming messages (e.g. bug reports, sales leads, support requests)

  • User wants to reduce AI agent costs by pre-filtering threads by topic or intent

Tips for the description field:

  • Be specific: 'Messages reporting errors, crashes, or unexpected behavior in the product'

  • Include examples of what qualifies and what doesn't

Limit: 20 active personal tags / 50 active team tags.

ParametersJSON Schema
NameRequiredDescriptionDefault
iconNoEmoji icon for the tag (max 10 chars, optional)
nameYesTag name (max 100 chars)
colorNoTailwind color key for the tag badge. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to use the default color.
descriptionNoClassifier prompt: describe exactly when this tag should be applied to a thread. The more specific, the better the auto-classification accuracy. E.g. 'Messages reporting software errors, crashes, or unexpected behavior'. Max 500 chars.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is not read-only or destructive; description adds that tags run on every incoming message and auto-label threads when criteria match. It also mentions limits (20/50 active tags), providing behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a summary, conceptual explanation, usage guidance, tips, and limits. It is front-loaded with the core purpose, and each section adds value. While slightly long, it remains efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity and zero output schema, the description covers purpose, usage, key behavioral details, and parameter guidance. It provides enough context for an agent to decide when to use it and how to fill in the description field correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining that the 'description' parameter is a classifier prompt, with tips for specificity and examples. This enhances understanding beyond the schema's brief descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a clear action verb and resource ('Create a new AI tag'), immediately distinguishing it as a creation tool. It explains the concept of AI tags as lightweight classifiers that auto-label threads, distinguishing it from sibling tools like ai_filters_create which may have different behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use:' section with two explicit scenarios: auto-classifying incoming messages and reducing costs by pre-filtering. While it does not explicitly state when not to use it compared to alternatives, the context is very clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_deleteA
DestructiveIdempotent
Inspect

Delete a personal AI tag. All thread associations are removed automatically.

When to use:

  • User wants to permanently remove a tag they no longer need

This cannot be undone. Threads are NOT deleted — they just lose this tag.

ParametersJSON Schema
NameRequiredDescriptionDefault
tag_idYesID of the tag to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint: true, so the description adds value by explaining that thread associations are removed automatically and threads are not deleted. It also states the action cannot be undone. This provides behavioral context beyond what annotations offer, without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with the main action, followed by a clear usage guideline and consequence statements. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple nature of the tool (one parameter, no output schema) and high schema coverage, the description fully explains the operation, when to use it, and its effects. It covers all relevant aspects for an AI agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the parameter tag_id is well-described in the schema. The description does not add extra semantics beyond what the schema already provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it deletes a personal AI tag and automatically removes thread associations. It uses the specific verb 'delete' and resource 'personal AI tag'. It distinguishes from siblings like ai_tags_remove_from_thread which only removes tag from a thread without deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a 'When to use' section specifying the scenario for permanent removal. It clarifies threads are not deleted and the action is irreversible. However, it could explicitly mention alternatives like ai_tags_remove_from_thread for when the goal is to just detach the tag.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_listA
Read-onlyIdempotent
Inspect

List all personal AI tags.

AI tags are automatic message filters: the system runs a lightweight classifier on every incoming message and applies matching tags to threads. This lets AI agents skip expensive full analysis on most messages — they only act on threads that match relevant tags, dramatically cutting LLM costs.

When to use:

  • Check which auto-classification filters exist before creating one

  • Get tag IDs for add_to_thread / remove_from_thread

  • See how many threads each tag currently matches

Returns all tags with thread counts (non-archived, included threads only).

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. The description adds valuable context: returns all tags with thread counts for non-archived included threads only, and explains the underlying auto-classification mechanism. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized: a one-line summary, a paragraph explaining AI tags, a bulleted 'When to use' section, and a note on return format. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no parameters and no output schema. The description fully covers what the tool does, when to use it, and what data it returns (tags with thread counts, non-archived, included threads only). No gaps remain for an agent to understand its behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist in the schema (schema coverage 100%). The description explains what the tool returns (tags with thread counts), which is the relevant semantic information for an agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'List all personal AI tags' with precise verb and resource, and includes a clear explanation of what AI tags are. Siblings like create/delete/update/add/remove are clearly distinct, making purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage scenarios: checking existing filters before creating, getting IDs for add/remove, and viewing thread counts. This guides the agent on when to use this tool versus related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_remove_from_threadA
DestructiveIdempotent
Inspect

Remove a specific AI tag from a thread.

When to use:

  • User wants to un-label or remove a specific tag from a conversation

  • User wants to correct an incorrectly applied tag

Provide both thread_id and tag_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
tag_idYesID of the tag to remove
thread_idYesID of the thread to remove the tag from
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states the removal action, which aligns with annotations (destructiveHint: true). However, it does not add behavioral context beyond what annotations already provide (e.g., no mention of side effects or permissions).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence plus a bulleted list. No unnecessary words, and the key information is front-loaded. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has two required parameters and no output schema. The description covers purpose and usage context. It is complete enough for a simple removal operation, though it could mention the lack of return value or error cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions already present. The description adds no additional meaning beyond instructing to provide both IDs. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Remove') and resource ('a specific AI tag from a thread'). However, it does not explicitly distinguish from the sibling tool 'ai_tags_add_to_thread', though the name itself provides differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' examples (un-labeling, correcting tags) and instructs to provide both thread_id and tag_id. It does not mention when not to use or alternatives, but the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_updateAInspect

Update an existing personal AI tag's name, description, icon, color, or active state.

When to use:

  • User wants to rename a tag

  • User wants to change a tag's icon, color, or description

  • User wants to enable or disable a tag

Provide only the fields you want to change. At least one field is required.

ParametersJSON Schema
NameRequiredDescriptionDefault
iconNoNew emoji icon (max 10 chars, optional)
nameNoNew tag name (max 100 chars, optional)
colorNoNew color key. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to leave the color unchanged.
tag_idYesID of the tag to update
is_activeNoEnable (true) or disable (false) the tag. OMIT to leave the active flag unchanged.
descriptionNoNew LLM hint (max 500 chars; empty string clears it, optional)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=false and destructiveHint=false. The description adds context: updates apply to 'existing' tags, at least one field is required. No contradictions. It does not detail side effects, but the mutation is clear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: two sentences plus a bullet list. It front-loads the purpose, lists when-to-use scenarios, and closes with a usage note. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters (1 required), no output schema, and clear sibling tools, the description covers the essential usage. It lacks details on the return value, but the update context is sufficiently explained for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The description adds value by stating 'Provide only the fields you want to change' and 'At least one field is required,' which guides partial updates. Baseline for high coverage is 3, but the added guidance lifts it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Update an existing personal AI tag's name, description, icon, color, or active state.' This clearly identifies the verb (update) and resource (personal AI tag), and distinguishes it from sibling tools like ai_tags_create or ai_tags_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' bullet points (rename, change icon/color/description, enable/disable). It notes partial updates are allowed. It does not explicitly exclude cases or name alternatives, but the sibling tool names imply creation/deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_attach_identityA
Read-onlyIdempotent
Inspect

Switch the page's identity by loading saved cookies + storage. Use only when switching identity mid-page; for first navigation, pass identity_name to browser.open instead.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
identity_nameYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations by explaining that it loads saved cookies and storage. However, the readOnlyHint annotation might contradict the act of modifying the page's identity, though likely not severe. The description is clear about the non-destructive nature of loading saved state.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and followed by usage guideline. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with 2 required params and no output schema. The description covers purpose and usage constraints well, but lacks details on return behavior or potential side effects beyond loading cookies. Still, it is largely complete for its context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 0% description coverage and the tool description does not explain the parameters (page_id, identity_name). The identity_name is mentioned in guidelines but no added detail on format or purpose. The description fails to compensate for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Switch the page's identity') and the mechanism ('loading saved cookies + storage'). It distinguishes from the sibling tool 'browser_open' by specifying when to use which, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit guidance: 'Use only when switching identity mid-page; for first navigation, pass identity_name to browser.open instead.' This clearly states when to use and when not, with a direct pointer to an alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_clickC
Read-onlyIdempotent
Inspect

Click an element. ref is a CSS selector (e.g., 'button.submit').

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
page_idYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description says 'Click an element' implying a mutating action, but annotations set readOnlyHint=true, which contradicts the tool's purpose. No additional behavioral context is provided beyond this contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with one sentence and an example, containing no unnecessary words. It is front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite a simple tool and existing annotations, the description is incomplete: it omits return behavior, side effects (e.g., navigation), and fails to resolve the annotation contradiction. Minimal viability is compromised.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, placing high burden on description. The description only explains `ref` as a CSS selector, leaving `page_id` undocumented and adding marginal value for one of two parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Click an element' and specifies that `ref` is a CSS selector, which is a specific verb-resource combination. It effectively distinguishes from sibling browser tools like browser_fill or browser_hover.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., browser_fill, browser_hover). The description lacks context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_closeA
Read-onlyIdempotent
Inspect

Close a page opened by browser.open.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true, covering safety and idempotency. The description adds that the tool only applies to pages opened by browser.open, but does not elaborate on side effects or error behavior, making additional transparency limited.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that conveys the essential purpose without wasted words. Every word is necessary to specify both the action and the constraint on the page's origin.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema, annotations covering behavior), the description is largely adequate. It explains what the tool does and when it's applicable. However, it could briefly mention that closing an invalid page may produce an error or no effect, but completeness is high for this simple action.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage for the sole parameter 'page_id'. The description does not explain what page_id represents or how to obtain it (e.g., from browser.open return value). With no param info in the description, the agent must infer from context, adding little value over the schema's type and required flag.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool closes a page, specifying it must be one opened by browser.open. This verb+resource combination is specific and distinguishes it from sibling browser tools like browser_navigate_back or browser_tabs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage after browser.open, but provides no explicit guidance on when not to use it (e.g., for pages not opened by browser.open) or alternatives. It lacks exclusions or context about prerequisites like verifying the page exists.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_console_messagesA
Read-onlyIdempotent
Inspect

Return console.log/warn/error events captured since the last drain. Filter by level ('log'|'info'|'warning'|'error'|'debug') and/or pattern (regex). Buffer caps at 500 entries; oldest are dropped first. Set clear=false to peek without draining.

ParametersJSON Schema
NameRequiredDescriptionDefault
clearNo
levelNo
page_idYes
patternNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds valuable context: buffer cap of 500 entries, oldest dropped first, and the effect of the clear parameter. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences. The first sentence states the purpose, the second lists filters, and the third adds behavioral details (buffer limit, clear peeking). No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, filtering, and buffer behavior. It does not specify the return format (e.g., structure of events), but for a simple read-only logging tool this is acceptable. Annotations and description together provide sufficient context for correct use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description explains three of four parameters: level (with example values), pattern (regex), and clear (peek vs drain). The page_id parameter is not explicitly described but its purpose is implied from the tool name and context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns console messages (log/warn/error) and mentions filtering. It distinguishes from sibling browser tools like browser_click or browser_snapshot by specifying the console event resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: it returns events since last drain, and advises using clear=false to peek without draining. However, it does not explicitly compare to alternatives, though none exist among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_dragA
Read-onlyIdempotent
Inspect

Drag one element onto another. source_ref is the element to grab; target_ref is where to drop. Both are CSS selectors. Used for slider captchas, kanban, drag-and-drop uploads.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
source_refYes
target_refYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims a write operation ('Drag one element onto another'), but annotations declare readOnlyHint: true, which is a direct contradiction. Also no mention of behavioral details like potential side effects or requirements beyond parameter types.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no extraneous words, front-loaded with action and immediately clarifies parameter purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks explanation of return behavior since no output schema is provided. Does not cover page_id or potential failure modes. Adequate for a simple drag operation but incomplete given the annotation contradiction.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description explains source_ref and target_ref as CSS selectors, adding context beyond the schema's property names. However, it does not explain the page_id parameter, which is required, and schema coverage is 0%.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action 'Drag one element onto another', explains the two CSS selector parameters, and gives explicit use cases like slider captchas, kanban, and drag-and-drop uploads. This distinguishes it from sibling tools like browser_click or browser_hover.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear examples of when to use (slider captchas, kanban, drag-and-drop uploads), but does not explicitly mention when not to use or list alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_evaluateA
Read-onlyIdempotent
Inspect

Run JavaScript in the page context and return the result. Use for state not in the a11y tree, captcha iframe inspection, DOM events. Expression can be a value (e.g., 'document.title') or an arrow function ((arg) => ...) — pass arg via the arg parameter. Result is JSON-serialized; non-serializable values become strings. 256KB cap on output.

ParametersJSON Schema
NameRequiredDescriptionDefault
argNo
page_idYes
expressionYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds beyond that: expression can be a value or arrow function, result is JSON-serialized with non-serializable becoming strings, and a 256KB output cap. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that are well-structured: first sentence states purpose and use cases, second adds syntax details and constraints. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters and no output schema, the description covers syntax, serialization behavior, output size limit, and use cases. It provides sufficient context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% (no descriptions in schema). The description explains that 'expression' can be a value or arrow function, and the 'arg' parameter is used to pass arguments to the arrow function. This adds meaning beyond the bare schema types.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Run JavaScript in the page context and return the result.' It specifies use cases (state not in a11y tree, captcha iframe inspection, DOM events), distinguishing it from sibling browser tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides specific scenarios for use ('state not in the a11y tree, captcha iframe inspection, DOM events'), implying when to use. It does not explicitly mention when not to use or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_file_uploadA
Read-onlyIdempotent
Inspect

Attach files to an . Pass either local_paths (absolute host paths) or data (list of {name, mime, base64} blobs written to /tmp). 25MB cap per file.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
dataNo
page_idYes
local_pathsNo
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states 'Attach files', implying a mutation, but annotations declare `readOnlyHint=true`. This is an annotation contradiction per the rubric, requiring a score of 1.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no extraneous information. The main action is front-loaded. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the two input modes, temp file handling, and size limit. Missing details on the required `page_id` and `ref` parameters, but otherwise fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It explains `local_paths` and `data` but omits `page_id` and `ref`, which are required. Partial compensation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool attaches files to an <input type=file> and distinguishes the two methods of providing file content. This is specific and separates it from other browser interaction tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use `local_paths` vs `data` but does not explicitly contrast with other browser tools like `browser_fill` or `browser_click`. The context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_fillC
Read-onlyIdempotent
Inspect

Fill an input or textarea with the given value. ref is a CSS selector (e.g., 'input[name=email]').

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
valueYes
page_idYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description claims the tool modifies the page by filling a field, but annotations set readOnlyHint=true, indicating no modifications. This is a contradiction. The description does not disclose any behavioral traits beyond the action itself.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short (one sentence) and gets to the point, but it sacrifices important details, making it adequate but not well-rounded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 required parameters and no output schema, the description is incomplete. It misses details about the 'page_id' parameter and the effect of filling (e.g., whether it triggers events).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains that 'ref' is a CSS selector, adding value beyond the schema. However, it does not explain 'page_id' or 'value' beyond 'given value'. With 0% schema description coverage, the description should compensate more.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Fill') and the target resource ('input or textarea'), and distinguishes from sibling tools like browser_fill_form and browser_type by specifying a single element via CSS selector.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (filling a single input/textarea) but does not explicitly state when not to use or compare with alternatives like browser_type or browser_fill_form.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_fill_formA
Read-onlyIdempotent
Inspect

Fill multiple form fields in one call. fields is a list of {ref, value} dicts. ref is a CSS selector; value is a string (text) or boolean (checkbox). Saves N round-trips vs calling browser.fill repeatedly.

ParametersJSON Schema
NameRequiredDescriptionDefault
fieldsYes
page_idYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description says 'Fill multiple form fields' which is a write operation, but the annotation readOnlyHint=true indicates the tool is read-only. This is a serious contradiction and fails to disclose the behavioral conflict. No additional behavioral details are provided beyond the contradictory annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the purpose, and includes essential parameter and benefit information without any unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers input and benefit but does not mention expected return values or error handling. Without an output schema, this omission is notable, though the tool's simplicity partially mitigates it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains the structure of the `fields` parameter in detail, specifying that each item is a dict with `ref` (CSS selector) and `value` (string or boolean). This adds significant meaning beyond the schema, which has no descriptions and 0% coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fills multiple form fields in one call, using a specific verb and resource. It distinguishes itself from the sibling browser_fill by highlighting the round-trip savings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear usage context by comparing to browser.fill, but does not explicitly state when not to use it or list other alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_handle_dialogA
Read-onlyIdempotent
Inspect

Respond to a pending JS dialog (alert/confirm/prompt). Pass accept=true for OK or false for Cancel. For prompt() dialogs also pass prompt_text. Dialogs are queued at page-open time; returns {pending: false} if none is waiting.

ParametersJSON Schema
NameRequiredDescriptionDefault
acceptYes
page_idYes
prompt_textNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations, such as dialog queuing and the return value when no dialog is waiting. However, it contradicts the readOnlyHint annotation, as responding to a dialog is a mutation. This contradiction reduces transparency. The description does not disclose potential side effects like page state changes after dismissing the dialog.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each serving a clear purpose: defining the tool, explaining core parameters, and adding behavioral notes. There is no redundancy or filler, making it highly efficient for the agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 parameters, no output schema), the description covers the main behavior and return value. It mentions the queuing mechanism and the case of no waiting dialog. However, it could be improved by mentioning error states or prerequisites (e.g., ensuring a dialog is actually pending). Overall, it is near complete for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, so the description must compensate. It explains two parameters (accept and prompt_text) well, providing clear usage. However, it fails to explain the required page_id parameter, which is crucial for specifying which page's dialog to handle. This omission significantly reduces parameter clarity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: responding to pending JS dialogs. It specifies the dialog types (alert/confirm/prompt), which distinguishes it from other browser tools like clicking or navigating. The verb 'respond' and resource 'pending JS dialog' are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives usage hints like 'Pass accept=true for OK or false for Cancel' and explains the prompt_text parameter, but it does not explicitly state when to use this tool versus alternatives (e.g., when a dialog appears). The mention that dialogs are queued at page-open time provides some context, but there is no guidance on when not to use it or how to detect a pending dialog.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_hoverA
Read-onlyIdempotent
Inspect

Hover the mouse over an element (reveals tooltips + hover menus). ref is a CSS selector.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint. The description adds that hovering reveals UI elements, which is behavioral context beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, concise and front-loaded. Every word adds value without waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the purpose and one parameter, but omits explanation of 'page_id'. For a simple tool, this is adequate but incomplete given the 0% schema coverage. No output schema, so return values are not addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must explain parameters. It explains 'ref' as a CSS selector but does not explain 'page_id' at all, leaving a gap in understanding the second required parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Hover' and resource 'element', and explains the effect: reveals tooltips and hover menus. This distinguishes it from sibling tools like browser_click or browser_fill.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to reveal tooltips/hover menus) but does not explicitly state when not to use or provide alternatives. It gives clear context but lacks exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_navigate_backA
Read-onlyIdempotent
Inspect

Navigate back in the page's history (browser back button). Returns the new URL + title.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds that navigation modifies the page history (URL changes), which is a behavioral trait not fully captured by annotations. However, it does not disclose edge cases like what happens if there is no history, so transparency is adequate but not exceptional.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, front-loading the action and return value. Every word is necessary, with no extraneous details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers the primary action and return value. It does not mention prerequisites (e.g., a page must be open with history) or error states (e.g., no history to navigate). For a straightforward tool, this is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter (page_id) with 0% description coverage, meaning no textual documentation in the schema or description. The tool description does not elaborate on the parameter's purpose or valid values, leaving the agent to infer from the parameter name alone. This is insufficient for confident usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool navigates back in page history, mimicking the browser back button, and specifies it returns the new URL and title. This is a specific verb+resource pairing that distinguishes it from sibling tools like browser_open or browser_click.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives (e.g., browser_open with a specific URL). However, the purpose is straightforward enough that usage context is implied, earning a baseline score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_network_requestsA
Read-onlyIdempotent
Inspect

List HTTP requests the page made since open or last drain. Optional filters: method (GET/POST/...), url_pattern (regex), status_min (e.g. 400 for errors). Captures up to 200 most recent requests per page.

ParametersJSON Schema
NameRequiredDescriptionDefault
clearNo
methodNo
page_idYes
status_minNo
url_patternNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a read-only, idempotent operation. The description adds behavioral details: capacity limit of 200 requests, data recency ('since open or last drain'), and optional filters. It does not explain the 'drain' mechanism but overall improves transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose, and includes necessary details without extraneous words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate complexity (5 params), the description covers the tool's purpose, filters, and capacity. It lacks explanation of the 'clear' parameter and drain behavior, but overall provides a working understanding for a read-only list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description partially compensates by explaining filters 'method', 'url_pattern', and 'status_min' with examples. However, it omits the 'clear' parameter and does not fully describe all five parameters. The page_id is implied but not detailed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists HTTP requests made by a page, with specific verb 'List' and resource 'HTTP requests'. It distinguishes from sibling browser tools that perform actions like clicking or opening, and adds scope 'since open or last drain'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing network requests but does not explicitly state when to use this tool versus alternatives like browser_console_messages. No exclusions or comparisons are provided, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_openA
Read-onlyIdempotent
Inspect

Open a URL in a remote browser. Optional identity_name attaches the workspace's saved login cookies first.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYes
workspace_idYes
identity_nameNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating a safe, non-destructive operation. The description adds behavioral context about attaching login cookies via identity_name, which is not covered by annotations. However, it does not describe other behaviors like whether the browser tab is focused or if a new window is opened.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with no irrelevant details. Information is front-loaded: the primary action in the first sentence, optional feature in the second. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 3 parameters with 0% schema coverage, the description leaves workspace_id unexplained. Annotations partially compensate for safety, but the tool's return behavior (e.g., whether it returns a success status or tab info) is absent. For a simple open action, it is minimally adequate but could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description should compensate. It mentions 'URL' and 'identity_name' but omits 'workspace_id', which is required. The description adds minimal meaning beyond field names; it does not specify URL format, workspace_id purpose, or identity_name constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Open a URL in a remote browser', specifying the exact action (open) and resource (URL in remote browser). It is distinct from sibling tools like browser_click or browser_fill, which involve interactions within the browser after navigation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description hints at a specific use case with 'optional identity_name attaches the workspace's saved login cookies first', but does not explicitly state when to use this tool versus other browser navigation tools like browser_navigate_back or browser_tabs. No exclusions or alternative tool references are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_press_keyA
Read-onlyIdempotent
Inspect

Press a keyboard key (e.g., 'Enter', 'Tab', 'Escape', 'ArrowDown') or a single character. Optional ref selector focuses an element first.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYes
refNo
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds that it presses keys or single characters and optionally focuses an element. No contradiction; additional context is useful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with action and examples. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with no output schema. Description covers core behavior, examples, and optional ref. Annotations cover safety. Complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%. Description explains `key` with examples and `ref` as focusing an element, but `page_id` is not mentioned. Compensates partially, but missing explanation for required page_id.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (press a keyboard key), gives specific examples ('Enter', 'Tab', etc.), and mentions optional focusing. It distinguishes from siblings like browser_type (which types strings) and browser_click.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly suggests using `ref` to focus an element before pressing, and examples show typical keys. Lacks explicit 'when to use vs alternatives', but context with sibling names helps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_resizeA
Read-onlyIdempotent
Inspect

Resize the page viewport. Useful when a site serves different HTML based on viewport width (mobile vs desktop) or when an anti-bot scores risk by viewport dimensions.

ParametersJSON Schema
NameRequiredDescriptionDefault
widthYes
heightYes
page_idYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and destructiveHint=false, implying no persistent data modification. The description adds value by explaining the effect on viewport and its relevance to page rendering, but it does not disclose potential side effects like triggering resize events or affecting page state beyond viewport dimensions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with no wasted words. It front-loads the core action and immediately follows with practical use cases.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with three required parameters and no output schema, the description is adequate but incomplete. It lacks parameter details and does not describe the return value (or lack thereof). The use cases add context but do not fully compensate for missing parameter semantics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides no information about the parameters (page_id, width, height) such as units, valid ranges, or defaults. The agent cannot infer input details from the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Resize the page viewport.' It also provides specific use cases (mobile vs desktop HTML, anti-bot detection) that distinguish it from other browser tools. The tool name reinforces the purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear context on when to use the tool (responsive design testing, anti-bot evasion) but does not explicitly mention when not to use it or list alternatives. However, the context is sufficient for an agent to infer appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_select_optionA
Read-onlyIdempotent
Inspect

Pick option(s) in a native dropdown. Pass value (matches the option's value attr) OR label (matches its visible text). Lists allowed for multi-select.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
labelNo
valueNo
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds that the tool targets native select dropdowns and supports multi-select via lists, giving useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Pithy two-sentence description with no wasted words. The action is front-loaded ('Pick option(s)') and critical usage details are provided efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is present, and the description does not mention return values, error handling, or prerequisites (e.g., element must be a select). While annotations cover safety, the description lacks complete contextual guidance for a tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, so the description must add meaning. It explains that 'value' matches the option's value attribute and 'label' matches visible text, and that arrays are allowed for multi-select. However, it does not describe 'ref' or 'page_id', leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it picks options in a native <select> dropdown, distinguishing it from other browser interaction tools like browser_click or browser_fill. The description explicitly mentions using 'value' or 'label' to specify options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explains when to use (for select dropdowns) and how to use (value vs label, lists for multi-select). Does not explicitly state when not to use or mention alternatives, but provides sufficient context for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_snapshotA
Read-onlyIdempotent
Inspect

Return a YAML aria_snapshot of the page DOM. Truncated at 32KB. Use the snapshot to find element refs for browser.click / browser.fill.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds that the snapshot is truncated at 32KB, which is important behavioral context. No contradictions; the description complements annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, each serving a purpose: first defines the output, second gives usage guidance. No redundancy or filler. Very efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers the essential aspects: output format (YAML aria_snapshot), size limit (32KB), and purpose (finding element refs). It does not detail the snapshot structure, but the term 'aria_snapshot' is self-explanatory. Slightly more detail on return structure could improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the schema provides no parameter descriptions. The description does not explain the 'page_id' parameter, assuming prior knowledge. This is a gap; the tool description should at least clarify that page_id identifies the browser page from which to take the snapshot.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a YAML aria_snapshot of the page DOM, truncated at 32KB, and explicitly mentions its use for finding element refs for browser.click and browser.fill. This provides a specific verb and resource, distinguishing it from sibling browser tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: 'Use the snapshot to find element refs for browser.click / browser.fill.' This provides clear context for usage. However, it does not mention when not to use it or provide alternatives, which could improve clarity further.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_tabsA
Read-onlyIdempotent
Inspect

Manage tabs within the same BrowserContext as page_id. action ∈ {list, switch, close, new}. For list, returns all open tab metadata; for new, returns the new tab's page_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlNo
actionYes
tab_idNo
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent behavior. The description adds context about BrowserContext association and return behavior for list and new actions, providing useful supplementary information.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the purpose and action set. No wasted words, though slightly more structure (e.g., bullet points) could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the core functionality and key actions but omits details on return values for switch/close and the role of optional parameters. Lacks an output schema, leaving some gaps for a 4-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description partially explains action and page_id but fails to clarify optional parameters like url (likely for new action) and tab_id (for switch/close). This leaves ambiguity for an AI agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it manages tabs within a BrowserContext, listing specific actions (list, switch, close, new) and their outcomes, which distinguishes it from sibling browser tools like browser_open or browser_close.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use this tool (for tab management actions), anchored to a specific page_id. It doesn't explicitly exclude alternatives, but the action enumeration provides clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_take_screenshotA
Read-onlyIdempotent
Inspect

Capture a PNG screenshot of the page or a specific element. Returns base64-encoded image bytes. Use sparingly — favor browser.snapshot for structured DOM understanding.

ParametersJSON Schema
NameRequiredDescriptionDefault
refNo
page_idYes
full_pageNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. Description adds output format (base64) and usage caution, which is helpful but does not detail potential side effects or permissions beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences: first captures action and output, second gives guidance. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description explains return format. Covers purpose and usage guidelines well, but lacks parameter details and error/limitation information. Mostly complete for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 3 parameters with 0% description coverage. Description only indirectly hints at 'page or specific element' (page_id and ref), but does not explain full_page or ref's format/meaning. Insufficient compensation for missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it captures a screenshot (PNG) of a page or specific element and returns base64-encoded image bytes. Distinguishes from sibling browser.snapshot by mentioning preference for structured DOM understanding.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to use sparingly and favor browser.snapshot for structured DOM understanding, providing clear usage context and alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_typeA
Read-onlyIdempotent
Inspect

Type text into an element with per-keystroke delay (organic). Each character dispatches keydown/keypress/keyup, unlike browser.fill which replaces .value instantly. Use when the page listens to keystroke events or for typing-speed fingerprint checks. delay_ms defaults to 50.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
textYes
page_idYes
delay_msNo
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description contradicts the annotation 'readOnlyHint: true' by indicating the tool modifies the element (types text, dispatches events). According to rubric, score 1 if contradiction exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no wasted words. Each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and few annotations, the description covers purpose, usage, and key behavior. Lacks return value info but is largely complete for a typing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, requiring compensation. The description adds meaning only for delay_ms (default 50), but does not explain ref, text, or page_id. Partial compensation, thus score 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Type text into an element with per-keystroke delay (organic).' It distinguishes itself from sibling tool browser.fill by contrasting the event dispatch mechanism.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'Use when the page listens to keystroke events or for typing-speed fingerprint checks.' It also contrasts with browser.fill, providing clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_wait_forA
Read-onlyIdempotent
Inspect

Wait for a selector to appear OR a navigation URL to match a glob pattern. Provide ref (selector) OR url_pattern (glob).

ParametersJSON Schema
NameRequiredDescriptionDefault
refNo
page_idYes
timeout_msNo
url_patternNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only and nondestructive behavior. Description adds the specific conditions waited for, but does not disclose failure behavior, timeout handling, or return value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, no unnecessary details. Front-loaded with purpose and clear parameter instruction.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and simple parameters, description explains the core functionality well but omits what the tool returns (e.g., boolean, timeout error). Adequate but not complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage. Description explains the mutual exclusivity of 'ref' and 'url_pattern' and implies timeout, but does not elaborate on 'page_id' or the exact format of 'url_pattern' (glob). Adds moderate value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool waits for a selector to appear or a URL pattern match. Distinguishes from sibling browser actions by specifying the wait condition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies that either 'ref' or 'url_pattern' should be provided, but does not give guidance on when to use this tool versus alternatives like browser_click or browser_snapshot. Lacks explicit when-not-to-use scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_check_availabilityA
Read-onlyIdempotent
Inspect

Check when you have free time in Google Calendar. Shows busy periods and free slots in a given time range. Useful for finding meeting times or checking schedule conflicts.

ParametersJSON Schema
NameRequiredDescriptionDefault
end_timeNoEnd date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to end of start_time day, or 7 days from now.
start_timeNoStart date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to start of today.
calendar_idNoCalendar ID to check. Defaults to primary calendar.primary
working_hours_onlyNoIf true, only show free slots during working hours (9 AM - 6 PM). OMIT to show all free time (the default).
min_duration_minutesNoMinimum duration in minutes for free slots. Filters out short gaps. Default: 30 minutes.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description's safety profile is covered. The description adds limited behavioral context beyond the annotations, such as showing 'busy periods and free slots,' but does not disclose pagination or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise, front-loaded sentences with no wasted words. Every sentence serves the purpose of explaining the tool's functionality and usefulness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool, comprehensive schema with defaults, and read-only annotations, the description adequately covers what the agent needs to know. No output schema is present, but the return value is implied.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with detailed parameter descriptions, setting a baseline of 3. The tool description does not add additional semantic value beyond the schema, merely referencing 'time range' which is already defined.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it checks free time in Google Calendar, showing busy periods and free slots. It distinguishes from sibling tools like calendar_list_events by focusing on availability rather than listing events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly says 'Useful for finding meeting times or checking schedule conflicts,' providing clear context for when to use. However, it does not mention when not to use or alternatives, missing explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_create_eventAInspect

Create a new event in Google Calendar. Specify the title, start time, end time, and optionally invite attendees. Use ISO 8601 format for dates (e.g., 2024-12-15T14:00:00).

ParametersJSON Schema
NameRequiredDescriptionDefault
endNoEvent end time in ISO 8601 format. If not provided, defaults to 1 hour after start. Also accepts 'end_time' as alias.
startNoEvent start time in ISO 8601 format (e.g., 2024-12-15T14:00:00). Also accepts 'start_time' as alias.
titleNoAlias for summary - event title.
summaryNoEvent title/summary. Required. Also accepts 'title' as alias.
end_timeNoAlias for end - event end time.
locationNoEvent location (physical address or virtual meeting link).
timezoneNoTimezone for the event (e.g., 'America/New_York', 'UTC').
attendeesNoList of attendee email addresses to invite.
start_timeNoAlias for start - event start time in ISO 8601 format.
calendar_idNoCalendar ID to create event in. Defaults to primary calendar.primary
descriptionNoEvent description/notes.
add_google_meetNoIf true, automatically creates a Google Meet link for the event. OMIT to skip Meet link.
conference_dataNoConference data for Google Meet. Alternative to add_google_meet flag.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-readonly and non-destructive behavior, so the description's 'Create' statement is consistent. The description adds value by noting default behavior (1-hour duration if end not provided) and alias handling ('title' for 'summary'), but does not disclose other traits like permission requirements or response format. With annotations present, this is adequate but not exceptional.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with just two sentences. The first sentence states the core action, and the second provides important format and optional parameter details. No wasted words, and the structure is front-loaded with the most critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, aliases, nested objects), the description covers the essential points but omits details like the calendar_id default, conference_data usage, and alias explanations. However, the schema fully documents all parameters, so the description effectively complements it. It is nearly complete for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds meaning by specifying ISO 8601 format for dates and mentioning optional attendee invitations. However, the schema already documents each parameter's description and aliases, so the description does not significantly enhance understanding beyond highlighting key parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new event in Google Calendar' with a specific verb and resource. It distinguishes from sibling tools like calendar_update_event and calendar_delete_event by focusing on creation. The mention of key parameters (title, start, end, attendees) further clarifies the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool (creating events) and includes format guidelines (ISO 8601). However, it does not explicitly state when not to use it or mention alternatives like calendar_update_event for modifications. The sibling tools are distinct enough to avoid confusion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_delete_eventA
DestructiveIdempotent
Inspect

Delete an event from Google Calendar. This action cannot be undone. Use with caution.

ParametersJSON Schema
NameRequiredDescriptionDefault
event_idYesID of the event to delete. Required.
calendar_idNoCalendar ID containing the event. Defaults to primary.primary
send_notificationsNoWhether to send cancellation notifications to attendees.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint=true and readOnlyHint=false. The description adds that the action 'cannot be undone', which provides additional context beyond the annotations. However, it does not detail other behavioral traits like authorization needs or rate limits, keeping the score moderate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences: one stating the purpose and one adding a warning. Every sentence adds value with no unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation with no output schema, the description effectively conveys the essential behavior (deletion, irreversibility) and caution. It is nearly complete, though it could briefly mention the send_notifications parameter from the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100%, so the schema already documents all parameters. The description does not add any additional meaning beyond what the schema provides, resulting in the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and the resource ('an event from Google Calendar'), distinguishing it from sibling tools like calendar_create_event or calendar_update_event.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use with caution' but does not provide explicit guidance on when to use this tool versus alternatives, nor does it mention prerequisites or exclusion criteria. It lacks the when-to-use and when-not-to-use information expected for a high score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_list_eventsA
Read-onlyIdempotent
Inspect

List events from Google Calendar. Shows upcoming events by default. Can filter by date range and search query.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryNoFree text search query to filter events.
date_toNoEnd date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to 7 days from now. Alias: time_max.
date_fromNoStart date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to now. Alias: time_min.
calendar_idNoCalendar ID to list events from. Defaults to primary calendar.primary
max_resultsNoMaximum number of events to return.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds beyond that by specifying default timeline (upcoming), date range filtering, and search query support. This provides useful behavioral context without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-loading the purpose and key capabilities. No filler or redundant information. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with no output schema, the description covers default behavior and filters. Missing explicit mention of pagination (though max_results param exists) or ordering, but still fairly complete. No major gaps given annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for all 5 parameters. The description mentions date range and search query, aligning with params, but does not add new meaning beyond the schema. Baseline 3 is appropriate since schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List', the resource 'events from Google Calendar', and the scope: shows upcoming by default, with filtering by date range and search query. This distinguishes it from sibling tools like calendar_create_event and calendar_delete_event.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on default behavior and filtering options but does not explicitly state when not to use this tool or mention alternatives like calendar_check_availability for availability checks. Some implied usage, but no explicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_update_eventAInspect

Update an existing event in Google Calendar. Can modify title, time, location, description, and attendees. Only specified fields will be updated.

ParametersJSON Schema
NameRequiredDescriptionDefault
endNoNew end time in ISO 8601 format. Optional.
startNoNew start time in ISO 8601 format. Optional.
summaryNoNew event title/summary. Optional.
event_idYesID of the event to update. Required.
locationNoNew event location. Optional.
attendeesNoNew list of attendee emails. Replaces existing attendees.
calendar_idNoCalendar ID containing the event. Defaults to primary.primary
descriptionNoNew event description. Optional.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description notes modifiable fields and partial update, consistent with annotations (readOnlyHint=false). Does not disclose side effects like attendee replacement (covered in schema) or return format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action and resource, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description covers core functionality but lacks details on error handling, permissions, or usage of time parameters. Schema fills gaps for parameter descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%. Description maps 'title, time, location, description, attendees' to schema fields but adds no meaning beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool updates an existing Google Calendar event and lists modifiable fields (title, time, location, description, attendees). Distinguishes from sibling tools like create, delete, and list events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States 'Only specified fields will be updated,' implying partial update behavior. Does not explicitly compare to alternatives, but context makes it clear this is for modifications.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_get_transcriptA
Read-onlyIdempotent
Inspect

Get the structured transcript and final state of a voice call by call_id. Returns per-turn rows in chronological order, call status (active/completed/failed/abandoned), duration, and an outcome field telling whether the recipient picked up (answered/no_answer/busy/declined/failed/unknown). answered_at is non-null once the recipient picked up. Returns active turns if the call is still in progress.

ParametersJSON Schema
NameRequiredDescriptionDefault
call_idYesCall ID returned by calls.make in _meta.call_id.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as read-only, idempotent, and non-destructive. The description adds valuable behavior: returns structured rows in chronological order, specific fields like outcome and answered_at, and handles active calls. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no waste. The first sentence front-loads the purpose, and subsequent sentences add concise details. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with no output schema, the description covers the return structure thoroughly: rows, status, duration, outcome, answered_at, and handling of active calls. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema fully describes the single parameter call_id with source info. The description does not add additional parameter semantics beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the purpose: retrieving the structured transcript and final state of a voice call by call_id. It specifies the returned fields (per-turn rows, call status, duration, outcome, answered_at), which distinguishes it from sibling tools like calls_list_active or calls_list_history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for getting transcript/state given a call_id, and notes it returns active turns for ongoing calls. However, it does not explicitly state when to use this tool versus alternatives or provide exclusion criteria. The context is clear but lacks explicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_hangupA
Read-onlyIdempotent
Inspect

Hang up an active voice call by call_id. Use after calls.make when the agent decides to terminate before the callee does, or to abort a stuck call. Idempotent: returns success if the call is already terminal.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoShort internal reason for ending the call (e.g. 'campaign timeout'). Stored on voice_sessions.metadata.
call_idYesCall ID returned by calls.make in _meta.call_id.
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description contradicts annotations: readOnlyHint=true but tool mutates state; destructiveHint=false but hanging up is destructive. This is a serious inconsistency, warranting a score of 1 per evaluation rules.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the core action and context. Every sentence is meaningful with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes usage and idempotency, but lacks return value details (e.g., success response format). Given no output schema, the description could do more to specify what 'returns success' means.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions (100% coverage). Description adds no extra parameter meaning beyond schema, meeting baseline but not exceeding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Hang up an active voice call by call_id.' Verb and resource are specific. Distinct from sibling tools like calls_make and calls_wait.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly recommends use after calls.make for agent-initiated termination or aborting stuck calls. Also notes idempotency, implying safe reuse. Does not explicitly state when to avoid, but scenarios are clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_activeA
Read-onlyIdempotent
Inspect

List active voice calls in this workspace. Use before calls.make on a Telegram account (only one MTProto call per account at a time) to check whether the line is free.

ParametersJSON Schema
NameRequiredDescriptionDefault
channelNoFilter by voice channel. OMIT to include both telegram and twilio.
channel_account_idNoFilter by channel_account.id (the calling Telegram account or Twilio number). Combine with channel for a per-line busy check.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description's behavioral detail about MTProto call limit adds value beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The primary purpose is front-loaded, and the second sentence provides a critical usage hint. Each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with good annotations and schema, the description is complete: it explains what it does, when to use it, and how parameters relate to the use case. No output schema is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with clear descriptions. The tool description enhances this by explaining the practical use of parameters (e.g., 'Combine with channel for a per-line busy check'), adding context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List active voice calls in this workspace' with a specific verb (list) and resource (active calls). It distinguishes from siblings like calls_list_history and calls_make by providing a usage context: checking if a Telegram line is free before making a call.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends using this tool before calls_make to check line availability, citing the constraint of one MTProto call per account. It provides clear context but does not explicitly exclude other scenarios or name alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_historyA
Read-onlyIdempotent
Inspect

Search historical voice calls in this workspace by participant name, contact_id, thread, channel, source, and/or date range. Returns one row per call (NOT per turn) with call_id, duration_seconds, outcome, direction, started_at, source, channel_label, and parent_thread_id (the originating chat thread for Telegram-group / Twilio-outbound / Meet calls). Pair with calls.get_transcript(call_id) for the full per-turn transcript. Use this instead of messages.read_history for cross-thread call queries — group calls and Meet sessions live on per-call sub-threads, not on the parent chat thread.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum calls to return (default 20, max 100).
sinceNoISO date or datetime lower bound (inclusive). Default: 90 days ago. Naive timestamps are interpreted as UTC.
untilNoISO date or datetime upper bound (inclusive). Default: now.
sourceNoFilter by voice_sessions.source: 'telegram' (1:1 + group), 'twilio' (PSTN), 'meet' (Google Meet bot), 'livechat' (in-app voice). OMIT to include all sources.
channelNoFilter by message-level channel of the call thread: 'telegram' (1:1 voice or group call sub-thread), 'twilio_voice', 'meet_voice', 'livechat_voice'. OMIT to include all voice channels.
thread_idNoRestrict to calls on this thread OR with this thread as their originating parent (Telegram group → call sub-thread back-link, Twilio outbound source_thread_id back-link).
contact_idNoFilter by exact entity_id (from contacts.find). Mutually exclusive with participant_name when both target the same person.
participant_nameNoFilter to calls whose parent thread has a participant matching this name (substring match against entity.title). Resolves group calls via the parent group's roster, not the per-call thread's speaker list.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. Description adds that it returns one row per call (not per turn) and lists returned fields, which enhances transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two focused sentences with no filler. Front-loaded with action and filters. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description lists returned fields. For a list tool with 8 parameters, it provides sufficient context. Could mention limit default but schema covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% parameter description coverage, so baseline is 3. The description adds value by explaining return structure and clarifying special parameters like thread_id and participant_name, earning a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches historical voice calls with specific filter parameters, and distinguishes from sibling tools like messages.read_history and calls.get_transcript.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises when to use this tool over alternatives: 'Use this instead of messages.read_history for cross-thread call queries' and pairs it with calls.get_transcript for per-turn transcripts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_makeAInspect

Place an outbound AUDIO/VOICE phone call via Twilio (PSTN) or Telegram (MTProto 1:1 call). Use this any time the user asks to 'call', 'ring', 'phone', 'dial', or have a spoken conversation. Do NOT use messages.send when the user asks to call someone — a call is real-time voice, not a text message. You conduct the conversation as the voice agent using the provided greeting and instructions.

ParametersJSON Schema
NameRequiredDescriptionDefault
channelNoVoice transport: 'twilio' (phone via PSTN — requires phone_number in E.164) or 'telegram' (MTProto 1:1 call — requires telegram_user_id, NOT a phone number or thread_id). OMIT to use 'twilio' (the default).
greetingYesThe first sentence the agent speaks immediately when the call connects. ALWAYS provide a greeting — without it the caller hears silence. Keep it short and natural. Example: 'Hi, this is Diana calling from DialogBrain. Do you have a moment to chat?'
report_backNoWhen to re-invoke you after the call ends. 'on_answer' (default) = only if the call was answered, 'always' = even on missed/failed calls, 'never' = fire and forget. Transcript is always stored regardless of this setting.
instructionsNoWhat to do during the call — objective, questions, tone. The AI generates a natural opening and guides the conversation. Example: 'Call about invoice #1234. Ask if they received it and when payment is expected. Be friendly and professional.'
phone_numberNoDestination phone number in E.164 format (e.g., '+15551234567', '+66812345678'). Required when channel='twilio'.
voice_agent_idNoOverride: specific voice agent to conduct the call. If omitted, uses the workspace's default voice agent. Must be an agent with execution_mode='voice'.
telegram_user_idNoDestination Telegram user ID (decimal int64 as string, e.g. '123456789'). Required when channel='telegram'. The caller account must have had prior interaction with this user — a cold contact cannot be reached via voice.
channel_account_idNoSpecific calling channel_account ID. For channel='twilio' this is the Twilio number; for channel='telegram' this is the connected Telegram account. If omitted, auto-selects the first active account of the matching channel.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=false, destructiveHint=false) indicate mutation but no destruction. Description adds that the call is outbound, uses specific transports, and the agent conducts a real-time conversation. It doesn't fully disclose potential costs or rate limits, but it provides sufficient behavioral context beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with the core purpose, immediately followed by usage guidance. Each sentence serves a distinct function: defining the tool and providing when-to-use/when-not-to-use. Ideal conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 parameters, no output schema, but 100% schema coverage, the description fully covers purpose, usage, and parameter semantics. It addresses ambiguity points (channel differences, greeting necessity) and provides situational context (e.g., 'cold contact cannot be reached via voice'). Complete for agent decision-making and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. However, the description adds significant meaning: explains the channel enum (twilio/telegram), emphasizes greeting is mandatory, details report_back options, instructions purpose, phone_number format (E.164), telegram_user_id constraints, voice_agent_id override, and channel_account_id selection. Every parameter is elaborated well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Place an outbound AUDIO/VOICE phone call via Twilio or Telegram' and specifies the triggers ('call', 'ring', 'phone', 'dial', or have a spoken conversation'). It distinguishes itself from sibling tool 'messages.send' by emphasizing real-time voice vs text. This strongly differentiates from other communication tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (user asks to call, ring, etc.) and when not to use ('Do NOT use messages.send when the user asks to call someone'). Provides clear context that it is for real-time voice conversation, not text messaging. No ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_meetA
Read-onlyIdempotent
Inspect

Dispatch a workspace AI agent into an active Google Meet call. The agent joins as a participant — it can hear the conversation, respond via TTS, see the shared screen (when vision is enabled on the agent), and answer questions about what's on screen. Use when the operator wants to delegate live meeting attendance to an agent (notes, Q&A, summarization, real-time support). The Meet URL must be in canonical 3-4-3 form, e.g. https://meet.google.com/abc-defg-hij. Lookup-redirect URLs are not supported — operator must use the share-link form.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of a voice agent (execution_mode=voice, enabled) in this workspace. Get it from agents.list.
meet_urlYesCanonical Google Meet URL — must match https://meet.google.com/<3 letters>-<4 letters>-<3 letters>, e.g. https://meet.google.com/abc-defg-hij. lookup/ redirects are NOT supported.
vision_modeNoScreen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call the vision_query tool for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and the executor splices the latest scene-change into each agent turn as ambient low-detail context. OMIT to use 'off' (the default).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description richly details agent behavior (hearing, TTS, screen sharing, answering questions) and vision_mode capabilities, adding significant context beyond the annotations. Annotations already mark the tool as readOnly and idempotent, but the description clarifies the behavioral impact on the meeting, which is valuable for agent decision-making.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core action, follows with capabilities, usage context, and a specific constraint—all in four tight sentences without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description could briefly mention what the tool returns (e.g., join status or participant ID). However, it covers the essential aspects: action, when to use, URL constraints, and agent capabilities, leaving minimal gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful constraints for meet_url (exact format and unsupported redirects) that are not fully captured in the schema's description. For agent_id and vision_mode, the description mostly echoes schema info, but the URL clarification justifies a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb-resource pair 'Dispatch workspace AI agent' and clearly distinguishes this tool from siblings like calls_make or calls_send_to_telegram_call by focusing on joining a live Google Meet with agent capabilities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('when the operator wants to delegate live meeting attendance') and provides critical URL format constraints ('must be canonical 3-4-3 form'). However, it does not explicitly state when not to use it or name alternatives, though the sibling list provides implicit differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_telegram_callA
Read-onlyIdempotent
Inspect

Dispatch a workspace AI agent into an active Telegram group call (t.me/call/ link). The agent joins as a participant via the workspace's Telegram account — it can hear the conversation, respond via TTS, see shared screens (when vision is enabled), and answer questions about what's on screen. Use when the operator wants to delegate live group-call attendance to an agent (notes, Q&A, summarization, real-time support). Pass either the full https://t.me/call/ URL or the bare slug token.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of a voice agent (execution_mode=voice, enabled) in this workspace. Get it from agents.list.
vision_modeNoScreen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call vision_query for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and splices the latest scene-change into each agent turn. OMIT to use 'off' (the default).
telegram_call_urlYesTelegram group-call invite — either the full URL (https://t.me/call/<slug>) or just the slug token. Slug is 12-64 chars from [A-Za-z0-9_-].
channel_account_idNoWorkspace Telegram channel account ID that joins as the bot. Optional — when the workspace has exactly one Telegram account, it's used by default. Required when multiple Telegram accounts exist.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes agent behavior (joins as participant, hears, responds via TTS, sees screens, answers questions). Annotations (readOnlyHint=true, etc.) are consistent with the description, which adds context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with main action, no redundant information. Could be slightly more structured but is efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description explains agent actions but omits what the tool returns (e.g., success status), error conditions (e.g., if call not active), or timeouts. Additional details would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage; description adds general context (e.g., agent joins via workspace account) but does not significantly enhance parameter meanings beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool dispatches a workspace AI agent into an active Telegram group call, specifying capabilities (hear, TTS, screen sharing). It distinguishes from sibling call tools by targeting Telegram group calls specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('when the operator wants to delegate live group-call attendance'), but does not mention when not to use or provide alternative tools for other call platforms.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_waitA
Read-onlyIdempotent
Inspect

Block until a voice call ends (status changes from 'active') or timeout elapses. Returns ended=true with final state when the call has ended; ended=false on timeout (re-issue to keep waiting). The returned state includes outcome so callers can branch on pickup vs. no-answer (answered/no_answer/busy/declined/failed/unknown). Default timeout 90s; cap 110s — bounded by nginx proxy_read_timeout 120s on /mcp.

ParametersJSON Schema
NameRequiredDescriptionDefault
call_idYesCall ID returned by calls.make in _meta.call_id.
timeout_secondsNoMax seconds to wait. Default 90, cap 110 (bounded below nginx 120s proxy_read_timeout). On expiry returns ended=False with status='active' so the caller can re-issue to keep waiting.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds rich behavioral details beyond annotations: blocking nature, return values (ended, outcome), timeout cap due to nginx proxy. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff, front-loaded with purpose. Every sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Handles return values (ended, outcome) and timeout behavior despite no output schema. Complete for a wait function.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds value by linking call_id to calls.make and explaining timeout cap and default. Enhances parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it blocks until a voice call ends or timeout, specifying the resource and action. Differentiates from siblings like calls_hangup and calls_list_active.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explains when to use: wait for call to end, and how to handle timeout (re-issue to keep waiting). No explicit alternatives mentioned, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_add_fileAInspect

Add a file to a knowledge collection.

The file must be uploaded and indexed first (files_upload + files_ingest). If the file was previously removed, it is re-enabled.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to add (from files_upload)
collection_idYesID of the collection
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint false, destructiveHint false), description reveals re-enabling behavior and prerequisite steps. Adds meaningful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, 35 words, front-loaded with purpose. No redundant information. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers prerequisites, behavior, and basic operation. No output schema but tool is simple. Could mention success response, but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with clear descriptions. Description adds little parameter meaning but aligns well. Baseline score appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Add a file to a knowledge collection', distinguishing it from siblings like collections_remove_file. The resource and action are unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisite (file must be uploaded and indexed) and notes re-enabling behavior. Lacks explicit 'when not to use' but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_assign_agentAInspect

Assign a knowledge collection to an AI agent.

Once assigned, the agent's knowledge.query will automatically scope RAG search to files in its assigned collections.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the AI agent
collection_idYesID of the collection to assign
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutation (readOnlyHint=false). The description adds value by detailing the specific behavioral effect: scoping the agent's knowledge query to the collection's files. It does not mention potential side effects like overriding existing assignments.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise, with two sentences: the first stating the action and the second explaining the consequence. No superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple assignment tool, the description adequately explains the purpose and effect. It does not mention prerequisites like existence of collection/agent, but these are implied. The return value is not described but likely void.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with basic descriptions for both parameters. The description does not add extra meaning beyond what the schema already provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'assign a knowledge collection to an AI agent' and distinguishes from siblings like unassign. It also explains the effect on knowledge.query scoping.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to link a collection for RAG scoping) but does not explicitly state exclusions or alternatives, such as what happens if the agent already has collections assigned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_createAInspect

Create a named knowledge collection.

Collections group files for RAG search. After creating, add files with collections.add_file and assign to agents with collections.assign_agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesCollection name (must be unique per user)
descriptionNoOptional description of the collection
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false and destructiveHint=false. The description adds workflow context but no additional behavioral traits like rate limits or side effects. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences, front-loaded with the main action, and no redundant information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool with only 2 parameters and no output schema, the description sufficiently explains the creation step and hints at subsequent actions, providing complete context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and both parameters have clear descriptions in the schema. The tool description does not add further meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create' and the resource 'named knowledge collection'. It distinguishes itself from sibling tools like collections_add_file and collections_assign_agent by placing them as subsequent steps, establishing a clear purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for usage by indicating the workflow after creation ('add files... assign to agents'), but does not explicitly state when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_deleteA
DestructiveIdempotent
Inspect

Delete a knowledge collection.

If the collection is assigned to agents, prompts, or channels, pass force=true to delete anyway. CASCADE removes all assignments automatically.

ParametersJSON Schema
NameRequiredDescriptionDefault
forceNoForce delete even if collection is in use. OMIT for the safe default (refuse to delete in-use collections).
collection_idYesID of the collection to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds beyond annotations by explaining behavior when the collection is in use and the effects of force and cascade. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, highly efficient, with critical information (force/cascade) presented immediately after the purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and the simplicity of a delete operation, the description is sufficient. It addresses error conditions (in-use) and options meaningfully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although the schema fully describes parameters, the description adds valuable context for 'force' (safe default vs. force delete) and introduces 'CASCADE' as a concept, enhancing understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete a knowledge collection' with a specific verb and resource. It differentiates from sibling tools like collections_create, collections_list, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use force=true or CASCADE, providing context for common scenarios. However, it does not explicitly compare to alternatives like collections_unassign_agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_listA
Read-onlyIdempotent
Inspect

List all knowledge collections in the workspace.

Collections are named groups of files used for RAG search. Auto-created collections (per-agent, per-prompt) are hidden by default.

ParametersJSON Schema
NameRequiredDescriptionDefault
include_inactiveNoInclude inactive collections. OMIT to list only active collections (the default).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint, so the tool is safe. The description adds value by explaining what collections are (RAG-related) and that auto-created ones are hidden by default, which clarifies the default behavior beyond the schema. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the action and provide essential context. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, the nature of collections, and default behavior. Since there is no output schema, mentioning the return format would improve completeness, but the tool name and context make it reasonably clear.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage for the single parameter include_inactive, with a clear description. The tool description adds no additional parameter semantics, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all knowledge collections in the workspace', specifying the verb (list) and resource (knowledge collections). It distinguishes from sibling tools like collections_create or collections_delete by focusing on listing. The additional context about collections being named groups for RAG search and auto-created collections hidden by default further clarifies scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to list collections) but does not explicitly state when not to use or suggest alternatives like collections_list_files. The note about hidden auto-created collections gives some context, but no direct guidance on filtering or comparing to other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_list_filesA
Read-onlyIdempotent
Inspect

List all files in a knowledge collection with their indexing status and chunk counts. Each returned file has a file_id (integer) that can be passed to messages.send as attachments=[file_id] to send the file to a contact, or to files.read to read its text content.

ParametersJSON Schema
NameRequiredDescriptionDefault
collection_idYesID of the collection
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and not destructive. The description adds context about returned fields (indexing status, chunk counts) and the file_id's role, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences, front-loaded with core purpose, and no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple schema and no output schema, the description sufficiently covers purpose, output details, and file_id utility, making it complete for an agent to select and invoke.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100% with a clear description for collection_id. The description adds no new parameter semantics, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists files with indexing status and chunk counts, using a specific verb and resource. It differentiates from sibling tools like collections_add_file and mentions a key output field (file_id) for downstream use.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly guide when to use this tool versus alternatives. It mentions downstream uses but lacks exclusions or comparisons with other list tools like agents_list_files.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_remove_fileAInspect

Remove a file from a knowledge collection.

The file itself is not deleted — only the collection membership is removed.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to remove
collection_idYesID of the collection
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide destructiveHint=false, and the description adds behavioral context by explicitly stating the file itself is not deleted. This adds value beyond annotations, though it doesn't detail other traits like permissions or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The key information is front-loaded and easily digestible.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal operation without an output schema, the description fully covers the behavior (non-destructive removal) and all necessary context. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with clear descriptions for both required parameters. The description does not add extra semantic information beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Remove a file from a knowledge collection' and clarifies that only membership is removed, not the file itself. However, it does not explicitly distinguish from sibling tools like 'agents_remove_file', though context from the name helps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description implies the use case (removing a file from a collection without deleting it), but there is no explicit when-to-use or when-not-to-use information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_unassign_agentAInspect

Remove a knowledge collection from an AI agent.

The collection and its files are not deleted — only the agent assignment is removed.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the AI agent
collection_idYesID of the collection to unassign
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=false, but the description adds valuable context by specifying that the collection and its files are not deleted, only the assignment is removed. This clarifies the non-destructive behavior beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences: the first states the primary purpose, and the second clarifies what is not affected. Every word earns its place, and it is front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, and the description does not mention return values, success indicators, or error cases (e.g., unassigning an unassigned collection). Given the low complexity, it covers the main action but misses some behavioral details that could be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters (agent_id and collection_id). The description does not add further meaning to the parameters beyond what the schema already provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (remove a knowledge collection from an AI agent) and distinguishes it from deletion by noting that the collection and files are not deleted, only the assignment. This differentiates it from siblings like collections_delete and collections_assign_agent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for unassigning rather than deleting, but it does not explicitly state when to use this tool versus alternatives like collections_delete or collections_assign_agent. No guidance on prerequisites or context is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_add_channelAInspect

🔗 Link a new channel identity (email, phone, LinkedIn, etc.) to an existing contact.

When to use:

  • User learns a contact's email or phone and wants to save it

  • User wants to link a LinkedIn/Instagram profile to an existing contact

  • Adding a second channel for an existing person

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema
NameRequiredDescriptionDefault
valueYesEmail address, phone number, or username for this channel
channelYesChannel type to add
contact_idYesentity_id from contacts.find
display_nameNoOptional display label for this identity
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description states it links a new channel, implying mutation. Annotations show readOnlyHint=false, destructiveHint=false, so non-destructive mutation. Description doesn't add detail on side effects or permissions but complements annotations well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three clear sentences plus bullet points for usage scenarios. Front-loaded with purpose, then when-to-use, then prerequisite. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description covers when to use, prerequisites, and parameter hints. Could mention return value (e.g., success or updated contact), but for a simple mutation it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description explains the purpose of contact_id (from contacts.find) and gives examples for value (email, phone, username). Adds context beyond enum and property descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it links a new channel identity to an existing contact. Verb 'Link' and resource 'contact' are specific. Distinguishes from sibling tools like contacts_find and contacts_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists scenarios when to use (learning email/phone, linking social profiles, adding second channel) and requires contact_id from contacts.find, providing clear prerequisites and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_discoverA
Read-onlyIdempotent
Inspect

Search for a contact on a live channel (Telegram, WhatsApp, etc.) before adding them. Use this to look up a person by username or phone number before calling contacts.sync.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryYesUsername, phone, or name to search for
channelYesChannel name: telegram, whatsapp, etc.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, openWorldHint, idempotentHint, destructiveHint=false. The description adds context about searching live channels and pre-sync usage, which aligns with and enriches the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The first sentence defines the core action and scope, the second provides usage guidance. Perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given rich annotations and full schema description, the description covers purpose and usage well. Missing explicit indication of return format, but the search nature and annotations fill the gap sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters have full descriptions in the schema (query: 'Username, phone, or name to search for', channel: 'Channel name: telegram, whatsapp, etc.'). The description does not add additional meaning beyond these descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Search for a contact on a live channel') and distinguishes it from sibling tools like contacts.sync by specifying the use case ('before adding them').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use this tool ('before calling contacts.sync') and provides context for lookup by username or phone. It lacks explicit exclusion criteria but is clear enough for decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_findA
Read-onlyIdempotent
Inspect

👤 Search for contacts in your address book by name or username.

When to use:

  • User asks 'find contact X' or 'who is Y?'

  • User wants to know someone's username or ID

  • Before sending a message to verify contact exists

  • To get contact's channel reference for messaging

Examples: ❓ User: 'find contact named [name]' → contacts_search(query='[name]', limit=5)

❓ User: 'who is [full name]?' → contacts_search(query='[full name]', limit=1)

❓ User: 'search for @username' → contacts_search(query='username', limit=10)

Returns: name, username, channel, channel_ref, similarity_score, match_type. Plus:

  • entity_id: local DB key — pass to contacts.profile. Null for live-discovered contacts (skip contacts.profile for those).

  • telegram_user_id (when channel='telegram'): the Telegram user ID — pass to calls.make / messages.send. NOT entity_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results to return
queryYesName or username to search for (supports partial matches)
channelNoFilter by channel. OMIT to search across all channels.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds significant behavioral context beyond annotations: explains return fields (entity_id and telegram_user_id) with specific guidance on using them with other tools. No contradiction with readOnlyHint, openWorldHint, etc.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with emoji summary, clear sections, and examples. No wasted sentences; each part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a search tool with no output schema. Covers when to use, return fields and their semantics (including behavior of entity_id for different match types), and examples.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds value by showing parameter usage in examples (e.g., query='[name]', limit=5), enhancing understanding of how to invoke the tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Search for contacts in your address book by name or username' with specific verb and resource. It distinguishes from sibling contact tools like contacts_profile or contacts_add_channel through examples and usage context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use' scenarios (e.g., user asks for a contact, before sending a message) and examples. Does not mention alternatives, but usage context is clear enough for correct selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_profileA
Read-onlyIdempotent
Inspect

👤 Get full profile for a contact: all channel identities, notes, role, capabilities, birthday.

When to use:

  • After contacts.find to get complete info about a specific person

  • To see all channels a contact is reachable on

  • To read notes, role, or capabilities for a contact

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema
NameRequiredDescriptionDefault
contact_idYesentity_id from contacts.find
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. The description adds value by detailing the returned data fields. It does not contradict annotations. No mention of error cases, but acceptable for a simple read tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise, using an emoji, bullet points, and front-loading key info. Every sentence adds value with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers purpose, usage, and expected return fields. It lacks explicit return structure but is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the only parameter. The description reinforces that contact_id comes from contacts.find, adding minimal extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves a full profile for a contact, listing specific types of information (channels, notes, role, etc.). It distinguishes itself from sibling tools like contacts.find by indicating it is used after that tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit when-to-use scenarios (after contacts.find, to see channels, etc.) and prerequisites (requires contact_id from contacts.find). However, it does not explicitly state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_syncAInspect

Add a discovered contact and open a conversation thread. Returns thread_id for the new conversation. Call contacts.discover first to verify the contact exists.

ParametersJSON Schema
NameRequiredDescriptionDefault
channelYesChannel name: telegram, whatsapp, etc.
identifierYesUsername or phone number to add
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-read-only and non-destructive behavior. Description adds that it opens a conversation and returns a thread_id, but doesn't detail side effects (e.g., duplicate handling). Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with main purpose, zero waste. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers key aspects: purpose, return value, prerequisite. Lacks edge cases (e.g., existing contact), but given simplicity and good schema/annotations, it's sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for both parameters. Description adds no extra meaning beyond schema. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (add discovered contact and open conversation), resource (contact/thread), and return value (thread_id). Distinguishes from siblings by referencing prerequisite contacts.discover.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisite (call contacts.discover first) and the action. Lacks explicit when-not-to-use or alternatives, but context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_updateAInspect

✏️ Update a contact's profile: name, notes, role, capabilities, birthday, preferred channel.

When to use:

  • User wants to add notes about a contact

  • User wants to set/update role or capabilities for a contact

  • User wants to rename a contact or update birthday

Requires contact_id (entity_id) from contacts.find. At least one optional field must be provided.

ParametersJSON Schema
NameRequiredDescriptionDefault
roleNoContact role (e.g. developer, client, partner). Empty string clears role.
notesNoFree-text notes/context about this contact. Empty string clears notes.
contact_idYesentity_id from contacts.find
birthday_dayNoBirth day 1-31 (must be set together with birthday_month)
capabilitiesNoList of capabilities (e.g. ['backend', 'design'])
display_nameNoNew display name (max 255 chars)
birthday_yearNoBirth year 1900-2100 (optional, standalone)
birthday_monthNoBirth month 1-12 (must be set together with birthday_day)
preferred_channelNoPreferred channel for contacting this person. OMIT to leave the preferred channel unchanged.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate mutation (readOnlyHint=false, destructiveHint=false). The description adds context that empty strings clear role/notes, and that at least one optional field must be provided. This goes beyond the annotations by clarifying update behavior and constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: a single introductory line followed by a bulleted 'When to use' section and a sentence about requirements. No redundant information; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 9 parameters (only 1 required) and no output schema, the description covers usage intent, prerequisites, and the constraint that at least one optional field must be provided. It is sufficiently complete for an update tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the 'at least one optional field' constraint not in the schema, and clarifies that omitting preferred_channel leaves it unchanged. This enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Update a contact's profile' and lists specific updatable fields, clearly indicating what the tool does. It distinguishes from sibling tools like contacts_find and contacts_profile by focusing on updating existing contacts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' scenarios (adding notes, setting role/capabilities, renaming, updating birthday) and states a prerequisite (contact_id from contacts.find). It lacks explicit when-not-to-use guidance but gives clear context for typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

documents_createAInspect

Generate a document (PDF / PPTX / DOCX / HTML) from markdown content authored by you.

REQUIRED parameters:

  • title: Short human-readable title.

  • content_markdown: The body. Slides separated by --- on its own line at the top level (Marp rule). Tables, code, lists, footnotes, definition lists, and {.section-header} class attrs all parse.

  • format: "document" (single flowing body) or "presentation" (slides).

  • output_type: "pdf", "pptx", "docx", or "html".

Optional:

  • theme: "default" | "corporate" | "minimal" | "pitch" | "invoice" | "contract" | "cinema" | "editorial" (default "default"). cinema/editorial are presentation-only (engine=marp).

  • language: BCP-47 tag (default "en"). Drives font fallback for Cyrillic/CJK/Arabic content.

  • engine: "marp" | "weasyprint". For format=presentation PDF/HTML only. Default "marp" (designer-grade Chromium renderer with full CSS3, web layout, and {.cover}/{.hero}/{.split}/{.stats}/{.dark} layout classes). Pass "weasyprint" for the legacy print-CSS path. Rejected for format=document or output_type=pptx.

DELIVERY CONTRACT (CRITICAL): After this tool returns a file_id, deliver the file by calling messages.send(attachments=[file_id], text="<short caption>"). Do NOT embed the file_id in a markdown link, a sandbox: URL, or /api/files/<id>/download text — those render as plain text on the recipient's channel, not as a file attachment. The attachments parameter is the ONLY way the file actually attaches.

CONVENTIONS:

  • Two-column slide: wrap with ::: cols\n::: col\n…\n:::\n::: col\n…\n:::\n:::.

  • Speaker notes (presentations only): ::: notes\n…\n::: at the end of a slide block. NOT <!-- ... --> (comments are escaped, not captured).

  • Section header slide: {.section-header} on its own line directly above the heading. Block-attr form, not inline.

  • Images: only ![](file:NNN) (workspace file_id), data:image/... URIs, or hosts in DOCUMENTS_MEDIA_URL_ALLOWLIST. Other URLs are dropped with [image removed].

LAYOUT CLASSES (engine=marp only — ignored under engine=weasyprint):

  • {.cover} — title-slide layout (centered headings, gradient background).

  • {.hero image="file:NNN"} — full-bleed background image with dark overlay and white headline.

  • {.split image="file:NNN"} — 50/50 image left, content (heading/bullets) right.

  • {.stats} — 3-up KPI cards: each card is ### big-number followed by a one-line label paragraph.

  • {.dark} / {.invert} — per-slide dark mode override. Both image="file:NNN" and image=file:NNN are accepted (quoted or unquoted). Place the class line on its own row directly above the slide content.

Format × output_type rules:

  • document + pptx is rejected — set format=presentation or pick pdf/docx/html.

  • theme=invoice/contract + output_type=pptx silently uses the default PPTX master.

For theme="invoice", every invoice MUST include a "Total" row whose value equals sum(line items) + tax (within ±0.01). The renderer fails closed on missing or mismatched totals.

EXEMPLAR — invoice (English):

Invoice INV-{YYYYMMDD-HHMMSS}

From: {Issuer Legal Name}, {Address}, {Tax ID} To: {Customer Name}, {Customer Address}, {Customer Tax ID} Issue date: {YYYY-MM-DD} Due date: {YYYY-MM-DD}

Description

Qty

Unit price

Total

{Service 1}

1

1500.00

1500.00

{Service 2}

2

500.00

1000.00

Subtotal: USD 2500.00 Tax (20%): USD 500.00 Total: USD 3000.00

Payment: {bank details OR crypto wallet — never both}

EXEMPLAR — invoice (Russian):

Счёт-фактура № INV-{YYYYMMDD-HHMMSS}

От: {Юридическое название организации}, {Адрес}, ИНН {Tax ID} Кому: {Название клиента}, {Адрес клиента}, ИНН {Tax ID} Дата: {YYYY-MM-DD} Срок оплаты: {YYYY-MM-DD}

Описание

Кол-во

Цена

Сумма

{Услуга 1}

1

1500.00

1500.00

{Услуга 2}

2

500.00

1000.00

Подытог: USD 2500.00 НДС (20%): USD 500.00 Итого: USD 3000.00

Реквизиты: {банковские реквизиты ИЛИ криптокошелёк — не оба сразу}

EXEMPLAR — contract (English):

Service Agreement

Between: {Provider Legal Name}, {Address} ("Provider") And: {Client Legal Name}, {Address} ("Client") Effective date: {YYYY-MM-DD}

1. Scope of services

{Concise description of what Provider agrees to deliver.}

2. Term

This Agreement begins on the Effective date and continues until {termination condition or end date}.

3. Compensation

Client pays Provider {amount and currency} according to {payment schedule}.

4. Confidentiality

Both parties agree to keep proprietary information of the other party confidential during and after the term of this Agreement.

5. Termination

Either party may terminate with {N} days' written notice.

6. Governing law

{Jurisdiction}.


Provider: ____________________ Client: ____________________ {Provider signatory name} {Client signatory name}

EXEMPLAR — contract (Russian):

Договор оказания услуг

Между: {Юридическое название Исполнителя}, {Адрес} ("Исполнитель") И: {Юридическое название Заказчика}, {Адрес} ("Заказчик") Дата вступления в силу: {YYYY-MM-DD}

1. Предмет договора

{Краткое описание услуг, которые Исполнитель обязуется оказать.}

2. Срок действия

Договор вступает в силу с указанной даты и действует до {условие прекращения или дата окончания}.

3. Стоимость и порядок оплаты

Заказчик оплачивает услуги Исполнителя в размере {сумма и валюта} в порядке {график платежей}.

4. Конфиденциальность

Стороны обязуются сохранять конфиденциальность сведений, полученных в ходе исполнения настоящего Договора, в течение срока его действия и после его прекращения.

5. Расторжение

Любая из сторон вправе расторгнуть Договор, направив письменное уведомление не менее чем за {N} дней.

6. Применимое право

{Юрисдикция}.


Исполнитель: ____________________ Заказчик: ____________________ {ФИО подписанта Исполнителя} {ФИО подписанта Заказчика}

ParametersJSON Schema
NameRequiredDescriptionDefault
themeNoVisual theme. invoice/contract trigger the corresponding exemplar styling.default
titleYesShort human-readable title for the document.
engineNoPDF/HTML engine for presentations. 'marp' (default for format=presentation) renders via headless Chromium with full CSS3, web fonts, and layout classes (.cover, .hero, .split, .stats, .dark). 'weasyprint' is the legacy renderer. Rejected for output_type=pptx (always uses python-pptx). OMIT to use the per-format default engine. python-pptx for editable text — use output_type=pdf or html, or remove the engine parameter). Rejected for format=document (always weasyprint).
formatYes'document' for a single flowing body, 'presentation' for slides.
languageNoBCP-47 language tag (e.g. 'en', 'ru', 'zh', 'ja'). Drives font fallback for non-Latin scripts.en
output_typeYesRenderer target: 'pdf' | 'pptx' | 'docx' | 'html'.
content_markdownYesMarkdown body authored by the agent. Slides separated by '---' on its own top-level line.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses behavioral traits such as that it returns a file_id requiring attachment via messages_send, rejection scenarios, and invoice validation. Annotations already indicate non-readonly and non-destructive, and the description adds depth without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (purpose, required, optional, delivery contract, conventions, examples). Every sentence adds value for a complex tool. Appropriate length given the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and complex interactions, the description is extremely complete. Covers all parameters, edge cases, delivery instructions, and provides templates for invoices and contracts. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema coverage, the description adds significant meaning: parameter interactions (engine+format rules), theme effects, language font fallback, and content_markdown syntax. Goes well beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates documents in multiple formats from markdown content, specifying the verb 'Generate' and the resource 'document'. It distinguishes from sibling tools by focusing on document creation, which is unique among the listed siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use each parameter, required vs optional, format/output_type rules, and even delivery contract. Includes examples and discusses rejected combinations like document+pptx, offering clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

feedback_saveAInspect

Save a behavioral rule, preference, or correction that should guide future agent behavior. Use this when the user gives explicit guidance like 'always reply in Russian', 'don't suggest meetings before 11am', or 'invoice link goes via email, not chat'. Structure the rule as: the rule itself, why it matters (if stated), and how to apply it. Scope: 'workspace' for org-wide rules, 'agent' for per-agent overrides, 'person' for per-contact preferences. Prefer feedback.save over notes.save for anything that's instructive rather than informational.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesShort identifier for this rule (e.g. 'reply_language', 'meeting_hours'). Must not start with '__' (reserved).
whyNoWhy this rule matters (optional but recommended for the distiller).
ruleYesThe rule itself, in imperative form. Required.
scopeYesScope of the rule. 'workspace' for org-wide rules; 'agent' for per-agent overrides; 'thread' for conversation-specific guidance; 'person' for per-contact preferences. 'global' accepted as deprecation alias for 'agent'.
how_to_applyNoWhen/how to apply the rule (optional). Helpful for conditional rules like 'apply when speaking to Russian-speaking customers'.
scope_ref_idNoRequired for scope='thread' (thread_id) and scope='person' (person_id).
target_agent_idNoTarget agent. In agent mode optional (defaults to self); required from MCP. Ignored when scope='workspace'.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are neutral (readOnlyHint=false, destructiveHint=false), and the description explains the tool creates behavioral guidance for future agent actions. It adds context about how the rule will be used (by distiller) and what types of input are expected. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a brief scope list cover the essential points: purpose, usage, structure, and sibling differentiation. Every sentence adds value; no fluff. Key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, usage, structure, scopes, and sibling differentiation. However, it does not mention 'target_agent_id' parameter, though it's described in schema. Still, for a save tool with good schema coverage, the description is nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description adds significant value by explaining the intention behind 'key', 'why', 'how_to_apply', and examples for 'scope' values. It elaborates on the rule structure and usage beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it saves behavioral rules/preferences/corrections for future agent behavior, provides concrete examples, and explicitly distinguishes from notes.save as 'instructive rather than informational'. This differentiates it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when the user gives explicit guidance...' and advises 'Prefer feedback.save over notes.save for anything instructive rather than informational.' It also explains how to structure the rule and scope options, providing clear when-to-use and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_get_base64A
Read-onlyIdempotent
Inspect

Download one or more files server-side and return their content as base64-encoded strings. Use this to inspect images, PDFs, or any binary file attached to messages when you cannot access presigned S3 URLs directly. Supports up to 5 files per call, max 15 MB each. For large files batch in groups of 1-2 to avoid oversized responses.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idsYesList of file IDs to fetch as base64 (max 5). Get IDs from files.info or message attachment_file_ids.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds context about server-side download, base64 encoding, and size limits, which are not in annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences that are front-loaded: purpose, use case, constraints, advice. Every sentence adds essential information with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a simple tool with one parameter, comprehensive annotations, and detailed description covering when, how, and limits, the description is fully adequate for correct agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with description for file_ids. The description adds value by explaining where to get the IDs (files.info or message attachment_file_ids), which goes beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'download', resource 'files', and return format 'base64-encoded strings'. It also distinguishes from siblings by referencing presigned S3 URLs, which differentiates it from other file retrieval tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use case ('when you cannot access presigned S3 URLs directly') and constraints (5 files, 15 MB each, batching advice). However, it does not explicitly state when not to use or name alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_infoA
Read-onlyIdempotent
Inspect

Get metadata and download URLs for files by their IDs.

When to use:

  • After messages_read_history returns attachment_file_ids

  • To get a presigned download URL to read a received file

Returns: filename, mime_type, byte_size, download_url (1-hour presigned URL).

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idsYesList of file IDs (max 20)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds behavioral details beyond annotations: it specifies the return value includes filename, mime_type, byte_size, and a 1-hour presigned download URL, which is crucial for agent understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, using four short lines to convey purpose, usage, and return values. Every sentence adds value with no redundancy or wordiness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the presence of comprehensive annotations, and full schema coverage, the description is complete. It explains when to use it and what is returned, leaving no gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single parameter 'file_ids'. The description does not add additional semantics beyond 'List of file IDs (max 20)', so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get metadata and download URLs for files by their IDs', providing a specific verb and resource. It does not explicitly differentiate from sibling tools like files_read or files_get_base64, but the purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit usage guidance: 'After messages_read_history returns attachment_file_ids' and 'To get a presigned download URL'. It lacks mention of when not to use or alternatives, but the provided context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_ingestAInspect

Save and index a file into the knowledge base. Use this when the user asks to save, store, or remember a document. The file will be processed (OCR if needed) and indexed for future search.

ParametersJSON Schema
NameRequiredDescriptionDefault
tagsNoOptional list of tags for categorization (e.g., ['presentation', 'dextrade']).
titleNoHuman-readable title for the file (e.g., 'Project Presentation', 'Q1 Report'). If not provided, uses original filename.
file_idYesID of the file to ingest (from attachment_file_ids in context).
thread_idNoOptional thread ID to associate the file with. If not provided, uses context thread.
descriptionNoOptional description of the file contents.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=false) already indicate mutability, but the description adds valuable context: the file is processed, OCR may be applied, and it is indexed for future search. This goes beyond annotations in disclosing side effects like indexing and processing behaviors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each serving a distinct purpose: stating the core action, providing usage guidance, and explaining the processing. No filler words or redundancy, making it front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the purpose, usage context, and processing steps. However, it does not mention prerequisites (e.g., that the file must already exist via file_id) or specify the return value (absence of output schema). It is nearly complete for a straightforward ingest tool but could hint at expected outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all 5 parameters, providing clear definitions. The tool description does not add additional parameter insights beyond the schema, so it meets the baseline of 3 as per the high coverage guideline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Save and index a file into the knowledge base,' specifying the action and resource. It distinguishes from sibling tools like files_upload (which does not imply indexing) and files_read by focusing on saving and remembering. The verb 'ingest' with indexing is unique among file-related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when the user asks to save, store, or remember a document,' providing clear context for invocation. However, it does not explicitly state when not to use it or mention alternative tools for other file operations, missing a chance to guide the agent away from inappropriate uses.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_readA
Read-onlyIdempotent
Inspect

Read text content of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use files.get_base64, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run files.ingest first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to read (from attachment_file_ids in context).
encodingNoText encoding to use (default: utf-8).utf-8
max_charsNoMaximum characters to return (default: 10000). Use smaller values for large files.
summarizeNoIf true, generate AI summary instead of returning raw content. Use for 'summary', 'summarize', 'краткое содержание' requests. OMIT to return raw content (the default).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and non-destructive. The description adds context: PDFs require prior ingestion, binary files cause errors, and the summarize option exists. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose and provides essential guidance in a structured manner. While slightly verbose, every sentence adds value, balancing completeness with brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (file type restrictions, error handling, multiple parameters, no output schema), the description is fully complete. It covers supported types, prerequisites, alternatives, error behavior, and parameter usage, leaving no ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value beyond parameter descriptions: it explains the summary parameter triggers on specific requests, and advises using smaller max_chars for large files. This enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads text content of attached files and lists supported types (.txt, .md, .json, code, PDFs). It explicitly distinguishes from sibling tools for binary files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use and when-not-to-use guidance: for images use files.get_base64, for audio/video it cannot transcribe, and for non-PDF documents run files.ingest first. It also warns that calling on binary returns an error.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_uploadAInspect

Upload a file to DialogBrain and get a file_id for use in messages_send.

When to use:

  • User wants to send a file/image to a contact

  • Before calling messages_send with an attachment

Returns: file_id (integer) to pass to messages_send attachments parameter.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNoOptional display title
contentNoBase64-encoded file bytes. Either content OR source_url is required.
filenameNoFilename with extension (e.g. 'photo.png')upload
mime_typeNoMIME type (e.g. 'image/png', 'application/pdf')application/octet-stream
source_urlNoPublic URL to fetch file from. Either content OR source_url is required.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are present: readOnlyHint=false (write operation), destructiveHint=false, idempotentHint=false. The description adds context by revealing the return value (file_id integer) and its purpose, which complements the annotations. No contradictions, and the description provides useful behavioral insight beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with three distinct sections: overall action, when to use, and return value. No redundant information. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 5-parameter tool with no output schema, the description adequately explains the workflow and return value. It does not cover possible errors or limitations, but it provides sufficient context for the primary use case (upload before sending). The schema compensates for parameter details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers all 5 parameters with descriptions (100% coverage). The description does not add additional meaning to the parameters beyond what the schema provides. The return value is mentioned but not the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'upload a file', the resource 'DialogBrain', and the purpose 'get a file_id for use in messages_send'. It distinguishes this tool from siblings like files_get_base64 or files_read by specifying its role as a prerequisite for sending messages with attachments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('User wants to send a file/image to a contact', 'Before calling messages_send with an attachment') and provides a clear usage pattern. However, it does not mention when not to use it or compare to alternatives like files_ingest.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_createAInspect

📁 Create a new inbox folder to organize threads.

When to use:

  • User wants to create a folder to group related conversations

  • User wants to organize threads by topic, project, or contact type

After creating a folder, use threads.update with folder_id to move threads into it.

ParametersJSON Schema
NameRequiredDescriptionDefault
iconNoEmoji icon for the folder (max 10 chars, optional)
nameYesFolder name (max 100 chars)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-readonly and non-destructive. The description adds value by explaining the folder is for inbox threads and that after creation, threads must be moved via threads.update – behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by concise usage guidelines and a helpful next-step hint. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple creation tool with no output schema, the description covers usage and follow-up actions but omits return value (e.g., folder ID) and error conditions. Adequate but incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add meaning beyond the schema's parameter descriptions (name max 100 chars, optional icon with emoji and max 10 chars).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a new inbox folder to organize threads. It uses a specific verb ('Create') and resource ('folder'), and distinguishes it from sibling tools like folders_delete by focusing on creation and providing follow-up steps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use (group conversations, organize threads) and suggests a next step (threads.update). It does not explicitly exclude alternatives or provide a when-not-to-use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_deleteAInspect

🗑️ Delete an inbox folder. Threads inside become unfiled (not deleted).

When to use:

  • User wants to remove a folder they no longer need

  • User wants to clean up their inbox organization

Threads inside the folder are NOT deleted — they simply move back to the inbox.

ParametersJSON Schema
NameRequiredDescriptionDefault
folder_idYesID of the folder to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that threads become unfiled, adding value beyond annotations which only indicate non-destructive. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short, uses an emoji, and bullet points for clarity. The 'When to use' section is helpful but slightly verbose; overall well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input schema, lack of output schema, and straightforward behavior, the description provides all necessary context including side effects (unfiling threads). Complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter folder_id. The description does not add additional meaning beyond the schema, justifying a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool deletes an inbox folder and specifies that threads inside become unfiled, not deleted. This is a specific verb+resource and distinguishes it from related tools like folders_create.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases: user wants to remove a folder or clean up inbox organization. It also clarifies that threads are not deleted, but does not mention when not to use or compare to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_addAInspect

Add a specific group to your discovery list by @username or invite link (t.me/...).

When to use:

  • You already know the group's @username or invite link

  • Adding a known group without searching

Returns: group metadata including id, title, member_count.

ParametersJSON Schema
NameRequiredDescriptionDefault
linkYesThe group's @username or invite link (e.g. '@phuket' or 't.me/...')
channelYesChannel the group is on (e.g. 'telegram')
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare destructiveHint=false and readOnlyHint=false, which are consistent with a write operation. The description adds context about the return value (group metadata). No contradictions, but it could disclose potential duplication behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at 4 sentences, front-loaded with the main action. Every sentence adds value, and the structure is clear and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers the return value adequately. Parameters are fully described in the schema. However, it does not address what happens if the group already exists in the list.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents both parameters. The description adds minimal extra context (e.g., 'by @username or invite link' for the link parameter), which is helpful but not substantial beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Add', the resource 'group to your discovery list', and the method 'by @username or invite link'. It distinguishes from sibling tools like group_discovery_search by emphasizing adding a known group without searching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use' section with specific conditions (already know the link, adding without searching). It implies when not to use (if searching is needed) but does not explicitly name the alternative sibling tool, slightly reducing clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_joinAInspect

Join a group and start syncing its messages to your inbox. The group must be in your discovery list (use group_discovery.search or group_discovery.add first).

What this does:

  • Joins the group on Telegram (or other channel)

  • Creates a thread in your inbox for syncing messages

  • Optionally enables AI auto-reply drafts

Returns: success, thread_id, auto_reply_enabled.

ParametersJSON Schema
NameRequiredDescriptionDefault
group_idYesID of the discovered group (from group_discovery.search or group_discovery.list)
enable_auto_replyNoEnable AI auto-reply drafts for messages in this group. Drafts can be reviewed and sent manually. Default: true.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Details what the tool does: joins the group, creates a thread, optionally enables auto-reply drafts. Annotations (readOnlyHint=false) agree that it is a write operation. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main action, uses bullet points for clarity, and includes return values. Every sentence is necessary and no superfluous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explicitly lists return values (success, thread_id, auto_reply_enabled). Prerequisite is clear. Tool is a specific action with defined inputs and outputs; description is complete for accurate invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions. The description adds context by specifying the source for group_id and explaining that auto-reply drafts are for manual review (not in schema). Adds value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool joins a group and syncs messages. It distinguishes itself from sibling group_discovery_* tools (search, list, add) which operate on the discovery list, whereas this tool acts on a discovered group to join it.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states the prerequisite that the group must be in the discovery list, referencing alternative tools (group_discovery.search or group_discovery.add) for that step. Provides clear context on when to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_listA
Read-onlyIdempotent
Inspect

List groups you've found and joined in this workspace.

Lifecycle values:

  • discovered: found but not yet evaluated

  • bookmarked: saved for later

  • monitored: joined and actively syncing messages

  • dismissed: hidden

By default, dismissed groups are excluded. Returns: id, title, member_count, lifecycle, scan_status, overall_score.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results (1-100, default 20)
offsetNoPagination offset. OMIT to start at row 0 (default).
channelNoFilter by channel (e.g. 'telegram'). Optional.
lifecycleNoFilter by state: discovered, bookmarked, monitored (=joined/syncing), dismissed. OMIT to include all states (dismissed excluded by default elsewhere).
min_scoreNoMinimum overall score (0.0-1.0). Optional.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already signal read-only, idempotent, non-destructive. Description adds default filtering, lifecycle meanings, and return fields, providing behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with purpose, then structured breakdown of lifecycle, defaults, and output. Every sentence is essential.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and rich annotations, the description sufficiently covers tool behavior, filtering, and return fields, making it complete for a list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters. Description adds contextual value by explaining lifecycle semantics and default exclusion of dismissed, which clarifies parameter behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb (list) and resource (groups you've found and joined in workspace), with explicit lifecycle values and default exclusion. Distinguishes from siblings like group_discovery_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides default behavior (dismissed excluded) but no explicit when-to-use or when-not-to-use guidance relative to sibling tools like group_discovery_search or group_discovery_scan.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_preview_messagesA
Read-onlyIdempotent
Inspect

Read recent public messages from a group without joining it. Only works for groups where can_preview_history=true.

Use this to manually evaluate message quality before deciding to join. For an automated quality score, use group_discovery.scan instead.

Returns: list of recent messages with sender, text, date, is_reply.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoNumber of recent messages to fetch (1-100, default 20)
group_idYesID of the discovered group (from group_discovery.search or group_discovery.list)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true, destructiveHint=false, idempotentHint=true) already declare safe, idempotent read behavior. The description adds a precondition ('Only works for groups where can_preview_history=true') and describes the output format, providing value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, each serving a distinct purpose: action, condition, usage guidance, return description. No redundancy, front-loaded with the core purpose. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with no output schema, the description covers all necessary aspects: purpose, precondition, when to use vs alternative, and return format. It is complete given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions (group_id, limit with default and range). The description does not add new semantic information beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Read recent public messages from a group without joining it,' specifying the verb 'read' and resource 'public messages from a group.' It also distinguishes from sibling tool group_discovery_scan, which provides an automated quality score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: 'Use this to manually evaluate message quality before deciding to join. For an automated quality score, use group_discovery.scan instead.' Clearly states when to use and when not, with a named alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_scanAInspect

Scan a group to evaluate its quality before joining. Fetches recent messages, analyzes activity, spam, and engagement, then returns a quality score and plain-English verdict.

When to use:

  • After finding groups with group_discovery.search

  • Before deciding which groups to join

Returns: overall_score (0-1), is_disqualified, disqualify_reasons, individual scores, and a verdict string.

ParametersJSON Schema
NameRequiredDescriptionDefault
group_idYesID of the discovered group (from group_discovery.search or group_discovery.list)
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description describes a read-only operation (fetch, analyze, return scores), but annotations set readOnlyHint=false, indicating potential side effects not disclosed. This is a direct contradiction. The description does not explain what state changes occur, failing to provide behavioral transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: a clear opening sentence, then a list of what it does, a 'When to use' section, and a 'Returns' list. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with no output schema, the description adequately explains the return values (overall_score, disqualify reasons, etc.). It provides enough context for usage, though it could elaborate on error conditions or prerequisites (e.g., whether the user must already have the group ID from search). Overall, it's nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% parameter description coverage. The description adds value by specifying that the group_id comes from group_discovery.search or group_discovery.list, providing useful context beyond the schema's own description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: scanning a group to evaluate its quality before joining. It specifies the actions (fetches messages, analyzes activity/spam/engagement) and the output (quality score, verdict). It distinguishes from sibling tools like group_discovery_search (finding groups) and group_discovery_join (joining).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use' section that explicitly states this tool is for after finding groups and before deciding to join. It provides clear context and references the preceding tool (group_discovery.search). However, it does not mention alternative tools for similar tasks (e.g., group_discovery_preview_messages) or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

images_generateAInspect

Generates a PNG image from a text prompt using Gemini 2.5 Flash Image. Returns a file_id consumable by messages.send(attachments=[...]) and other file-aware tools. Supports up to 12 reference image file_ids for subject-consistent edits and composition (use file IDs from the [ATTACHMENTS] block, files.search, or workspace.search). Latency: ~8-10s per image. Output: 1024×1024 PNG.

ParametersJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the image to generate (3-4000 chars).
aspect_ratioNoOutput aspect ratio.1:1
reference_file_idsNoOptional list of up to 3 file_ids whose images should be used as visual references (for edits, subject consistency, or composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide limited behavioral cues (all false). The description adds latency (~8-10s), output dimensions (1024x1024 PNG), and reference file support. These are valuable beyond annotations and help set expectations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences efficiently convey purpose, output usage, reference support, latency, and output size. No fluff; front-loaded with main action. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a generation tool without output schema, the description covers what it does, output format, integration points, reference usage, latency, and resolution. It is sufficiently complete for an AI agent to decide and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3, but the description states 'Supports up to 12 reference image file_ids' while the schema explicitly says 'up to 3 file_ids'. This contradiction reduces reliability. Description adds some context on how to obtain file IDs but the discrepancy hurts the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states that the tool generates a PNG image from a text prompt using Gemini 2.5 Flash Image. It clearly distinguishes from siblings like images_search (searching) and videos_generate (video generation) by specifying output type and usage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that the output file_id can be consumed by messages.send and other file-aware tools, and provides context for using reference file IDs with sources. It lacks explicit when-not-to-use or alternatives, but the guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_completeAInspect

Mark the job as completed. This sanitizes PII from the context and records a completion summary. Use when all tasks in the job are done.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to complete
summaryNoBrief summary of what was accomplished
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-destructive behavior (destructiveHint=false). The description adds value by mentioning PII sanitization and recording a summary. However, it does not disclose potential side effects like irreversibility or whether the job can be modified after completion, limiting transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: three short sentences that front-load the main action, then add behavioral details and usage guidance. No unnecessary words or redundancy. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no output schema, two optional params), the description covers the essential aspects: purpose, key behaviors, and usage trigger. It could be improved by clarifying whether job_id is required or what happens if called on an already-completed job, but overall it is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for both parameters (job_id, summary). The description references the summary parameter ('records a completion summary') but does not add additional context beyond the schema. Hence, it meets the baseline but does not exceed it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's primary function ('Mark the job as completed') and highlights additional behaviors (sanitizes PII, records summary). While it doesn't explicitly distinguish from sibling tools like job_escalate, the action is specific enough for an agent to understand its purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool: 'Use when all tasks in the job are done.' This is a clear usage condition. However, it lacks information about when not to use or alternatives, which prevents a higher score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_escalateAInspect

Escalate the job to a human. Use when you cannot resolve an issue, someone is not responding, or a situation requires human judgment.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to escalate
reasonYesWhy escalation is needed
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-readonly and non-destructive, but the description adds no additional behavioral context (e.g., what escalation entails, if irreversible, or notification details). The description carries minimal extra value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no fluff. The key information is front-loaded and every word is necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple escalation action with no output schema, the description is sufficient to guide usage. A minor gap could be explaining what happens after escalation (e.g., status change), but it's not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so both parameters already have clear descriptions. The tool description does not add further semantic meaning over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Escalate the job to a human' with specific verb and resource. Distinct from sibling tools like agent_handoff or agents_ask, which are different actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists three scenarios for use (cannot resolve, no response, human judgment). However, it does not provide negative guidance on when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_read_contextAInspect

Read the current job context. Returns the full state of your active job including assignments, escalations, and any data you previously stored.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to read
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains that the tool returns the full state including assignments, escalations, and previous stored data. This adds value beyond the annotations, which only indicate destructiveHint and idempotentHint as false. However, the readOnlyHint annotation is false despite the description stating 'Read', which is a contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that front-loads the purpose and includes key details. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the return value (full state, assignments, escalations, stored data) which is helpful since there is no output schema. It is sufficient for understanding what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% documentation coverage for the single optional parameter job_id. The description does not add any additional meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Read the current job context' with a specific verb and resource, and lists what is returned (assignments, escalations, stored data). This distinguishes it from sibling tools like job_update_context and job_complete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for reading context but does not explicitly state when to use this tool versus alternatives, nor does it provide any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_update_contextAInspect

Update the job context by merging new data. Existing keys are preserved unless explicitly overwritten. Use this to record progress, update assignment statuses, or store intermediate results.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to update
updatesYesKey-value pairs to merge into job context
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate mutation (readOnlyHint=false) but no destructive or idempotent hints. The description adds merge semantics (preserves keys unless overwritten), which is useful beyond annotations but does not cover error states or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey function, merge behavior, and usage advice without wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and lack of output schema, the description adequately explains the update operation and merge behavior. Minor omission: no mention of return value or error handling, but not critical for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions. The description adds context about merge behavior and key preservation, enhancing understanding beyond the schema itself.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates job context via merging, and provides example use cases. However, it does not explicitly differentiate from sibling tools like job_read_context or job_complete, which would strengthen purpose clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises when to use the tool (recording progress, updating statuses, storing results) but lacks explicit guidance on when not to use it or mention of alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_find_entityA
Read-onlyIdempotent
Inspect

Find an entity by name in the Knowledge Graph.

USE WHEN user mentions a person, project, company by name and you need:

  • To resolve a name to entity_id for subsequent queries

  • 'Кто работает над X?' → find X first

  • 'Расскажи про Y' → find Y first

RETURNS entity_id for use in kg.get_relationships or kg.explore. ALWAYS use this as the FIRST step in KG query chains.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesEntity name to search for. Can be in any language (Russian, English, etc.) - transliteration is automatic.
limitNoMaximum results to return (1-10). Default: 5
entity_typeNoFilter by entity type: - 'person': People, contacts - 'project': Projects, tasks - 'organization': Companies, teams - 'event': Meetings, deadlines - 'topic': Discussion topics - 'workspace': User's own facts (my/our company) OMIT to include all entity types.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already convey readOnly, idempotent, non-destructive behavior. The description adds context that it returns entity_id and is a prerequisite for subsequent KG queries, enriching transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five concise sentences with front-loaded purpose, clear examples, and workflow guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a lookup tool with well-documented schema and annotations, the description provides sufficient context: purpose, usage triggers, return value, and integration with sibling tools. Minor gaps like error handling are acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover all 3 parameters (100% coverage) with clear explanations. The tool description does not add new parameter info but reinforces usage context. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Find an entity by name in the Knowledge Graph' and provides specific usage examples like 'resolve a name to entity_id' and Russian queries, distinguishing it from sibling tools that require entity_id.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'USE WHEN' with concrete scenarios, advises 'ALWAYS use this as the FIRST step', and specifies that the output entity_id should be used with kg.get_relationships or kg.explore, leaving no ambiguity about when to invoke.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_get_relationshipsA
Read-onlyIdempotent
Inspect

Get relationships for a specific entity from Knowledge Graph.

USE WHEN:

  • 'Кто работает над X?' - filter by works_on

  • 'С кем общался Y?' - filter by discussed_with

  • 'Кто из компании Z?' - filter by member_of

  • 'Что связано с W?' - no filter, get all

REQUIRES: entity_id from previous kg.find_entity step. Use: {{step_N.entity_id}} where N is the find_entity step number.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum relationships to return (1-50). Default: 20
directionNoRelationship direction: - 'outgoing': Entity → Others - 'incoming': Others → Entity - 'both': All relationships (default)both
entity_idYesEntity ID from kg.find_entity step. Use {{step_N.entity_id}} reference.
relation_typesNoFilter by relationship types (optional): People: works_on, works_for, member_of, manages, knows, client_of, provides_service Communication: discussed_with, participated_in, mentioned_in Org/Project: developed_by, funded_by, partnered_with, integrates_with, depends_on, part_of Document: issued_by, issued_to, signed_by, authored_by Other: uses, located_in, about, follows, owns, related_to
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false, so the description correctly adds dependency context (requires entity_id from find_entity) without contradiction. It does not need to repeat the read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is succinct, well-structured with bullet points and sections, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers usage context, dependencies, and filter options well. It lacks explicit return format, but given schema coverage and annotations, it is mostly complete. A minor gap is not describing the output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description ties parameters to use cases (e.g., works_on filter) but does not add new semantic information beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get relationships for a specific entity from Knowledge Graph' and provides concrete usage examples with filters, distinguishing it from the prerequisite tool kg_find_entity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use with specific filters and requires entity_id from a previous step, offering clear context. However, it does not explicitly state when not to use or list alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

knowledge_queryA
Read-onlyIdempotent
Inspect

Answer questions using knowledge base (uploaded documents, handbooks, files).

Use for QUESTIONS that need an answer synthesized from documents or messages. Returns an evidence pack with source citations, KG entities, and extracted numbers.

Modes:

  • 'auto' (default): Smart routing — works for most questions

  • 'rag': Semantic search across documents & messages

  • 'entity': Entity-centric queries (e.g., 'Tell me about [entity]')

  • 'relationship': Two-entity queries (e.g., 'How is [entity A] related to [entity B]?')

Examples:

  • 'What did we discuss about the budget?' → knowledge.query

  • 'Tell me about [entity]' → knowledge.query mode=entity

  • 'How is [A] related to [B]?' → knowledge.query mode=relationship

NOT for finding/listing files, threads, or links — use workspace.search for that.

ParametersJSON Schema
NameRequiredDescriptionDefault
modeNoQuery mode: - 'auto' (default): Smart routing based on question - 'rag': Pure semantic search with KG boost - 'entity': GraphRAG for entity queries - 'relationship': Two-entity relationship query - 'graph': Direct KG traversal onlyauto
styleNoAnswer style: concise, detailed, or bulletconcise
date_toNoFilter messages until this date (ISO format: YYYY-MM-DD).
file_idsNoSpecific file IDs to search within (for pinned files)
questionYesThe question to answer from user's knowledge base. Required even for entity queries.
date_fromNoFilter messages from this date (ISO format: YYYY-MM-DD). Use for time-based queries like 'this week', 'last month'.
thread_idNoLimit search to a specific thread/chat
query_typeNoQuery classification hint. Skips internal AI analysis when provided.
entity_nameNoEntity name for entity/graph modes (optional for auto mode)
max_sourcesNoMaximum number of sources to consider (1-10)
entity_namesNoBoth entity names for relationship queries.
person_namesNoPerson names mentioned in the query. Used for keyword fallback when RAG misses.
search_keywordsNoKey search terms to enhance RAG matching.
needs_aggregationNoTrue if query asks for totals/sums/counts.
include_relationshipsNoInclude KG relationships in answer (default: true for entity mode)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the safety profile is clear. Description adds that it returns an evidence pack with source citations, KG entities, and extracted numbers, which is valuable behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-organized with clear sections (purpose, when to use, modes, examples, boundaries). Every sentence adds value without redundancy. Front-loaded with core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive for a query tool with 15 parameters and no output schema. Covers modes, examples, and exclusions. Could mention pagination or max tokens, but the max_sources parameter is documented in schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for all 15 parameters. The description additionally explains mode semantics and provides examples that go beyond the enum values, adding meaning for parameter selection.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Answer questions using knowledge base' and provides examples distinguishing from workspace.search. The verb 'answer' and resource 'knowledge base' are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says NOT for finding/listing files, threads, or links - use workspace.search. Provides mode selection guidance with examples, helping the agent decide when to use different modes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_add_commentAInspect

Add a comment to a LinkedIn post. Use post_id from search results or thread data.

ParametersJSON Schema
NameRequiredDescriptionDefault
textYesComment text to post
post_idYesLinkedIn post/activity ID (from search results or thread metadata)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a write operation (readOnlyHint=false). Description adds minor context on post_id source but does not detail further behavioral aspects like authorization or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundant information, front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple parameter set and schema coverage, the description is mostly complete. Could mention that the comment is posted under the authenticated user, but not essential.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description does not add semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add a comment') and the target resource ('to a LinkedIn post'), with specific guidance on sourcing the post ID. It distinguishes from YouTube comment tools by platform.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on where to obtain the post_id ('from search results or thread data'). No exclusions or alternatives needed as there is no other LinkedIn comment tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_companyA
Read-onlyIdempotent
Inspect

Get a LinkedIn company profile by company ID or vanity name. Returns company name, description, industry, size, and other details.

ParametersJSON Schema
NameRequiredDescriptionDefault
identifierYesCompany ID or vanity name
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds specific returned fields (name, description, industry, size) but does not disclose behavioral traits like authentication requirements, rate limits, or error behavior. This modest addition warrants a 3.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences only: first states purpose and identification method, second lists returns. No unnecessary words, front-loaded, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with one parameter, strong annotations, and no output schema, the description is sufficient. It covers what it does, how to identify the company, and what is returned. Missing edge case handling or return format info, but not critical for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with the parameter 'identifier' described as 'Company ID or vanity name'. The description's mention of 'by company ID or vanity name' adds no new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'LinkedIn company profile', and specifies identification by company ID or vanity name. It also lists return fields, distinguishing it from the sibling 'linkedin_get_profile' which targets user profiles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides the identification method (by ID or vanity name) but does not explicitly state when not to use this tool or suggest alternatives like 'linkedin_search' for finding company URIs. The context implies usage, but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_profileA
Read-onlyIdempotent
Inspect

Get a LinkedIn user profile by ID, public identifier (vanity name), or profile URL. Returns name, headline, location, and other profile information.

ParametersJSON Schema
NameRequiredDescriptionDefault
identifierYesLinkedIn member ID, public identifier (vanity name), or full profile URL
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the agent knows it's a safe, idempotent read. The description adds value by specifying the returned fields (name, headline, location, other info), enhancing behavioral understanding beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, front-loaded with the action, and every word is functional. No unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 param, no output schema), the description adequately explains the input and output. It could be more complete by specifying the expected return format (e.g., JSON object) or handling of not-found cases, but it is sufficient for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (the single parameter has a detailed description). The tool description repeats the same identifier options, adding no new semantic meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the verb 'Get' and the resource 'LinkedIn user profile', and lists the three identifier formats (ID, vanity name, URL). It distinguishes itself from sibling tools like linkedin_get_company and linkedin_search by clearly focusing on a single profile retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly states when to use the tool (to retrieve a LinkedIn profile). However, it does not mention when not to use it or provide alternative sibling tools like linkedin_search for finding profiles by query.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_inviteAInspect

Send a connection invitation to a LinkedIn user. Optionally include a personalized message (max 300 characters). Rate limited: LinkedIn allows 80-100 invitations per day, max 200 per week.

ParametersJSON Schema
NameRequiredDescriptionDefault
messageNoOptional personalized invitation message (max 300 characters)
provider_idYesLinkedIn provider ID of the person to invite (from search results or profile)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations by specifying rate limits (80-100 per day, max 200 per week) and the maximum message length. It does not contradict annotations (readOnlyHint false is consistent with a write operation). However, it does not disclose success/failure responses.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loading the primary action and then adding key constraints. Every sentence is valuable and there is no extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with no output schema, the description covers purpose, optional parameter details, and rate limiting. It could be improved by mentioning potential errors or response behavior, but it is sufficient for basic understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides 100% coverage with descriptions for both parameters. The description restates the optional message and its max length but adds no new semantic information beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Send a connection invitation to a LinkedIn user.' This clearly identifies the action and resource, distinguishing it from sibling LinkedIn tools like linkedin_list_invitations_sent or linkedin_add_comment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides general usage context (optional message, rate limits) but does not explicitly guide when to use this tool versus alternatives such as linkedin_list_invitations_sent or linkedin_add_comment. No when-not conditions or alternative recommendations are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_connectionsA
Read-onlyIdempotent
Inspect

List your LinkedIn connections, sorted by most recently added.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum connections to return
cursorNoPagination cursor from previous response
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description adds little extra beyond noting the sorting order.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with essential information; no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with pagination, the description is adequate; however, it lacks details on response format or pagination behavior beyond parameter descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions, and the description adds the valuable detail that results are sorted by most recently added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List your LinkedIn connections' with ordering, distinguishing it from sibling LinkedIn tools like search or profile retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives; no mention of use cases or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_invitations_sentA
Read-onlyIdempotent
Inspect

List your pending sent connection invitations on LinkedIn.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum invitations to return
cursorNoPagination cursor from previous response
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description adds limited behavioral context. It specifies 'pending sent' which adds a status filter, but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no unnecessary words. It is front-loaded with the action and resource, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with strong annotations, the description is adequate. It lacks details on return format, but given no output schema, the agent can infer an array of invitations. It sufficiently addresses the tool's purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage, fully describing both parameters (limit with constraints, cursor as pagination token). The description does not add any additional meaning beyond what the schema provides, so score is at baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List', the resource 'pending sent connection invitations', and the platform 'LinkedIn'. It is distinct from siblings like 'linkedin_list_connections' (which lists existing connections) and 'linkedin_invite' (which sends invitations).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing pending sent invitations but does not explicitly state when to use it over alternatives, nor does it mention any exclusions or prerequisites. The context is implied but not elaborated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_reactionsC
Read-onlyIdempotent
Inspect

List all reactions (likes, celebrates, etc.) on a specific LinkedIn post.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum reactions to return
post_idYesLinkedIn post/activity ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the safety profile is clear. However, the description adds no behavioral details beyond listing reactions, and it contradicts the limit parameter by claiming 'all reactions'. It omits pagination or error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise at 12 words and front-loaded. However, the word 'all' may mislead agents about the tool's actual behavior, slightly reducing effectiveness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with rich annotations, the description is adequate but lacks details on pagination, error handling, or return format. It does not leverage the absence of an output schema to add value. Slightly below complete for a tool with 2 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description does not add meaningful extra information beyond identifying the post_id. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list) and resource (reactions on a LinkedIn post) with examples (likes, celebrates). However, it says 'list all reactions' while the schema includes a limit parameter, implying not all reactions may be returned, which slightly undermines clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., other LinkedIn tools). The description simply states what it does without context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_raw_requestA
Read-onlyIdempotent
Inspect

Send an arbitrary LinkedIn API request via Unipile's magic route. Only GET and POST methods are allowed. WARNING: This bypasses structured rate limiting and can perform destructive actions. Use this only when no other LinkedIn tool covers the needed functionality.

ParametersJSON Schema
NameRequiredDescriptionDefault
bodyNoRequest body (for POST requests)
methodNoHTTP method (only GET and POST allowed)GET
request_urlYesTarget LinkedIn API endpoint URL
query_paramsNoURL query parameters
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims 'can perform destructive actions' and bypasses rate limiting, but annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. This is a direct contradiction, severely undermining transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no unnecessary words, front-loaded with purpose then guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description lacks detail on return values, error handling, or expected response format. The contradiction further reduces completeness for a raw request tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so schema already documents parameters. Description adds only method restriction and body usage for POST, but no additional meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool sends arbitrary LinkedIn API requests via Unipile's magic route, specifies allowed methods (GET and POST), and implies a fallback purpose when no specific tool exists.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly states 'Use this only when no other LinkedIn tool covers the needed functionality' and warns about bypassing rate limiting and potential destructive actions, providing clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_search_filtersA
Read-onlyIdempotent
Inspect

Get LinkedIn search filter parameter IDs. LinkedIn uses internal IDs instead of text for search filters (location, industry, etc.). Call this before linkedin.search to resolve filter keywords to their LinkedIn parameter IDs.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesFilter category to resolve (e.g. LOCATION, INDUSTRY, SKILL)
limitNoMax results per filter category
keywordsYesKeywords to resolve to parameter IDs (e.g. 'Thailand' for LOCATION)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and non-destructive behavior. The description adds the key behavioral context that the tool resolves keywords to internal IDs, which is beyond the annotation. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the tool's core purpose, and contains no extraneous information. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with no output schema, the description covers the essential aspects: what it does, why it's needed, and how to use it (before search). Complete given the complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents each parameter. The description briefly explains the overall conversion process but does not add new meaning beyond 'resolve filter keywords to LinkedIn parameter IDs'. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to get LinkedIn search filter parameter IDs. It explains that LinkedIn uses internal IDs and that this tool should be called before linkedin.search to resolve keywords. This distinguishes it from the sibling linkedin_search tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises calling this before linkedin.search, providing clear usage context. It does not mention when to avoid the tool, but the guidance is sufficient for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_update_profileAInspect

Update the authenticated user's own LinkedIn profile. Supports adding/editing experience entries (role, company, skills, dates). Also supports updating location. Headline, summary, education are NOT supported by the API.

ParametersJSON Schema
NameRequiredDescriptionDefault
locationNoLocation to set on profile (requires LinkedIn location ID)
experienceNoAdd or edit a professional experience entry
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false, consistent with the update operation. The description adds value by specifying limitations (headline, summary, education are not supported) and clarifying that it operates on the authenticated user's own profile. However, it does not disclose rate limits, error behavior, or whether updates are additive or overwrite existing data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and then listing constraints. Every sentence provides essential information with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, so the description should hint at return values to set expectations. It does not mention the response format. While annotations and schema cover safety and parameters, the lack of output guidance is a gap. However, the clear scope (supported/unsupported) compensates partially.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for properties. The description adds meaningful context: location requires a LinkedIn location ID from linkedin.search_filters, and omitting the experience ID adds a new entry while including it edits an existing one. This clarifies usage beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool updates the authenticated user's LinkedIn profile, enumerates supported operations (experience, location), and lists unsupported fields (headline, summary, education). This provides a specific verb+resource and clearly distinguishes it from siblings like linkedin_get_profile or linkedin_raw_request.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists what is supported and unsupported, giving clear context on scope. However, it does not provide explicit when-to-use or when-not-to-use guidance, nor does it mention alternatives such as linkedin_raw_request for more advanced updates. The usage is implied but lacks proactive decision support.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_deleteA
DestructiveIdempotent
Inspect

Delete a message from a thread. Supports Telegram, WhatsApp, and other connected channels. Note: Some channels have time limits on message deletion.

ParametersJSON Schema
NameRequiredDescriptionDefault
thread_idYesThread/channel ID containing the message
message_idYesID of the message to delete
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint and idempotentHint. Description adds channel-specific constraints and time limits, which is useful context. Does not elaborate on idempotency or other behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no extraneous information. Front-loaded with the core action. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter delete tool, description covers purpose, supported channels, and a constraint (time limits). No output schema, so return behavior is not needed. Could mention idempotency or success response, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. Description adds no additional parameter-level meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Delete a message from a thread' with a specific verb and resource. It mentions supported channels and time limits, distinguishing it from sibling message tools (send, forward, read history).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on supported channels (Telegram, WhatsApp, etc.) and a caveat about time limits, guiding appropriate use. However, no explicit comparison to alternatives or when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_forwardAInspect

Forward a message from one thread to another. Supports native Telegram forwarding (preserves original sender attribution) and text-based forwarding for cross-channel scenarios.

ParametersJSON Schema
NameRequiredDescriptionDefault
dest_thread_idNoDestination thread to forward into. Provide at least one of dest_thread_id or recipient_name. To forward into the active conversation, pass the current thread_id. (If both are provided, dest_thread_id wins and recipient_name is ignored.)
recipient_nameNoName of person to forward to (channel auto-resolved). Provide at least one of dest_thread_id or recipient_name. Use only when forwarding to a different contact than the current conversation.
source_thread_idYesThread containing the message to forward (e.g., 'telegram:123456' or numeric DB ID)
source_message_idYesID of the message to forward
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and openWorldHint=true. The description adds value by detailing that forwarding can be native (preserving attribution) or text-based, and implies state modification. It does not contradict annotations and provides meaningful behavioral context beyond the structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, directly stating purpose and then adding essential detail about modes. No extraneous information, perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has two optional parameters with detailed schema descriptions and no output schema, the description adequately covers the core functionality and forwarding modes. It could mention that forwarding creates a new message (modifying state), but annotations already cover non-read-only behavior. Slightly missing is mention of return value or success indicators, but this is not critical given the tool's simplicity and annotation coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add significant parameter-level meaning beyond what the schema's parameter descriptions already provide (e.g., the dest_thread_id/recipient_name logic is already explained in the schema). The description's mention of two forwarding modes only indirectly relates to parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Forward a message from one thread to another,' which is a specific verb-resource pair. It distinguishes the tool from siblings like messages_send by specifying forwarding behavior, and further differentiates two modes (native Telegram and text-based), making its purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides useful context on when to use each mode ('preserves original sender attribution' for native, 'cross-channel scenarios' for text-based), but does not explicitly contrast with alternatives like messages_send or state when not to use it. It gives good guidance but misses explicit exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_read_historyA
Read-onlyIdempotent
Inspect

Read messages from a conversation thread. Use text_contains to find specific messages by content. Returns the most recent messages, including sender info and timestamps.

Voice calls: each row carries a meta object with allowlisted keys (event_type ∈ 'call_started'|'call_ended'|null, source ∈ 'voice_transcript'|null, call_id, speaker_display_name, duration_seconds, outcome, direction) plus per-message channel. To find calls without scanning every row, use calls.list_history instead.

Usage:

  1. Get thread_id from threads.list first, OR

  2. Use contact_name to auto-resolve thread_id

Examples:

  • User: 'show me messages from chat with [contact]' → read_history(contact_name='[contact]', limit=10)

  • User: 'last 5 messages from thread 571' → read_history(thread_id=571, limit=5)

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of messages to return (default: 10, max: 100)
offsetNoNumber of messages to skip (for pagination, default: 0)
thread_idNoThread ID to read messages from (e.g., '571' or 'telegram:571'). Optional if contact_name provided.
contact_nameNoContact/thread name to search for (optional if thread_id provided). Example: 'Jane Smith', 'John Doe'
text_containsNoFilter: only return messages containing this text (case-insensitive substring match)
include_outgoingNoInclude messages sent by you (default: true)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and non-destructive. Description adds details on return ordering (most recent), voice call meta object structure, and example usage, going beyond the structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured: starts with summary, then voice call specifics, usage steps, and examples. Each sentence adds value without waste; front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description details return content (most recent messages, sender info, timestamps, voice call meta). Covers all 6 parameters and provides usage flow, making it fully complete for a read-only tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description explains the difference between thread_id and contact_name, emphasizes text_contains as a filter, and provides contextual examples, adding meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it reads messages from a conversation thread, distinguishes from text_contains usage, and differentiates from calls.list_history for call-related data. It specifies the return content includes sender info and timestamps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance (reading messages), when-not-to-use (calls: use calls.list_history), prerequisites (get thread_id or use contact_name), and includes examples for common use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_sendAInspect

Send a message to a thread, channel, or contact. Supports Telegram, Email, LinkedIn, and other connected channels. For LinkedIn posts (comment_thread kind), this posts a comment on the post. Can automatically resolve recipients and channels when not specified. Can send files/images/documents as attachments — pass attachments=[file_id, ...] with integer file IDs obtained from collections.list_files, workspace.search, or files.search. text is optional when attachments are provided.

ParametersJSON Schema
NameRequiredDescriptionDefault
ccNoEmail addresses to CC (carbon copy). Only for email channel.
bccNoEmail addresses to BCC (blind carbon copy). Only for email channel.
textNoMessage text to send. Optional if attachments provided.
formatNoMessage formattext
silentNoSend without notification
channelNoChannel hint (e.g. 'telegram'). Required when using recipient_username. Only 'telegram' is currently accepted for handle-based routing.
subjectNoEmail subject line. Required for new emails, optional for replies (auto-generates 'Re: ...').
thread_idNoTarget thread. OMIT to reply in the same chat you received the triggering message from — the backend defaults to the current thread. Pass an explicit value ONLY to reply in a DIFFERENT thread, and only use: (a) a numeric DB thread id from threads.list / workspace.search, or (b) a channel_ref like 'telegram:-12345'. NEVER use a chat-type word (dm, group, channel, livechat) — those are category labels from the SITUATION block, not ids.
attachmentsNoArray of integer file IDs to send as attachments (images, documents, any files). Get file IDs from collections.list_files (field `file_id`), workspace.search with scope=['files'] (field `file_id`), or files.search. Example: [302237]. The file must already exist in the workspace (status=ready) — no separate upload step needed. When attachments are provided, `text` becomes optional (a caption can be included alongside).
recipient_nameNoName of person to send to (e.g., 'Jane', 'John'). Tool will auto-resolve channel. Optional if thread_id provided.
recipient_emailNoEmail address to send to (e.g., 'john@example.com'). Creates new email thread. Only for email channel.
recipient_usernameNoTelegram @handle (e.g. '@smartdeveloper' or 'smartdeveloper'). Resolves or opens a DM without needing a DB thread_id. Requires channel='telegram'. Only Telegram supported in this release.
reply_to_message_idNoID of message to reply to (optional)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses automatic resolution of recipients and channels, attachment sending capabilities, and nuances of the thread_id parameter. Annotations are present (readOnlyHint=false, etc.) and consistent, with no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single paragraph but well-organized, front-loading the main action. Could be slightly more concise but remains informative and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, no required, no output schema), the description covers channels, attachments, recipient resolution, thread_id rules, and supported platforms comprehensively. No obvious gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant context beyond schema, especially for thread_id (OMIT for same chat, explicit usage rules) and attachments (file ID sources). Enhances understanding of parameter behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states the tool sends messages to threads, channels, or contacts across multiple platforms (Telegram, Email, LinkedIn). It specifies posting comments on LinkedIn posts, clearly differentiating from sibling tools like messages_delete or messages_forward.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context for when to use, such as automatic recipient resolution and attachment handling. However, it does not explicitly state when not to use this tool versus alternatives, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_deleteA
DestructiveIdempotent
Inspect

Delete a note by ID from the target notebook. Same identity rules as notes.save — agents can only delete from their own notebook.

ParametersJSON Schema
NameRequiredDescriptionDefault
note_idYesID of the note to delete
target_agent_idNoTarget notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true and idempotentHint=true. The description adds the identity restriction (only own notebook), which provides useful behavioral context beyond the annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the key action and identity rule. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple deletion tool with two parameters and no output schema, the description covers the essential purpose and identity constraints. Could mention permanence but not required given simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds little beyond the schema. It mentions 'target notebook' but the schema already documents target_agent_id with a description. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (delete) and resource (note by ID). It distinguishes from sibling tools like notes_save and notes_search by specifying 'from the target notebook' and referencing identity rules.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: agents can only delete from their own notebook, referencing notes.save rules. It explicitly states who can use it and under what conditions, though it doesn't name alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_recallA
Read-onlyIdempotent
Inspect

Recall notes from your notebook. By default returns only your own notes (all scopes, newest first). Pass filter_agent_id= to read another agent's notebook, or filter_agent_id="all" (or "*") to read across every agent in the workspace. Pass scope to narrow to global/thread/person. Each result includes agent_id and agent_name of the author.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyNoRecall a specific note by key
limitNoMax notes (default 20, max 50). Newest first.
scopeNoOptional filter: global | thread | person. Omit for all scopes.
scope_ref_idNoFilter by specific thread_id or person_id
filter_agent_idNoOptional. Omit to read only your own notes. Pass a numeric agent_id as a string (e.g. "57") to read another agent's notebook (read-only). Pass "all" or "*" to read across all agents in the workspace.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds value by detailing default return scope (own notes, all scopes, newest first) and how filtering affects results, including that each result includes agent_id and agent_name. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is four sentences long, well-structured, and front-loaded with the main purpose. Every sentence adds value without redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description mentions that results include agent_id and agent_name. It covers key behaviors and filtering, but does not detail other note fields (e.g., content, timestamp) or pagination behavior beyond the limit parameter. Slight gap but still fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with all 5 parameters described. Description adds meaning by explaining default behavior when parameters are omitted and special handling for filter_agent_id='all' or '*'. It clarifies that agent_id is passed as a string, which the schema does not specify.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Recall notes from your notebook' and specifies default behavior (own notes, all scopes, newest first). It distinguishes this from sibling tools like notes_search and notes_delete/save by focusing on retrieval with filtering options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides explicit guidance on when to use (recalling notes) and how to filter by agent_id and scope. It explains defaults and special values like 'all' for cross-agent reading. However, it does not explicitly mention alternatives or when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_saveAInspect

Save a fact or note into the agent's memory. Use scope to choose visibility: 'workspace' = visible to every agent in this workspace (use for shared facts, project conventions); 'agent' = private to this agent (use for personal working notes); 'thread' = scoped to one conversation (use for thread-specific reminders); 'person' = scoped to one contact (use for per-contact context). If a note with the same key+scope exists it will be updated. Do NOT use this tool for behavioral rules or corrections — use feedback.save for those.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesShort identifier for this note (must not start with '__' — reserved)
scopeYesScope of the note. 'workspace' = shared across all agents; 'agent' = private to this agent (was 'global' pre-PR1); 'thread' = per-conversation; 'person' = per-contact. 'global' is accepted as a deprecation alias for 'agent'.
valueYesThe note content
pinnedNoPin this note so it's always loaded first. Default false.
scope_ref_idNoReference ID — thread_id (for scope=thread) or person_id (for scope=person). Required for thread/person scope. In MCP mode (no thread context), must be passed explicitly.
target_agent_idNoTarget notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks. Ignored when scope='workspace' (workspace memory is shared).
expires_in_hoursNoAuto-delete after N hours. Omit for permanent notes.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses the upsert behavior (updates if key+scope exists) and auto-delete via expires_in_hours. Annotations are generic (readOnlyHint=false, etc.), but the description adds valuable behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but every sentence provides necessary information. It front-loads the core action and scope selection. Minor redundancy? It's very thorough, but could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters and no output schema, the description covers all essential aspects: scope explanations, upsert, expiration, MCP mode, and directs to feedback.save for behavioral rules. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value: explains scope meanings, deprecation alias 'global', requirements for scope_ref_id, MCP mode notes for target_agent_id, and applicability of expires_in_hours. This goes well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool saves a fact or note into the agent's memory. It distinguishes from siblings like feedback_save by explicitly saying not to use it for behavioral rules. The verb and resource are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use (saving facts/notes) and when-not-to-use (behavioral rules, refer to feedback.save). It also details scope choices with examples, helping the agent decide which scope to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_getA
Read-onlyIdempotent
Inspect

Get full content of a prompt template: system instructions (prompt_text) and auto-reply rules.

Run prompts.list first to find the prompt_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
prompt_idYesID of the prompt template to fetch
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent behavior. Description adds specificity about returned content (system instructions and auto-reply rules), enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-loading purpose and usage guidance, with no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple getter tool with one parameter and robust annotations, the description provides sufficient context. Lacks output schema but specifies return content sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers the single parameter with adequate description. The description adds value by linking to prerequisite (prompts.list) but doesn't elaborate on parameter formatting or validation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (get full content) and resource (prompt template), listing specific fields (prompt_text, auto-reply rules). Distinguishes from siblings like prompts_list and prompts_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to run prompts.list first to obtain prompt_id, providing clear context for when to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_listA
Read-onlyIdempotent
Inspect

List all prompt templates in this workspace.

Returns id + name + description + category so you know which prompt_id to use in prompts.get or prompts.update.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations (readOnlyHint, destructiveHint, idempotentHint) and adds value by specifying the exact fields returned, which is beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the primary action, and includes necessary detail without any superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters, no output schema, and no nested objects, the description adequately covers the tool's function and output. Minor improvement could include mentioning ordering or filtering, but not essential.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters, so the description does not need to add parameter information. Baseline score of 4 is appropriate for no parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists all prompt templates in the workspace and specifies the returned fields (id, name, description, category), directly linking to the use case of identifying prompt_id for prompts.get or prompts.update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use the tool: to obtain prompt_id for subsequent calls to prompts.get or prompts.update. Although it does not explicitly mention when not to use it, the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_historyA
Read-onlyIdempotent
Inspect

List past versions of a prompt template's prompt_text. Every edit is snapshotted to an append-only table — use this to browse history and find a version_number for prompts.prompt_restore.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax versions to return (1-200, default 50)
prompt_idYesID of the prompt template
before_versionNoCursor: return versions strictly below this version_number
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, but the description adds valuable context: 'Every edit is snapshotted to an append-only table', explaining the immutable and historical nature of the data, which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, with the first sentence immediately stating the purpose and the second adding context and linking to a sibling tool. Every sentence is necessary and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a list-history tool with full schema coverage and annotations, the description is fairly complete. It explains the append-only storage and the link to restore, but could mention ordering (e.g., descending by version) for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add significant new meaning beyond the schema descriptions (e.g., 'before_version' is already described as a cursor).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the verb 'List' and the resource 'past versions of a prompt template's prompt_text', distinguishing it from sibling tools like prompts_get (current prompt) and prompts_list (all prompts). It also references the related restore tool, clarifying its role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly indicates when to use the tool ('to browse history and find a version_number for prompts.prompt_restore'), providing context for its use case. However, it does not explicitly state when not to use it or list alternative tools, though the sibling context implies alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_restoreAInspect

Restore a past version of a prompt template by version_number. Creates a new version pointing at the restored content — history is preserved. Fans out to every agent using this template without a per-agent override; the response includes affected_agents as a receipt of the fan-out.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoOptional: why this restore is happening (shows up in history UI)
prompt_idYesID of the prompt template
version_numberYesThe version_number to restore (get it from prompts.prompt_history)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it creates a new version (preserving history), fans out to all agents without per-agent override, and includes affected_agents in the response. Annotations only indicate false hints for readOnly, openWorld, idempotent, and destructive—so the description fills crucial gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no redundancy. First sentence states the action; second explains the non-destructive behavior; third covers the fan-out and response. All content earns its place, and the key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description adequately explains the return value (affected_agents). It covers the main behavior, side effects, and constraints. Minor gap: no mention of error conditions or permissions, but overall sufficient for a restore operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description reinforces the version_number usage (e.g., 'get it from prompts.prompt_history') and notes the response includes affected_agents. While helpful, it does not substantially surpass the schema's explanations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool restores a past version of a prompt template by version_number, creating a new version. This distinguishes it from sibling tools like prompts_update (which modifies current version) and prompts_prompt_history (which lists history). The title 'Restore Prompt Template' reinforces the purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the primary use case—restoring a past version. It notes the fan-out behavior and the affected_agents receipt, which helps agents understand impact. However, it does not explicitly state when not to use it (e.g., for per-agent overrides) or direct alternatives, so it misses full exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_updateAInspect

Update a prompt template's name, system instructions, or auto-reply rules.

Changes affect every agent using this template, unless the agent has its own override (set via agents.update → prompt_text).

All parameters except prompt_id are optional — only provided fields are updated.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew name for the prompt template
prompt_idYesID of the prompt template to update
descriptionNoNew description for the prompt template
prompt_textNoThe AI system prompt: persona, tone, rules, behavior.
auto_reply_rulesNoPre-classifier rules that run BEFORE the main AI. Format: bullet list of conditions → actions (SKIP / SIMPLE_REPLY / SEARCH / CALENDAR). Pass null to clear.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral context beyond annotations: explains that updates propagate to agents unless overridden, and that auto_reply_rules can be cleared by passing null. Annotations indicate mutation but not destruction; description aligns with that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no redundant wording. Each sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and full parameter descriptions, the description covers all necessary contextual aspects: effect scope, optionality, and special parameter behavior. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds value by explaining that only provided fields are updated and clarifies the special handling of auto_reply_rules (null to clear). This provides useful context beyond what parameter descriptions offer.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates a prompt template's name, system instructions, or auto-reply rules, specifying the resource ('prompt template') and action ('update'), distinguishing it from sibling tools like prompts_get or prompts_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that changes affect all agents using the template unless overridden at agent level (via agents.update). Also notes all parameters except prompt_id are optional, guiding usage. Does not explicitly exclude alternative tools but offers sufficient context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_cancelBInspect

Cancel an active reminder by its trigger ID.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idNoAgent ID (required when calling from MCP; ignored in agentic mode).
trigger_idYesID of the reminder to cancel
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description is minimal, not adding behavioral context beyond annotations. Annotations indicate it is not read-only, not idempotent, and not destructive (though cancelling may be considered destructive). The description does not clarify side effects or permissions needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no wasted words. However, it could benefit from slight restructuring to improve readability (e.g., front-loading the action).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description omits critical information such as return values, failure conditions, or constraints (e.g., whether it can cancel reminders already triggered). This makes it incomplete for a cancellation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. The description does not add additional meaning beyond what the schema already provides, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Cancel') and the target resource ('active reminder'), and specifies the required identifier ('trigger ID'). It distinctly differentiates from sibling tools like reminder_set and reminder_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, such as when a reminder should be cancelled versus modified. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_listA
Read-onlyIdempotent
Inspect

List your active reminders (both one-time and recurring).

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax results (default 20)
agent_idNoAgent ID (required when calling from MCP; ignored in agentic mode).
thread_idNoFilter by thread
include_firedNoInclude already-fired one-time reminders (default false)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint. Description adds scope (active, one-time+recurring) but does not disclose other behavioral traits like pagination or ordering. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, perfectly concise, front-loaded with the action and resource. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple list tool with clear annotations and full schema descriptions. Description explains what is listed (active, both types), which is sufficient. Could mention include_fired parameter but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds no additional parameter information beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists active reminders, both one-time and recurring. Distinguishes from sibling tools reminder_set and reminder_cancel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Does not mention exclusions or context for choosing between reminder_list and reminder_cancel/reminder_set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_setBInspect

Schedule a reminder. One-time reminders fire at a specific datetime. Recurring reminders fire on a schedule (daily, weekly, every N days, or every N minutes). Optionally scope to a thread or target another agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
timeNoTime of day HH:MM for daily/weekly/every_n_days (e.g. '09:00'). Required for daily/weekly/every_n_days.
reasonYesWhat this reminder is for (you'll see this when it fires)
agent_idNoAgent ID (required when calling from MCP; ignored in agentic mode).
datetimeNoISO datetime for one_time (e.g. '2026-04-01T09:00:00+03:00'). Required for one_time.
timezoneNoIANA timezone (e.g. 'Europe/Moscow'). Defaults to UTC.
thread_idNoOptional thread ID to scope the reminder to. Omit for workspace-level reminders.
days_of_weekNoDays for weekly: 0=Mon, 1=Tue, 2=Wed, 3=Thu, 4=Fri, 5=Sat, 6=Sun. Required for weekly.
interval_daysNoFor every_n_days: fire every N days (min 2).
schedule_typeYesone_time = fires once at datetime. daily = fires daily at time. weekly = fires on specific days_of_week at time. every_n_days = fires every N days at time. interval = fires every N minutes.
interval_minutesNoFor interval: fire every N minutes (5-1440).
target_agent_slugNoOptional: activate a different staff member instead of yourself when the reminder fires.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate this is a write operation (readOnlyHint=false) and not destructive (destructiveHint=false). The description adds scheduling details but doesn't disclose edge cases like overriding existing reminders or behavior on conflict. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences, front-loaded with the main action. No unnecessary words. Efficient structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters and no output schema, the description lacks information about return values (e.g., reminder ID) and potential limitations (e.g., maximum reminders per user). Could be more complete for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions, so the description adds little beyond mentioning optional scoping (thread_id, target_agent_slug). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Schedule a reminder' and explains one-time vs recurring types. However, it does not explicitly differentiate from sibling tools like reminder_cancel or reminder_list, so it misses a chance to distinguish its role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives (e.g., when to choose one_time vs daily). No prerequisites or context about required permissions or effect on existing reminders.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

system_sleepA
Read-onlyIdempotent
Inspect

Pause execution for a given number of seconds (max 30). Use when you need to wait for an external process to complete before retrying — e.g. message sync, backfill, or API propagation. Total sleep per run is capped at 60 seconds.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoWhy you are waiting (logged for debugging)
secondsYesNumber of seconds to sleep (1-30)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description doesn't need to restate those. It adds valuable behavioral details: max 30 seconds per sleep, total cap of 60 seconds per run, which go beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The action and purpose are front-loaded. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple pause tool with no output schema, the description covers the essential: what it does, why use it, and the limits. No missing behavioral context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with descriptions for both parameters. The description adds minimal extra meaning—mentions 'max 30' which is already in schema. Baseline 3 is appropriate as the description doesn't significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Pause execution for a given number of seconds (max 30)' and specifies the purpose 'wait for an external process to complete before retrying'. This is a specific verb-resource pair with clear scope, and it distinguishes itself from sibling tools like agent_handoff or web_fetch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit use case given: 'Use when you need to wait for an external process to complete before retrying — e.g. message sync, backfill, or API propagation.' It also mentions the total sleep cap. No explicit when-not, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_createBInspect

Create a new task in your to-do list.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleYesTask title
due_atNoISO datetime when task is due (e.g. '2026-03-31T15:00:00')
agent_idNoAgent ID whose tasks to access. Required when calling from MCP.
due_dateNoDate when task is due (e.g. '2026-03-31'). Use with due_time or alone.
due_timeNoTime when task is due (e.g. '15:00'). Used with due_date.
priorityNoTask priority (default: medium)
thread_idNoRelated thread ID
descriptionNoDetailed description
assigned_to_contact_idNoContact ID if assigned to someone
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate this is a write operation (readOnlyHint=false). The description merely restates 'create' without adding details like side effects, return behavior, or required permissions. No additional behavioral context is provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words. However, it is so minimal that it could be expanded slightly to include return value or usage context without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description should hint at what is returned (e.g., task ID or object). It does not. Also, with many sibling tools, a brief note on use cases would enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all 9 parameters, so the schema itself documents parameter meaning. The tool description adds no extra semantics beyond 'create a task', which is already implied. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the action 'Create' and the resource 'task', clearly indicating that this tool adds a new item to the to-do list. It naturally distinguishes itself from siblings like tasks_update and tasks_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as tasks_update for modifying existing tasks. It lacks any contextual hints about prerequisites or preferred scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_deleteBInspect

Delete a task from your to-do list by its ID.

ParametersJSON Schema
NameRequiredDescriptionDefault
task_idYesID of the task to delete
agent_idNoAgent ID whose task to delete. Required when calling from MCP.
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description claims a destructive action ('Delete'), but the annotation 'destructiveHint' is false, creating a direct contradiction. According to scoring rules, this warrants a score of 1 with an annotation contradiction flag.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence of 10 words. It contains no extraneous information and efficiently communicates the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with full schema and no output schema, the description covers the basic purpose. However, it omits mention of irreversibility or side effects, and the annotation contradiction diminishes completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with complete parameter descriptions. The tool description does not add additional meaning beyond what the schema already provides, so the baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete'), the resource ('a task from your to-do list'), and the identifier ('by its ID'). This is specific and distinguishes it from sibling tools like tasks_update or tasks_create.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites or exclusions. It only states what the tool does without context of when it is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_listA
Read-onlyIdempotent
Inspect

List your tasks, or another agent's tasks (read-only) using from_agent_id. Use filters to narrow results.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax results (default 20)
statusNo
overdueNoIf true, only return tasks past due_at that are not done
agent_idNoAgent ID whose tasks to list. Required when calling from MCP.
thread_idNoFilter by related thread
from_agent_idNoList tasks of another agent (read-only). Omit to list your own.
assigned_to_contact_idNoFilter by assigned contact
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only; description adds clarity on read-only scope for other agents' tasks. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, efficient and front-loaded. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main purpose, key parameter, and filtering. Lacks return format or pagination details, but adequate for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 86% (>80%), so baseline 3. Description adds context on from_agent_id but doesn't detail other parameters beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists tasks, distinguishes own tasks from another agent's tasks (read-only), and mentions filters. Differentiates from create/update/delete siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (list tasks, including for another agent) and that it's read-only. Lacks explicit alternatives but sibling names imply them.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_updateAInspect

Update an existing task. Set status='done' to complete it, 'cancelled' to cancel. Use summary for completion notes.

ParametersJSON Schema
NameRequiredDescriptionDefault
due_atNoISO datetime
statusNo
summaryNoCompletion note (stored when marking done)
task_idYesID of the task to update
agent_idNoAgent ID whose task to update. Required when calling from MCP.
priorityNo
descriptionNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false and destructiveHint=false. Description adds that updating status to 'done' completes the task and summary is stored, which is useful context but does not cover all behavioral traits like permissions or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with primary action, followed by specific usage details. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers key behaviors for an update tool with 7 parameters, including status handling. Lacks explanation of all parameters (e.g., description, priority), but schema partially fills gaps. No output schema, but return values are implied.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 57%, the description adds meaning by explaining how to use status and summary in context, beyond what the schema provides. It clarifies the purpose of these parameters in the tool's workflow.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Update an existing task' with specific verb and resource, and provides concrete examples of status values and completion notes. It clearly distinguishes this from sibling tools like tasks_create and tasks_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on using status='done' or 'cancelled' and mentioning summary for completion notes. However, it does not contrast with siblings or exclusion scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

threads_listA
Read-onlyIdempotent
Inspect

List conversation threads with previews and metadata. Use before messages.read_history to resolve thread_id. Returns: id, title, last message, timestamp, unread count.

ParametersJSON Schema
NameRequiredDescriptionDefault
kindNoThread type. OMIT to include DMs, groups, and channels.
limitNoMaximum threads to return.
orderNoSort order.desc
channelNoFilter by channel. OMIT to search across ALL channels — restricting to one channel is a common cause of zero-result mistakes.
order_byNoSort field.last_message_at
only_unreadNoOnly threads with unread messages. OMIT to include read threads too.
include_archivedNoInclude archived threads. OMIT to hide archived (the safe default).
participant_nameNoFilter threads by participant name. OMIT to list all threads regardless of participant.
max_inactive_daysNoUser sent a message within the last N days (recently-active filter). OMIT to include threads regardless of recent user activity.
min_inactive_daysNoUser's last outgoing message older than ≥N days (dormant filter). OMIT to include threads regardless of last outgoing activity.
user_sent_messageNoTRUE = only threads where the user has sent ≥1 outgoing message. FALSE = only threads where the user has NEVER sent. OMIT to include both.
min_last_message_daysNoLast message (from anyone) older than ≥N days. OMIT to skip this filter.
participant_contact_idNoentity_id from contacts.find — returns all threads where this contact is an active participant. OMIT to skip this filter.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. Description adds that it returns id, title, last message, timestamp, unread count. No contradictions; adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states purpose, second gives usage guidance and return fields. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While there is no output schema, the description lists expected return fields. For a tool with 13 well-documented parameters and clear annotations, this is sufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage with detailed explanations for all 13 parameters. The tool description does not add parameter-specific details beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List conversation threads with previews and metadata' and links to a downstream tool (messages.read_history). Purpose is well-defined, but not explicitly differentiated from all sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use before messages.read_history to resolve thread_id', providing a concrete usage context. Does not state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

threads_updateAInspect

✏️ Update a conversation thread: rename it, add notes/description, or move to a folder.

When to use:

  • User wants to rename a chat or group

  • User wants to add notes/context about a conversation

  • User wants to organize threads into folders

For DM threads, renaming also updates the linked contact's display name by default. Requires thread_id from threads.list.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNoNew title for the thread (max 255 chars)
folder_idNoMove thread to this folder (null removes from folder)
thread_idYesThread ID from threads.list
descriptionNoAI context / notes for this thread. Empty string clears description.
update_contactNoFor DM threads, also rename the linked contact (default: true)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds key behavioral context: DM thread renaming also updates linked contact display name by default. No contradictions with annotations. Could mention side effects like folder removal via null folder_id.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with main action and clear bullet list for usage. Efficient but slightly redundant with schema info. Not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main use cases and special DM behavior. No output schema, but update tools often return simple confirmation. Missing details on error handling or folder removal, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 5 parameters with descriptions. Description reinforces requirement of thread_id and default behavior for update_contact but adds limited new meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'update' and resource 'conversation thread' with specific actions (rename, add notes/description, move). Distinguishes from sibling threads_list and other similar tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use' list covering three common scenarios. Mentions prerequisite (thread_id from threads.list). Lacks explicit 'when not to use' but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

videos_generateAInspect

Generate a short video (5-10s) from a text prompt using BytePlus Seedance. Optionally accepts up to 12 image file IDs from the user's attached files (visible in the [ATTACHMENTS] block) as reference_file_ids for style and composition. Returns immediately with a job_id; the video is delivered back via continuation when the job completes (~30-90s for fast model, ~2-5min for pro). Reference images are temporarily re-hosted on a third-party CDN (imgbb) for the duration of generation and deleted on completion — don't submit confidential references. Gated behind a workspace opt-in flag.

ParametersJSON Schema
NameRequiredDescriptionDefault
modelNoSeedance model variant. 'seedance-2-fast' (~30-90s, lower cost) or 'seedance-2-pro' (~2-5min, cinematic quality, native audio). Default: 'seedance-2-fast'.seedance-2-fast
promptYesText description of the video to generate (3-4000 chars).
durationNoOutput video duration in seconds. Must be 5 or 10. 10s costs 2x the 5s price.
aspect_ratioNoOutput aspect ratio.16:9
generate_audioNoWhether the model should produce native audio (Pro only — Fast ignores the flag).
reference_file_idsNoOptional list of up to 12 image file_ids to use as visual references (style, composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif). Get IDs from the [ATTACHMENTS] block, files.search, or workspace.search.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no informative annotations, the description compensates well by disclosing: async job_id return, estimated times for fast/pro models, CDN re-hosting and deletion of reference images, and workspace opt-in requirement. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of four sentences, front-loading the core purpose and then adding key details. Every sentence is necessary and well-placed, with no extraneous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description covers essential aspects: async delivery, timing, reference image handling, and gating. It might lack error handling details but is sufficient for a generation tool of moderate complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds value beyond the 100% schema coverage by explaining reference_file_ids usage (up to 12, from attachments, for style/composition), the CDN handling, and that duration 10s costs double. This enriches the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'generate', the resource 'short video', and the input 'from a text prompt using BytePlus Seedance'. It distinguishes from sibling tools like images_generate by focusing on video generation and mentions optional reference image IDs from attachments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides practical guidance: specifies optional reference files from attachments, mentions the async nature and delivery method, warns against confidential references, and notes workspace opt-in gating. It does not explicitly contrast with alternatives but is sufficient for the context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vision_queryA
Read-onlyIdempotent
Inspect

Look at the screen currently being shared in a meeting and answer a question about it. Returns a natural-language answer based on the visual content. Use ONLY when the user explicitly asks about the screen/slide/document being shown.

ParametersJSON Schema
NameRequiredDescriptionDefault
questionYesQuestion about the shared screen.
image_b64NoBase64-encoded JPEG image of the screen-share frame.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, indicating safe, read-only behavior. The description adds context about returning a natural-language answer but does not disclose potential failure modes (e.g., no screen shared, ambiguous question) or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states function, second provides usage restriction. No filler, front-loaded, and every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the primary use case (visual QA on shared screen) and return type (natural-language answer). However, it does not clarify that image_b64 is optional or how the tool behaves when no image is provided (e.g., auto-captures vs. requires input). Given no output schema and simple parameters, it is minimally adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions ('Question about the shared screen.' and 'Base64-encoded JPEG image of the screen-share frame.'). The description adds no additional parameter info, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('look at' / analyze), resource ('screen being shared in a meeting'), and action ('answer a question'). It distinguishes itself from sibling tools like images_generate or images_search by explicitly focusing on shared screen content in meetings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: when the user explicitly asks about the screen/slide/document being shown. This implies when not to use (e.g., other queries, or no screen share). Could be more explicit about avoiding use for non-screen visual queries, but still clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_fetchAInspect

Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL.

Modes (extract):

  • 'auto' (default): picks the right mode based on response content type.

  • 'markdown': for HTML pages; returns cleaned markdown plus the page .

  • 'text': for JSON/XML/plaintext APIs; returns the raw decoded body.

  • 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read.

Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn.

Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch (http or https). Must be publicly reachable.
extractNoHow to handle the response: 'auto' (default), 'markdown' (HTML → markdown), 'text' (raw body), or 'file' (ingest as binary, return file_id).auto
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are present and description adds significant behavioral context: explains each mode's output (markdown with title, text raw body, file returns file_id and ingests bytes), and notes that files.upload is async. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose, then modes, then comparisons. Slightly verbose in listing all modes but each sentence adds unique value. Could be slightly more concise but still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description fully covers return values for each mode. Addresses when to use vs alternatives, async considerations, and file storage behavior. Complete for a moderately complex fetch tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage. Description adds meaning by clarifying that url must be publicly reachable, and expanding on each extract mode's purpose and return value, especially the file mode which returns a file_id for immediate use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches a single URL and returns content, with specific use cases like reading a link from web.search or a user-pasted URL. It explicitly distinguishes from sibling tools web_search and files.upload, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use the tool (specific URL known) and when not to (use web.search for finding URLs, use files.upload for async file uploads). Also details mode selection based on content type, leaving no ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_createAInspect

Create a new livechat widget for your website.

The widget will be created with default settings. You can customize theme, auto-reply mode, and more.

Use this when user wants to add a chat widget to their site.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesName for the widget (e.g., 'Website Chat', 'Support Widget')
positionNoWidget position on screenbottom-right
display_modeNoVisual mode of the widget. Pick exactly one: - 'chat' (default): full chat panel + voice mic — use for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget (e.g. 'just a voice button', 'no chat, just call'). - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design' / 'no widget chrome'.chat
header_titleNoTitle shown in chat headerChat with us
primary_colorNoPrimary color for widget theme (hex, e.g., '#2563eb')#2563eb
auto_reply_modeNoAuto-reply mode: 'draft' (review before sending) or 'auto' (send immediately)draft
voice_button_labelNoLocalized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if omitted.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false (mutation) and destructiveHint=false (non-destructive). The description adds that the widget is created with default settings and customizable options, but does not disclose additional behavioral traits like response format or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, front-loading the action verb 'Create.' Every sentence adds value, avoiding redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 7-parameter complexity and absence of an output schema, the description covers the main purpose and customization options. However, it lacks information about what is returned after creation, which could be helpful for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all parameters. The description mentions 'customize theme, auto-reply mode, and more,' which summarizes some parameters but does not add new information beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Create a new livechat widget for your website,' which is a specific verb+resource action. Among sibling tools like widgets_delete and widgets_update, it clearly distinguishes itself as the creation tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes 'Use this when user wants to add a chat widget to their site,' providing clear context for when to use the tool. It does not explicitly state when not to use it or mention alternatives, but the purpose is direct and unambiguous.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_deleteA
DestructiveIdempotent
Inspect

Delete a livechat widget permanently.

This will remove the widget and its embed code will stop working. Existing chat history will be preserved.

Use this when user wants to remove a chat widget.

ParametersJSON Schema
NameRequiredDescriptionDefault
widget_idYesID of the widget to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint=true and idempotentHint=true. Description adds valuable context: permanent removal, embed stop, chat history preserved. This exceeds what annotations alone provide and does not contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three short sentences, front-loaded with the primary action. Every sentence adds necessary information without redundancy. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one required parameter, no output schema, and clear annotations, the description covers purpose, effect on other systems, and usage guidance comprehensively. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with widget_id described. Description does not add extra semantic details beyond the schema. Baseline 3 applies as schema already provides sufficient meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'Delete a livechat widget permanently', identifying the verb (delete) and the resource (livechat widget). It distinguishes from siblings like widgets_create, widgets_get, etc., which have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear usage context: 'Use this when user wants to remove a chat widget.' It also implies when not to use (if the user wants to keep the widget or needs a reversible action). However, it does not explicitly mention alternative tools like widgets_update or disabling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_getA
Read-onlyIdempotent
Inspect

Get full configuration of a single livechat widget.

Returns all settings including theme, identification, actions, and more.

Use this when user wants to see or verify a specific widget's settings.

ParametersJSON Schema
NameRequiredDescriptionDefault
widget_idYesID of the widget to retrieve
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the tool is safe. The description adds value by specifying what is returned (theme, identification, actions, etc.), which aligns with annotations and provides extra context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, each adding value: tool purpose, return details, usage guidance. No wasted words, clearly structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get tool with one parameter, no output schema, and comprehensive annotations, the description is complete. It tells what it does, what it returns, and when to use it, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with a single required parameter widget_id. The description does not add meaning beyond the schema's own description of 'ID of the widget to retrieve.' Baseline 3 is appropriate as schema does the work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'get' and the resource 'widget', and specifies it retrieves full configuration of a single livechat widget. It distinguishes from siblings like widgets_list (which lists all widgets) by focusing on a single widget.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit context: 'Use this when user wants to see or verify a specific widget's settings.' It does not explicitly mention when not to use or alternatives, but the context is clear enough to differentiate from other widget tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_get_embed_codeA
Read-onlyIdempotent
Inspect

Get the embed code snippet for a livechat widget.

Returns HTML/JavaScript code to add to your website. The code should be placed before the closing tag.

Use this when user wants to install the chat widget on their site.

ParametersJSON Schema
NameRequiredDescriptionDefault
widget_idYesID of the widget to get embed code for
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description adds that it returns HTML/JavaScript code and specifies placement before closing </body> tag, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, front-loaded with the main purpose. Every sentence adds value, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, but the description explains the return value (HTML/JavaScript code) and provides placement instructions. This is complete for a simple retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter (widget_id) with clear description. Schema coverage is 100%, so the description adds no new parameter details. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get the embed code snippet for a livechat widget', specifying the exact action and resource. It differentiates from sibling tools like widgets_get or widgets_create by focusing on embed code retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit usage guidance: 'Use this when user wants to install the chat widget on their site.' While it does not contrast with alternative tools, the context is clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_listA
Read-onlyIdempotent
Inspect

List all livechat widgets.

Returns widgets with their configuration, embed code, and status.

Use this when user wants to see their widgets or chat widgets.

ParametersJSON Schema
NameRequiredDescriptionDefault
active_onlyNoOnly return active widgets. OMIT to include inactive widgets too.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false, so the safety profile is covered. The description adds value by specifying return fields (configuration, embed code, status), but this is not critical for behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Remarkably concise: three sentences covering purpose, return values, and usage context. No extraneous words, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool, the description is complete. It explains what is returned despite no output schema, and the parameter is fully documented. No additional context needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage for the single parameter (active_only). The description does not add additional meaning beyond the schema's own description, so it meets the baseline without exceeding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('list') and resource ('all livechat widgets'), and specifies the return contents (configuration, embed code, status). It effectively distinguishes from sibling tools like widgets_get which focus on single widgets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage context: 'Use this when user wants to see their widgets or chat widgets.' While it does not mention when not to use it, the context is clear and implicitly differentiates from related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_updateAInspect

Update an existing livechat widget configuration.

You can change name, theme, auto-reply mode, and other settings. Only provided fields will be updated.

Use this when user wants to modify their chat widget settings.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew name for the widget
positionNoWidget position on screen. OMIT to leave the position unchanged.
is_activeNoEnable or disable the widget. OMIT to leave the active flag unchanged.
widget_idYesID of the widget to update
website_urlNoWebsite URL for product/site search integration
calendly_urlNoBooking URL for calendar action (e.g., 'https://calendly.com/yourname')
color_schemeNoWidget color scheme. 'auto' follows the visitor's OS dark/light mode preference. OMIT to leave the color scheme unchanged.
display_modeNoVisual mode of the widget. Pick exactly one: - 'chat': full chat panel + voice mic — default for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget. - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design'. OMIT to leave the display mode unchanged.
header_titleNoTitle shown in chat header
greeting_textNoCustom greeting message shown when visitor opens the chat (e.g., 'Hello! How can I help you today?')
primary_colorNoPrimary color for widget theme (hex, e.g., '#2563eb')
voice_greetingNoSpoken opening line when a visitor starts a voice call through this widget. Played via TTS before the AI model runs. Empty string disables the greeting.
allowed_domainsNoList of allowed domains for the widget
auto_reply_modeNoAuto-reply mode: 'draft' or 'auto'. OMIT to leave the auto-reply mode unchanged.
header_subtitleNoSubtitle shown in chat header
greeting_enabledNoEnable or disable the proactive greeting. OMIT to leave this flag unchanged.
greeting_behaviorNonotification = show badge after delay; auto_open = open widget automatically after delay; on_open = greet only when visitor manually opens. OMIT to leave the greeting behavior unchanged.
enable_form_actionNoEnable or disable the contact form action button. OMIT to leave this flag unchanged.
voice_button_labelNoLocalized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if not set.
contact_form_fieldsNoFields to collect in contact form (e.g., ['name', 'email', 'phone'])
enable_search_actionNoEnable or disable the search action button. OMIT to leave this flag unchanged.
show_visitor_historyNoShow full chat history to returning visitors. OMIT to leave this flag unchanged.
identification_fieldsNoFields to require for visitor identification (e.g., ['name', 'email'])
enable_calendar_actionNoEnable or disable the calendar booking action button. OMIT to leave this flag unchanged.
greeting_delay_secondsNoDelay in seconds before the proactive greeting appears (0–300). 0 = send immediately on page load. Default: 30.
require_identificationNoRequire visitor to identify before chatting. OMIT to leave the identification policy unchanged.
returning_greeting_textNoGreeting for returning visitors who already have chat history (e.g., 'Welcome back! How can I help you today?'). Falls back to greeting_text if not set.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a mutation (readOnlyHint=false). The description adds that only provided fields are updated, which is useful. No mention of auth or rate limits, but annotations don't either.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with the main purpose upfront, no unnecessary words. Efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with many parameters and no output schema, the description provides a reasonable overview but lacks detail about the response or side effects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for 27 parameters, so description doesn't need to detail each. The description gives a high-level summary, which is adequate but not extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates an existing livechat widget configuration, with specific fields like name, theme, auto-reply mode. It distinguishes from sibling tools like widgets_create and widgets_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use this when user wants to modify their chat widget settings.' This is clear but could explicitly exclude other operations like creating or deleting.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_currentA
Read-onlyIdempotent
Inspect

Return the workspace this MCP API key is currently routed to, with the caller's role inside it. Use this to confirm context before/after workspace.switch.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint as safe. Description adds minor context about the return value (workspace + role) but does not disclose additional behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key information, no unnecessary words. Every sentence earn its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters, no output schema, and strong annotations, the description provides sufficient purpose and usage context for a simple read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema coverage is 100%. Description does not need to add parameter info; baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Return' and clearly identifies the resource: the current workspace with the caller's role. It distinguishes itself from sibling tools like workspace_switch (which changes workspace) and workspace_list (which lists all workspaces).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use this to confirm context before/after workspace.switch', providing a clear usage scenario and when to employ this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_listA
Read-onlyIdempotent
Inspect

List every workspace the caller is a member of, with is_current marking the workspace this MCP key is currently routed to. Pair with workspace.switch to change the active workspace without reconnecting.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent behavior. Description adds value by specifying the `is_current` marking of the workspace the MCP key is routed to, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the core action, no superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter, no-output-schema tool, the description fully covers purpose, usage, and output details. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters in schema, so the description's role is reduced. It clearly explains the output structure (list with `is_current` field), effectively covering what would otherwise be parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List every workspace the caller is a member of' with specific verb and resource, and mentions the `is_current` field. It distinguishes from sibling tools like `workspace.switch` and `workspace.current`.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises pairing with `workspace.switch` to change active workspace, providing clear context for when to use this tool and how it relates to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_switchAInspect

Re-point the active MCP API key to a different workspace. Pass exactly one of workspace_id or slug (find them via workspace.list). Takes effect on the very next tool call — no MCP reconnect, no new API key. Sequential checkpoint: do not parallelize tool calls across a switch — calls already in flight when the switch commits will run against the previous workspace.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugNoWorkspace slug to switch to. Resolved within the caller's memberships, so cross-tenant slug collisions are not possible. Mutually exclusive with `workspace_id`.
workspace_idNoNumeric workspace id to switch to. Mutually exclusive with `slug`.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the switch takes effect on the next tool call, requires no reconnect or new API key, and that in-flight calls run on the previous workspace—adding context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no redundant information. Every sentence serves a clear function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, behavioral notes, and constraints. No output schema needed; the description is complete for this simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already covers parameters fully (100% coverage). Description adds value by reiterating mutual exclusivity and how to obtain values via workspace.list, but does not deepen semantic meaning beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Re-point the active MCP API key to a different workspace') and the resource, distinguishing it from sibling tools like workspace_current or workspace_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to pass exactly one of workspace_id or slug, referencing workspace.list for discovery, and warns about sequential checkpointing (no parallelization across switch).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_commentA
DestructiveIdempotent
Inspect

Permanently delete a YouTube comment by id (or 'youtube:comment:'). Cannot be undone. Costs 50 quota units.

ParametersJSON Schema
NameRequiredDescriptionDefault
comment_idYesBare commentId OR 'youtube:comment:<id>'.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description reinforces the destructiveHint annotation with 'permanently delete' and 'Cannot be undone', and adds quota cost information beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action, no filler. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with one parameter and no output schema, the description covers purpose, identifier format, permanence, and quota cost—sufficient for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter with 100% schema coverage. The description restates the identifier format provided in the schema, adding no new semantic meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (permanently delete), resource (YouTube comment), and identifier format (id or youtube:comment:<id>). It distinguishes from sibling tools like youtube_moderate_comment and youtube_post_comment_reply.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies irreversible deletion and mentions quota cost, guiding when to use (only if certain). However, it does not explicitly state when not to use or reference alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_videoA
DestructiveIdempotent
Inspect

Permanently delete a YouTube video by id (or 'youtube:video:'). Cannot be undone. Costs 50 quota units. Caller must own the channel.

ParametersJSON Schema
NameRequiredDescriptionDefault
video_idYesBare videoId OR 'youtube:video:<id>'.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses permanent deletion, quota cost, and ownership requirement. Annotations already indicate destructive=true, and the description adds valuable context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with key information front-loaded. No redundant words; every sentence adds essential detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter destructive action with no output schema, the description covers purpose, constraints, cost, and prerequisites completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes the parameter with 100% coverage. The description merely restates the accepted formats, adding no new semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'permanently delete a YouTube video by id', providing a specific verb and resource. It clearly distinguishes from sibling tools like upload or list videos.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use: when the caller owns the channel and intends permanent deletion. It also warns of irreversibility and quota cost, guiding appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_commentsA
Read-onlyIdempotent
Inspect

List comment threads on a YouTube video. Pass video_id (e.g. 'dQw4w9WgXcQ') or channel_ref ('youtube:video:'). Returns top-level comments with inline replies.

ParametersJSON Schema
NameRequiredDescriptionDefault
video_idYesYouTube videoId — bare 11-char form OR full 'youtube:video:<id>'.
page_tokenNoPagination cursor from a previous call's `next_page_token`.
max_resultsNoPage size, 1-100. Default 25.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, destructiveHint, idempotentHint. Description adds return format (top-level comments with inline replies), which complements annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no waste, action and purpose front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given simple tool with 3 params, good annotations, and no output schema, description covers input format and output scope completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description adds alternative video_id format (youtube:video:<id>) and clarifies page_token usage implicitly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists comment threads on a YouTube video, using specific verb and resource. It distinguishes from siblings by targeting comments specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear instructions on how to pass video_id or channel_ref, but does not explicitly exclude when not to use or compare to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_videosA
Read-onlyIdempotent
Inspect

List videos on the connected YouTube channel. Returns id, title, published_at, view_count. Paginate via page_token.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_tokenNoPagination cursor returned in a previous call's `next_page_token`. Omit for the first page.
max_resultsNoPage size, 1-50. Default 25.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and idempotentHint=true, so the safety profile is clear. The description adds value by detailing return fields (id, title, published_at, view_count) and pagination, which are beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the main action and then providing return fields and pagination. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with two parameters and no output schema, the description is complete: it specifies behavior, return fields, and pagination. Annotations cover safety, so no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds the pagination context for page_token and mentions return fields, but does not elaborate on max_results beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List videos on the connected YouTube channel' with a specific verb and resource. It distinguishes from sibling tools like youtube_video_query by scoping to the connected channel, and specifies return fields.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing own channel videos but lacks explicit when-to-use or when-not-to-use guidance. It does not compare with alternative tools like youtube_video_query, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_moderate_commentAInspect

Apply a moderation status to a YouTube comment. Allowed status values: heldForReview, published, rejected, spam. Costs 50 quota units.

ParametersJSON Schema
NameRequiredDescriptionDefault
statusYesOne of: heldForReview, published, rejected, spam.
comment_idYesBare commentId OR 'youtube:comment:<id>'.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate it's a non-read-only, non-destructive write operation. The description adds valuable behavioral info: quota cost of 50 units. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loads the key action and allowed values, and includes quota cost. Every word contributes value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a moderate-complexity tool with two parameters and no output schema, the description covers purpose, allowed values, and quota cost. It does not explain return values or error handling but is adequate for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats allowed status values but does not add new meaning beyond the schema. No additional examples or constraints provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Apply a moderation status to a YouTube comment'), specifies allowed status values, and mentions quota cost. It effectively distinguishes from sibling tools like youtube_delete_comment and youtube_list_comments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for changing moderation status but does not explicitly compare to alternatives or provide when-not-to-use guidance. The context of sibling tools suggests differentiation, but the description lacks explicit usage guidelines.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_post_comment_replyAInspect

Post a comment on a YouTube video, or reply to an existing comment. Pass video_id for a top-level comment, OR parent_comment_id to reply. AI-disclosure suffix appended automatically when configured.

ParametersJSON Schema
NameRequiredDescriptionDefault
textYesComment body. 1-10000 chars. AI-disclosure suffix may be auto-appended.
video_idNoBare videoId or 'youtube:video:<id>' — for a top-level comment.
parent_comment_idNoBare commentId or 'youtube:comment:<id>' — for a reply.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that an AI-disclosure suffix may be auto-appended, which is behavioral info beyond annotations. However, it does not mention other important aspects like required authentication, rate limits, or whether the comment becomes immediately visible.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise, front-loaded sentences with no wasted words. Each sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a relatively simple tool (3 parameters, no output schema), the description covers the core usage logic. It does not explain return value or error handling, but given the low complexity, it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already has 100% description coverage. The description adds the crucial OR relationship between video_id and parent_comment_id, which is not explicit in the schema. Also adds nuance about auto-appended suffix for the text parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool posts a comment or reply on YouTube. Distinguishes top-level comment vs reply, which differentiates from sibling tools like youtube_moderate_comment or youtube_delete_comment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly explains when to use video_id (for top-level comment) versus parent_comment_id (for reply). Provides clear usage context but does not mention when not to use this tool (e.g., use youtube_moderate_comment for moderation).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_upload_videoAInspect

Upload a workspace-owned video file (file_id) to the connected YouTube channel. Returns video_id + thread_id. Costs 1600 quota units. Default privacy is 'private' — pass privacy='public' to publish.

ParametersJSON Schema
NameRequiredDescriptionDefault
tagsNoOptional list of tag strings (max ~500 chars total).
titleYesVideo title (max 100 chars).
file_idYesWorkspace `files.id` of the video to upload. Must be a video/* MIME type and `status='ready'`. Get IDs from the [ATTACHMENTS] block, files.search, or workspace.search.
privacyNoPrivacy status. 'private' (default), 'unlisted', or 'public'.private
category_idNoYouTube category ID (default '22' = People & Blogs). See https://developers.google.com/youtube/v3/docs/videoCategories/list.22
descriptionNoVideo description (max 5000 chars). OMIT to upload without a description.
made_for_kidsNoCOPPA flag. OMIT for the standard (non-kids) default.
channel_account_idYesThe connected YouTube channel_account.id (workspace-scoped).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a write operation, and the description adds costs (1600 quota units), default privacy, and return values (video_id, thread_id). This goes beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no unnecessary words. The first sentence introduces the core action and returns, the second adds important behavioral details. Perfectly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the upload action, prerequisites (file_id), returns, quota, and privacy. Given there is no output schema, it provides sufficient return info. Minor omissions like file size limits are acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and each parameter has a description. The tool description adds clarity on the privacy default and the file_id requirement, enhancing the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (Upload), the resource (workspace-owned video file to connected YouTube channel), and the returns. It distinguishes from sibling YouTube tools like delete or list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies that the file must be workspace-owned and indicates the default privacy behavior. It does not explicitly state when to use this tool vs alternatives, but the action is unique among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_video_queryA
Read-onlyIdempotent
Inspect

Ask Gemini about a YouTube video. Pass a video URL and any prompt — verbatim transcript with timestamps, summary, targeted Q&A about content or visuals, translation, etc. Works on any public/unlisted video.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesYouTube video URL. Supported forms: youtube.com/watch?v=…, youtu.be/…, youtube.com/shorts/…, m.youtube.com/watch?v=…. Pass-through to Gemini verbatim.
promptYesWhat to ask Gemini about the video. Examples: 'Provide a verbatim transcript with [HH:MM:SS] timestamps.' / 'What is the main claim made in the first 30 seconds?' / 'Describe what's shown on screen at 0:30.' / 'Translate the spoken Spanish to English.'
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds context about working on public/unlisted videos and the range of prompts, but this is supplementary. No contradiction is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with purpose, then usage details and examples. Every sentence adds value with no redundant information. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two well-described parameters and comprehensive annotations, the description covers all needed context: what it does, how to use it, constraints (public/unlisted videos), and example prompts. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with both parameters described. The description reinforces the usage ('Pass a video URL and any prompt') and gives examples for the prompt parameter, but the schema already captures the meaning effectively. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Ask Gemini about a YouTube video', which is a specific verb-resource combination. It distinguishes from sibling tools like youtube_delete_video or youtube_upload_video by focusing on querying and analysis. Examples of prompts further clarify the purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: 'Pass a video URL and any prompt' and lists use cases like transcript, summary, Q&A, translation. It also notes 'Works on any public/unlisted video.' However, it does not explicitly exclude alternatives or state when not to use, keeping it from a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.