dialogbrain

Name: dialogbrain
Author: saloprj

by io.github.saloprj

Server Details

Unified inbox MCP for WhatsApp, Telegram, Email, voice — read/send messages, search, AI agents.

Status: Healthy
Last Tested: 2026-06-06 01:37
Transport: Streamable HTTP
URL
Repository: saloprj/dialogbrain-mcp
GitHub Stars: 0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.6/5.0

Tool DescriptionsA

Average 4.2/5 across 165 of 165 tools scored. Lowest: 2.9/5.

Server CoherenceA

Disambiguation4/5

Most tools have clearly distinct purposes within their categories, aided by detailed descriptions. However, some overlapping functionalities (e.g., `web_search` vs. `web__local_search`, `agents_ask` vs. `agents_simulate_inbound`) could cause minor confusion for an agent, preventing a perfect score.

Naming Consistency4/5

The naming convention is predominantly snake_case with a consistent `noun_verb` pattern. However, the use of a double underscore in `web__local_search` (vs. `web_search`) is an outlier that breaks the pattern, and a few verbs are less descriptive (e.g., `process`, `run` are absent here, but the overall pattern is good).

Tool Count2/5

With 165 tools, the server is oversized for a typical MCP server. While the tools are well-organized into domains, the sheer volume overwhelms an agent's ability to efficiently select the right tool, exceeding the recommended 3-15 range by a wide margin.

Completeness4/5

The tool surface covers a vast range of functionalities including agents, messaging, calls, browser automation, knowledge management, and integrations with LinkedIn, YouTube, and calendars. Minor gaps exist (e.g., no tool for managing YouTube channel settings or advanced web scraping), but the core workflows are well-supported.

Available Tools

157 tools

agent_handoffA

Read-onlyIdempotent

Inspect

Delegate a multi-step task (research, composing messages, booking, scheduling) to the full agentic planner. Use when a user ask needs more than a direct answer. The specialist runs synchronously — its response is already shown to the user in real-time. Summarize the OUTCOME in past tense (e.g. 'The Media Creator generated your video' or 'The Document Composer failed because...'). Do NOT say 'I will delegate' — the delegation already happened. If status is timeout or error, explain what went wrong and offer to retry.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	Execution mode: 'sync' (wait for result, default) or 'async' (fire and forget, child runs in background). Async is only available in background/trigger context.	sync
`agent_id`	No	Optional ID of another agent in the same workspace to delegate the task to. When set, this becomes cross-agent delegation; the target agent runs with ITS OWN prompt, tools, and model. Use this for specialty tasks (see agents.list to discover specialists). Prefer the in-loop variant (no `agent_id`) for one-off escalations. Spawns a new trace linked back to this trace via parent_trace_id (visible in the admin lineage card).
`target_slug`	No	Optional stable slug of a system-template specialist to delegate to (e.g. 'doc-composer' for the Document Composer). Env-portable alternative to agent_id — resolves the workspace's fork of that template (auto-forking on first use). Used by async handoffs that target a specialist without knowing its per-workspace id.
`task_description`	Yes	Plain-language description of what the planner should accomplish. Include everything the planner needs: the user's goal, constraints, and any context already gathered in this voice call.

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, idempotentHint. Description adds key behaviors: returns final_answer for narration, handling of timeout/error, and trace spawning when agent_id is set. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, followed by usage and error handling. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, error handling, and delegation behavior. No output schema exists, but final_answer is mentioned. Minor gap: final_answer structure could be detailed, but sufficient for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions. The description adds only minimal context (e.g., task_description should include everything). Baseline 3 is appropriate; no significant extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it delegates multi-step tasks to a planner, contrasting with direct-answer tools. It specifies the verb 'delegate' and the resource 'full agentic planner', distinguishing it from sibling tools like agents_ask.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly says 'use when a user ask needs more than a direct answer' and provides error-handling guidance ('Do NOT re-trigger... acknowledge and offer to retry'). While no alternative tool is named, the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_activityA

Read-onlyIdempotent

Inspect

See what you — or another agent in your workspace — actually did over a time window: messages sent, documents created, calls made, plus a summary (run counts, per-day, top tools). Use this to answer 'what did I do today / yesterday / last week / in the last hour?' or 'what did do?' with real data instead of guessing.

Omit agent for your own activity, or pass another workspace agent's name, slug, or id. Pass since/until as ISO datetimes (e.g. '2026-06-03T09:00:00') for sub-day windows like the last hour, or plain dates ('2026-06-03') for whole days — compute them from the current date/time you were given. Defaults to the last 24h. Traces are retained 30 days.

Times are interpreted as UTC — if the current time you were given is in another timezone, convert to UTC before passing since/until.

ParametersJSON Schema

Name	Required	Description
`agent`	No	Target agent: name, slug, or numeric id. OMIT for yourself.
`limit`	No	Max actions / recent runs to return.
`since`	No	Window start — ISO datetime or date. OMIT for last 24h.
`until`	No	Window end — ISO datetime or date (exclusive day-end for a bare date). OMIT for now.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, and not destructive. The description adds behavioral details: 30-day retention, UTC interpretation, defaults, and response structure. This enriches the agent's understanding beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: core purpose first, then examples, then parameter details. It is moderately concise with no wasted words. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explains the response content (messages, documents, calls, summary). It covers parameter usage, defaults, timezone, and retention. It lacks pagination details but is sufficient for a read-only activity log.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. The description adds meaning by explaining agent omission, date formats with examples, default behavior, and timezone handling, which goes beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to see agent activity over a time window including messages, documents, calls, and a summary. It uses specific verbs and resources, and implicitly distinguishes from sibling tools like agents_traces_list and agents_get by focusing on activity history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage scenarios: answering 'what did I do today?' etc. It gives guidance on omitting 'agent' for self and how to specify time windows. While it doesn't explicitly list alternatives, the context is well-defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_add_fileAInspect

Attach a file to this agent's private knowledge (agent-specific files, not shared with other agents).

Workflow:

Upload the file with files_upload (pass source_url for remote files)
Index it with files_ingest (pass the file_id)
Call this tool with agent_id + file_id

Returns chunk_count — shows 0 while still processing. Call agents.list_files later to see the final chunk count once indexing completes.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	file_id returned by files_upload or files_ingest
`agent_id`	Yes	ID of the agent to attach the file to

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool returns chunk_count which may be 0 while indexing, and that the file is agent-specific. Annotations minimal but description adds useful behavioral context about processing delay.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five sentences, front-loaded with purpose, then workflow, then return value explanation. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Explains return value and workflow, but lacks error handling or additional notes on idempotency. Still sufficiently complete for a tool with good annotations and schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage. Description mentions parameters but adds no significant new meaning beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool attaches a file to an agent's private knowledge, specifying it is agent-specific and not shared. Distinguishes from siblings like agents_remove_file and agents_list_files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a three-step workflow (upload, ingest, attach) and explains that chunk_count of 0 means processing, with suggestion to check later. Does not explicitly state when not to use, but workflow guidance is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_approve_draftAInspect

Approve a pending agent draft and send the message.

The draft will be sent to the conversation it was generated for. You can optionally edit the text before sending.

Use this when user says:

'Approve this draft'
'Send this reply'
'Approve and send'
'Looks good, send it'

IMPORTANT: This will send a message to a real person.

ParametersJSON Schema

Name	Required	Description	Default
`draft_id`	Yes	ID of the draft to approve
`edited_text`	No	Optional edited response text (if user wants to modify before sending)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds meaningful behavioral context beyond annotations: mentions sending to a real person, optional editing, and warning about real-world impact. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with main action, includes examples and a warning. Slightly repetitive but efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete description for a simple tool: explains action, outcome, optional edit, and provides usage examples. No output schema needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions fully cover both parameters (100% coverage), so baseline is 3. Description reinforces optional edit but doesn't add new details beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action: approve a pending agent draft and send the message. Distinguishes from sibling tools like agents_reject_draft and agents_list_drafts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit user phrases for when to use (e.g., 'Approve this draft'), offers context for usage, but lacks explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_askAInspect

Send a message to an AI agent and get its response.

The agent runs with its configured prompt, tools, and knowledge. Use this to test agents or have them process a task.

Returns: {status: 'replied'|'silent', response_text, messages[], full_reply, model_used, tokens_*, send_mode, execution_mode}. messages[] carries each messages.send invocation the agent made (text, subject, reply_to_message_id, timestamp, message_id, attachments=[{file_id,name,mime}]). full_reply concatenates text only — attachment-only sends show up in messages but not full_reply. status='silent' iff both response_text is empty AND messages is empty.

Execution may take 10-60s depending on agent complexity.

ParametersJSON Schema

Name	Required	Description
`message`	Yes	Message/goal to send to the agent
`agent_id`	Yes	ID of the AI agent to ask
`send_mode`	No	Send mode for the agent run: 'draft' = create drafts, 'auto' = send directly. Defaults to the agent's configured default_send_mode. Does NOT change execution_mode — that is fixed by the agent's config.

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the basic annotations, the description discloses execution time (10-60s), return structure including status conditions ('replied'/'silent'), detailed attributes of messages (attachments, IDs), and how response_text and messages relate. This comprehensive behavioral disclosure adds significant value beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the purpose, followed by detailed return information. While every sentence adds value, the detailed enumeration of return fields could be slightly more concise. Overall, it is well-structured and informative without being excessively long.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description provides a thorough explanation of the return format, including edge cases (status='silent'), token usage, and execution time. It also covers the behavior of send_mode relative to agent config. This makes the tool's behavior fully predictable for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter (message, agent_id, send_mode) is already well-documented in the input schema. The description does not add new information about parameter usage or constraints beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Send a message to an AI agent and get its response.' It further clarifies use cases: 'Use this to test agents or have them process a task.' This distinguishes it from sibling tools like agents_create (creating agents) or agent_handoff (handoff actions), providing a specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context by stating the tool is for testing agents or processing tasks. It does not explicitly exclude scenarios or name alternatives, but the purpose is well-defined, and the context is sufficient for an AI agent to decide when to use this tool over other agent-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_createAInspect

Create a new AI agent in the workspace.

Execution modes:

ai_assisted (default, recommended): Two-phase AI — fast pre-classifier (Haiku) for keyword filtering and simple replies, then full AI with tools for complex messages. Best for: auto-replies, group monitoring, keyword-based filtering.
agentic: Autonomous multi-step agent with planning and tool execution. Best for: complex scheduled tasks, multi-step automation.
rule_based: Simple pattern matching without AI.

For keyword filtering: use ai_assisted mode + set keywords in trigger conditions (free, deterministic) and/or auto_reply_rules (smart, LLM-based) via agents.update.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Name of the AI agent (1-100 characters)
`prompt_id`	No	ID of the prompt to assign to this agent
`send_mode`	No	Default send mode: 'auto' or 'draft'. OMIT to use 'draft' (the default).
`description`	No	Optional description of what this agent does
`text_engine`	No	Text-execution engine: 'rule_based', 'ai_assisted', 'agentic' (default), or 'claude_channels'. Voice is derived from triggers, not engine. OMIT to use the default ('agentic').

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate it is a write operation (readOnlyHint=false). The description adds context about execution modes and defaults (send_mode), but does not cover return values or side effects. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with the main purpose and organized with bullet points. A few minor verbose parts but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Missing information about return values (e.g., created agent ID) and required permissions. With no output schema, the description should cover what the tool returns. Otherwise adequate for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining execution modes beyond enum labels, and provides usage context for parameters like send_mode and execution_mode.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new AI agent in the workspace' with a specific verb and resource. It distinguishes from sibling tools like agents_update and agents_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use each execution mode (ai_assisted for keyword filtering, agentic for complex tasks, rule_based for simple). Does not explicitly state when not to use the tool, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_deleteBInspect

Permanently delete an AI agent.

WARNING: This cannot be undone. The agent and all its triggers will be removed.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the agent to delete

Tool Definition Quality

B3.4/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that deletion is permanent and removes triggers, but the annotation destructiveHint=false contradicts this, indicating a serious inconsistency. Scoring is low due to contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: the first states the purpose, the second provides a critical warning. No unnecessary words, front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple destructive action with no output schema, the description covers the action, irreversibility, and what is removed. It could mention the return value or confirmation, but it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There is one parameter (agent_id) with a schema description, and the tool description adds no additional meaning beyond that. Schema coverage is 100%, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Permanently delete an AI agent' with a specific verb and resource, and it distinguishes from sibling tools like agents_create and agents_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The warning 'This cannot be undone' hints at use when irreversible deletion is intended, but it lacks explicit guidance on when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_getA

Read-onlyIdempotent

Inspect

Get detailed information about a specific AI agent.

Returns full agent config including:

Execution configuration
Tool configuration
Knowledge configuration
Escalation configuration
Triggers list
Knowledge collections
Custom AI instructions (prompt_text)
Auto-reply rules override (auto_reply_rules)

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the AI agent to fetch

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true, but description adds value by enumerating exact return fields (execution config, tool config, etc.). No contradictions. Provides behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is appropriately sized; first sentence clearly states purpose, followed by a bulleted list of return fields. No fluff. Every sentence adds value. Could be slightly more compact but is well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only one parameter and strong annotations, the description fully covers what the tool does and what it returns. No output schema, so the detailed list compensates. Complete enough for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Single parameter agent_id with schema description 'ID of the AI agent to fetch'. Schema coverage is 100%, so description doesn't need to add more. Description does not elaborate on the parameter, but baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description starts with clear verb 'Get' and resource 'detailed information about a specific AI agent'. Lists specific return fields, distinguishing it from siblings like agents_list (for listing) and agents_create (for creation). This provides exact, unambiguous purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage context is implied by purpose ('specific' vs list), but no explicit mention of when to use this tool vs alternatives (e.g., agents_list). No guidance on prerequisites or when not to use it. Adequate but minimal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_silenceA

Read-onlyIdempotent

Inspect

End this turn without sending any message. Use when the thread is owned by a human operator after job.escalate, when the guest is self-resolving, when the message is a duplicate, or for observation-only turns. Calling this tool is the ONLY correct way to stay silent — narrated silence text (e.g. '(Staying silent…)', 'Internal:…') would be delivered to the guest verbatim.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	Yes	Free-form explanation for admin audit. Stored in trace_tool_executions.tool_params (ClickHouse String; reason filters are scan-only).

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, destructiveHint=false, and idempotentHint=true, which cover safety. The description adds valuable context: it ends the turn, and that narrated silence would be delivered verbatim to the guest. This goes beyond annotations without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, each carrying weight: core action, use cases, warning about alternative, and emphasis on correctness. No wasted words, and the key point is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema, clear annotations), the description is fully adequate. It covers purpose, usage, and a critical behavioral nuance, leaving no gaps for a typical use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the 'reason' parameter described in the schema. The description does not add extra meaning or context for the parameter beyond what the schema provides, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool ends the turn without sending a message, and explicitly distinguishes it from alternative actions like narrated silence. The verb 'end' and resource 'turn' are specific, and the purpose is unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists explicit scenarios for use (after job.escalate, self-resolving, duplicate, observation-only) and warns against using narrated silence as an alternative. This provides clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_listA

Read-onlyIdempotent

Inspect

List all AI agents configured in the workspace.

Returns agents with their basic info, trigger count, and knowledge collection count.

Each agent's description field tells you when that agent is useful. If you're a router-style agent deciding whether to delegate via agent.handoff, read descriptions and pick the best fit.

Use this to:

See all configured AI agents
Filter by status (active/paused/archived)
Get agent IDs for further operations

ParametersJSON Schema

Name	Required	Description	Default
`status`	No	Filter by status ('active' / 'paused' / 'archived'). Omit for all.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that it returns basic info, trigger count, and knowledge collection count, beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three paragraphs with front-loaded purpose. The bullet list slightly repeats the first sentence but overall efficient and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool without output schema, the description explains return fields (basic info, counts) and filter option. Covers main aspects adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the 'enabled' parameter. The description restates filtering capability but does not add new semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all AI agents configured in the workspace' with a specific verb and resource. It differentiates from sibling tools like agents_create or agents_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases (see all agents, filter, get IDs) but does not mention when not to use or alternatives. Still clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_draftsA

Read-onlyIdempotent

Inspect

List pending agent drafts awaiting approval.

Shows drafts that have been generated by AI agents but not yet sent. Each draft includes:

Thread/conversation info
Trigger message (what prompted the reply)
Generated response text
Creation time and expiration

Use this when user asks:

'Show pending agent drafts'
'What messages are waiting for approval?'
'List drafts to approve'

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum number of drafts to return
`thread_id`	No	Filter by specific thread ID (optional)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds context about draft content (thread info, trigger message, generated response, creation/expiration) beyond what annotations provide, enhancing transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a concise introductory sentence followed by bullet points for included fields. It is not overly verbose, and each element serves a purpose, though the bullet list could be slightly more compact.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, input semantics via schema, and output content. It lacks explicit details about pagination or error handling, but for a read-only list tool with good annotations, it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% as both parameters have descriptions. The description does not add additional parameter-level detail beyond what is in the schema, so it meets the baseline without extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List pending agent drafts awaiting approval' with a specific verb and resource. It lists included fields and provides example user queries, distinguishing it from sibling tools like agents_approve_draft and agents_reject_draft.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit example queries for when to use the tool, offering clear context. It does not explicitly state when not to use it or mention alternatives, but the context makes it clear that it's for listing drafts only.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_filesA

Read-onlyIdempotent

Inspect

List files directly attached to this agent (agent-specific files, not shared collections).

Returns file_id, title, status, and chunk_count for each file. chunk_count shows how many indexed chunks were created — 0 means the file is still processing.

Use agents.add_file to attach a new file, or agents.remove_file to detach one.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the agent whose files to list

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, indicating a safe read. The description adds value by explaining the return fields and that chunk_count=0 means processing. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, return fields, and related tool usage. No fluff, each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, return values, and a special case (chunk_count=0). It does not mention pagination or limits, but for a simple list tool this is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for agent_id. The description does not add further parameter meaning beyond what the schema provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and resource 'files directly attached to this agent', with explicit differentiation from shared collections. It also specifies the return fields, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description tells when to use this tool (list agent-specific files) and provides alternatives: 'Use agents.add_file to attach a new file, or agents.remove_file to detach one.' It implicitly distinguishes from sibling list tools by stating 'not shared collections'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_historyA

Read-onlyIdempotent

Inspect

List past versions of an agent's prompt_text. Every edit to the agent's prompt is snapshotted to an append-only table — use this tool to browse history, find a prior known-good version, and copy it into agents.prompt_restore.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max versions to return (1-200, default 50)
`agent_id`	Yes	ID of the agent
`before_version`	No	Cursor: return versions strictly below this version_number

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and destructiveHint. The description adds that every edit is snapshotted to an append-only table, providing behavioral context beyond annotations. It could mention ordering or pagination, but overall sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. The purpose is front-loaded, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides a clear use case and hints at return content (versions with prompt_text). No output schema exists, but the tool is straightforward. Minor gap: ordering of results is not specified, but cursor parameter implies descending.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters (agent_id, limit, before_version). The description does not add further meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists past versions of an agent's prompt_text, using specific verbs and resource. It distinguishes itself from siblings like agents_prompt_restore by specifying its role in browsing history and finding prior versions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use the tool ('browse history, find a prior known-good version') and how to use it ('copy it into agents.prompt_restore'), providing clear alternatives and highlighting its read-only nature.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_restoreAInspect

Restore a past version of an agent's prompt_text by version_number. Creates a new version pointing at the restored content — history is preserved. Use agents.prompt_history first to find the version_number you want.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Optional: why this restore is happening (shows up in history UI)
`agent_id`	Yes	ID of the agent
`version_number`	Yes	The version_number to restore (get it from agents.prompt_history)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds that it creates a new version pointing to restored content and that history is preserved, which goes beyond annotations (destructiveHint: false, readOnlyHint: false) by explaining the non-destructive behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no filler. First sentence states purpose, second gives usage guidance. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a restore operation: explains what it does, how to use it, and that history is preserved. Lacks return value info but no output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions, so the baseline is 3. The description adds minimal extra meaning (e.g., where to get version_number) but mostly repeats schema info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it restores a past version of an agent's prompt_text by version_number, preserving history. It distinguishes itself from siblings like agents_prompt_history (which lists versions) and prompts_prompt_restore (for prompts).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to use agents.prompt_history first to find the version_number. No explicit when-not-to-use, but the context is clear and the sibling tool prompts_prompt_restore provides an alternative for prompts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_reject_draftAInspect

Reject a pending agent draft without sending.

The draft will be marked as rejected and won't be sent. Use this when the generated response isn't appropriate.

Use this when user says:

'Reject this draft'
'Don't send this'
'Cancel this reply'
'Delete this draft'
'This response is wrong'

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	Optional reason for rejection (for logging/feedback)
`draft_id`	Yes	ID of the draft to reject

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses that 'The draft will be marked as rejected and won't be sent', which is behavioral context beyond annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded, with a list of example phrases. No unnecessary content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple action with good schema, description adequately explains behavior. No output schema needed. Could mention logging but not essential.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description adds no extra meaning beyond what schema already provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Title 'Reject Agent Draft' and description 'Reject a pending agent draft without sending' clearly state the verb and resource. It distinguishes from sibling 'agents_approve_draft'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'when the generated response isn't appropriate' and provides specific user phrases. Implies not to use when approval is needed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_remove_fileAInspect

Remove a file from this agent's private knowledge.

The file itself is not deleted — it's just detached from this agent. Use agents.list_files to find the file_id to remove.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to detach (from agents.list_files)
`agent_id`	Yes	ID of the agent to remove the file from

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the file is not deleted but only detached, which is important behavioral context. Annotations are non-destructive, so description adds clarity beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. Front-loads the main action and clarifies side effects immediately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal tool with no output schema, the description fully covers behavior, prerequisite knowledge (agents.list_files), and consequences (detachment). Complete and self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The description adds context for file_id (from agents.list_files), which provides additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Specifically describes the action 'Remove a file from this agent's private knowledge' and distinguishes it from deleting by noting the file is 'just detached'. Clear verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Clearly states the tool's purpose and references agents.list_files for finding file_id, but does not explicitly mention when not to use or compare to siblings like agents_add_file.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_simulate_inboundA

Read-onlyIdempotent

Inspect

Replay an inbound message on a thread through the real trigger pipeline and return what would have happened. The router auto-picks the winning enabled agent + trigger by priority/specificity (same logic as production). By default send_mode='draft' so no real message is sent; pass send_mode='auto' on a test account to let the matched agent actually deliver (drafts get overwritten by the next draft, so 'auto' is the only way to verify Telegram/email delivery end-to-end).

Use to verify routing for a thread: which agent answers, which trigger wins, or — when nothing matches — the structured skip reason. Pass blockchain_tx_data instead of message_text to simulate a blockchain:transfer event on the thread.

Returns: {matched: true, matched_agent: {id, name, execution_mode}, matched_trigger: {id, trigger_type, conditions, specificity_score}, routing_reason, response_text, messages[], execution_mode, send_mode, model_used, tokens_input, tokens_output, latency_ms, rag_queries_made, rag_results_used} on a hit, or {matched: false, skip_reason, simulator_warnings} on a miss.

ParametersJSON Schema

Name	Required	Description	Default
`send_mode`	No	How the matched agent should deliver its reply. 'draft' (default, safe) creates a draft only — no real send, no idempotency key. 'auto' lets the agent deliver through the channel adapter exactly as it would in production — use this on a test account to verify Telegram/email delivery end-to-end. Drafts get overwritten by the next draft on the thread, so 'auto' is required when you want to see the message persisted.	draft
`thread_id`	Yes	Thread ID to route the simulated event from. Must belong to the API key's workspace.
`message_text`	No	Inbound message body to simulate. Defaults to '[MCP simulation test]' when omitted.
`system_message`	No	Tag the simulated inbound as a system/service-message row (missed call, group join, pinned message, etc.) so the `excluded_system_message_kinds` trigger filter can be exercised end-to-end. Shape: {"category": <one of call_event \| membership_change \| contact_signup \| pinned_message \| chat_metadata_change \| voice_chat_event \| other_service>, "native_kind": <free-form upstream event class name, e.g. 'MessageActionPhoneCall'>}. The category is written into `message.meta.system_message` (mirroring the real Telegram ingest path) AND surfaced on the synthetic IncomingEvent so the trigger evaluator honors the block-list. Omit for a normal text-message simulation.
`blockchain_tx_data`	No	When set, simulate a blockchain:transfer event instead of a channel:message:new event. Expected keys: chain, to_address / from_address, tx_hash.
`attachment_file_ids`	No	Optional list of workspace file IDs to attach to the simulated inbound message — same shape as a real Telegram message with image/document attachments. Use this to test agent behavior on incoming messages that carry images (e.g. logos for invoices) or documents the agent must reference. File IDs must belong to the API key's workspace.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses behavioral traits: drafts overwritten, auto mode sends real messages, router uses production logic. However, annotations (readOnlyHint=true) contradict the ability to send real messages with 'auto' mode, which is a significant inconsistency. Despite this, description adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is thorough but somewhat lengthy (3 paragraphs). It is well-structured, front-loading the main action and then detailing parameters. Could be slightly more concise, but all sentences are informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters with nested objects and no output schema, the description provides a complete picture: it explains the return structure for both matched and unmatched cases, covers edge cases (system_message, blockchain), and addresses behavior differences between draft and auto modes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the description still adds meaning: explains send_mode trade-offs, system_message shape and purpose, blockchain_tx_data usage, and attachment_file_ids for testing image-aware agents. This goes beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool replays an inbound message through the trigger pipeline and returns what would have happened. It specifies the verb (simulate), resource (inbound message routing), and distinguishes from siblings like agents_ask by focusing on verification of routing logic rather than actual message sending.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'verify routing for a thread' and provides alternatives like blockchain_tx_data for simulating blockchain events. Also gives guidance on send_mode: 'draft' for safe simulation, 'auto' for end-to-end testing on test accounts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_task_completeAInspect

Report that a Claude Code agent task has been completed. Call this when you finish processing an agent_task from DialogBrain.

ParametersJSON Schema

Name	Required	Description
`success`	Yes	Whether the task completed successfully
`summary`	No	Brief summary of what was done
`trace_id`	Yes	Trace ID from the agent task event

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-destructive, non-readOnly behavior. The description adds context that the tool reports completion, which implies a state change. It does not contradict annotations and provides adequate behavioral insight for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the purpose, and contains no extraneous information. Every sentence serves a clear function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple notification tool with no output schema and 100% parameter coverage, the description fully informs the agent about when and why to use this tool. It is complete given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for all 3 parameters. The description does not add any additional meaning beyond what the schema already provides, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reports agent task completion, with a specific verb ('Report') and resource ('Claude Code agent task'). It distinguishes from sibling agent tools by specifying the exact event (finishing an agent_task from DialogBrain).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Call this when you finish processing an agent_task from DialogBrain,' providing clear context for when to use it. It does not mention alternatives or when-not-to-use, but the context is sufficient given the tool's narrow scope.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trace_getA

Read-onlyIdempotent

Inspect

Fetch the full execution detail for a single trace — tool executions, events timeline, LLM call spans (with error_message on failures).

Use after agents.traces_list identifies a specific trace of interest (failed run, slow run, unexpected outcome).

By default LLM system_prompt and prompt_messages are stripped — set include_llm_bodies=true to fetch them when diagnosing prompt engineering issues (emits a WARNING audit log). Set full=true to disable all field truncation. completion_text on failed LLM calls is always returned (capped at 8 KB).

ParametersJSON Schema

Name	Required	Description
`full`	No	Disable all field truncation. Escape hatch for a human operator. OMIT for the standard truncated view.
`agent_id`	Yes	Expected agent_id — used for scope validation. Mismatch returns not_found.
`trace_id`	Yes	Trace identifier returned by agents.traces_list.
`include_llm_bodies`	No	Include system_prompt and prompt_messages in LLM spans. Audited at WARNING level. OMIT to keep them stripped (the default).

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds beyond that: default stripping of LLM bodies, WARNING audit log when include_llm_bodies is set, truncation behavior, and that completion_text on failed LLM calls is always returned (capped at 8KB). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is extremely concise: one paragraph with clear front-loading of main purpose, then usage guidelines, then parameter details. No wasted words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description lists what the trace contains (tool executions, events, LLM call spans with error_message) and mentions key behavioral details. This is sufficient for an agent to understand the return value and when to use the tool, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds meaningful context: trace_id comes from agents_traces_list, agent_id for scope validation, include_llm_bodies triggers audit log, full is for human operator escape hatch. This elevates beyond schema alone, justifying a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Fetch the full execution detail for a single trace — tool executions, events timeline, LLM call spans', with a specific verb and resource. It distinguishes itself from sibling tools like agents_traces_list by specifying that this is for a single trace after identification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use after agents.traces_list identifies a specific trace of interest (failed run, slow run, unexpected outcome)'. Provides context for when to set include_llm_bodies and full, offering clear usage guidance and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_listA

Read-onlyIdempotent

Inspect

List recent execution traces for an agent — the same data as /admin/requests, scoped to one agent and readable by an LLM.

Use this when an agent call timed out, drafted the wrong response, or you want to know which tool/LLM call burned the latency. Pair with agents.trace_get for full detail on a specific trace.

Filters: status, success, source (single value or comma-separated: agent,voice), date_from/date_to (ISO-8601), pagination via limit/offset.

Returns returned_count, dropped_on_page (should be 0 — positive means the backend agent_id predicate let something through), and has_more. Edge case: a raw page of all-dedup-dropped rows yields returned_count=0, has_more=true; re-call with offset += limit.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max rows per page (1–100).
`offset`	No	Rows to skip for pagination. OMIT to start at row 0 (default).
`source`	No	Filter by trace source. Single value or comma-separated, e.g. 'agent,voice'. Values: agent / auto_reply / agentic / outreach / voice. Note: source='agent' also matches voice traces today (known upstream bug).
`status`	No	Filter by status. OMIT to include all statuses.
`date_to`	No	ISO-8601 upper bound on created_at.
`success`	No	Filter to succeeded (true) or failed (false) runs only. OMIT to include both.
`agent_id`	Yes	Agent ID to pull traces for (must belong to your workspace).
`date_from`	No	ISO-8601 lower bound on created_at, e.g. '2026-04-10T00:00:00Z'.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses edge case of dropped_on_page (should be 0, positive means backend bug) and pagination behavior (offset increment when has_more true). Annotations already declare readOnlyHint and idempotentHint, and description adds important behavioral nuances beyond those.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise at 6 sentences, front-loaded with purpose and usage, then filters and return fields. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lack of output schema, description explains return fields and edge case. For a list tool with 8 parameters, it covers purpose, usage, filters, return format, and a specific edge condition. No obvious gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds minor value (source comma-separated values, known bug with source='agent' matching voice, ISO-8601 format for dates). Not transformative but helpful.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists execution traces for an agent, distinguishing from sibling tools like agents_trace_get (for full detail) and agents_traces_stats (presumably stats). It specifies the data source (/admin/requests) and that it's scoped to one agent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use scenarios (timeout, wrong draft, latency analysis) and pairs with agents_trace_get for further detail. Does not explicitly state when not to use, but the use cases are clear and context is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_statsA

Read-onlyIdempotent

Inspect

Aggregated trace statistics for one agent over the last N days — total runs, success rate, avg duration, error breakdown, top tools used, runs-per-day histogram.

Use this when you want a bird's-eye view of an agent's health before diving into individual traces with agents.traces_list / agents.trace_get. Scoped to the target agent (exact match, no substring bleed). days is capped at 30 — matches the ClickHouse request_traces TTL.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Rolling window in days (1–30).
`agent_id`	Yes	Agent ID to compute stats for (must belong to your workspace).

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds context beyond annotations (e.g., scoped to exact agent match, days capped at 30 due to TTL), but annotations already cover readOnly and idempotent hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words; front-loaded with purpose then usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 100% schema coverage and no output schema, the description lists returned metrics adequately, though output format details are omitted.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions; description adds extra context like the TTL cap and workspace ownership, adding value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('aggregated trace statistics') and lists concrete metrics (total runs, success rate, etc.), clearly distinguishing it from sibling tools like agents.traces_list and agents.trace_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use this tool ('bird's-eye view before diving into individual traces') and names alternatives (agents.traces_list / agents.trace_get).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_createAInspect

Create a new trigger for an AI agent.

Triggers determine when the agent activates.

Trigger types:

incoming_message: Activates on new incoming messages
schedule: Activates on a schedule
webhook: Activates on webhook events
event: Activates on system events

ParametersJSON Schema

Name	Required	Description
`enabled`	No	Whether the trigger is enabled. OMIT to use the default (true).
`agent_id`	Yes	ID of the agent to create a trigger for
`priority`	No	Trigger priority — lower numbers run first (default: 100)
`send_mode`	No	Send mode override for this trigger. OMIT to inherit from the agent.
`conditions`	No	Trigger conditions (JSON). Supported fields for incoming_message: - keywords: ["pricing","demo"] — message must contain keyword(s) (free, no LLM cost) - keyword_match: "any" (default, OR) or "all" (AND) - channel_types: ["telegram","whatsapp","livechat_voice","twilio_voice","telegram_voice","voice",...] — filter by channel. For voice, use EITHER the three per-channel keys (scoped) OR "voice" alone (wildcard matching all three) — mixing them is redundant. Per-channel keys: "livechat_voice" (web widget), "twilio_voice" (PSTN inbound), "telegram_voice" (Telegram p2p calls) - context_types: ["dm","group","channel","livechat"] — filter by chat type - group_mode: "mentions_only" or "questions" — for group chats - channel_account_ids: ["123"] — restrict to specific accounts - folder_ids: [5,10] — restrict to threads in folders - ai_tag_ids: [1,2] — restrict to threads with AI tags - ai_filter_ids: [1,2] — semantic intent filters (message matched via embedding similarity, works in noisy groups) - ai_filter_mode: "any" (default, OR) or "all" (AND) — how multiple AI filters combine - ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. Prefer referencing existing filters by id when available. Use ai_filters.create + ai_filters.test for fine-tuning before assigning. - contact_states: ["active"] — filter by contact state - cooldown_seconds: 30 — min gap between runs per thread - max_runs_per_thread_per_hour: 5 — rate limit Supported fields for job_completed (proactive callback when a delegated job finishes): - source_agent_id: <int> — fire only when this agent's job completed - source_agent_slug: <str> — alternate to source_agent_id - job_type: "agentic_session" — match a specific job type (default: any) - outcome: ["completed"] \| ["escalated"] \| ["completed","escalated"] — default ["completed"] - min_duration_seconds: <int> — skip very-short jobs (noise filter) - thread_filter: {thread_ids: [<int>...]} — restrict to specific threads
`thread_ids`	No	Restrict this trigger to specific threads (chats) by their numeric thread IDs. When set, the trigger only fires for messages in these threads. Maps to conditions.thread_filter.thread_ids.
`trigger_type`	Yes	Type of trigger: 'incoming_message', 'incoming_call', 'voice_transcript', 'schedule', 'webhook', 'event', 'blockchain_event', or 'job_completed'

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations, detailing trigger types and complex condition fields. Annotations only provide readOnlyHint=false and destructiveHint=false, so the description carries the burden and does so well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the purpose, but the conditions section is extensive and verbose. While structured, it could be more concise by referencing external documentation for detailed conditions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of triggers and nested conditions, the description provides substantial detail without needing an output schema. It explains trigger types and condition fields comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value with detailed explanations for the 'conditions' parameter, covering supported fields and their behavior. For other parameters, it repeats schema info but adds clarity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new trigger for an AI agent' and explains trigger types and their activation conditions. It distinguishes from sibling tools like agents_trigger_update and agents_trigger_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists trigger types and conditions but does not explicitly state when to use this tool vs alternatives like agents_trigger_update. No direct guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_deleteAInspect

Delete a trigger from an AI agent.

WARNING: This cannot be undone.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the agent that owns this trigger
`trigger_id`	Yes	ID of the trigger to delete

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description includes a strong warning: 'WARNING: This cannot be undone,' which adds critical behavioral transparency about irreversibility beyond the annotations. However, the annotations set destructiveHint=false, which contradicts the implied destructiveness, slightly reducing clarity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences, the first clearly stating the action and the second adding a crucial warning. No extraneous words or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation, the description covers the essential purpose and the irreversible nature. It lacks details about success/failure conditions or side effects, but given the straightforward nature and full schema coverage, it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter schema coverage is 100%, and both parameters (agent_id, trigger_id) have clear descriptions in the schema. The tool description does not add any additional semantic information beyond what the schema already provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action: 'Delete a trigger from an AI agent.' The verb 'delete' and resource 'trigger' are unambiguous, and it distinguishes itself from sibling tools like agents_trigger_create and agents_trigger_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., other delete tools or update operations). It does not mention prerequisites, such as ownership or permissions, nor does it indicate when deletion is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_updateAInspect

Update an existing AI agent trigger.

All parameters are optional — only provided fields will be updated.

ParametersJSON Schema

Name	Required	Description
`enabled`	No	Enable or disable this trigger. OMIT to leave the enabled flag unchanged.
`agent_id`	Yes	ID of the agent that owns this trigger
`priority`	No	Trigger priority — lower numbers run first
`send_mode`	No	New send mode override. OMIT to leave the send-mode unchanged.
`conditions`	No	New trigger conditions (replaces existing). Same fields as trigger_create: keywords, keyword_match, channel_types, context_types, group_mode, channel_account_ids, folder_ids, ai_tag_ids, ai_filter_ids, ai_filter_mode, ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. contact_states, cooldown_seconds, max_runs_per_thread_per_hour
`thread_ids`	No	Restrict this trigger to specific threads (chats) by their numeric thread IDs. When set, merged into conditions.thread_filter.thread_ids. If conditions is also provided, thread_ids is merged into it.
`trigger_id`	Yes	ID of the trigger to update
`trigger_type`	No	New trigger type. OMIT to keep the existing type unchanged.

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false, destructiveHint=false, consistent with an update operation. The description adds that parameters are optional, providing some behavioral context, but does not disclose side effects, authorization needs, or other traits beyond what annotations already convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences. The first sentence states the purpose, and the second provides a critical usage nuance. Every word is purposeful, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high schema coverage (100%) and presence of annotations, the description is largely sufficient. It covers the partial update behavior, which is key. However, it could briefly mention that the 'conditions' parameter replaces existing conditions (though detailed in schema) for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The description adds that all parameters are optional, which is not explicitly stated in each schema description but is implied. It does not add significant new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates an existing AI agent trigger, which is a specific verb-resource combination. It distinguishes itself from sibling tools like agents_trigger_create (create new trigger) and agents_trigger_delete (delete trigger) by using 'update' and 'existing'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that all parameters are optional and only provided fields are updated, giving useful context for partial updates. However, it does not explicitly state when to use this tool versus alternatives (create, delete) or provide any exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_updateAInspect

Update an existing AI agent's configuration.

All parameters are optional — only provided fields will be updated.

Use this to:

Enable or disable an agent
Change agent name or description
Assign or detach a prompt
Change default send mode
Replace knowledge collections
Update agent status
Change agent priority for trigger matching (lower number = higher priority)
Override which tools the agent can/can't call on triggered runs
Override which context sections (situation, communication style, job state, conversation history, thread summary) the agent receives
Opt into boilerplate prompt sections (safety guidelines, data confidentiality, factual accuracy) — all default OFF

ParametersJSON Schema

Name	Required	Description
`name`	No	New name for the agent
`model`	No	Canonical source for which LLM the agent runs on. To switch models pass JUST this — do NOT also rewrite prompt_text (any 'duty model' section in the prompt is stale doc, not the config). OMIT to leave the model unchanged.
`status`	No	Agent status: 'active', 'paused', or 'archived'. OMIT to leave the status unchanged.
`agent_id`	Yes	ID of the agent to update
`priority`	No	Agent priority for trigger matching. LOWER number = HIGHER priority (wins tiebreaks). Typical range 1-100. Fallback auto-reply agents use 10; specialised/topical agents use 100. When two agents match the same incoming message, the one with the lower priority number fires.
`prompt_id`	No	Prompt ID to assign (null to detach)
`send_mode`	No	Default send mode: 'auto' or 'draft'. OMIT to leave the send-mode unchanged.
`fast_model`	No	Model for the fast-path responder (voice, text auto-reply, agent executor). Defaults to deepseek-chat when unset. Non-Anthropic models (deepseek-chat, gpt-4.1-nano, kimi-k2.6) do NOT use BYOK today — they use the system API key + credits. Pass null to revert to default.
`api_surface`	No	OpenAI HTTPS endpoint for this agent's LLM calls (Phase 3a). 'chat_completions' (default, also when null) routes to /v1/chat/completions. 'responses' routes to /v1/responses — required for OpenAI native server tools (web_search, code_interpreter, image_generation, input_file PDFs). Capability still wins: agents whose tool list triggers the server_tool_responses_api substitution always route to Responses regardless of this setting. Ignored on non-OpenAI models (Anthropic, DeepSeek, Moonshot). OMIT to leave the api_surface unchanged.
`description`	No	New description for the agent
`prompt_text`	No	DESTRUCTIVE — REPLACES the entire system prompt. Pass ONLY when the user explicitly asks to edit/rewrite the prompt. To READ the prompt use prompts.get. When updating other fields (model, name, …) OMIT this. To append, prompts.get first then concatenate. Pass null to revert to the linked template.
`text_engine`	No	Text-execution engine: 'agentic', 'ai_assisted', 'rule_based', or 'claude_channels'. Replaces the legacy execution_mode field (20260523_002). Voice is now derived from triggers, not engine. OMIT to leave unchanged.
`denied_tools`	No	Block-list of tool IDs the agent must not call on triggered runs. Applied after allowed_tools and default visibility. Empty list [] = clear the block-list.
`allowed_tools`	No	Explicit allow-list of tool IDs this agent can call on triggered runs (e.g. ['messages.send', 'agent.handoff']). Empty list [] = clear the allow-list and fall back to system defaults. When set, only these tools (minus denied_tools) are exposed to the agent. Does NOT affect the My AI dropdown path.
`max_iterations`	No	Hard cap on agentic-loop turns (LLM round-trips) per run, 1-50 (default 10). Each turn can call tools; the loop stops when the model replies with no tool call OR this cap is hit. Raise it for multi-step tool chains (e.g. browser automation: open → snapshot → fill → confirm → reply) that otherwise exhaust their turns before producing a final answer. OMIT to leave it unchanged.
`vision_enabled`	No	Per-agent opt-in for vision content. When true, the executor splices recent image attachments from the active thread into the LLM call (Phase 3a continuous vision for Meet bot screen-share, plus any future channel that uploads images). Requires the agent's model to support vision (model_has_vision check). Default false; new calls pay zero token cost until the operator opts in. OMIT to leave the vision flag unchanged.
`voice_greeting`	No	Opening line the agent speaks when the call connects. Pass an empty string "" to clear. Omit or null leaves unchanged.
`voice_stt_model`	No	Speech-to-text model: 'flux' (alias for flux-general-en), 'flux-general-en' (English Flux, LLM-powered end-of-turn), 'flux-general-multi' (multilingual Flux), or 'nova-3' (silence-based fallback). Flux variants are more responsive; nova-3 is the fallback when your Deepgram plan lacks Flux. OMIT to leave the STT model unchanged.
`voice_tts_speed`	No	TTS playback speed multiplier (0.5-2.0, default 1.0). Yandex/OpenAI/Cartesia only — ignored for Deepgram.
`voice_tts_voice`	No	TTS voice id — provider-specific (e.g. 'aura-2-thalia-en' for Deepgram, 'alloy' for OpenAI, 'alena' for Yandex, Cartesia voice UUID). Pass null to revert to provider default.
`auto_reply_rules`	No	Plain-English rules injected into the fast model's system prompt as a `## Rules` block. No reserved keywords — the fast model reads them as guidance and decides per turn whether to reply directly or escalate to the main model for tools. Example: '- If the user greets, reply "Hi! How can I help?"\n- If the user asks what you can do, reply with a 1-sentence summary\n- If the question needs live data (prices, stock, booking), escalate' Engagement filtering (SKIP) belongs in trigger `conditions` (keywords, ai_filters, channel_types, cooldown), NOT here — if a message should be ignored the trigger shouldn't have fired. Pass null to clear.
`voice_max_tokens`	No	Max TTS tokens per voice reply (40-200, default 100). Lower = snappier, higher = more detail.
`include_job_state`	No	Include current job state (active job context, tasks, notes) in the agent's prompt. OMIT to leave this flag unchanged.
`include_situation`	No	Include situation context (channel, sender info, trigger type) in the agent's prompt. OMIT to leave this flag unchanged.
`voice_stt_keyterms`	No	Domain-vocab bias for STT — names, product SKUs, etc. Passed verbatim as repeated `&keyterm=<w>` query params. Works on both Nova-3 and Flux. Prefer short phrases over full sentences. Empty list [] = no bias. Omit leaves unchanged.
`voice_stt_language`	No	STT language hint. 'multi' (default) enables code-switching; singletons like 'en', 'ru', 'es' give higher accuracy when the caller language is known. Use 'multi' for bilingual callers. OMIT to leave the STT language unchanged.
`voice_tts_language`	No	TTS language code, BCP-47 lite e.g. 'en', 'es', 'pt-BR' (Cartesia only, default 'en').
`voice_tts_provider`	No	Text-to-speech provider: 'deepgram' (default, Aura-2 EN-only), 'openai' (multilingual), 'yandex' (best Russian), or 'cartesia' (Sonic-3 ultra-low TTFB). OMIT to leave the TTS provider unchanged.
`include_specialists`	No	Inject a [SPECIALISTS] block (~50–200 tokens) listing the workspace's delegation-capable agents so a router-style agent can pick a handoff target without first calling agents.list. Default OFF for new agents; the Router template ships with this ON. Agentic mode only. OMIT to leave this flag unchanged.
`voice_primary_model`	No	Primary LLM for voice turns (e.g. 'gpt-4.1-mini', 'claude-haiku-4-5-20251001'). gpt-4.1-nano is too weak for reliable turn tracking; mini is the recommended floor. Pass null to revert to default.
`fast_prompt_override`	No	Full fast-path prompt override. Placeholders substituted via .replace(): {message}, {history}, {rules}, {tools}, {output_contract}. agent.prompt_text is NOT injected into fast_prompt_override — include it yourself if you want it. Pass null to clear.
`voice_filler_enabled`	No	Emit 'thinking' filler audio while tools run so the caller hears life on the line (default true). OMIT to leave this flag unchanged.
`voice_max_tool_calls`	No	Max tool calls per voice turn (1-10, default 3). OMIT to leave unchanged.
`voice_thinking_texts`	No	Pool of phrases spoken while the agent sets up the turn before calling the LLM (e.g. ['Hmm', 'So', 'One sec']). Pre-rendered to PCM at call start; one is picked at random per turn so the agent doesn't repeat the same word. Pass [] to clear. Omit or null leaves unchanged.
`include_learned_style`	No	Include learned communication style (per-contact tone, dormancy state) in the agent's prompt. OMIT to leave this flag unchanged.
`include_thread_summary`	No	Include condensed summary of older thread messages in the agent's prompt. OMIT to leave this flag unchanged.
`include_factual_accuracy`	No	Inject the Factual Accuracy block (~100 tokens, generic anti-hallucination rules) into the system prompt. Default OFF — skip if you write domain-specific accuracy rules in Instructions. Agentic mode only. OMIT to leave this flag unchanged.
`knowledge_collection_ids`	No	Replace all knowledge collections with these IDs (empty list = clear all)
`include_safety_guidelines`	No	Inject the generic Safety Guidelines block (~80 tokens) into the system prompt. Default OFF — enable only if you don't already write safety rules in your Instructions. Agentic mode only. OMIT to leave this flag unchanged.
`include_tool_call_history`	No	Include the agent's own tool calls and results from the last 3 runs on this thread, compacted to IDs + top hits (~200-1000 tokens). Lets the agent recall file IDs, search hits, and decisions it already made across turns. Default ON. Agentic mode only. OMIT to leave this flag unchanged.
`voice_endpointing_min_delay`	No	Silence after end-of-utterance before agent replies (0.1-2.0s, default 0.3). Higher = fewer false interrupts; lower = snappier.
`voice_preemptive_generation`	No	Speculatively start the LLM on STT partials so the agent begins responding before end-of-utterance. Matches LiveKit stock template. Default true. OMIT to leave this flag unchanged.
`include_conversation_history`	No	Include recent messages from this thread (up to 20) in the agent's prompt. OMIT to leave this flag unchanged.
`include_data_confidentiality`	No	Inject the Data Confidentiality block (~250 tokens, cross-contact PII isolation + prompt-injection defense) into the system prompt. Recommended for multi-tenant workspaces. Default OFF. Agentic mode only. OMIT to leave this flag unchanged.
`voice_greeting_interruptible`	No	Allow the caller to barge in during the opener TTS. Default true (trial-friendly — long greetings can be interrupted). Set false on outbound-call agents whose configured opener would otherwise get preempted by the caller's 'Hello?' triggering an off-script auto-turn. OMIT to leave this flag unchanged.
`voice_interruption_min_duration`	No	Min caller speech duration to interrupt the agent (0.1-1.5s, default 0.25). Higher = ignore short fillers like 'uh-huh'.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description goes beyond annotations (readOnlyHint=false, destructiveHint=false) by noting 'All parameters are optional' and flagging prompt_text as 'DESTRUCTIVE', plus detailed parameter-specific behavior like model switching nuances.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with purpose, then optionality note, then bulleted use cases. It is slightly lengthy due to complexity but well-organized; could trim some redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 45 parameters, description covers major use cases and relationships (e.g., model/prompt_text interplay). Missing output schema info but annotations provide no output schema, so this is acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the tool description adds value by framing optionality and grouping use cases, though it does not detail each parameter. The bullet list helps agents understand which parameters to use together.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Update an existing AI agent's configuration' and lists specific use cases (enable/disable, change name, etc.), distinguishing it from sibling tools like agents_create and agents_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly lists 'Use this to:' with bullet points, covering key scenarios. It notes all parameters are optional but does not explicitly exclude when not to use, though context implies for updates only.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_update_from_templateAInspect

Update a forked agent's instructions (prompt) to the latest version of the system template it was created from.

Use when the platform has improved a template and the user wants their forked agent to pick up the new prompt. This OVERWRITES the agent's prompt_text with the template's current prompt — any customizations to the prompt are replaced (recoverable via prompt history). Tool/model/execution settings are NOT changed. Only works on agents forked from a template (not from-scratch agents or templates themselves).

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the forked agent to update from its template

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (destructiveHint=false), description discloses that prompt_text is overwritten and customizations are replaced, but recoverable via prompt history. Also clarifies unaffected settings, providing full behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five sentences front-loading purpose, then usage, then behavioral details, exclusions, and constraints. No fluff; each sentence is necessary and well-placed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low complexity (1 param, no output schema), the description covers all needed: what it does, when to use, behavior (overwrite, recoverable), what isn't changed, and applicability constraints. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (agent_id) with schema description 'ID of the forked agent'. Description reinforces that it must be a forked agent, adding value beyond schema. Schema coverage 100%, but description provides necessary constraint.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Title and description clearly state the tool updates a forked agent's prompt to the latest template version. It specifies the resource (forked agent) and action (update prompt from template), distinguishing it from sibling tools like agents_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use: 'Use when the platform has improved a template...'. Also states restrictions: only for forked agents, not from-scratch or templates, and what is not changed (tool/model/settings).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_createAInspect

Create a new AI filter for semantic intent-based message matching.

AI filters use vector embeddings (via Voyage AI) to detect whether an incoming message matches a specific intent or topic. The filter's description is embedded as a reference vector at creation time. When a message arrives, its embedding is compared against this reference using cosine similarity.

The description field is the most important part — it becomes the reference embedding that all incoming messages are compared against. Write it as a clear statement of what kind of messages should match:

'Customer asking about pricing, subscription plans, or billing'
'User reporting a bug, crash, or unexpected behavior in the product'
'Inbound sales lead expressing interest in purchasing or trialing'

The threshold controls sensitivity: 0.5 is a balanced default, lower values (0.3) cast a wider net, higher values (0.8) require closer matches.

Note: This tool calls the Voyage AI embedding API to generate the reference vector.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Filter name — a short, human-readable label (max 100 chars)
`threshold`	No	Cosine similarity threshold for a message to be considered a match. Range 0.1–1.0. Default 0.50. Lower values (e.g. 0.3) are more permissive and catch more messages. Higher values (e.g. 0.8) require closer semantic similarity.
`description`	Yes	Reference text that defines what messages should match this filter. This text is embedded as a vector and used for cosine similarity comparison against all incoming messages. Be specific and descriptive — the quality of this text directly determines filter accuracy. E.g. 'Customer asking about pricing, subscription costs, or billing issues'. Max 500 chars.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by explaining the use of Voyage AI embedding API, vector embeddings, cosine similarity, and how the description becomes the reference vector. Annotations only indicate it's not read-only and has side effects, but the description adds rich behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is multi-paragraph but well-structured. It starts with the purpose, then explains the mechanism, gives examples, explains threshold, and notes the external API call. Each section adds value, though slightly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of AI filters and no output schema, the description covers the creation process, parameter usage, and external API call. It is sufficient for an agent to understand when and how to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions, but the tool description adds significant value: it provides examples for the description field, explains threshold sensitivity with concrete numbers, and emphasizes the critical role of the description in filter accuracy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new AI filter for semantic intent-based message matching' with a specific verb and resource. It distinguishes from sibling tools like ai_filters_delete, ai_filters_update, etc., by focusing on creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use this tool (to create an AI filter) and provides guidance on how to write the description and set the threshold. It does not directly mention when not to use it or alternatives, but the context makes it clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_deleteA

DestructiveIdempotent

Inspect

Permanently delete an AI filter.

When to use:

User wants to remove a filter they no longer need

This action cannot be undone. Any triggers that reference this filter by ID will no longer match it — review and update those triggers after deletion.

ParametersJSON Schema

Name	Required	Description	Default
`filter_id`	Yes	ID of the filter to delete

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses permanence ('cannot be undone') and side effect on triggers ('triggers that reference this filter by ID will no longer match it'). The annotations (destructiveHint=true) align and are supplemented by detailed behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: three short paragraphs with no wasted words. The action is front-loaded, and each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation with one parameter and no output schema, the description fully covers purpose, usage, irreversibility, and downstream effects. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with one parameter filter_id described as 'ID of the filter to delete.' The description adds no further semantics beyond the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Permanently delete an AI filter,' specifying the action and resource. It distinguishes from sibling tools like ai_filters_create and ai_filters_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a clear 'When to use' section: 'User wants to remove a filter they no longer need.' Lacks explicit when-not-to-use or alternatives, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_listA

Read-onlyIdempotent

Inspect

List all AI filters for the current workspace.

AI filters are semantic intent-based message filters that use embeddings (vector representations) to detect whether an incoming message matches a specific intent or topic. Unlike keyword filters, they understand meaning: 'I need help with my order' and 'my package hasn't arrived' both match a 'shipping support' filter even without shared keywords.

Each filter stores a reference embedding of its description. When a message arrives, its embedding is compared via cosine similarity against the filter's reference vector. If the similarity exceeds the threshold, the filter matches.

When to use:

Check which semantic filters already exist before creating a new one
Get filter IDs for use in trigger conditions
Review thresholds and active status of existing filters

Returns all filters with id, name, description, threshold, and is_active.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds valuable context: it explains that AI filters are semantic intent-based using embeddings and cosine similarity, and that the operation returns all filters. This goes beyond the annotations and provides full transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: a purpose statement, an explanation of AI filters, usage scenarios, and return details. Every sentence adds value, and it is appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and no output schema, the description provides complete information: what it does, what it returns (fields), and contextual understanding of AI filters. It is fully sufficient for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has zero parameters (100% coverage), so baseline is 4. The description adds meaning by explaining the nature of AI filters and the output fields, which is sufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all AI filters for the current workspace' and specifies the returned fields (id, name, description, threshold, is_active). This is specific and distinguishes it from sibling tools like ai_filters_create, ai_filters_delete, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use' section with explicit scenarios: check existing filters before creating, get filter IDs for triggers, review thresholds and active status. While it provides clear usage context, it does not explicitly mention when not to use or compare with alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_testA

Read-onlyIdempotent

Inspect

Test a message against an AI filter to check whether it would match.

This tool embeds the provided message using Voyage AI and computes the cosine similarity between the message vector and the filter's stored reference vector. It returns the similarity score, whether the message would match (similarity >= threshold), and the filter's threshold value.

Use this to:

Verify a filter works as intended before using it in a trigger
Tune the threshold by testing borderline messages
Debug why a message did or did not match a filter in production

Returns: {similarity: float, matched: bool, threshold: float}

Note: This tool calls the Voyage AI embedding API to embed the test message.

ParametersJSON Schema

Name	Required	Description	Default
`message`	Yes	The message text to test. This is embedded and compared against the filter's reference vector via cosine similarity.
`filter_id`	Yes	ID of the filter to test against

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by revealing that the tool calls the Voyage AI embedding API and computes cosine similarity. It confirms the tool is idempotent and safe, consistent with annotations, and provides detailed behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficient and well-structured: purpose, mechanism, use cases, output format, and a note on API call. No redundant sentences, and every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description explicitly states the return format {similarity, matched, threshold}. It covers behavioral traits, usage guidelines, and parameter roles, making it fully complete for a test tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for both parameters. The description adds marginal value by explaining the role of message in embedding and threshold comparison, but the schema already handles the semantics adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tests a message against an AI filter to check for matches, explaining the embedding and cosine similarity computation. It distinguishes itself from sibling tools like ai_filters_create, ai_filters_delete, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists three use cases: verify filter before use, tune threshold, debug production issues. It provides clear context for when to use the tool, though it doesn't explicitly state when not to use it or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_updateAInspect

Update an existing AI filter's name, description, threshold, or active state.

When to use:

User wants to rename a filter
User wants to refine the filter description to improve match accuracy
User wants to adjust the similarity threshold (higher = stricter matching)
User wants to enable or disable a filter without deleting it

Provide only the fields you want to change. At least one field is required.

Note: If the description is changed, this tool calls the Voyage AI embedding API to re-generate the reference vector with the new description text.

ParametersJSON Schema

Name	Required	Description
`name`	No	New filter name (max 100 chars, optional)
`filter_id`	Yes	ID of the filter to update
`is_active`	No	Enable (true) or disable (false) the filter. OMIT to leave the active flag unchanged.
`threshold`	No	New cosine similarity threshold. Range 0.1–1.0. Optional.
`description`	No	New reference description text. If changed, the Voyage AI embedding API is called to re-generate the reference vector. Max 500 chars. Optional.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false, but the description adds critical behavior: calling the Voyage AI embedding API to re-generate the reference vector when description changes. This goes beyond annotations and helps the agent anticipate side effects. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three paragraphs: main action, when-to-use list, and a note about re-embedding. It front-loads the purpose and avoids unnecessary detail. Could be slightly more compact, but every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 5 parameters, the description covers the key aspects: what fields can be updated, the effect of changing description (API call), and the threshold range implied by min/max in schema. It provides enough context for an agent to invoke correctly, though it lacks clarification on whether partial updates are always allowed when some fields are omitted.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so baseline is 3. The description adds value by explaining threshold meaning ('higher = stricter matching') and clarifying that changing description triggers an embedding API call. It also reinforces that at least one field must be changed (though schema only marks filter_id as required). This enriches parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates an AI filter's name, description, threshold, or active state. It distinguishes itself from sibling tools like ai_filters_create and ai_filters_delete by focusing on modification. The verb 'Update' and resource 'AI filter' are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use the tool (rename, refine description, adjust threshold, enable/disable). It also advises to provide only fields to change and that at least one field is required. However, it does not explicitly mention when not to use it or compare to alternatives like ai_filters_test, but the provided guidance is clear and helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_add_to_threadAInspect

Apply one or more AI tags to a thread (manually).

When to use:

User wants to label a conversation with one or more tags
User asks to categorize or tag a thread

Provide the thread_id (integer) and an array of tag_ids to apply. If a tag is already applied it will be updated to is_manual=true.

ParametersJSON Schema

Name	Required	Description	Default
`tag_ids`	Yes	Array of tag IDs to apply (1–20 IDs)
`thread_id`	Yes	ID of the thread to tag

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutation (readOnlyHint=false). Description adds key detail: 'If a tag is already applied it will be updated to is_manual=true', which is not obvious from annotations. This enriches behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: one-line title, brief 'When to use' section, parameter instruction, and a single behavioral note. Every sentence adds value; no superfluous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 2-parameter mutation tool with no output schema, the description covers purpose, usage, parameter constraints, and idempotency behavior. Minor omission: no mention of prerequisite (e.g., tags must exist), but overall sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters are covered 100% in the input schema. The description restates them without adding new semantic meaning beyond the schema descriptions. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'apply', the resource 'AI tags to a thread', and adds 'manually' to distinguish from automatic tagging. The purpose is specific and distinct from siblings like 'ai_tags_remove_from_thread'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use' scenarios: user wants to label a conversation or categorize/tag a thread. Lacks direct exclusions or alternatives, but the context is clear and useful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_createAInspect

Create a new AI tag (automatic message filter).

AI tags are lightweight classifiers that run on every incoming message. When a message matches the tag's description/criteria, the thread is automatically labelled — so AI agents can cheaply pre-filter threads instead of running full LLM analysis on everything. Good descriptions are the key: they tell the classifier exactly when to apply this tag.

When to use:

User wants to auto-classify incoming messages (e.g. bug reports, sales leads, support requests)
User wants to reduce AI agent costs by pre-filtering threads by topic or intent

Tips for the description field:

Be specific: 'Messages reporting errors, crashes, or unexpected behavior in the product'
Include examples of what qualifies and what doesn't

Limit: 20 active personal tags / 50 active team tags.

ParametersJSON Schema

Name	Required	Description
`icon`	No	Emoji icon for the tag (max 10 chars, optional)
`name`	Yes	Tag name (max 100 chars)
`color`	No	Tailwind color key for the tag badge. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to use the default color.
`description`	No	Classifier prompt: describe exactly when this tag should be applied to a thread. The more specific, the better the auto-classification accuracy. E.g. 'Messages reporting software errors, crashes, or unexpected behavior'. Max 500 chars.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is not read-only or destructive; description adds that tags run on every incoming message and auto-label threads when criteria match. It also mentions limits (20/50 active tags), providing behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a summary, conceptual explanation, usage guidance, tips, and limits. It is front-loaded with the core purpose, and each section adds value. While slightly long, it remains efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity and zero output schema, the description covers purpose, usage, key behavioral details, and parameter guidance. It provides enough context for an agent to decide when to use it and how to fill in the description field correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining that the 'description' parameter is a classifier prompt, with tips for specificity and examples. This enhances understanding beyond the schema's brief descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a clear action verb and resource ('Create a new AI tag'), immediately distinguishing it as a creation tool. It explains the concept of AI tags as lightweight classifiers that auto-label threads, distinguishing it from sibling tools like ai_filters_create which may have different behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use:' section with two explicit scenarios: auto-classifying incoming messages and reducing costs by pre-filtering. While it does not explicitly state when not to use it compared to alternatives, the context is very clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_deleteA

DestructiveIdempotent

Inspect

Delete a personal AI tag. All thread associations are removed automatically.

When to use:

User wants to permanently remove a tag they no longer need

This cannot be undone. Threads are NOT deleted — they just lose this tag.

ParametersJSON Schema

Name	Required	Description	Default
`tag_id`	Yes	ID of the tag to delete

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint: true, so the description adds value by explaining that thread associations are removed automatically and threads are not deleted. It also states the action cannot be undone. This provides behavioral context beyond what annotations offer, without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with the main action, followed by a clear usage guideline and consequence statements. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple nature of the tool (one parameter, no output schema) and high schema coverage, the description fully explains the operation, when to use it, and its effects. It covers all relevant aspects for an AI agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the parameter tag_id is well-described in the schema. The description does not add extra semantics beyond what the schema already provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it deletes a personal AI tag and automatically removes thread associations. It uses the specific verb 'delete' and resource 'personal AI tag'. It distinguishes from siblings like ai_tags_remove_from_thread which only removes tag from a thread without deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a 'When to use' section specifying the scenario for permanent removal. It clarifies threads are not deleted and the action is irreversible. However, it could explicitly mention alternatives like ai_tags_remove_from_thread for when the goal is to just detach the tag.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_listA

Read-onlyIdempotent

Inspect

List all personal AI tags.

AI tags are automatic message filters: the system runs a lightweight classifier on every incoming message and applies matching tags to threads. This lets AI agents skip expensive full analysis on most messages — they only act on threads that match relevant tags, dramatically cutting LLM costs.

When to use:

Check which auto-classification filters exist before creating one
Get tag IDs for add_to_thread / remove_from_thread
See how many threads each tag currently matches

Returns all tags with thread counts (non-archived, included threads only).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. The description adds valuable context: returns all tags with thread counts for non-archived included threads only, and explains the underlying auto-classification mechanism. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized: a one-line summary, a paragraph explaining AI tags, a bulleted 'When to use' section, and a note on return format. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no parameters and no output schema. The description fully covers what the tool does, when to use it, and what data it returns (tags with thread counts, non-archived, included threads only). No gaps remain for an agent to understand its behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist in the schema (schema coverage 100%). The description explains what the tool returns (tags with thread counts), which is the relevant semantic information for an agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'List all personal AI tags' with precise verb and resource, and includes a clear explanation of what AI tags are. Siblings like create/delete/update/add/remove are clearly distinct, making purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage scenarios: checking existing filters before creating, getting IDs for add/remove, and viewing thread counts. This guides the agent on when to use this tool versus related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_remove_from_threadA

DestructiveIdempotent

Inspect

Remove a specific AI tag from a thread.

When to use:

User wants to un-label or remove a specific tag from a conversation
User wants to correct an incorrectly applied tag

Provide both thread_id and tag_id.

ParametersJSON Schema

Name	Required	Description	Default
`tag_id`	Yes	ID of the tag to remove
`thread_id`	Yes	ID of the thread to remove the tag from

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states the removal action, which aligns with annotations (destructiveHint: true). However, it does not add behavioral context beyond what annotations already provide (e.g., no mention of side effects or permissions).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence plus a bulleted list. No unnecessary words, and the key information is front-loaded. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has two required parameters and no output schema. The description covers purpose and usage context. It is complete enough for a simple removal operation, though it could mention the lack of return value or error cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions already present. The description adds no additional meaning beyond instructing to provide both IDs. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Remove') and resource ('a specific AI tag from a thread'). However, it does not explicitly distinguish from the sibling tool 'ai_tags_add_to_thread', though the name itself provides differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' examples (un-labeling, correcting tags) and instructs to provide both thread_id and tag_id. It does not mention when not to use or alternatives, but the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_updateAInspect

Update an existing personal AI tag's name, description, icon, color, or active state.

When to use:

User wants to rename a tag
User wants to change a tag's icon, color, or description
User wants to enable or disable a tag

Provide only the fields you want to change. At least one field is required.

ParametersJSON Schema

Name	Required	Description
`icon`	No	New emoji icon (max 10 chars, optional)
`name`	No	New tag name (max 100 chars, optional)
`color`	No	New color key. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to leave the color unchanged.
`tag_id`	Yes	ID of the tag to update
`is_active`	No	Enable (true) or disable (false) the tag. OMIT to leave the active flag unchanged.
`description`	No	New LLM hint (max 500 chars; empty string clears it, optional)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=false and destructiveHint=false. The description adds context: updates apply to 'existing' tags, at least one field is required. No contradictions. It does not detail side effects, but the mutation is clear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: two sentences plus a bullet list. It front-loads the purpose, lists when-to-use scenarios, and closes with a usage note. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters (1 required), no output schema, and clear sibling tools, the description covers the essential usage. It lacks details on the return value, but the update context is sufficiently explained for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The description adds value by stating 'Provide only the fields you want to change' and 'At least one field is required,' which guides partial updates. Baseline for high coverage is 3, but the added guidance lifts it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Update an existing personal AI tag's name, description, icon, color, or active state.' This clearly identifies the verb (update) and resource (personal AI tag), and distinguishes it from sibling tools like ai_tags_create or ai_tags_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' bullet points (rename, change icon/color/description, enable/disable). It notes partial updates are allowed. It does not explicitly exclude cases or name alternatives, but the sibling tool names imply creation/deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_check_availabilityA

Read-onlyIdempotent

Inspect

Check when you have free time in Google Calendar. Shows busy periods and free slots in a given time range. Useful for finding meeting times or checking schedule conflicts.

ParametersJSON Schema

Name	Required	Description	Default
`end_time`	No	End date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to end of start_time day, or 7 days from now.
`start_time`	No	Start date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to start of today.
`calendar_id`	No	Calendar ID to check. Defaults to primary calendar.	primary
`working_hours_only`	No	If true, only show free slots during working hours (9 AM - 6 PM). OMIT to show all free time (the default).
`min_duration_minutes`	No	Minimum duration in minutes for free slots. Filters out short gaps. Default: 30 minutes.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description's safety profile is covered. The description adds limited behavioral context beyond the annotations, such as showing 'busy periods and free slots,' but does not disclose pagination or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise, front-loaded sentences with no wasted words. Every sentence serves the purpose of explaining the tool's functionality and usefulness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool, comprehensive schema with defaults, and read-only annotations, the description adequately covers what the agent needs to know. No output schema is present, but the return value is implied.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with detailed parameter descriptions, setting a baseline of 3. The tool description does not add additional semantic value beyond the schema, merely referencing 'time range' which is already defined.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it checks free time in Google Calendar, showing busy periods and free slots. It distinguishes from sibling tools like calendar_list_events by focusing on availability rather than listing events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly says 'Useful for finding meeting times or checking schedule conflicts,' providing clear context for when to use. However, it does not mention when not to use or alternatives, missing explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_create_eventAInspect

Create a new event in Google Calendar. Specify the title, start time, end time, and optionally invite attendees. Use ISO 8601 format for dates (e.g., 2024-12-15T14:00:00).

ParametersJSON Schema

Name	Required	Description	Default
`end`	No	Event end time in ISO 8601 format. If not provided, defaults to 1 hour after start. Also accepts 'end_time' as alias.
`start`	No	Event start time in ISO 8601 format (e.g., 2024-12-15T14:00:00). Also accepts 'start_time' as alias.
`title`	No	Alias for summary - event title.
`summary`	No	Event title/summary. Required. Also accepts 'title' as alias.
`end_time`	No	Alias for end - event end time.
`location`	No	Event location (physical address or virtual meeting link).
`timezone`	No	Timezone for the event (e.g., 'America/New_York', 'UTC').
`attendees`	No	List of attendee email addresses to invite.
`start_time`	No	Alias for start - event start time in ISO 8601 format.
`calendar_id`	No	Calendar ID to create event in. Defaults to primary calendar.	primary
`description`	No	Event description/notes.
`add_google_meet`	No	If true, automatically creates a Google Meet link for the event. OMIT to skip Meet link.
`conference_data`	No	Conference data for Google Meet. Alternative to add_google_meet flag.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-readonly and non-destructive behavior, so the description's 'Create' statement is consistent. The description adds value by noting default behavior (1-hour duration if end not provided) and alias handling ('title' for 'summary'), but does not disclose other traits like permission requirements or response format. With annotations present, this is adequate but not exceptional.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with just two sentences. The first sentence states the core action, and the second provides important format and optional parameter details. No wasted words, and the structure is front-loaded with the most critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, aliases, nested objects), the description covers the essential points but omits details like the calendar_id default, conference_data usage, and alias explanations. However, the schema fully documents all parameters, so the description effectively complements it. It is nearly complete for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds meaning by specifying ISO 8601 format for dates and mentioning optional attendee invitations. However, the schema already documents each parameter's description and aliases, so the description does not significantly enhance understanding beyond highlighting key parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new event in Google Calendar' with a specific verb and resource. It distinguishes from sibling tools like calendar_update_event and calendar_delete_event by focusing on creation. The mention of key parameters (title, start, end, attendees) further clarifies the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool (creating events) and includes format guidelines (ISO 8601). However, it does not explicitly state when not to use it or mention alternatives like calendar_update_event for modifications. The sibling tools are distinct enough to avoid confusion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_delete_eventA

DestructiveIdempotent

Inspect

Delete an event from Google Calendar. This action cannot be undone. Use with caution.

ParametersJSON Schema

Name	Required	Description	Default
`event_id`	Yes	ID of the event to delete. Required.
`calendar_id`	No	Calendar ID containing the event. Defaults to primary.	primary
`send_notifications`	No	Whether to send cancellation notifications to attendees.

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint=true and readOnlyHint=false. The description adds that the action 'cannot be undone', which provides additional context beyond the annotations. However, it does not detail other behavioral traits like authorization needs or rate limits, keeping the score moderate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences: one stating the purpose and one adding a warning. Every sentence adds value with no unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation with no output schema, the description effectively conveys the essential behavior (deletion, irreversibility) and caution. It is nearly complete, though it could briefly mention the send_notifications parameter from the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100%, so the schema already documents all parameters. The description does not add any additional meaning beyond what the schema provides, resulting in the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and the resource ('an event from Google Calendar'), distinguishing it from sibling tools like calendar_create_event or calendar_update_event.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use with caution' but does not provide explicit guidance on when to use this tool versus alternatives, nor does it mention prerequisites or exclusion criteria. It lacks the when-to-use and when-not-to-use information expected for a high score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_list_eventsA

Read-onlyIdempotent

Inspect

List events from Google Calendar. Shows upcoming events by default. Can filter by date range and search query.

ParametersJSON Schema

Name	Required	Description	Default
`query`	No	Free text search query to filter events.
`date_to`	No	End date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to 7 days from now. Alias: time_max.
`date_from`	No	Start date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to now. Alias: time_min.
`calendar_id`	No	Calendar ID to list events from. Defaults to primary calendar.	primary
`max_results`	No	Maximum number of events to return.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds beyond that by specifying default timeline (upcoming), date range filtering, and search query support. This provides useful behavioral context without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-loading the purpose and key capabilities. No filler or redundant information. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with no output schema, the description covers default behavior and filters. Missing explicit mention of pagination (though max_results param exists) or ordering, but still fairly complete. No major gaps given annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for all 5 parameters. The description mentions date range and search query, aligning with params, but does not add new meaning beyond the schema. Baseline 3 is appropriate since schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List', the resource 'events from Google Calendar', and the scope: shows upcoming by default, with filtering by date range and search query. This distinguishes it from sibling tools like calendar_create_event and calendar_delete_event.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on default behavior and filtering options but does not explicitly state when not to use this tool or mention alternatives like calendar_check_availability for availability checks. Some implied usage, but no explicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_update_eventAInspect

Update an existing event in Google Calendar. Can modify title, time, location, description, and attendees. Only specified fields will be updated.

ParametersJSON Schema

Name	Required	Description	Default
`end`	No	New end time in ISO 8601 format. Optional.
`start`	No	New start time in ISO 8601 format. Optional.
`summary`	No	New event title/summary. Optional.
`event_id`	Yes	ID of the event to update. Required.
`location`	No	New event location. Optional.
`attendees`	No	New list of attendee emails. Replaces existing attendees.
`calendar_id`	No	Calendar ID containing the event. Defaults to primary.	primary
`description`	No	New event description. Optional.

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description notes modifiable fields and partial update, consistent with annotations (readOnlyHint=false). Does not disclose side effects like attendee replacement (covered in schema) or return format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action and resource, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description covers core functionality but lacks details on error handling, permissions, or usage of time parameters. Schema fills gaps for parameter descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%. Description maps 'title, time, location, description, attendees' to schema fields but adds no meaning beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool updates an existing Google Calendar event and lists modifiable fields (title, time, location, description, attendees). Distinguishes from sibling tools like create, delete, and list events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States 'Only specified fields will be updated,' implying partial update behavior. Does not explicitly compare to alternatives, but context makes it clear this is for modifications.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_get_transcriptA

Read-onlyIdempotent

Inspect

Get the structured transcript and final state of a voice call by call_id. Returns per-turn rows in chronological order, call status (active/completed/failed/abandoned), duration, and an outcome field telling whether the recipient picked up (answered/no_answer/busy/declined/failed/unknown). answered_at is non-null once the recipient picked up. Returns active turns if the call is still in progress.

ParametersJSON Schema

Name	Required	Description	Default
`call_id`	Yes	Call ID returned by calls.make in _meta.call_id.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as read-only, idempotent, and non-destructive. The description adds valuable behavior: returns structured rows in chronological order, specific fields like outcome and answered_at, and handles active calls. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no waste. The first sentence front-loads the purpose, and subsequent sentences add concise details. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with no output schema, the description covers the return structure thoroughly: rows, status, duration, outcome, answered_at, and handling of active calls. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema fully describes the single parameter call_id with source info. The description does not add additional parameter semantics beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the purpose: retrieving the structured transcript and final state of a voice call by call_id. It specifies the returned fields (per-turn rows, call status, duration, outcome, answered_at), which distinguishes it from sibling tools like calls_list_active or calls_list_history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for getting transcript/state given a call_id, and notes it returns active turns for ongoing calls. However, it does not explicitly state when to use this tool versus alternatives or provide exclusion criteria. The context is clear but lacks explicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_hangupA

Read-onlyIdempotent

Inspect

Hang up an active voice call by call_id. Use after calls.make when the agent decides to terminate before the callee does, or to abort a stuck call. Idempotent: returns success if the call is already terminal.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	Short internal reason for ending the call (e.g. 'campaign timeout'). Stored on voice_sessions.metadata.
`call_id`	Yes	Call ID returned by calls.make in _meta.call_id.

Tool Definition Quality

A3.5/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description contradicts annotations: readOnlyHint=true but tool mutates state; destructiveHint=false but hanging up is destructive. This is a serious inconsistency, warranting a score of 1 per evaluation rules.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the core action and context. Every sentence is meaningful with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes usage and idempotency, but lacks return value details (e.g., success response format). Given no output schema, the description could do more to specify what 'returns success' means.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions (100% coverage). Description adds no extra parameter meaning beyond schema, meeting baseline but not exceeding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Hang up an active voice call by call_id.' Verb and resource are specific. Distinct from sibling tools like calls_make and calls_wait.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly recommends use after calls.make for agent-initiated termination or aborting stuck calls. Also notes idempotency, implying safe reuse. Does not explicitly state when to avoid, but scenarios are clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_activeA

Read-onlyIdempotent

Inspect

List active voice calls in this workspace. Use before calls.make on a Telegram account (only one MTProto call per account at a time) to check whether the line is free.

ParametersJSON Schema

Name	Required	Description	Default
`channel`	No	Filter by voice channel. OMIT to include both telegram and twilio.
`channel_account_id`	No	Filter by channel_account.id (the calling Telegram account or Twilio number). Combine with channel for a per-line busy check.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description's behavioral detail about MTProto call limit adds value beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The primary purpose is front-loaded, and the second sentence provides a critical usage hint. Each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with good annotations and schema, the description is complete: it explains what it does, when to use it, and how parameters relate to the use case. No output schema is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with clear descriptions. The tool description enhances this by explaining the practical use of parameters (e.g., 'Combine with channel for a per-line busy check'), adding context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List active voice calls in this workspace' with a specific verb (list) and resource (active calls). It distinguishes from siblings like calls_list_history and calls_make by providing a usage context: checking if a Telegram line is free before making a call.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends using this tool before calls_make to check line availability, citing the constraint of one MTProto call per account. It provides clear context but does not explicitly exclude other scenarios or name alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_historyA

Read-onlyIdempotent

Inspect

Search historical voice calls in this workspace by participant name, contact_id, thread, channel, source, and/or date range. Returns one row per call (NOT per turn) with call_id, duration_seconds, outcome, direction, started_at, source, channel_label, and parent_thread_id (the originating chat thread for Telegram-group / Twilio-outbound / Meet calls). Pair with calls.get_transcript(call_id) for the full per-turn transcript. Use this instead of messages.read_history for cross-thread call queries — group calls and Meet sessions live on per-call sub-threads, not on the parent chat thread.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum calls to return (default 20, max 100).
`since`	No	ISO date or datetime lower bound (inclusive). Default: 90 days ago. Naive timestamps are interpreted as UTC.
`until`	No	ISO date or datetime upper bound (inclusive). Default: now.
`source`	No	Filter by voice_sessions.source: 'telegram' (1:1 + group), 'twilio' (PSTN), 'meet' (Google Meet bot), 'livechat' (in-app voice). OMIT to include all sources.
`channel`	No	Filter by message-level channel of the call thread: 'telegram' (1:1 voice or group call sub-thread), 'twilio_voice', 'meet_voice', 'livechat_voice'. OMIT to include all voice channels.
`thread_id`	No	Restrict to calls on this thread OR with this thread as their originating parent (Telegram group → call sub-thread back-link, Twilio outbound source_thread_id back-link).
`contact_id`	No	Filter by exact entity_id (from contacts.find). Mutually exclusive with participant_name when both target the same person.
`participant_name`	No	Filter to calls whose parent thread has a participant matching this name (substring match against entity.title). Resolves group calls via the parent group's roster, not the per-call thread's speaker list.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. Description adds that it returns one row per call (not per turn) and lists returned fields, which enhances transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two focused sentences with no filler. Front-loaded with action and filters. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description lists returned fields. For a list tool with 8 parameters, it provides sufficient context. Could mention limit default but schema covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% parameter description coverage, so baseline is 3. The description adds value by explaining return structure and clarifying special parameters like thread_id and participant_name, earning a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches historical voice calls with specific filter parameters, and distinguishes from sibling tools like messages.read_history and calls.get_transcript.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises when to use this tool over alternatives: 'Use this instead of messages.read_history for cross-thread call queries' and pairs it with calls.get_transcript for per-turn transcripts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_makeAInspect

Place an outbound AUDIO/VOICE phone call via Twilio (PSTN) or Telegram (MTProto 1:1 call). Use this any time the user asks to 'call', 'ring', 'phone', 'dial', or have a spoken conversation. Do NOT use messages.send when the user asks to call someone — a call is real-time voice, not a text message. You conduct the conversation as the voice agent using the provided greeting and instructions.

ParametersJSON Schema

Name	Required	Description
`channel`	No	Voice transport: 'twilio' (phone via PSTN — requires phone_number in E.164) or 'telegram' (MTProto 1:1 call — requires telegram_user_id, NOT a phone number or thread_id). OMIT to use the current conversation's channel (e.g. a Telegram DM → a Telegram call to that contact).
`greeting`	Yes	The first sentence the agent speaks immediately when the call connects. ALWAYS provide a greeting — without it the caller hears silence. Keep it short and natural. Example: 'Hi, this is Diana calling from DialogBrain. Do you have a moment to chat?'
`report_back`	No	When to re-invoke you after the call ends. 'on_answer' (default) = only if the call was answered, 'always' = even on missed/failed calls, 'never' = fire and forget. Transcript is always stored regardless of this setting.
`instructions`	No	What to do during the call — objective, questions, tone. The AI generates a natural opening and guides the conversation. Example: 'Call about invoice #1234. Ask if they received it and when payment is expected. Be friendly and professional.'
`phone_number`	No	Destination phone number in E.164 format (e.g., '+15551234567', '+66812345678'). Required when channel='twilio'.
`telegram_user_id`	No	Destination Telegram user ID (decimal int64 as string, e.g. '123456789'). Required when channel='telegram'. The caller account must have had prior interaction with this user — a cold contact cannot be reached via voice.

Tool Definition Quality

A4.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=false, destructiveHint=false) indicate mutation but no destruction. Description adds that the call is outbound, uses specific transports, and the agent conducts a real-time conversation. It doesn't fully disclose potential costs or rate limits, but it provides sufficient behavioral context beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with the core purpose, immediately followed by usage guidance. Each sentence serves a distinct function: defining the tool and providing when-to-use/when-not-to-use. Ideal conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 parameters, no output schema, but 100% schema coverage, the description fully covers purpose, usage, and parameter semantics. It addresses ambiguity points (channel differences, greeting necessity) and provides situational context (e.g., 'cold contact cannot be reached via voice'). Complete for agent decision-making and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. However, the description adds significant meaning: explains the channel enum (twilio/telegram), emphasizes greeting is mandatory, details report_back options, instructions purpose, phone_number format (E.164), telegram_user_id constraints, voice_agent_id override, and channel_account_id selection. Every parameter is elaborated well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Place an outbound AUDIO/VOICE phone call via Twilio or Telegram' and specifies the triggers ('call', 'ring', 'phone', 'dial', or have a spoken conversation'). It distinguishes itself from sibling tool 'messages.send' by emphasizing real-time voice vs text. This strongly differentiates from other communication tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (user asks to call, ring, etc.) and when not to use ('Do NOT use messages.send when the user asks to call someone'). Provides clear context that it is for real-time voice conversation, not text messaging. No ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_meetA

Read-onlyIdempotent

Inspect

Dispatch a workspace AI agent into an active Google Meet call. The agent joins as a participant — it can hear the conversation, respond via TTS, see the shared screen (when vision is enabled on the agent), and answer questions about what's on screen. Use when the operator wants to delegate live meeting attendance to an agent (notes, Q&A, summarization, real-time support). The Meet URL must be in canonical 3-4-3 form, e.g. https://meet.google.com/abc-defg-hij. Lookup-redirect URLs are not supported — operator must use the share-link form.

ParametersJSON Schema

Name	Required	Description
`agent_id`	Yes	ID of an active agent in this workspace. Get it from agents.list. Any active agent can be dispatched — a voice trigger is NOT required (the runner attaches the agent you name directly).
`meet_url`	Yes	Canonical Google Meet URL — must match https://meet.google.com/<3 letters>-<4 letters>-<3 letters>, e.g. https://meet.google.com/abc-defg-hij. lookup/ redirects are NOT supported.
`vision_mode`	No	Screen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call the vision_query tool for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and the executor splices the latest scene-change into each agent turn as ambient low-detail context. OMIT to use 'off' (the default).
`instructions`	No	What the agent should do once it joins — its task brief, e.g. 'greet everyone and present the overview deck' or 'take notes and answer questions about the roadmap'. Woven into the agent's system prompt for the session. OMIT for a generic listening agent.
`start_immediately`	No	If true, the agent starts talking as soon as it joins — it greets everyone and begins the task in `instructions` without waiting for someone to say a wake-word. OMIT (default false) to stay silent until addressed.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description richly details agent behavior (hearing, TTS, screen sharing, answering questions) and vision_mode capabilities, adding significant context beyond the annotations. Annotations already mark the tool as readOnly and idempotent, but the description clarifies the behavioral impact on the meeting, which is valuable for agent decision-making.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core action, follows with capabilities, usage context, and a specific constraint—all in four tight sentences without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description could briefly mention what the tool returns (e.g., join status or participant ID). However, it covers the essential aspects: action, when to use, URL constraints, and agent capabilities, leaving minimal gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful constraints for meet_url (exact format and unsupported redirects) that are not fully captured in the schema's description. For agent_id and vision_mode, the description mostly echoes schema info, but the URL clarification justifies a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb-resource pair 'Dispatch workspace AI agent' and clearly distinguishes this tool from siblings like calls_make or calls_send_to_telegram_call by focusing on joining a live Google Meet with agent capabilities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('when the operator wants to delegate live meeting attendance') and provides critical URL format constraints ('must be canonical 3-4-3 form'). However, it does not explicitly state when not to use it or name alternatives, though the sibling list provides implicit differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_telegram_callA

Read-onlyIdempotent

Inspect

Dispatch a workspace AI agent into an active Telegram group call (t.me/call/ link). The agent joins as a participant via the workspace's Telegram account — it can hear the conversation, respond via TTS, see shared screens (when vision is enabled), and answer questions about what's on screen. Use when the operator wants to delegate live group-call attendance to an agent (notes, Q&A, summarization, real-time support). Pass either the full https://t.me/call/ URL or the bare slug token.

ParametersJSON Schema

Name	Required	Description
`agent_id`	Yes	ID of an active agent in this workspace. Get it from agents.list. Any active agent can be dispatched — a voice trigger is NOT required (the runner attaches the agent you name directly).
`vision_mode`	No	Screen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call vision_query for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and splices the latest scene-change into each agent turn. OMIT to use 'off' (the default).
`instructions`	No	What the agent should do once it joins — its task brief, woven into the voice system prompt. e.g. 'present the overview deck' or 'greet everyone and summarize the discussion'. Optional.
`telegram_call_url`	Yes	Telegram group-call invite — either the full URL (https://t.me/call/<slug>) or just the slug token. Slug is 12-64 chars from [A-Za-z0-9_-].
`channel_account_id`	No	Workspace Telegram channel account ID that joins as the bot. Optional — when the workspace has exactly one Telegram account, it's used by default. Required when multiple Telegram accounts exist.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes agent behavior (joins as participant, hears, responds via TTS, sees screens, answers questions). Annotations (readOnlyHint=true, etc.) are consistent with the description, which adds context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with main action, no redundant information. Could be slightly more structured but is efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description explains agent actions but omits what the tool returns (e.g., success status), error conditions (e.g., if call not active), or timeouts. Additional details would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage; description adds general context (e.g., agent joins via workspace account) but does not significantly enhance parameter meanings beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool dispatches a workspace AI agent into an active Telegram group call, specifying capabilities (hear, TTS, screen sharing). It distinguishes from sibling call tools by targeting Telegram group calls specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('when the operator wants to delegate live group-call attendance'), but does not mention when not to use or provide alternative tools for other call platforms.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_waitA

Read-onlyIdempotent

Inspect

Block until a voice call ends (status changes from 'active') or timeout elapses. Returns ended=true with final state when the call has ended; ended=false on timeout (re-issue to keep waiting). The returned state includes outcome so callers can branch on pickup vs. no-answer (answered/no_answer/busy/declined/failed/unknown). Default timeout 90s; cap 110s — bounded by nginx proxy_read_timeout 120s on /mcp.

ParametersJSON Schema

Name	Required	Description	Default
`call_id`	Yes	Call ID returned by calls.make in _meta.call_id.
`timeout_seconds`	No	Max seconds to wait. Default 90, cap 110 (bounded below nginx 120s proxy_read_timeout). On expiry returns ended=False with status='active' so the caller can re-issue to keep waiting.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds rich behavioral details beyond annotations: blocking nature, return values (ended, outcome), timeout cap due to nginx proxy. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff, front-loaded with purpose. Every sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Handles return values (ended, outcome) and timeout behavior despite no output schema. Complete for a wait function.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds value by linking call_id to calls.make and explaining timeout cap and default. Enhances parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it blocks until a voice call ends or timeout, specifying the resource and action. Differentiates from siblings like calls_hangup and calls_list_active.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explains when to use: wait for call to end, and how to handle timeout (re-issue to keep waiting). No explicit alternatives mentioned, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

channels_connect_telegram_botAInspect

🤖 Connect a Telegram Bot (Bot API) channel from its bot token.

When to use:

After a bot was created via @BotFather and you have its token.
The token alone is sufficient — no Telegram user account is needed.

Validates the token, creates the channel account, and registers the webhook so the bot starts receiving messages immediately.

ParametersJSON Schema

Name	Required	Description	Default
`bot_token`	Yes	Telegram bot token from @BotFather (e.g. '123456789:ABCdef...').

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations indicate this is a mutation (readOnlyHint=false) and non-idempotent, the description adds valuable behavioral details: validates the token, creates the channel account, and registers the webhook for immediate message reception. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Highly concise: uses a main sentence, then structured 'When to use' bullet points. Every sentence adds value, no fluff. Front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one simple parameter and no output schema, the description provides sufficient information: prerequisites, behavioral steps, and immediate effect (webhook registration). Complete for the task.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'bot_token' is already well-described in the schema (from @BotFather, example). The description adds minimal extra semantics beyond contextual usage. With 100% schema coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Connect a Telegram Bot (Bot API) channel') and the input ('from its bot token'). The emoji and 'When to use' section further clarify the purpose. It distinguishes from siblings as no other tool seems to handle Telegram bot connection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions prerequisites (bot created via @BotFather, token) and states that no Telegram user account is needed. Provides clear context for when to use, but does not explicitly state when not to use or list alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_add_fileAInspect

Add a file to a knowledge collection.

The file must be uploaded and indexed first (files_upload + files_ingest). If the file was previously removed, it is re-enabled.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to add (from files_upload)
`collection_id`	Yes	ID of the collection

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint false, destructiveHint false), description reveals re-enabling behavior and prerequisite steps. Adds meaningful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, 35 words, front-loaded with purpose. No redundant information. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers prerequisites, behavior, and basic operation. No output schema but tool is simple. Could mention success response, but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with clear descriptions. Description adds little parameter meaning but aligns well. Baseline score appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Add a file to a knowledge collection', distinguishing it from siblings like collections_remove_file. The resource and action are unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisite (file must be uploaded and indexed) and notes re-enabling behavior. Lacks explicit 'when not to use' but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_assign_agentAInspect

Assign a knowledge collection to an AI agent.

Once assigned, the agent's knowledge.query will automatically scope RAG search to files in its assigned collections.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the AI agent
`collection_id`	Yes	ID of the collection to assign

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutation (readOnlyHint=false). The description adds value by detailing the specific behavioral effect: scoping the agent's knowledge query to the collection's files. It does not mention potential side effects like overriding existing assignments.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise, with two sentences: the first stating the action and the second explaining the consequence. No superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple assignment tool, the description adequately explains the purpose and effect. It does not mention prerequisites like existence of collection/agent, but these are implied. The return value is not described but likely void.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with basic descriptions for both parameters. The description does not add extra meaning beyond what the schema already provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'assign a knowledge collection to an AI agent' and distinguishes from siblings like unassign. It also explains the effect on knowledge.query scoping.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to link a collection for RAG scoping) but does not explicitly state exclusions or alternatives, such as what happens if the agent already has collections assigned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_createAInspect

Create a named knowledge collection.

Collections group files for RAG search. After creating, add files with collections.add_file and assign to agents with collections.assign_agent.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	Collection name (must be unique per user)
`description`	No	Optional description of the collection

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false and destructiveHint=false. The description adds workflow context but no additional behavioral traits like rate limits or side effects. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences, front-loaded with the main action, and no redundant information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool with only 2 parameters and no output schema, the description sufficiently explains the creation step and hints at subsequent actions, providing complete context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and both parameters have clear descriptions in the schema. The tool description does not add further meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create' and the resource 'named knowledge collection'. It distinguishes itself from sibling tools like collections_add_file and collections_assign_agent by placing them as subsequent steps, establishing a clear purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for usage by indicating the workflow after creation ('add files... assign to agents'), but does not explicitly state when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_deleteA

DestructiveIdempotent

Inspect

Delete a knowledge collection.

If the collection is assigned to agents, prompts, or channels, pass force=true to delete anyway. CASCADE removes all assignments automatically.

ParametersJSON Schema

Name	Required	Description	Default
`force`	No	Force delete even if collection is in use. OMIT for the safe default (refuse to delete in-use collections).
`collection_id`	Yes	ID of the collection to delete

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds beyond annotations by explaining behavior when the collection is in use and the effects of force and cascade. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, highly efficient, with critical information (force/cascade) presented immediately after the purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and the simplicity of a delete operation, the description is sufficient. It addresses error conditions (in-use) and options meaningfully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although the schema fully describes parameters, the description adds valuable context for 'force' (safe default vs. force delete) and introduces 'CASCADE' as a concept, enhancing understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete a knowledge collection' with a specific verb and resource. It differentiates from sibling tools like collections_create, collections_list, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use force=true or CASCADE, providing context for common scenarios. However, it does not explicitly compare to alternatives like collections_unassign_agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_listA

Read-onlyIdempotent

Inspect

List all knowledge collections in the workspace.

Collections are named groups of files used for RAG search. Auto-created collections (per-agent, per-prompt) are hidden by default.

ParametersJSON Schema

Name	Required	Description	Default
`include_inactive`	No	Include inactive collections. OMIT to list only active collections (the default).

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint, so the tool is safe. The description adds value by explaining what collections are (RAG-related) and that auto-created ones are hidden by default, which clarifies the default behavior beyond the schema. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the action and provide essential context. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, the nature of collections, and default behavior. Since there is no output schema, mentioning the return format would improve completeness, but the tool name and context make it reasonably clear.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage for the single parameter include_inactive, with a clear description. The tool description adds no additional parameter semantics, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all knowledge collections in the workspace', specifying the verb (list) and resource (knowledge collections). It distinguishes from sibling tools like collections_create or collections_delete by focusing on listing. The additional context about collections being named groups for RAG search and auto-created collections hidden by default further clarifies scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to list collections) but does not explicitly state when not to use or suggest alternatives like collections_list_files. The note about hidden auto-created collections gives some context, but no direct guidance on filtering or comparing to other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_list_filesA

Read-onlyIdempotent

Inspect

List all files in a knowledge collection with their indexing status and chunk counts. Each returned file has a file_id (integer) that can be passed to messages.send as attachments=[file_id] to send the file to a contact, or to files.read to read its text content.

ParametersJSON Schema

Name	Required	Description	Default
`collection_id`	Yes	ID of the collection

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and not destructive. The description adds context about returned fields (indexing status, chunk counts) and the file_id's role, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences, front-loaded with core purpose, and no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple schema and no output schema, the description sufficiently covers purpose, output details, and file_id utility, making it complete for an agent to select and invoke.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100% with a clear description for collection_id. The description adds no new parameter semantics, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists files with indexing status and chunk counts, using a specific verb and resource. It differentiates from sibling tools like collections_add_file and mentions a key output field (file_id) for downstream use.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly guide when to use this tool versus alternatives. It mentions downstream uses but lacks exclusions or comparisons with other list tools like agents_list_files.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_remove_fileAInspect

Remove a file from a knowledge collection.

The file itself is not deleted — only the collection membership is removed.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to remove
`collection_id`	Yes	ID of the collection

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide destructiveHint=false, and the description adds behavioral context by explicitly stating the file itself is not deleted. This adds value beyond annotations, though it doesn't detail other traits like permissions or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The key information is front-loaded and easily digestible.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal operation without an output schema, the description fully covers the behavior (non-destructive removal) and all necessary context. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with clear descriptions for both required parameters. The description does not add extra semantic information beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Remove a file from a knowledge collection' and clarifies that only membership is removed, not the file itself. However, it does not explicitly distinguish from sibling tools like 'agents_remove_file', though context from the name helps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description implies the use case (removing a file from a collection without deleting it), but there is no explicit when-to-use or when-not-to-use information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_unassign_agentAInspect

Remove a knowledge collection from an AI agent.

The collection and its files are not deleted — only the agent assignment is removed.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the AI agent
`collection_id`	Yes	ID of the collection to unassign

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=false, but the description adds valuable context by specifying that the collection and its files are not deleted, only the assignment is removed. This clarifies the non-destructive behavior beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences: the first states the primary purpose, and the second clarifies what is not affected. Every word earns its place, and it is front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, and the description does not mention return values, success indicators, or error cases (e.g., unassigning an unassigned collection). Given the low complexity, it covers the main action but misses some behavioral details that could be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters (agent_id and collection_id). The description does not add further meaning to the parameters beyond what the schema already provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (remove a knowledge collection from an AI agent) and distinguishes it from deletion by noting that the collection and files are not deleted, only the assignment. This differentiates it from siblings like collections_delete and collections_assign_agent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for unassigning rather than deleting, but it does not explicitly state when to use this tool versus alternatives like collections_delete or collections_assign_agent. No guidance on prerequisites or context is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_add_channelAInspect

🔗 Link a new channel identity (email, phone, LinkedIn, etc.) to an existing contact.

When to use:

User learns a contact's email or phone and wants to save it
User wants to link a LinkedIn/Instagram profile to an existing contact
Adding a second channel for an existing person

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema

Name	Required	Description
`value`	Yes	Email address, phone number, or username for this channel
`channel`	Yes	Channel type to add
`contact_id`	Yes	entity_id from contacts.find
`display_name`	No	Optional display label for this identity

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description states it links a new channel, implying mutation. Annotations show readOnlyHint=false, destructiveHint=false, so non-destructive mutation. Description doesn't add detail on side effects or permissions but complements annotations well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three clear sentences plus bullet points for usage scenarios. Front-loaded with purpose, then when-to-use, then prerequisite. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description covers when to use, prerequisites, and parameter hints. Could mention return value (e.g., success or updated contact), but for a simple mutation it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description explains the purpose of contact_id (from contacts.find) and gives examples for value (email, phone, username). Adds context beyond enum and property descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it links a new channel identity to an existing contact. Verb 'Link' and resource 'contact' are specific. Distinguishes from sibling tools like contacts_find and contacts_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists scenarios when to use (learning email/phone, linking social profiles, adding second channel) and requires contact_id from contacts.find, providing clear prerequisites and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_discoverA

Read-onlyIdempotent

Inspect

Search for a contact on a live channel (Telegram, WhatsApp, etc.) before adding them. Use this to look up a person by username or phone number before calling contacts.sync. This is the right tool when asked to add or find a specific person by @username or phone (use contacts.sync afterwards to actually add them) — not group_discovery.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Username, phone, or name to search for
`channel`	Yes	Channel name: telegram, whatsapp, etc.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, openWorldHint, idempotentHint, destructiveHint=false. The description adds context about searching live channels and pre-sync usage, which aligns with and enriches the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The first sentence defines the core action and scope, the second provides usage guidance. Perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given rich annotations and full schema description, the description covers purpose and usage well. Missing explicit indication of return format, but the search nature and annotations fill the gap sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters have full descriptions in the schema (query: 'Username, phone, or name to search for', channel: 'Channel name: telegram, whatsapp, etc.'). The description does not add additional meaning beyond these descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Search for a contact on a live channel') and distinguishes it from sibling tools like contacts.sync by specifying the use case ('before adding them').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use this tool ('before calling contacts.sync') and provides context for lookup by username or phone. It lacks explicit exclusion criteria but is clear enough for decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_findA

Read-onlyIdempotent

Inspect

👤 Search for contacts in your address book by name or username.

When to use:

User asks 'find contact X' or 'who is Y?'
User wants to know someone's username or ID
Before sending a message to verify contact exists
To get contact's channel reference for messaging

Examples: ❓ User: 'find contact named [name]' → contacts_search(query='[name]', limit=5)

❓ User: 'who is [full name]?' → contacts_search(query='[full name]', limit=1)

❓ User: 'search for @username' → contacts_search(query='username', limit=10)

Returns: name, username, channel, channel_ref, similarity_score, match_type. Plus:

entity_id: local DB key — pass to contacts.profile. Null for live-discovered contacts (skip contacts.profile for those).
telegram_user_id (when channel='telegram'): the Telegram user ID — pass to calls.make / messages.send. NOT entity_id.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results to return
`query`	Yes	Name or username to search for (supports partial matches)
`channel`	No	Filter by channel. OMIT to search across all channels.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds significant behavioral context beyond annotations: explains return fields (entity_id and telegram_user_id) with specific guidance on using them with other tools. No contradiction with readOnlyHint, openWorldHint, etc.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with emoji summary, clear sections, and examples. No wasted sentences; each part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a search tool with no output schema. Covers when to use, return fields and their semantics (including behavior of entity_id for different match types), and examples.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds value by showing parameter usage in examples (e.g., query='[name]', limit=5), enhancing understanding of how to invoke the tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Search for contacts in your address book by name or username' with specific verb and resource. It distinguishes from sibling contact tools like contacts_profile or contacts_add_channel through examples and usage context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use' scenarios (e.g., user asks for a contact, before sending a message) and examples. Does not mention alternatives, but usage context is clear enough for correct selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_profileA

Read-onlyIdempotent

Inspect

👤 Get full profile for a contact: all channel identities, notes, role, capabilities, birthday.

When to use:

After contacts.find to get complete info about a specific person
To see all channels a contact is reachable on
To read notes, role, or capabilities for a contact

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema

Name	Required	Description	Default
`contact_id`	Yes	entity_id from contacts.find

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. The description adds value by detailing the returned data fields. It does not contradict annotations. No mention of error cases, but acceptable for a simple read tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise, using an emoji, bullet points, and front-loading key info. Every sentence adds value with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers purpose, usage, and expected return fields. It lacks explicit return structure but is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the only parameter. The description reinforces that contact_id comes from contacts.find, adding minimal extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves a full profile for a contact, listing specific types of information (channels, notes, role, etc.). It distinguishes itself from sibling tools like contacts.find by indicating it is used after that tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit when-to-use scenarios (after contacts.find, to see channels, etc.) and prerequisites (requires contact_id from contacts.find). However, it does not explicitly state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_syncAInspect

Add a discovered contact and open a conversation thread. Returns thread_id for the new conversation. Call contacts.discover first to verify the contact exists.

ParametersJSON Schema

Name	Required	Description	Default
`channel`	Yes	Channel name: telegram, whatsapp, etc.
`identifier`	Yes	Username or phone number to add

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-read-only and non-destructive behavior. Description adds that it opens a conversation and returns a thread_id, but doesn't detail side effects (e.g., duplicate handling). Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with main purpose, zero waste. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers key aspects: purpose, return value, prerequisite. Lacks edge cases (e.g., existing contact), but given simplicity and good schema/annotations, it's sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for both parameters. Description adds no extra meaning beyond schema. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (add discovered contact and open conversation), resource (contact/thread), and return value (thread_id). Distinguishes from siblings by referencing prerequisite contacts.discover.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisite (call contacts.discover first) and the action. Lacks explicit when-not-to-use or alternatives, but context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_updateAInspect

✏️ Update a contact's profile: name, notes, role, capabilities, birthday, preferred channel.

When to use:

User wants to add notes about a contact
User wants to set/update role or capabilities for a contact
User wants to rename a contact or update birthday

Requires contact_id (entity_id) from contacts.find. At least one optional field must be provided.

ParametersJSON Schema

Name	Required	Description
`role`	No	Contact role (e.g. developer, client, partner). Empty string clears role.
`notes`	No	Free-text notes/context about this contact. Empty string clears notes.
`contact_id`	Yes	entity_id from contacts.find
`birthday_day`	No	Birth day 1-31 (must be set together with birthday_month)
`capabilities`	No	List of capabilities (e.g. ['backend', 'design'])
`display_name`	No	New display name (max 255 chars)
`birthday_year`	No	Birth year 1900-2100 (optional, standalone)
`birthday_month`	No	Birth month 1-12 (must be set together with birthday_day)
`preferred_channel`	No	Preferred channel for contacting this person. OMIT to leave the preferred channel unchanged.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate mutation (readOnlyHint=false, destructiveHint=false). The description adds context that empty strings clear role/notes, and that at least one optional field must be provided. This goes beyond the annotations by clarifying update behavior and constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: a single introductory line followed by a bulleted 'When to use' section and a sentence about requirements. No redundant information; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 9 parameters (only 1 required) and no output schema, the description covers usage intent, prerequisites, and the constraint that at least one optional field must be provided. It is sufficiently complete for an update tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the 'at least one optional field' constraint not in the schema, and clarifies that omitting preferred_channel leaves it unchanged. This enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Update a contact's profile' and lists specific updatable fields, clearly indicating what the tool does. It distinguishes from sibling tools like contacts_find and contacts_profile by focusing on updating existing contacts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' scenarios (adding notes, setting role/capabilities, renaming, updating birthday) and states a prerequisite (contact_id from contacts.find). It lacks explicit when-not-to-use guidance but gives clear context for typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

documents_createAInspect

Render a document (PDF / HTML / PPTX / DOCX) and save it to the workspace.

This tool has two input pipelines — pass exactly one of content_html or content_markdown.

Pipeline A — `content_html` (canonical for decks, proposals, designed pages)

You author full HTML+CSS. A baked-in design-system preamble ships first (<style> with Inter/Manrope as data-URI fonts, CSS-variable palette tokens, 8px spacing scale, and pre-styled layout helpers); your markup and any of your own <style> blocks land after the preamble so you can override anything. Chromium renders the assembled document into a static PDF — JavaScript is disabled and DNS is blackholed, so external font / image / script fetches will fail by configuration.

Required when this pipeline is used:

title — human-readable, used for PDF metadata and the saved filename.
content_html — the <body> and any custom <style> blocks. The renderer wraps this in <html>…</html> and injects the preamble + a canonical <meta charset> + <title>. Do NOT emit <script>, <iframe>, <object>, <embed>, <meta>, <link>, <base>, <form>, or event handlers — the sanitizer strips them.
output_type — "pdf" or "html". ("pptx" and "docx" require content_markdown since they need structured markdown intermediates.)

Optional:

page_preset — "slide_16_9" (default for any deck), "a4" (default for flowing documents — used if omitted), "letter", or "none" (you declare your own @page rule).
design_tokens — flat dict overriding the preamble's CSS variables. Whitelisted keys: brand_primary, accent, surface_dark (hex color), font_display, font_body (font name from ['Inter', 'Manrope', 'monospace', 'sans-serif', 'serif', 'system-ui', 'ui-monospace', 'ui-sans-serif', 'ui-serif']).
language — BCP-47 tag (default "en"). Drives <html lang>.

Slide structure (`page_preset="slide_16_9"`)

Each slide is <section class="slide …">…</section>. The base .slide class is what sizes it to the viewport and forces the page break — do not drop it. Composable variants (apply alongside .slide):

.slide-cover — gradient hero, big display title.
.slide-split — two equal columns, image + narrative.
.slide-stats — three-up KPI cards (use <div class="stat"> with .stat-value + .stat-label inside).
.slide-quote — centered pull quote + <cite> attribution.

Layout helpers (work in any preset): .grid-2, .grid-3, .split, .stack, .cluster, .callout, .muted, .kbd.

Speaker notes

<aside class="notes">…text…</aside> inside a <section class="slide">. The sanitizer strips them from the rendered PDF and returns them as slide_notes[] (parallel to slide order). Orphan notes outside any slide are dropped with a warning.

Images

Only these src schemes resolve:

file:NNN — workspace file_id.
data:image/...;base64,... — inline.
https://<host> where <host> ∈ DOCUMENTS_MEDIA_URL_ALLOWLIST. Other URLs are dropped and replaced with an HTML comment placeholder.

Pipeline B — `content_markdown` (invoice / contract only)

Required:

title, content_markdown, output_type.

Optional:

theme — "invoice" or "contract". Triggers the corresponding exemplar styling and (for invoices) the arithmetic validator that fail-closes on missing or mismatched totals.
language — BCP-47 (default "en").

Delivery contract (CRITICAL)

After this tool returns file_id, deliver the file with messages.send(attachments=[file_id], text="<short caption>"). Embedding the file_id in a markdown link, sandbox: URL, or /api/files/<id>/download text will render as plain text on the recipient's channel — the attachments parameter is the only way the file actually attaches.

Exemplars

INVOICE (English):

Invoice INV-{YYYYMMDD-HHMMSS}

From: {Issuer Legal Name}, {Address}, {Tax ID} To: {Customer Name}, {Customer Address}, {Customer Tax ID} Issue date: {YYYY-MM-DD} Due date: {YYYY-MM-DD}

Description	Qty	Unit price	Total
{Service 1}	1	1500.00	1500.00
{Service 2}	2	500.00	1000.00

Subtotal: USD 2500.00 Tax (20%): USD 500.00 Total: USD 3000.00

Payment: {bank details OR crypto wallet — never both}

INVOICE (Russian):

Счёт-фактура № INV-{YYYYMMDD-HHMMSS}

От: {Юридическое название организации}, {Адрес}, ИНН {Tax ID} Кому: {Название клиента}, {Адрес клиента}, ИНН {Tax ID} Дата: {YYYY-MM-DD} Срок оплаты: {YYYY-MM-DD}

Описание	Кол-во	Цена	Сумма
{Услуга 1}	1	1500.00	1500.00
{Услуга 2}	2	500.00	1000.00

Подытог: USD 2500.00 НДС (20%): USD 500.00 Итого: USD 3000.00

Реквизиты: {банковские реквизиты ИЛИ криптокошелёк — не оба сразу}

CONTRACT (English):

Service Agreement

Between: {Provider Legal Name}, {Address} ("Provider") And: {Client Legal Name}, {Address} ("Client") Effective date: {YYYY-MM-DD}

1. Scope of services

{Concise description of what Provider agrees to deliver.}

2. Term

This Agreement begins on the Effective date and continues until {termination condition or end date}.

3. Compensation

Client pays Provider {amount and currency} according to {payment schedule}.

4. Confidentiality

Both parties agree to keep proprietary information of the other party confidential during and after the term of this Agreement.

5. Termination

Either party may terminate with {N} days' written notice.

6. Governing law

{Jurisdiction}.

Provider: ____________________ Client: ____________________ {Provider signatory name} {Client signatory name}

CONTRACT (Russian):

Договор оказания услуг

Между: {Юридическое название Исполнителя}, {Адрес} ("Исполнитель") И: {Юридическое название Заказчика}, {Адрес} ("Заказчик") Дата вступления в силу: {YYYY-MM-DD}

1. Предмет договора

{Краткое описание услуг, которые Исполнитель обязуется оказать.}

2. Срок действия

Договор вступает в силу с указанной даты и действует до {условие прекращения или дата окончания}.

3. Стоимость и порядок оплаты

Заказчик оплачивает услуги Исполнителя в размере {сумма и валюта} в порядке {график платежей}.

4. Конфиденциальность

Стороны обязуются сохранять конфиденциальность сведений, полученных в ходе исполнения настоящего Договора, в течение срока его действия и после его прекращения.

5. Расторжение

Любая из сторон вправе расторгнуть Договор, направив письменное уведомление не менее чем за {N} дней.

6. Применимое право

{Юрисдикция}.

Исполнитель: ____________________ Заказчик: ____________________ {ФИО подписанта Исполнителя} {ФИО подписанта Заказчика}

ParametersJSON Schema

Name	Required	Description	Default
`theme`	No	Invoice or contract styling for content_markdown. Rejected with content_html (use design_tokens + your own CSS instead). OMIT for default (unthemed) styling.
`title`	Yes	Short human-readable title for the document.
`language`	No	BCP-47 language tag (e.g. 'en', 'ru', 'zh', 'ja'). Drives <html lang> and (markdown path) font fallback for non-Latin scripts.	en
`output_type`	Yes	Renderer target: 'pdf' \| 'pptx' \| 'docx' \| 'html'. PPTX/DOCX require content_markdown.
`page_preset`	No	Page geometry for content_html. 'slide_16_9' = 1280x720 deck, 'a4'/'letter' = flowing document, 'none' = LLM declares its own @page. Defaults to 'a4' inside the html branch when omitted. Rejected with content_markdown.
`content_html`	No	Full HTML body (with optional <style> blocks) for the canonical Chromium pipeline. Mutually exclusive with content_markdown.
`design_tokens`	No	Flat dict of CSS-variable overrides for content_html. Whitelisted keys: brand_primary, accent, surface_dark (hex color), font_display, font_body (Inter\|Manrope\|system-ui\|ui-sans-serif\|ui-serif\|ui-monospace\|sans-serif\|serif\|monospace). Unknown keys / invalid values are dropped with a warning. Rejected with content_markdown.
`content_markdown`	No	Markdown body for the invoice/contract pipeline. Mutually exclusive with content_html.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses behavioral traits such as that it returns a file_id requiring attachment via messages_send, rejection scenarios, and invoice validation. Annotations already indicate non-readonly and non-destructive, and the description adds depth without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (purpose, required, optional, delivery contract, conventions, examples). Every sentence adds value for a complex tool. Appropriate length given the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and complex interactions, the description is extremely complete. Covers all parameters, edge cases, delivery instructions, and provides templates for invoices and contracts. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema coverage, the description adds significant meaning: parameter interactions (engine+format rules), theme effects, language font fallback, and content_markdown syntax. Goes well beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates documents in multiple formats from markdown content, specifying the verb 'Generate' and the resource 'document'. It distinguishes from sibling tools by focusing on document creation, which is unique among the listed siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use each parameter, required vs optional, format/output_type rules, and even delivery contract. Includes examples and discusses rejected combinations like document+pptx, offering clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

feedback_saveAInspect

Save a behavioral rule, preference, or correction that should guide future agent behavior. Use this when the user gives explicit guidance like 'always reply in Russian', 'don't suggest meetings before 11am', or 'invoice link goes via email, not chat'. Structure the rule as: the rule itself, why it matters (if stated), and how to apply it. Scope: 'workspace' for org-wide rules, 'agent' for per-agent overrides, 'person' for per-contact preferences. Prefer feedback.save over notes.save for anything that's instructive rather than informational.

ParametersJSON Schema

Name	Required	Description
`key`	Yes	Short identifier for this rule (e.g. 'reply_language', 'meeting_hours'). Must not start with '__' (reserved).
`why`	No	Why this rule matters (optional but recommended for the distiller).
`rule`	Yes	The rule itself, in imperative form. Required.
`scope`	Yes	Scope of the rule. 'workspace' for org-wide rules; 'agent' for per-agent overrides; 'thread' for conversation-specific guidance; 'person' for per-contact preferences. 'global' accepted as deprecation alias for 'agent'.
`how_to_apply`	No	When/how to apply the rule (optional). Helpful for conditional rules like 'apply when speaking to Russian-speaking customers'.
`scope_ref_id`	No	Required for scope='thread' (thread_id) and scope='person' (person_id).
`target_agent_id`	No	Target agent. In agent mode optional (defaults to self); required from MCP. Ignored when scope='workspace'.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are neutral (readOnlyHint=false, destructiveHint=false), and the description explains the tool creates behavioral guidance for future agent actions. It adds context about how the rule will be used (by distiller) and what types of input are expected. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a brief scope list cover the essential points: purpose, usage, structure, and sibling differentiation. Every sentence adds value; no fluff. Key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, usage, structure, scopes, and sibling differentiation. However, it does not mention 'target_agent_id' parameter, though it's described in schema. Still, for a save tool with good schema coverage, the description is nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description adds significant value by explaining the intention behind 'key', 'why', 'how_to_apply', and examples for 'scope' values. It elaborates on the rule structure and usage beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it saves behavioral rules/preferences/corrections for future agent behavior, provides concrete examples, and explicitly distinguishes from notes.save as 'instructive rather than informational'. This differentiates it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when the user gives explicit guidance...' and advises 'Prefer feedback.save over notes.save for anything instructive rather than informational.' It also explains how to structure the rule and scope options, providing clear when-to-use and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_get_base64A

Read-onlyIdempotent

Inspect

Download one or more files server-side and return their content as base64-encoded strings. Use this to inspect images, PDFs, or any binary file attached to messages when you cannot access presigned S3 URLs directly. Supports up to 5 files per call, max 15 MB each. For large files batch in groups of 1-2 to avoid oversized responses.

ParametersJSON Schema

Name	Required	Description	Default
`file_ids`	Yes	List of file IDs to fetch as base64 (max 5). Get IDs from files.info or message attachment_file_ids.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds context about server-side download, base64 encoding, and size limits, which are not in annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences that are front-loaded: purpose, use case, constraints, advice. Every sentence adds essential information with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a simple tool with one parameter, comprehensive annotations, and detailed description covering when, how, and limits, the description is fully adequate for correct agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with description for file_ids. The description adds value by explaining where to get the IDs (files.info or message attachment_file_ids), which goes beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'download', resource 'files', and return format 'base64-encoded strings'. It also distinguishes from siblings by referencing presigned S3 URLs, which differentiates it from other file retrieval tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use case ('when you cannot access presigned S3 URLs directly') and constraints (5 files, 15 MB each, batching advice). However, it does not explicitly state when not to use or name alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_infoA

Read-onlyIdempotent

Inspect

Get metadata and download URLs for files by their IDs.

When to use:

After messages_read_history returns attachment_file_ids
To get a presigned download URL to read a received file

Returns: filename, mime_type, byte_size, download_url (1-hour presigned URL).

ParametersJSON Schema

Name	Required	Description	Default
`file_ids`	Yes	List of file IDs (max 20)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds behavioral details beyond annotations: it specifies the return value includes filename, mime_type, byte_size, and a 1-hour presigned download URL, which is crucial for agent understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, using four short lines to convey purpose, usage, and return values. Every sentence adds value with no redundancy or wordiness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the presence of comprehensive annotations, and full schema coverage, the description is complete. It explains when to use it and what is returned, leaving no gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single parameter 'file_ids'. The description does not add additional semantics beyond 'List of file IDs (max 20)', so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get metadata and download URLs for files by their IDs', providing a specific verb and resource. It does not explicitly differentiate from sibling tools like files_read or files_get_base64, but the purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit usage guidance: 'After messages_read_history returns attachment_file_ids' and 'To get a presigned download URL'. It lacks mention of when not to use or alternatives, but the provided context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_ingestAInspect

Save and index a file into the knowledge base. Use this when the user asks to save, store, or remember a document. The file will be processed (OCR if needed) and indexed for future search.

ParametersJSON Schema

Name	Required	Description
`tags`	No	Optional list of tags for categorization (e.g., ['presentation', 'dextrade']).
`title`	No	Human-readable title for the file (e.g., 'Project Presentation', 'Q1 Report'). If not provided, uses original filename.
`file_id`	Yes	ID of the file to ingest (from attachment_file_ids in context).
`thread_id`	No	Optional thread ID to associate the file with. If not provided, uses context thread.
`description`	No	Optional description of the file contents.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=false) already indicate mutability, but the description adds valuable context: the file is processed, OCR may be applied, and it is indexed for future search. This goes beyond annotations in disclosing side effects like indexing and processing behaviors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each serving a distinct purpose: stating the core action, providing usage guidance, and explaining the processing. No filler words or redundancy, making it front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the purpose, usage context, and processing steps. However, it does not mention prerequisites (e.g., that the file must already exist via file_id) or specify the return value (absence of output schema). It is nearly complete for a straightforward ingest tool but could hint at expected outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all 5 parameters, providing clear definitions. The tool description does not add additional parameter insights beyond the schema, so it meets the baseline of 3 as per the high coverage guideline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Save and index a file into the knowledge base,' specifying the action and resource. It distinguishes from sibling tools like files_upload (which does not imply indexing) and files_read by focusing on saving and remembering. The verb 'ingest' with indexing is unique among file-related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when the user asks to save, store, or remember a document,' providing clear context for invocation. However, it does not explicitly state when not to use it or mention alternative tools for other file operations, missing a chance to guide the agent away from inappropriate uses.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_readA

Read-onlyIdempotent

Inspect

Read text content of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use files.get_base64, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run files.ingest first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to read (from attachment_file_ids in context).
`encoding`	No	Text encoding to use (default: utf-8).	utf-8
`max_chars`	No	Maximum characters to return (default: 10000). Use smaller values for large files.
`summarize`	No	If true, generate AI summary instead of returning raw content. Use for 'summary', 'summarize', 'краткое содержание' requests. OMIT to return raw content (the default).

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and non-destructive. The description adds context: PDFs require prior ingestion, binary files cause errors, and the summarize option exists. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose and provides essential guidance in a structured manner. While slightly verbose, every sentence adds value, balancing completeness with brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (file type restrictions, error handling, multiple parameters, no output schema), the description is fully complete. It covers supported types, prerequisites, alternatives, error behavior, and parameter usage, leaving no ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value beyond parameter descriptions: it explains the summary parameter triggers on specific requests, and advises using smaller max_chars for large files. This enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads text content of attached files and lists supported types (.txt, .md, .json, code, PDFs). It explicitly distinguishes from sibling tools for binary files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use and when-not-to-use guidance: for images use files.get_base64, for audio/video it cannot transcribe, and for non-PDF documents run files.ingest first. It also warns that calling on binary returns an error.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_uploadAInspect

Upload a file to DialogBrain and get a file_id for use in messages_send.

When to use:

User wants to send a file/image to a contact
Before calling messages_send with an attachment

Returns: file_id (integer) to pass to messages_send attachments parameter.

ParametersJSON Schema

Name	Required	Description	Default
`title`	No	Optional display title
`content`	No	Base64-encoded file bytes. Either content OR source_url is required.
`filename`	No	Filename with extension (e.g. 'photo.png')	upload
`mime_type`	No	MIME type (e.g. 'image/png', 'application/pdf')	application/octet-stream
`source_url`	No	Public URL to fetch file from. Either content OR source_url is required.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are present: readOnlyHint=false (write operation), destructiveHint=false, idempotentHint=false. The description adds context by revealing the return value (file_id integer) and its purpose, which complements the annotations. No contradictions, and the description provides useful behavioral insight beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with three distinct sections: overall action, when to use, and return value. No redundant information. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 5-parameter tool with no output schema, the description adequately explains the workflow and return value. It does not cover possible errors or limitations, but it provides sufficient context for the primary use case (upload before sending). The schema compensates for parameter details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers all 5 parameters with descriptions (100% coverage). The description does not add additional meaning to the parameters beyond what the schema provides. The return value is mentioned but not the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'upload a file', the resource 'DialogBrain', and the purpose 'get a file_id for use in messages_send'. It distinguishes this tool from siblings like files_get_base64 or files_read by specifying its role as a prerequisite for sending messages with attachments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('User wants to send a file/image to a contact', 'Before calling messages_send with an attachment') and provides a clear usage pattern. However, it does not mention when not to use it or compare to alternatives like files_ingest.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_createAInspect

📁 Create a new inbox folder to organize threads.

When to use:

User wants to create a folder to group related conversations
User wants to organize threads by topic, project, or contact type

After creating a folder, use threads.update with folder_id to move threads into it.

ParametersJSON Schema

Name	Required	Description	Default
`icon`	No	Emoji icon for the folder (max 10 chars, optional)
`name`	Yes	Folder name (max 100 chars)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-readonly and non-destructive. The description adds value by explaining the folder is for inbox threads and that after creation, threads must be moved via threads.update – behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by concise usage guidelines and a helpful next-step hint. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple creation tool with no output schema, the description covers usage and follow-up actions but omits return value (e.g., folder ID) and error conditions. Adequate but incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add meaning beyond the schema's parameter descriptions (name max 100 chars, optional icon with emoji and max 10 chars).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a new inbox folder to organize threads. It uses a specific verb ('Create') and resource ('folder'), and distinguishes it from sibling tools like folders_delete by focusing on creation and providing follow-up steps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use (group conversations, organize threads) and suggests a next step (threads.update). It does not explicitly exclude alternatives or provide a when-not-to-use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_deleteAInspect

🗑️ Delete an inbox folder. Threads inside become unfiled (not deleted).

When to use:

User wants to remove a folder they no longer need
User wants to clean up their inbox organization

Threads inside the folder are NOT deleted — they simply move back to the inbox.

ParametersJSON Schema

Name	Required	Description	Default
`folder_id`	Yes	ID of the folder to delete

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that threads become unfiled, adding value beyond annotations which only indicate non-destructive. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short, uses an emoji, and bullet points for clarity. The 'When to use' section is helpful but slightly verbose; overall well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input schema, lack of output schema, and straightforward behavior, the description provides all necessary context including side effects (unfiling threads). Complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter folder_id. The description does not add additional meaning beyond the schema, justifying a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool deletes an inbox folder and specifies that threads inside become unfiled, not deleted. This is a specific verb+resource and distinguishes it from related tools like folders_create.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases: user wants to remove a folder or clean up inbox organization. It also clarifies that threads are not deleted, but does not mention when not to use or compare to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_addAInspect

Add a specific group to your discovery list by @username or invite link (t.me/...).

Groups and channels only — this does NOT add an individual person/contact. To add a person by @username (e.g. a customer or lead), use contacts.discover then contacts.sync instead.

When to use:

You already know the group's @username or invite link
Adding a known group without searching

Returns: group metadata including id, title, member_count.

ParametersJSON Schema

Name	Required	Description	Default
`link`	Yes	The group's @username or invite link (e.g. '@phuket' or 't.me/...')
`channel`	Yes	Channel the group is on (e.g. 'telegram')

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate write operation (readOnlyHint: false) and non-destructive (destructiveHint: false). Description adds that it returns group metadata (id, title, member_count), providing useful context beyond annotations. It does not elaborate on potential side effects like duplicates or limits, but is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (5 sentences) with clear bullet points for usage and returns. It is front-loaded with the primary action and efficiently conveys essential information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, no output schema), the description covers purpose, usage constraints, and return format. It lacks mention of idempotency or error states, but annotations already indicate non-idempotent. Overall, it is sufficiently complete for effective tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for both parameters. The description adds examples ('@phuket' for link, 'telegram' for channel) but does not significantly extend beyond schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Add a specific group to your discovery list by @username or invite link'. It specifies the resource (group) and method, and distinguishes from adding individuals by mentioning it does not add contacts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use (known @username or invite link, adding known group without searching) and when not to use (for individuals), with a clear alternative: 'use contacts.discover then contacts.sync instead'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_add_memberAInspect

Add a member to an existing group on Telegram or WhatsApp.

What this does:

Adds the specified member to the group
Resolves the member by username, phone number, or JID
Reports if the member is already in the group

Returns: success, chat_id, member, already_member.

ParametersJSON Schema

Name	Required	Description
`member`	Yes	The member to add (format depends on channel: @username on Telegram, phone on WhatsApp)
`channel`	Yes	Channel where the group exists (e.g., 'telegram', 'whatsapp')
`chat_id`	Yes	ID of the group/channel to add the member to

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral insights beyond annotations, such as member resolution by username/phone/JID and reporting if the member is already in the group. Annotations indicate non-read-only and non-destructive, which is consistent with the description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with a clear structure: a one-line summary, bullet points for key behaviors, and a return values line. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, behaviors, parameter context, and return values. It lacks discussion of error cases or edge conditions, but given the tool's moderate complexity (3 parameters), it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for all parameters. The description adds minor context by explaining how member is resolved, which complements the schema's format hints. No significant extra meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add a member'), the resource ('to an existing group'), and the platforms ('Telegram or WhatsApp'). It distinguishes from sibling tools like group_create and group_join by specifying the function is for adding to an existing group.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use: when you need to add a member to an existing group. It provides context about member resolution and duplicate detection, but does not explicitly exclude cases or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_createAInspect

Create a new group on a channel (Telegram or WhatsApp). Returns the new group's chat ID and invite link.

What this does:

Creates a new group with the specified title
Returns chat_id, invite_link, and channel_ref for further operations
Optionally registers the group in your inbox for monitoring

Returns: success, chat_id, channel_ref, title, thread_id.

ParametersJSON Schema

Name	Required	Description	Default
`about`	No	Optional description or about text for the group
`title`	Yes	Title/name of the group to create
`channel`	Yes	Channel to create the group on (e.g., 'telegram', 'whatsapp')
`group_type`	No	Type of group to create. Options: 'supergroup' (default), 'basic'. Telegram-only; ignored on WhatsApp.	supergroup
`register_in_inbox`	No	Auto-register the created group in your inbox for monitoring. Default: true.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations by detailing the return values (chat_id, invite_link, channel_ref) and the optional inbox registration behavior. It does not mention auth requirements or rate limits, but for a creation tool, it provides adequate behavioral insight.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with a clear one-sentence summary followed by bullet points. It is front-loaded with the core action and returns, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 5 parameters and no output schema, the description covers essential aspects: purpose, behavior, and return values. It could mention prerequisites (e.g., connected channels) but remains adequately complete for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter has a description. The description adds value by contextualizing the 'channel' parameter as 'Telegram or WhatsApp' and listing return fields not present in the schema. This enhances understanding beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Create a new group on a channel (Telegram or WhatsApp),' clearly identifying the action and resource. It distinguishes from sibling tools like group_add, group_join, and group_list, which serve different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage—use this tool to create a new group—but does not explicitly state when not to use it or mention alternatives. However, the purpose is clear enough that an agent can infer appropriate usage from the context of sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_joinAInspect

Join a group and start syncing its messages to your inbox. The group must be in your discovery list (use group.search or group.add first).

What this does:

Joins the group on Telegram (or other channel)
Creates a thread in your inbox for syncing messages
Optionally enables AI auto-reply drafts

Returns: success, thread_id, auto_reply_enabled.

ParametersJSON Schema

Name	Required	Description	Default
`group_id`	Yes	ID of the discovered group (from group.search or group.list)
`enable_auto_reply`	No	Enable AI auto-reply drafts for messages in this group. Drafts can be reviewed and sent manually. Default: true.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations show readOnlyHint=false and destructiveHint=false. The description adds context by detailing side effects: creates a thread, optionally enables auto-reply drafts with manual review. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a lead sentence, bullet list of actions, and return values. Two paragraphs and a bullet list are concise and easy to parse. Could be slightly more concise but efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 2 parameters with full schema coverage, no output schema, and good annotations, the description covers prerequisites, actions, side effects, and return values. It does not mention error conditions or rate limits, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. The description adds context for group_id (source) and enable_auto_reply (behavior), but schema already defines them well. No significant extra value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Join a group and start syncing its messages to your inbox.' It specifies the action (join), resource (group), and effect (sync messages), distinguishing it from siblings like group_search or group_add.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisite: 'The group must be in your discovery list (use group.search or group.add first).' This guides the agent on when to use this tool versus others. Also mentions optional auto-reply.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_listA

Read-onlyIdempotent

Inspect

List groups you've found and joined in this workspace.

Lifecycle values:

discovered: found but not yet evaluated
bookmarked: saved for later
monitored: joined and actively syncing messages
dismissed: hidden

By default, dismissed groups are excluded. Returns: id, title, member_count, lifecycle, scan_status, overall_score.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results (1-100, default 20)
`offset`	No	Pagination offset. OMIT to start at row 0 (default).
`channel`	No	Filter by channel (e.g. 'telegram'). Optional.
`lifecycle`	No	Filter by state: discovered, bookmarked, monitored (=joined/syncing), dismissed. OMIT to include all states (dismissed excluded by default elsewhere).
`min_score`	No	Minimum overall score (0.0-1.0). Optional.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds lifecycle details, default filtering, and return fields, enriching behavioral context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise (a few sentences), front-loaded with purpose, and efficiently covers lifecycle, defaults, and return fields without wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description lists return fields and explains all key aspects: purpose, lifecycle, filtering, defaults. Complete for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. Description does not add additional meaning beyond schema; baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List groups you've found and joined in this workspace,' specifying verb and resource. It distinguishes from sibling tools like group_search, group_create, etc., by focusing on listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explains default exclusion of dismissed groups and lifecycle states, giving some usage context. However, it lacks explicit when-to-use vs. alternatives (e.g., group_search).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_preview_messagesA

Read-onlyIdempotent

Inspect

Read recent public messages from a group without joining it. Only works for groups where can_preview_history=true.

Use this to manually evaluate message quality before deciding to join. For an automated quality score, use group.scan instead.

Returns: list of recent messages with sender, text, date, is_reply.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Number of recent messages to fetch (1-100, default 20)
`group_id`	Yes	ID of the discovered group (from group.search or group.list)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds the prerequisite condition (can_preview_history=true) and lists return fields (sender, text, date, is_reply). No contradictions. The condition is important behavioral info that supplements annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each with a clear purpose: main action, usage guidance, and return info. No unnecessary words. Front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 parameters, no output schema), the description covers the main purpose, condition, usage, and return format. Could mention ordering or pagination, but schema already documents limit. Annotations handle safety. Sufficient for a read-only preview tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described (group_id and limit with default/max/min). Description does not add extra meaning beyond schema beyond noting that group_id comes from group.search/group.list. Baseline of 3 is appropriate as schema already handles parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (read), resource (public messages), and condition (without joining, and only for groups with can_preview_history=true). It explicitly distinguishes from sibling tools like group_scan and group_join.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly advises to use this tool for manual evaluation before joining, and points to group_scan for automated scoring. Provides clear when-to-use and when-not-to-use guidance with a named alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_promote_adminAInspect

Promote a member to admin in an existing group on Telegram or WhatsApp.

What this does:

Gives the specified member admin status in the group
On Telegram, this grants visibility of all group messages (even if not a bot)
Defaults to minimal/empty rights; specify custom rights if needed

Returns: success, chat_id, member.

ParametersJSON Schema

Name	Required	Description
`member`	Yes	The member to promote (format depends on channel: @username on Telegram, phone on WhatsApp)
`rights`	No	Optional admin rights dict (Telegram-specific). If not provided, defaults to minimal/admin status only. Example: {"post_messages": true, "edit_messages": true}
`channel`	Yes	Channel where the group exists (e.g., 'telegram', 'whatsapp')
`chat_id`	Yes	ID of the group/channel where the member will be promoted

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint false, destructiveHint false), the description explains that the tool grants admin status, and on Telegram it grants visibility of all messages. It also details default rights behavior. However, it doesn't discuss permission requirements or potential side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear title, 'What this does' section, and return values. It is succinct but informative, though it could be slightly more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description lists return fields (success, chat_id, member). It covers the main action and key behavioral notes. However, it omits error conditions, preconditions (like being admin), and rate limits, which would enhance completeness for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds value: it specifies the format of 'member' depending on channel (e.g., @username for Telegram) and explains the optional 'rights' parameter with an example. This supplements the schema's basic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it promotes a member to admin in an existing group on Telegram or WhatsApp, with specific verb 'promote' and resource 'member to admin'. It distinguishes from sibling tools like group_add_member, which only adds a member, and group_create, which creates a new group.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear context: 'existing group', supports Telegram and WhatsApp, and advises on custom rights. It doesn't explicitly state when not to use this tool or mention alternatives, but the guidance on default minimal rights implies when to provide optional rights.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_scanAInspect

Scan a group to evaluate its quality before joining. Fetches recent messages, analyzes activity, spam, and engagement, then returns a quality score and plain-English verdict.

When to use:

After finding groups with group.search
Before deciding which groups to join

Returns: overall_score (0-1), is_disqualified, disqualify_reasons, individual scores, and a verdict string.

ParametersJSON Schema

Name	Required	Description	Default
`group_id`	Yes	ID of the discovered group (from group.search or group.list)

Tool Definition Quality

A4.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide basic safety cues, but description does not disclose potential side effects or resource usage; it only describes the analysis and return values.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, well-structured with bullet points, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers purpose, usage, and return fields comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter group_id is fully described in the schema; the description adds context by specifying the source (from group.search or group.list).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it scans a group to evaluate quality before joining, and distinguishes from sibling tools like group_search and group_join.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'When to use' section tells to use after group.search and before deciding to join, providing clear guidance over alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_searchAInspect

Search for public groups or channels by topic on Telegram (or other channels). Returns matching groups with title, member count, and whether messages can be previewed.

Finds public groups/channels by topic — NOT individual people. To find or add a specific person by @username, use contacts.discover / contacts.find instead.

When to use:

Finding groups related to a topic or niche
Building a list of groups for outreach or monitoring

After searching, use group.scan to evaluate quality before joining.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results to return (1-50, default 20)
`channel`	Yes	Channel to search on (e.g. 'telegram')
`keywords`	Yes	Search keywords or phrase (e.g. 'crypto trading signals')

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description does not mention any behavioral traits beyond the search operation. However, annotations indicate readOnlyHint=false, suggesting potential side effects, yet the description gives no hint of such. It fails to disclose whether the search consumes resources, logs activity, or has any non-idempotent behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise at four sentences, but has some redundancy (repeats 'public groups or channels' and the 'NOT individual people' point). It uses a bullet structure for 'When to use' which aids readability, but could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 3-parameter search tool with no output schema, the description adequately covers the return fields (title, member count, preview info) and usage context. However, it does not specify ordering, filtering beyond keywords, or behavior when no results are found.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so descriptions exist for all parameters. The description adds context about the return type (title, member count, preview info) but does not add meaning beyond the schema for the parameters themselves. Baseline of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches for public groups or channels by topic, specifying the resource (groups/channels) and the action (search). It distinguishes itself from sibling tools by explicitly stating it does NOT find individual people, and directs to contacts.discover/contacts.find for that purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool (finding groups related to a topic or building lists for outreach) and when not to (finding individuals). It also suggests a follow-up action: using group.scan to evaluate quality before joining.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

images_generateAInspect

Generates a PNG image from a text prompt using Gemini 2.5 Flash Image. Returns a file_id consumable by messages.send(attachments=[...]) and other file-aware tools. Supports up to 12 reference image file_ids for subject-consistent edits and composition (use file IDs from the [ATTACHMENTS] block, files.search, or search.files). Latency: ~8-10s per image. Output: 1024×1024 PNG.

ParametersJSON Schema

Name	Required	Description	Default
`prompt`	Yes	Text description of the image to generate (3-4000 chars).
`aspect_ratio`	No	Output aspect ratio.	1:1
`reference_file_ids`	No	Optional list of up to 3 file_ids whose images should be used as visual references (for edits, subject consistency, or composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif).

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide limited behavioral cues (all false). The description adds latency (~8-10s), output dimensions (1024x1024 PNG), and reference file support. These are valuable beyond annotations and help set expectations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences efficiently convey purpose, output usage, reference support, latency, and output size. No fluff; front-loaded with main action. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a generation tool without output schema, the description covers what it does, output format, integration points, reference usage, latency, and resolution. It is sufficiently complete for an AI agent to decide and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3, but the description states 'Supports up to 12 reference image file_ids' while the schema explicitly says 'up to 3 file_ids'. This contradiction reduces reliability. Description adds some context on how to obtain file IDs but the discrepancy hurts the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states that the tool generates a PNG image from a text prompt using Gemini 2.5 Flash Image. It clearly distinguishes from siblings like images_search (searching) and videos_generate (video generation) by specifying output type and usage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that the output file_id can be consumed by messages.send and other file-aware tools, and provides context for using reference file IDs with sources. It lacks explicit when-not-to-use or alternatives, but the guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

images_searchA

Read-onlyIdempotent

Inspect

Searches images in this workspace by visual content using vector embeddings (Voyage multimodal-3). Pass a text description; returns ranked file_ids with cosine scores and presigned download URLs. Up to 50 results.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max number of results.
`query`	Yes	Text description of what you're looking for (3-4000 chars).
`mime_type`	No	Optional — restrict to a specific image MIME (e.g. "image/png"). Filter is applied after RAG (same caveat as collection_id).
`collection_id`	No	Optional — restrict to images attached to this collection. Filter is applied after RAG, so you may get fewer than `limit` results; pass a larger limit to broaden if needed.
`score_threshold`	No	Minimum cosine similarity (0.0 returns all, higher = stricter).

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-destructive, read-only, and idempotent behavior. The description adds transparency by naming the embedding model (Voyage multimodal-3), specifying the return format (ranked file_ids with cosine scores and presigned URLs), and setting a result limit of 50. This provides useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at two sentences, front-loading the core purpose and then adding key details (embedding model, return values, limit). No unnecessary words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description adequately covers what is returned (ranked file_ids with scores and URLs) and the input (text description). It mentions the embedding model and result limit. It is complete enough for a search tool, though it could explicitly note that the search is semantic rather than keyword-based.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter described. The description adds semantic value by explaining that the search uses visual content via vector embeddings and that the query is a text description. It also highlights the result limit, which aligns with the 'limit' parameter's maximum.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches images by visual content using vector embeddings, takes a text description, and returns ranked file_ids with cosine scores and presigned URLs. It is specific and distinguishes itself from sibling tools like images_generate (creation) and vision_query (analysis).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for searching images by visual description but does not provide explicit guidance on when to use this tool versus alternatives like images_generate or vision_query. No when-not or exclusion criteria are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_list_mediaA

Read-onlyIdempotent

Inspect

List photos and Reels on the connected Instagram Business/Creator account. Returns id, caption, media_type, permalink, thumbnail_url, timestamp.

ParametersJSON Schema

Name	Required	Description	Default
`after`	No	Pagination cursor from a previous call's next_cursor.
`limit`	No	Page size, 1-50. Default 25.

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds the specific return fields but does not disclose additional behavioral traits beyond the annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is clear and front-loaded with the action. It could be slightly shorter but contains no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with no required parameters and good annotations, the description adequately covers the purpose and return fields. It could mention the account requirement but still is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters well-described in the input schema. The description does not add any extra meaning beyond the schema, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and the resource 'photos and Reels on the connected Instagram Business/Creator account', and specifies the fields returned (id, caption, etc.). It distinguishes itself from sibling write tools like instagram_publish_media and instagram_update_media.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives. It implies usage for listing media but lacks guidance on pagination, when not to use, or alternatives. The context of read-only vs write operations is implied by sibling names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_publish_mediaAInspect

Publish a photo (IMAGE) or video (REELS) from workspace files to a connected Instagram Business/Creator account. Returns media_id + permalink. Instagram allows ~25 publishes per day.

ParametersJSON Schema

Name	Required	Description	Default
`caption`	No	Post caption (max 2200 chars). OMIT to publish without caption.
`file_id`	Yes	Workspace files.id of the photo or video to publish.
`media_type`	No	'auto' (default, detects from mime), 'image', or 'reels'.	auto
`location_id`	No	Facebook Place ID for location tag.
`share_to_feed`	No	For Reels: also show on profile grid (default true).

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by mentioning the return values (media_id + permalink) and a rate limit. There is no contradiction with annotations. It does not cover auth requirements or error handling, but for a non-destructive mutation, the transparency is good.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence stating purpose and return values, plus one sentence on rate limits. No fluff, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers return values (media_id + permalink) and a key constraint (rate limit). It does not discuss authentication or error handling, but for a simple publish tool, the essential context is provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description does not significantly add meaning beyond the schema. It reiterates that the file comes from workspace files, but parameters like caption, media_type, location_id, and share_to_feed are already well-described in the schema. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'publish' and the resource 'photo (IMAGE) or video (REELS) to a connected Instagram account'. It specifies the return values (media_id + permalink) and distinguishes from siblings like youtube_upload_video or linkedin_add_comment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a rate limit hint ('~25 publishes per day') but does not provide explicit guidelines on when to use this tool versus alternatives or when not to use it. It lacks comparisons with sibling tools like instagram_list_media or instagram_update_media.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_update_mediaAInspect

Update the caption of a published Instagram photo or Reel. Only caption is editable after publish (Instagram limitation).

ParametersJSON Schema

Name	Required	Description	Default
`caption`	Yes	New caption (max 2200 chars).
`media_id`	Yes	Instagram media ID (from list_media or thread metadata).

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations (e.g., 'Only caption is editable after publish') without contradicting the provided readOnlyHint, destructiveHint, or idempotentHint flags. It does not cover additional traits like rate limits or authentication.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is exceptionally concise with two sentences, front-loaded with the core purpose, and contains no superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple update tool, the description covers the key behavioral constraint (Instagram limitation) and relies on the schema for parameter details. It lacks mention of return value or error scenarios, but this is acceptable given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with adequate parameter descriptions (max length for caption, source for media_id). The description does not add significant parameter-specific meaning beyond confirming the limitation, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action (update caption) and resource (Instagram photo or Reel), distinguishing it from sibling tools like instagram_list_media and instagram_publish_media by highlighting the platform-specific limitation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by noting the Instagram limitation (only caption editable after publish), which implicitly guides when to use this tool. However, it lacks explicit when-not-to-use instructions or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_completeAInspect

Mark the job as completed. This sanitizes PII from the context and records a completion summary. Use when all tasks in the job are done.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to complete
`summary`	No	Brief summary of what was accomplished

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-destructive behavior (destructiveHint=false). The description adds value by mentioning PII sanitization and recording a summary. However, it does not disclose potential side effects like irreversibility or whether the job can be modified after completion, limiting transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: three short sentences that front-load the main action, then add behavioral details and usage guidance. No unnecessary words or redundancy. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no output schema, two optional params), the description covers the essential aspects: purpose, key behaviors, and usage trigger. It could be improved by clarifying whether job_id is required or what happens if called on an already-completed job, but overall it is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for both parameters (job_id, summary). The description references the summary parameter ('records a completion summary') but does not add additional context beyond the schema. Hence, it meets the baseline but does not exceed it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's primary function ('Mark the job as completed') and highlights additional behaviors (sanitizes PII, records summary). While it doesn't explicitly distinguish from sibling tools like job_escalate, the action is specific enough for an agent to understand its purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool: 'Use when all tasks in the job are done.' This is a clear usage condition. However, it lacks information about when not to use or alternatives, which prevents a higher score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_escalateAInspect

Escalate the job to a human. Use when you cannot resolve an issue, someone is not responding, or a situation requires human judgment.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to escalate
`reason`	Yes	Why escalation is needed

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-readonly and non-destructive, but the description adds no additional behavioral context (e.g., what escalation entails, if irreversible, or notification details). The description carries minimal extra value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no fluff. The key information is front-loaded and every word is necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple escalation action with no output schema, the description is sufficient to guide usage. A minor gap could be explaining what happens after escalation (e.g., status change), but it's not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so both parameters already have clear descriptions. The tool description does not add further semantic meaning over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Escalate the job to a human' with specific verb and resource. Distinct from sibling tools like agent_handoff or agents_ask, which are different actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists three scenarios for use (cannot resolve, no response, human judgment). However, it does not provide negative guidance on when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_read_contextAInspect

Read the current job context. Returns the full state of your active job including assignments, escalations, and any data you previously stored.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to read

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains that the tool returns the full state including assignments, escalations, and previous stored data. This adds value beyond the annotations, which only indicate destructiveHint and idempotentHint as false. However, the readOnlyHint annotation is false despite the description stating 'Read', which is a contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that front-loads the purpose and includes key details. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the return value (full state, assignments, escalations, stored data) which is helpful since there is no output schema. It is sufficient for understanding what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% documentation coverage for the single optional parameter job_id. The description does not add any additional meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Read the current job context' with a specific verb and resource, and lists what is returned (assignments, escalations, stored data). This distinguishes it from sibling tools like job_update_context and job_complete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for reading context but does not explicitly state when to use this tool versus alternatives, nor does it provide any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_update_contextAInspect

Update the job context by merging new data. Existing keys are preserved unless explicitly overwritten. Use this to record progress, update assignment statuses, or store intermediate results.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to update
`updates`	Yes	Key-value pairs to merge into job context

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate mutation (readOnlyHint=false) but no destructive or idempotent hints. The description adds merge semantics (preserves keys unless overwritten), which is useful beyond annotations but does not cover error states or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey function, merge behavior, and usage advice without wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and lack of output schema, the description adequately explains the update operation and merge behavior. Minor omission: no mention of return value or error handling, but not critical for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions. The description adds context about merge behavior and key preservation, enhancing understanding beyond the schema itself.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates job context via merging, and provides example use cases. However, it does not explicitly differentiate from sibling tools like job_read_context or job_complete, which would strengthen purpose clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises when to use the tool (recording progress, updating statuses, storing results) but lacks explicit guidance on when not to use it or mention of alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_find_entityA

Read-onlyIdempotent

Inspect

Find an entity by name in the Knowledge Graph.

USE WHEN user mentions a person, project, company by name and you need:

To resolve a name to entity_id for subsequent queries
'Кто работает над X?' → find X first
'Расскажи про Y' → find Y first

RETURNS entity_id for use in kg.get_relationships or kg.explore. ALWAYS use this as the FIRST step in KG query chains.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Entity name to search for. Can be in any language (Russian, English, etc.) - transliteration is automatic.
`limit`	No	Maximum results to return (1-10). Default: 5
`entity_type`	No	Filter by entity type: - 'person': People, contacts - 'project': Projects, tasks - 'organization': Companies, teams - 'event': Meetings, deadlines - 'topic': Discussion topics - 'workspace': User's own facts (my/our company) OMIT to include all entity types.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already convey readOnly, idempotent, non-destructive behavior. The description adds context that it returns entity_id and is a prerequisite for subsequent KG queries, enriching transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five concise sentences with front-loaded purpose, clear examples, and workflow guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a lookup tool with well-documented schema and annotations, the description provides sufficient context: purpose, usage triggers, return value, and integration with sibling tools. Minor gaps like error handling are acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover all 3 parameters (100% coverage) with clear explanations. The tool description does not add new parameter info but reinforces usage context. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Find an entity by name in the Knowledge Graph' and provides specific usage examples like 'resolve a name to entity_id' and Russian queries, distinguishing it from sibling tools that require entity_id.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'USE WHEN' with concrete scenarios, advises 'ALWAYS use this as the FIRST step', and specifies that the output entity_id should be used with kg.get_relationships or kg.explore, leaving no ambiguity about when to invoke.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_get_relationshipsA

Read-onlyIdempotent

Inspect

Get relationships for a specific entity from Knowledge Graph.

USE WHEN:

'Кто работает над X?' - filter by works_on
'С кем общался Y?' - filter by discussed_with
'Кто из компании Z?' - filter by member_of
'Что связано с W?' - no filter, get all

REQUIRES: entity_id from previous kg.find_entity step. Use: {{step_N.entity_id}} where N is the find_entity step number.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum relationships to return (1-50). Default: 20
`direction`	No	Relationship direction: - 'outgoing': Entity → Others - 'incoming': Others → Entity - 'both': All relationships (default)	both
`entity_id`	Yes	Entity ID from kg.find_entity step. Use {{step_N.entity_id}} reference.
`relation_types`	No	Filter by relationship types (optional): People: works_on, works_for, member_of, manages, knows, client_of, provides_service Communication: discussed_with, participated_in, mentioned_in Org/Project: developed_by, funded_by, partnered_with, integrates_with, depends_on, part_of Document: issued_by, issued_to, signed_by, authored_by Other: uses, located_in, about, follows, owns, related_to

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false, so the description correctly adds dependency context (requires entity_id from find_entity) without contradiction. It does not need to repeat the read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is succinct, well-structured with bullet points and sections, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers usage context, dependencies, and filter options well. It lacks explicit return format, but given schema coverage and annotations, it is mostly complete. A minor gap is not describing the output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description ties parameters to use cases (e.g., works_on filter) but does not add new semantic information beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get relationships for a specific entity from Knowledge Graph' and provides concrete usage examples with filters, distinguishing it from the prerequisite tool kg_find_entity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use with specific filters and requires entity_id from a previous step, offering clear context. However, it does not explicitly state when not to use or list alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

knowledge_queryA

Read-onlyIdempotent

Inspect

Answer questions using knowledge base (uploaded documents, handbooks, files).

Use for QUESTIONS that need an answer synthesized from documents or messages. Returns an evidence pack with source citations, KG entities, and extracted numbers.

Modes:

'auto' (default): Smart routing — works for most questions
'rag': Semantic search across documents & messages
'entity': Entity-centric queries (e.g., 'Tell me about [entity]')
'relationship': Two-entity queries (e.g., 'How is [entity A] related to [entity B]?')

Examples:

'What did we discuss about the budget?' → knowledge.query
'Tell me about [entity]' → knowledge.query mode=entity
'How is [A] related to [B]?' → knowledge.query mode=relationship

NOT for finding/listing files, threads, or links — use search.files / search.threads / search.links for that.

ParametersJSON Schema

Name	Required	Description
`date_to`	No	Filter messages until this date (ISO format: YYYY-MM-DD).
`file_ids`	No	Specific file IDs to search within (for pinned files)
`question`	Yes	The question to answer from user's knowledge base. Required even for entity queries.
`date_from`	No	Filter messages from this date (ISO format: YYYY-MM-DD). Use for time-based queries like 'this week', 'last month'.
`thread_id`	No	Limit search to a specific thread/chat
`max_sources`	No	Maximum number of sources to consider (1-10)
`needs_aggregation`	No	True if query asks for totals/sums/counts.
`include_relationships`	No	Include KG relationships in answer (default: true for entity mode)

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the safety profile is clear. Description adds that it returns an evidence pack with source citations, KG entities, and extracted numbers, which is valuable behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-organized with clear sections (purpose, when to use, modes, examples, boundaries). Every sentence adds value without redundancy. Front-loaded with core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive for a query tool with 15 parameters and no output schema. Covers modes, examples, and exclusions. Could mention pagination or max tokens, but the max_sources parameter is documented in schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for all 15 parameters. The description additionally explains mode semantics and provides examples that go beyond the enum values, adding meaning for parameter selection.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Answer questions using knowledge base' and provides examples distinguishing from workspace.search. The verb 'answer' and resource 'knowledge base' are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says NOT for finding/listing files, threads, or links - use workspace.search. Provides mode selection guidance with examples, helping the agent decide when to use different modes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_add_commentAInspect

Add a comment to a LinkedIn post. Use post_id from search results or thread data.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Comment text to post
`post_id`	Yes	LinkedIn post/activity ID (from search results or thread metadata)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a write operation (readOnlyHint=false). Description adds minor context on post_id source but does not detail further behavioral aspects like authorization or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundant information, front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple parameter set and schema coverage, the description is mostly complete. Could mention that the comment is posted under the authenticated user, but not essential.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description does not add semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add a comment') and the target resource ('to a LinkedIn post'), with specific guidance on sourcing the post ID. It distinguishes from YouTube comment tools by platform.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on where to obtain the post_id ('from search results or thread data'). No exclusions or alternatives needed as there is no other LinkedIn comment tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_companyA

Read-onlyIdempotent

Inspect

Get a LinkedIn company profile by company ID or vanity name. Returns company name, description, industry, size, and other details.

ParametersJSON Schema

Name	Required	Description	Default
`identifier`	Yes	Company ID or vanity name

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds specific returned fields (name, description, industry, size) but does not disclose behavioral traits like authentication requirements, rate limits, or error behavior. This modest addition warrants a 3.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences only: first states purpose and identification method, second lists returns. No unnecessary words, front-loaded, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with one parameter, strong annotations, and no output schema, the description is sufficient. It covers what it does, how to identify the company, and what is returned. Missing edge case handling or return format info, but not critical for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with the parameter 'identifier' described as 'Company ID or vanity name'. The description's mention of 'by company ID or vanity name' adds no new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'LinkedIn company profile', and specifies identification by company ID or vanity name. It also lists return fields, distinguishing it from the sibling 'linkedin_get_profile' which targets user profiles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides the identification method (by ID or vanity name) but does not explicitly state when not to use this tool or suggest alternatives like 'linkedin_search' for finding company URIs. The context implies usage, but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_profileA

Read-onlyIdempotent

Inspect

Get a LinkedIn user profile by ID, public identifier (vanity name), or profile URL. Returns name, headline, location, and other profile information.

ParametersJSON Schema

Name	Required	Description	Default
`identifier`	Yes	LinkedIn member ID, public identifier (vanity name), or full profile URL

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the agent knows it's a safe, idempotent read. The description adds value by specifying the returned fields (name, headline, location, other info), enhancing behavioral understanding beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, front-loaded with the action, and every word is functional. No unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 param, no output schema), the description adequately explains the input and output. It could be more complete by specifying the expected return format (e.g., JSON object) or handling of not-found cases, but it is sufficient for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (the single parameter has a detailed description). The tool description repeats the same identifier options, adding no new semantic meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the verb 'Get' and the resource 'LinkedIn user profile', and lists the three identifier formats (ID, vanity name, URL). It distinguishes itself from sibling tools like linkedin_get_company and linkedin_search by clearly focusing on a single profile retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly states when to use the tool (to retrieve a LinkedIn profile). However, it does not mention when not to use it or provide alternative sibling tools like linkedin_search for finding profiles by query.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_inviteAInspect

Send a connection invitation to a LinkedIn user. Optionally include a personalized message (max 300 characters). Rate limited: LinkedIn allows 80-100 invitations per day, max 200 per week.

ParametersJSON Schema

Name	Required	Description	Default
`message`	No	Optional personalized invitation message (max 300 characters)
`provider_id`	Yes	LinkedIn provider ID of the person to invite (from search results or profile)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations by specifying rate limits (80-100 per day, max 200 per week) and the maximum message length. It does not contradict annotations (readOnlyHint false is consistent with a write operation). However, it does not disclose success/failure responses.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loading the primary action and then adding key constraints. Every sentence is valuable and there is no extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with no output schema, the description covers purpose, optional parameter details, and rate limiting. It could be improved by mentioning potential errors or response behavior, but it is sufficient for basic understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides 100% coverage with descriptions for both parameters. The description restates the optional message and its max length but adds no new semantic information beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Send a connection invitation to a LinkedIn user.' This clearly identifies the action and resource, distinguishing it from sibling LinkedIn tools like linkedin_list_invitations_sent or linkedin_add_comment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides general usage context (optional message, rate limits) but does not explicitly guide when to use this tool versus alternatives such as linkedin_list_invitations_sent or linkedin_add_comment. No when-not conditions or alternative recommendations are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_connectionsA

Read-onlyIdempotent

Inspect

List your LinkedIn connections, sorted by most recently added.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum connections to return
`cursor`	No	Pagination cursor from previous response

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description adds little extra beyond noting the sorting order.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with essential information; no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with pagination, the description is adequate; however, it lacks details on response format or pagination behavior beyond parameter descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions, and the description adds the valuable detail that results are sorted by most recently added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List your LinkedIn connections' with ordering, distinguishing it from sibling LinkedIn tools like search or profile retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives; no mention of use cases or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_invitations_sentA

Read-onlyIdempotent

Inspect

List your pending sent connection invitations on LinkedIn.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum invitations to return
`cursor`	No	Pagination cursor from previous response

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description adds limited behavioral context. It specifies 'pending sent' which adds a status filter, but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no unnecessary words. It is front-loaded with the action and resource, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with strong annotations, the description is adequate. It lacks details on return format, but given no output schema, the agent can infer an array of invitations. It sufficiently addresses the tool's purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage, fully describing both parameters (limit with constraints, cursor as pagination token). The description does not add any additional meaning beyond what the schema provides, so score is at baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List', the resource 'pending sent connection invitations', and the platform 'LinkedIn'. It is distinct from siblings like 'linkedin_list_connections' (which lists existing connections) and 'linkedin_invite' (which sends invitations).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing pending sent invitations but does not explicitly state when to use it over alternatives, nor does it mention any exclusions or prerequisites. The context is implied but not elaborated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_reactionsC

Read-onlyIdempotent

Inspect

List all reactions (likes, celebrates, etc.) on a specific LinkedIn post.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum reactions to return
`post_id`	Yes	LinkedIn post/activity ID

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the safety profile is clear. However, the description adds no behavioral details beyond listing reactions, and it contradicts the limit parameter by claiming 'all reactions'. It omits pagination or error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise at 12 words and front-loaded. However, the word 'all' may mislead agents about the tool's actual behavior, slightly reducing effectiveness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with rich annotations, the description is adequate but lacks details on pagination, error handling, or return format. It does not leverage the absence of an output schema to add value. Slightly below complete for a tool with 2 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description does not add meaningful extra information beyond identifying the post_id. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list) and resource (reactions on a LinkedIn post) with examples (likes, celebrates). However, it says 'list all reactions' while the schema includes a limit parameter, implying not all reactions may be returned, which slightly undermines clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., other LinkedIn tools). The description simply states what it does without context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_raw_requestA

Read-onlyIdempotent

Inspect

Send an arbitrary LinkedIn API request via Unipile's magic route. Only GET and POST methods are allowed. WARNING: This bypasses structured rate limiting and can perform destructive actions. Use this only when no other LinkedIn tool covers the needed functionality.

ParametersJSON Schema

Name	Required	Description	Default
`body`	No	Request body (for POST requests)
`method`	No	HTTP method (only GET and POST allowed)	GET
`request_url`	Yes	Target LinkedIn API endpoint URL
`query_params`	No	URL query parameters

Tool Definition Quality

A3.6/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims 'can perform destructive actions' and bypasses rate limiting, but annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. This is a direct contradiction, severely undermining transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no unnecessary words, front-loaded with purpose then guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description lacks detail on return values, error handling, or expected response format. The contradiction further reduces completeness for a raw request tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so schema already documents parameters. Description adds only method restriction and body usage for POST, but no additional meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool sends arbitrary LinkedIn API requests via Unipile's magic route, specifies allowed methods (GET and POST), and implies a fallback purpose when no specific tool exists.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly states 'Use this only when no other LinkedIn tool covers the needed functionality' and warns about bypassing rate limiting and potential destructive actions, providing clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_searchA

Read-onlyIdempotent

Inspect

Search LinkedIn for people, companies, jobs, or posts. Supports filtering by keywords, location, industry, network distance, and more. Use linkedin.search_filters first to resolve filter keywords to LinkedIn parameter IDs.

ParametersJSON Schema

Name	Required	Description	Default
`api`	No	LinkedIn product to search with	classic
`url`	No	Direct LinkedIn search URL (alternative to keyword/filter search)
`role`	No	Role/title filter
`limit`	No	Maximum results to return
`category`	No	What to search for	people
`industry`	No	Industry filter IDs
`keywords`	No	Search keywords
`location`	No	Location filter IDs (use linkedin.search_filters to resolve)
`has_job_offers`	No	Filter for people with job offers
`network_distance`	No	Connection degree: F=1st, S=2nd, O=3rd+

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true; description adds context about supported filters and prerequisite step. No contradictions. Could mention response format but annotations cover safety profile.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states core purpose, second lists capabilities and prerequisite. No filler, front-loaded, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, so return structure is not described. For a search tool with 10 parameters and no required ones, more detail on response format or pagination would be useful. However, annotations cover safety.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with individual descriptions. The high-level description summarizes filter types but adds minimal new meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Explicitly states 'Search LinkedIn' and lists four categories (people, companies, jobs, posts) and multiple filter types, clearly distinguishing from sibling tool linkedin_search_filters which resolves filter keywords.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using linkedin.search_filters first to resolve filter keywords to IDs, providing clear prerequisite guidance. Does not mention when to use alternatives like linkedin_get_profile or linkedin_list_connections.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_search_filtersA

Read-onlyIdempotent

Inspect

Get LinkedIn search filter parameter IDs. LinkedIn uses internal IDs instead of text for search filters (location, industry, etc.). Call this before linkedin.search to resolve filter keywords to their LinkedIn parameter IDs.

ParametersJSON Schema

Name	Required	Description
`type`	Yes	Filter category to resolve (e.g. LOCATION, INDUSTRY, SKILL)
`limit`	No	Max results per filter category
`keywords`	Yes	Keywords to resolve to parameter IDs (e.g. 'Thailand' for LOCATION)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and non-destructive behavior. The description adds the key behavioral context that the tool resolves keywords to internal IDs, which is beyond the annotation. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the tool's core purpose, and contains no extraneous information. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with no output schema, the description covers the essential aspects: what it does, why it's needed, and how to use it (before search). Complete given the complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents each parameter. The description briefly explains the overall conversion process but does not add new meaning beyond 'resolve filter keywords to LinkedIn parameter IDs'. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to get LinkedIn search filter parameter IDs. It explains that LinkedIn uses internal IDs and that this tool should be called before linkedin.search to resolve keywords. This distinguishes it from the sibling linkedin_search tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises calling this before linkedin.search, providing clear usage context. It does not mention when to avoid the tool, but the guidance is sufficient for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_update_profileAInspect

Update the authenticated user's own LinkedIn profile. Supports adding/editing experience entries (role, company, skills, dates). Also supports updating location. Headline, summary, education are NOT supported by the API.

ParametersJSON Schema

Name	Required	Description	Default
`location`	No	Location to set on profile (requires LinkedIn location ID)
`experience`	No	Add or edit a professional experience entry

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false, consistent with the update operation. The description adds value by specifying limitations (headline, summary, education are not supported) and clarifying that it operates on the authenticated user's own profile. However, it does not disclose rate limits, error behavior, or whether updates are additive or overwrite existing data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and then listing constraints. Every sentence provides essential information with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, so the description should hint at return values to set expectations. It does not mention the response format. While annotations and schema cover safety and parameters, the lack of output guidance is a gap. However, the clear scope (supported/unsupported) compensates partially.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for properties. The description adds meaningful context: location requires a LinkedIn location ID from linkedin.search_filters, and omitting the experience ID adds a new entry while including it edits an existing one. This clarifies usage beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool updates the authenticated user's LinkedIn profile, enumerates supported operations (experience, location), and lists unsupported fields (headline, summary, education). This provides a specific verb+resource and clearly distinguishes it from siblings like linkedin_get_profile or linkedin_raw_request.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists what is supported and unsupported, giving clear context on scope. However, it does not provide explicit when-to-use or when-not-to-use guidance, nor does it mention alternatives such as linkedin_raw_request for more advanced updates. The usage is implied but lacks proactive decision support.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_deleteA

DestructiveIdempotent

Inspect

Delete a message from a thread. Supports Telegram, WhatsApp, and other connected channels. Note: Some channels have time limits on message deletion.

ParametersJSON Schema

Name	Required	Description	Default
`thread_id`	Yes	Thread/channel ID containing the message
`message_id`	Yes	ID of the message to delete

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint and idempotentHint. Description adds channel-specific constraints and time limits, which is useful context. Does not elaborate on idempotency or other behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no extraneous information. Front-loaded with the core action. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter delete tool, description covers purpose, supported channels, and a constraint (time limits). No output schema, so return behavior is not needed. Could mention idempotency or success response, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. Description adds no additional parameter-level meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Delete a message from a thread' with a specific verb and resource. It mentions supported channels and time limits, distinguishing it from sibling message tools (send, forward, read history).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on supported channels (Telegram, WhatsApp, etc.) and a caveat about time limits, guiding appropriate use. However, no explicit comparison to alternatives or when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_forwardAInspect

Forward a message from one thread to another. Supports native Telegram forwarding (preserves original sender attribution) and text-based forwarding for cross-channel scenarios.

ParametersJSON Schema

Name	Required	Description
`dest_thread_id`	No	Destination thread to forward into. Provide at least one of dest_thread_id or recipient_name. To forward into the active conversation, pass the current thread_id. (If both are provided, dest_thread_id wins and recipient_name is ignored.)
`recipient_name`	No	Name of person to forward to (channel auto-resolved). Provide at least one of dest_thread_id or recipient_name. Use only when forwarding to a different contact than the current conversation.
`source_thread_id`	Yes	Thread containing the message to forward (e.g., 'telegram:123456' or numeric DB ID)
`source_message_id`	Yes	ID of the message to forward

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and openWorldHint=true. The description adds value by detailing that forwarding can be native (preserving attribution) or text-based, and implies state modification. It does not contradict annotations and provides meaningful behavioral context beyond the structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, directly stating purpose and then adding essential detail about modes. No extraneous information, perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has two optional parameters with detailed schema descriptions and no output schema, the description adequately covers the core functionality and forwarding modes. It could mention that forwarding creates a new message (modifying state), but annotations already cover non-read-only behavior. Slightly missing is mention of return value or success indicators, but this is not critical given the tool's simplicity and annotation coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add significant parameter-level meaning beyond what the schema's parameter descriptions already provide (e.g., the dest_thread_id/recipient_name logic is already explained in the schema). The description's mention of two forwarding modes only indirectly relates to parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Forward a message from one thread to another,' which is a specific verb-resource pair. It distinguishes the tool from siblings like messages_send by specifying forwarding behavior, and further differentiates two modes (native Telegram and text-based), making its purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides useful context on when to use each mode ('preserves original sender attribution' for native, 'cross-channel scenarios' for text-based), but does not explicitly contrast with alternatives like messages_send or state when not to use it. It gives good guidance but misses explicit exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_read_historyA

Read-onlyIdempotent

Inspect

Read messages from a conversation thread. Use text_contains to find specific messages by content. Returns the most recent messages, including sender info and timestamps.

Voice calls: each row carries a meta object with allowlisted keys (event_type ∈ 'call_started'|'call_ended'|null, source ∈ 'voice_transcript'|null, call_id, speaker_display_name, duration_seconds, outcome, direction) plus per-message channel. To find calls without scanning every row, use calls.list_history instead.

Usage:

Get thread_id from threads.list first, OR
Use contact_name to auto-resolve thread_id

Examples:

User: 'show me messages from chat with [contact]' → read_history(contact_name='[contact]', limit=10)
User: 'last 5 messages from thread 571' → read_history(thread_id=571, limit=5)

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of messages to return (default: 10, max: 100)
`offset`	No	Number of messages to skip (for pagination, default: 0)
`thread_id`	No	Thread ID to read messages from (e.g., '571' or 'telegram:571'). Optional if contact_name provided.
`contact_name`	No	Contact/thread name to search for (optional if thread_id provided). Example: 'Jane Smith', 'John Doe'
`text_contains`	No	Filter: only return messages containing this text (case-insensitive substring match)
`include_outgoing`	No	Include messages sent by you (default: true)

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and non-destructive. Description adds details on return ordering (most recent), voice call meta object structure, and example usage, going beyond the structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured: starts with summary, then voice call specifics, usage steps, and examples. Each sentence adds value without waste; front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description details return content (most recent messages, sender info, timestamps, voice call meta). Covers all 6 parameters and provides usage flow, making it fully complete for a read-only tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description explains the difference between thread_id and contact_name, emphasizes text_contains as a filter, and provides contextual examples, adding meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it reads messages from a conversation thread, distinguishes from text_contains usage, and differentiates from calls.list_history for call-related data. It specifies the return content includes sender info and timestamps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance (reading messages), when-not-to-use (calls: use calls.list_history), prerequisites (get thread_id or use contact_name), and includes examples for common use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_sendAInspect

Send a message to a thread, channel, or contact. Supports Telegram, Email, LinkedIn, and other connected channels. For LinkedIn posts (comment_thread kind), this posts a comment on the post. Can automatically resolve recipients and channels when not specified. Can send files/images/documents as attachments — pass attachments=[file_id, ...] with integer file IDs obtained from collections.list_files, search.files, or files.search. text is optional when attachments are provided.

ParametersJSON Schema

Name	Required	Description	Default
`text`	No	Message text to send. Optional if attachments provided.
`format`	No	Message format	text
`silent`	No	Send without notification
`thread_id`	No	Target thread. OMIT to reply in the same chat you received the triggering message from — the backend defaults to the current thread. Pass an explicit value ONLY to reply in a DIFFERENT thread, and only use: (a) a numeric DB thread id from search.threads, or (b) a channel_ref like 'telegram:-12345'. NEVER use a chat-type word (dm, group, channel, livechat) — those are category labels from the SITUATION block, not ids.
`attachments`	No	Array of integer file IDs to send as attachments (images, documents, any files). Get file IDs from collections.list_files (field `file_id`), search.files (field `file_id`), or files.search. Example: [302237]. The file must already exist in the workspace (status=ready) — no separate upload step needed. When attachments are provided, `text` becomes optional (a caption can be included alongside).
`recipient_name`	No	Name of person to send to (e.g., 'Jane', 'John'). Tool will auto-resolve channel. Optional if thread_id provided.
`reply_to_message_id`	No	ID of message to reply to (optional)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses automatic resolution of recipients and channels, attachment sending capabilities, and nuances of the thread_id parameter. Annotations are present (readOnlyHint=false, etc.) and consistent, with no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single paragraph but well-organized, front-loading the main action. Could be slightly more concise but remains informative and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, no required, no output schema), the description covers channels, attachments, recipient resolution, thread_id rules, and supported platforms comprehensively. No obvious gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant context beyond schema, especially for thread_id (OMIT for same chat, explicit usage rules) and attachments (file ID sources). Enhances understanding of parameter behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states the tool sends messages to threads, channels, or contacts across multiple platforms (Telegram, Email, LinkedIn). It specifies posting comments on LinkedIn posts, clearly differentiating from sibling tools like messages_delete or messages_forward.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context for when to use, such as automatic recipient resolution and attachment handling. However, it does not explicitly state when not to use this tool versus alternatives, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_send_emailAInspect

Compose and send an email — with subject, CC/BCC, and attachments. Use for email; for chat messages (Telegram/WhatsApp/livechat) use messages.send instead.

ParametersJSON Schema

Name	Required	Description
`cc`	No	Email addresses to CC. OMIT to skip.
`bcc`	No	Email addresses to BCC. OMIT to skip.
`text`	No	Email body.
`subject`	No	Email subject line. Required for new emails; for replies it auto-generates 'Re: ...' when omitted.
`attachments`	No	Array of integer file IDs to attach.
`recipient_email`	No	Recipient email address (e.g. 'john@example.com'). Provide to start a new email thread; OMIT to reply in the current email thread.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations state readOnlyHint=false and destructiveHint=false, so the description appropriately indicates a mutation. It adds context about conditional parameter requirements and reply behavior, but does not mention potential rate limits or failure modes. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single efficient sentence (27 words) plus a clear, front-loaded directive. Every word earns its place, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, no required fields, and no output schema, the description covers the primary use case, distinguishes from sibling, and explains conditional parameters. It doesn't elaborate on error handling or return values, but that is acceptable for a sending tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning: explains that 'recipient_email' is for new threads vs omit for reply, and 'subject' is required for new but auto-generates 'Re: ...' when omitted for replies. This goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Compose and send an email' and explicitly distinguishes from the sibling tool messages.send for chat messages. The verb and resource are specific, and the distinction eliminates ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: use for email, use messages.send for chat. It also explains conditional usage for new threads (provide recipient_email) vs replies (omit), and subject auto-generation for replies.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_deleteA

DestructiveIdempotent

Inspect

Delete a note by ID from the target notebook. Same identity rules as notes.save — agents can only delete from their own notebook.

ParametersJSON Schema

Name	Required	Description	Default
`note_id`	Yes	ID of the note to delete
`target_agent_id`	No	Target notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true and idempotentHint=true. The description adds the identity restriction (only own notebook), which provides useful behavioral context beyond the annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the key action and identity rule. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple deletion tool with two parameters and no output schema, the description covers the essential purpose and identity constraints. Could mention permanence but not required given simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds little beyond the schema. It mentions 'target notebook' but the schema already documents target_agent_id with a description. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (delete) and resource (note by ID). It distinguishes from sibling tools like notes_save and notes_search by specifying 'from the target notebook' and referencing identity rules.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: agents can only delete from their own notebook, referencing notes.save rules. It explicitly states who can use it and under what conditions, though it doesn't name alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_recallA

Read-onlyIdempotent

Inspect

Recall notes from your notebook. By default returns only your own notes (all scopes, newest first). Pass filter_agent_id= to read another agent's notebook, or filter_agent_id="all" (or "*") to read across every agent in the workspace. Pass scope to narrow to global/thread/person. Each result includes agent_id and agent_name of the author.

ParametersJSON Schema

Name	Required	Description
`key`	No	Recall a specific note by key
`limit`	No	Max notes (default 20, max 50). Newest first.
`scope`	No	Optional filter: global \| thread \| person. Omit for all scopes.
`scope_ref_id`	No	Filter by specific thread_id or person_id
`filter_agent_id`	No	Optional. Omit to read only your own notes. Pass a numeric agent_id as a string (e.g. "57") to read another agent's notebook (read-only). Pass "all" or "*" to read across all agents in the workspace.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds value by detailing default return scope (own notes, all scopes, newest first) and how filtering affects results, including that each result includes agent_id and agent_name. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is four sentences long, well-structured, and front-loaded with the main purpose. Every sentence adds value without redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description mentions that results include agent_id and agent_name. It covers key behaviors and filtering, but does not detail other note fields (e.g., content, timestamp) or pagination behavior beyond the limit parameter. Slight gap but still fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with all 5 parameters described. Description adds meaning by explaining default behavior when parameters are omitted and special handling for filter_agent_id='all' or '*'. It clarifies that agent_id is passed as a string, which the schema does not specify.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Recall notes from your notebook' and specifies default behavior (own notes, all scopes, newest first). It distinguishes this from sibling tools like notes_search and notes_delete/save by focusing on retrieval with filtering options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides explicit guidance on when to use (recalling notes) and how to filter by agent_id and scope. It explains defaults and special values like 'all' for cross-agent reading. However, it does not explicitly mention alternatives or when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_saveAInspect

Save a fact or note into the agent's memory. Use scope to choose visibility: 'workspace' = visible to every agent in this workspace (use for shared facts, project conventions); 'agent' = private to this agent (use for personal working notes); 'thread' = scoped to one conversation (use for thread-specific reminders); 'person' = scoped to one contact (use for per-contact context). If a note with the same key+scope exists it will be updated. Do NOT use this tool for behavioral rules or corrections — use feedback.save for those.

ParametersJSON Schema

Name	Required	Description
`key`	Yes	Short identifier for this note (must not start with '__' — reserved)
`scope`	Yes	Scope of the note. 'workspace' = shared across all agents; 'agent' = private to this agent (was 'global' pre-PR1); 'thread' = per-conversation; 'person' = per-contact. 'global' is accepted as a deprecation alias for 'agent'.
`value`	Yes	The note content
`pinned`	No	Pin this note so it's always loaded first. Default false.
`scope_ref_id`	No	Reference ID — thread_id (for scope=thread) or person_id (for scope=person). Required for thread/person scope. In MCP mode (no thread context), must be passed explicitly.
`target_agent_id`	No	Target notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks. Ignored when scope='workspace' (workspace memory is shared).
`expires_in_hours`	No	Auto-delete after N hours. Omit for permanent notes.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses the upsert behavior (updates if key+scope exists) and auto-delete via expires_in_hours. Annotations are generic (readOnlyHint=false, etc.), but the description adds valuable behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but every sentence provides necessary information. It front-loads the core action and scope selection. Minor redundancy? It's very thorough, but could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters and no output schema, the description covers all essential aspects: scope explanations, upsert, expiration, MCP mode, and directs to feedback.save for behavioral rules. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value: explains scope meanings, deprecation alias 'global', requirements for scope_ref_id, MCP mode notes for target_agent_id, and applicability of expires_in_hours. This goes well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool saves a fact or note into the agent's memory. It distinguishes from siblings like feedback_save by explicitly saying not to use it for behavioral rules. The verb and resource are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use (saving facts/notes) and when-not-to-use (behavioral rules, refer to feedback.save). It also details scope choices with examples, helping the agent decide which scope to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_searchA

Read-onlyIdempotent

Inspect

Full-text search in your notebook. By default searches only your own notes. Pass filter_agent_id= to search another agent's notebook, or "all" (or "*") for workspace-wide. Or list all notes for a person/thread by scope_ref_id.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results (default 10, max 50)
`query`	No	Text to search for in note keys and values. Optional if scope_ref_id is provided.
`scope`	No	Limit search to scope
`scope_ref_id`	No	Filter by specific thread_id or person_id. If provided without query, lists all notes for that ref.
`filter_agent_id`	No	Optional. Omit to search only your own notes. Pass a numeric agent_id as a string (e.g. "57") to search another agent's notebook (read-only). Pass "all" or "*" to search across all agents in the workspace.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true, idempotentHint=true) already indicate safety. The description adds behavioral details like default scope, cross-agent search, and listing mode without query. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with purpose, no fluff. Each sentence adds information efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but tool is a typical search; description covers input usage thoroughly. Adequate for agent to invoke correctly without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description adds usage context beyond schema, e.g., default own notes for filter_agent_id omission, 'all'/'*' for workspace, and scope_ref_id listing behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Full-text search in your notebook' with specific verb and resource, and distinguishes via default behavior (own notes) and filtering options. This differentiates it from sibling tools like notes_save or notes_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit instructions on default usage (own notes), how to search others (filter_agent_id), and how to list notes without query (scope_ref_id). While no explicit 'when not to use', the context is clear and helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

present_tabA

Read-onlyIdempotent

Inspect

Share the agent's browser tab on the live call so everyone sees it as a real screen-share. Pass the page_id you got from browser.open. Only usable while the agent is in an active voice call. The shared tab stays the active share until you call present_tab with a different page_id, close the tab via browser.close, or the call ends.

ParametersJSON Schema

Name	Required	Description	Default
`page_id`	Yes	page_id returned by browser.open for the tab you want to share. Must be a tab still open in the agent's browser context.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond the annotations, describing that the shared tab stays active until overridden, the tab is closed, or the call ends. This explains the stateful nature of the tool, which is not captured by the readOnlyHint, openWorldHint, idempotentHint, or destructiveHint annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at three sentences, each serving a distinct purpose: stating the action, specifying a prerequisite, and outlining the lifecycle. It is front-loaded with the core functionality and contains no unnecessary words or repetitions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has a single parameter, no output schema, and annotations already present, the description covers all necessary context: what it does, when it can be used, how the shared tab behaves, and where the parameter comes from. It is fully sufficient for an agent to use the tool correctly without further clarification.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes the page_id parameter with 100% coverage, but the description adds essential context: the page_id comes from browser.open and must be for an open tab in the agent's browser. This clarifies the parameter's source and validation requirement, supplementing the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool shares a browser tab on a live call as a real screen-share, specifying the verb 'share', the resource 'browser tab', and the required input page_id. It effectively distinguishes the tool's purpose among siblings by focusing exclusively on screen-sharing during voice calls.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly restricts usage to active voice calls and explains the lifecycle of the shared tab (remains active until changed, tab closed, or call ends). While it doesn't mention alternatives or when not to use it, the constraints are clearly communicated, making it easy for an agent to decide when to invoke.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_getA

Read-onlyIdempotent

Inspect

Get full content of a prompt template: system instructions (prompt_text) and auto-reply rules.

Run prompts.list first to find the prompt_id.

ParametersJSON Schema

Name	Required	Description	Default
`prompt_id`	Yes	ID of the prompt template to fetch

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent behavior. Description adds specificity about returned content (system instructions and auto-reply rules), enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-loading purpose and usage guidance, with no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple getter tool with one parameter and robust annotations, the description provides sufficient context. Lacks output schema but specifies return content sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers the single parameter with adequate description. The description adds value by linking to prerequisite (prompts.list) but doesn't elaborate on parameter formatting or validation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (get full content) and resource (prompt template), listing specific fields (prompt_text, auto-reply rules). Distinguishes from siblings like prompts_list and prompts_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to run prompts.list first to obtain prompt_id, providing clear context for when to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_listA

Read-onlyIdempotent

Inspect

List all prompt templates in this workspace.

Returns id + name + description + category so you know which prompt_id to use in prompts.get or prompts.update.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations (readOnlyHint, destructiveHint, idempotentHint) and adds value by specifying the exact fields returned, which is beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the primary action, and includes necessary detail without any superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters, no output schema, and no nested objects, the description adequately covers the tool's function and output. Minor improvement could include mentioning ordering or filtering, but not essential.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters, so the description does not need to add parameter information. Baseline score of 4 is appropriate for no parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists all prompt templates in the workspace and specifies the returned fields (id, name, description, category), directly linking to the use case of identifying prompt_id for prompts.get or prompts.update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use the tool: to obtain prompt_id for subsequent calls to prompts.get or prompts.update. Although it does not explicitly mention when not to use it, the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_historyA

Read-onlyIdempotent

Inspect

List past versions of a prompt template's prompt_text. Every edit is snapshotted to an append-only table — use this to browse history and find a version_number for prompts.prompt_restore.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max versions to return (1-200, default 50)
`prompt_id`	Yes	ID of the prompt template
`before_version`	No	Cursor: return versions strictly below this version_number

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, but the description adds valuable context: 'Every edit is snapshotted to an append-only table', explaining the immutable and historical nature of the data, which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, with the first sentence immediately stating the purpose and the second adding context and linking to a sibling tool. Every sentence is necessary and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a list-history tool with full schema coverage and annotations, the description is fairly complete. It explains the append-only storage and the link to restore, but could mention ordering (e.g., descending by version) for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add significant new meaning beyond the schema descriptions (e.g., 'before_version' is already described as a cursor).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the verb 'List' and the resource 'past versions of a prompt template's prompt_text', distinguishing it from sibling tools like prompts_get (current prompt) and prompts_list (all prompts). It also references the related restore tool, clarifying its role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly indicates when to use the tool ('to browse history and find a version_number for prompts.prompt_restore'), providing context for its use case. However, it does not explicitly state when not to use it or list alternative tools, though the sibling context implies alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_restoreAInspect

Restore a past version of a prompt template by version_number. Creates a new version pointing at the restored content — history is preserved. Fans out to every agent using this template without a per-agent override; the response includes affected_agents as a receipt of the fan-out.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Optional: why this restore is happening (shows up in history UI)
`prompt_id`	Yes	ID of the prompt template
`version_number`	Yes	The version_number to restore (get it from prompts.prompt_history)

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it creates a new version (preserving history), fans out to all agents without per-agent override, and includes affected_agents in the response. Annotations only indicate false hints for readOnly, openWorld, idempotent, and destructive—so the description fills crucial gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no redundancy. First sentence states the action; second explains the non-destructive behavior; third covers the fan-out and response. All content earns its place, and the key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description adequately explains the return value (affected_agents). It covers the main behavior, side effects, and constraints. Minor gap: no mention of error conditions or permissions, but overall sufficient for a restore operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description reinforces the version_number usage (e.g., 'get it from prompts.prompt_history') and notes the response includes affected_agents. While helpful, it does not substantially surpass the schema's explanations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool restores a past version of a prompt template by version_number, creating a new version. This distinguishes it from sibling tools like prompts_update (which modifies current version) and prompts_prompt_history (which lists history). The title 'Restore Prompt Template' reinforces the purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the primary use case—restoring a past version. It notes the fan-out behavior and the affected_agents receipt, which helps agents understand impact. However, it does not explicitly state when not to use it (e.g., for per-agent overrides) or direct alternatives, so it misses full exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_updateAInspect

Update a prompt template's name, system instructions, or auto-reply rules.

Changes affect every agent using this template, unless the agent has its own override (set via agents.update → prompt_text).

All parameters except prompt_id are optional — only provided fields are updated.

ParametersJSON Schema

Name	Required	Description
`name`	No	New name for the prompt template
`prompt_id`	Yes	ID of the prompt template to update
`description`	No	New description for the prompt template
`prompt_text`	No	The AI system prompt: persona, tone, rules, behavior.
`auto_reply_rules`	No	Pre-classifier rules that run BEFORE the main AI. Format: bullet list of conditions → actions (SKIP / SIMPLE_REPLY / SEARCH / CALENDAR). Pass null to clear.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral context beyond annotations: explains that updates propagate to agents unless overridden, and that auto_reply_rules can be cleared by passing null. Annotations indicate mutation but not destruction; description aligns with that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no redundant wording. Each sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and full parameter descriptions, the description covers all necessary contextual aspects: effect scope, optionality, and special parameter behavior. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds value by explaining that only provided fields are updated and clarifies the special handling of auto_reply_rules (null to clear). This provides useful context beyond what parameter descriptions offer.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates a prompt template's name, system instructions, or auto-reply rules, specifying the resource ('prompt template') and action ('update'), distinguishing it from sibling tools like prompts_get or prompts_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that changes affect all agents using the template unless overridden at agent level (via agents.update). Also notes all parameters except prompt_id are optional, guiding usage. Does not explicitly exclude alternative tools but offers sufficient context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_cancelBInspect

Cancel an active reminder by its trigger ID.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	No	Agent ID (required when calling from MCP; ignored in agentic mode).
`trigger_id`	Yes	ID of the reminder to cancel

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description is minimal, not adding behavioral context beyond annotations. Annotations indicate it is not read-only, not idempotent, and not destructive (though cancelling may be considered destructive). The description does not clarify side effects or permissions needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no wasted words. However, it could benefit from slight restructuring to improve readability (e.g., front-loading the action).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description omits critical information such as return values, failure conditions, or constraints (e.g., whether it can cancel reminders already triggered). This makes it incomplete for a cancellation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. The description does not add additional meaning beyond what the schema already provides, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Cancel') and the target resource ('active reminder'), and specifies the required identifier ('trigger ID'). It distinctly differentiates from sibling tools like reminder_set and reminder_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, such as when a reminder should be cancelled versus modified. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_listA

Read-onlyIdempotent

Inspect

List your active reminders (both one-time and recurring).

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results (default 20)
`agent_id`	No	Agent ID (required when calling from MCP; ignored in agentic mode).
`thread_id`	No	Filter by thread
`include_fired`	No	Include already-fired one-time reminders (default false)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint. Description adds scope (active, one-time+recurring) but does not disclose other behavioral traits like pagination or ordering. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, perfectly concise, front-loaded with the action and resource. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple list tool with clear annotations and full schema descriptions. Description explains what is listed (active, both types), which is sufficient. Could mention include_fired parameter but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds no additional parameter information beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists active reminders, both one-time and recurring. Distinguishes from sibling tools reminder_set and reminder_cancel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Does not mention exclusions or context for choosing between reminder_list and reminder_cancel/reminder_set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_setBInspect

Schedule a reminder. One-time reminders fire at a specific datetime. Recurring reminders fire on a schedule (daily, weekly, every N days, or every N minutes). Optionally scope to a thread or target another agent.

ParametersJSON Schema

Name	Required	Description
`time`	No	Time of day HH:MM for daily/weekly/every_n_days (e.g. '09:00'). Required for daily/weekly/every_n_days.
`reason`	Yes	What this reminder is for (you'll see this when it fires)
`agent_id`	No	Agent ID (required when calling from MCP; ignored in agentic mode).
`datetime`	No	ISO datetime for one_time (e.g. '2026-04-01T09:00:00+03:00'). Required for one_time.
`timezone`	No	IANA timezone (e.g. 'Europe/Moscow'). Defaults to UTC.
`thread_id`	No	Optional thread ID to scope the reminder to. Omit for workspace-level reminders.
`days_of_week`	No	Days for weekly: 0=Mon, 1=Tue, 2=Wed, 3=Thu, 4=Fri, 5=Sat, 6=Sun. Required for weekly.
`interval_days`	No	For every_n_days: fire every N days (min 2).
`schedule_type`	Yes	one_time = fires once at datetime. daily = fires daily at time. weekly = fires on specific days_of_week at time. every_n_days = fires every N days at time. interval = fires every N minutes.
`interval_minutes`	No	For interval: fire every N minutes (5-1440).
`target_agent_slug`	No	Optional: activate a different staff member instead of yourself when the reminder fires.

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate this is a write operation (readOnlyHint=false) and not destructive (destructiveHint=false). The description adds scheduling details but doesn't disclose edge cases like overriding existing reminders or behavior on conflict. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences, front-loaded with the main action. No unnecessary words. Efficient structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters and no output schema, the description lacks information about return values (e.g., reminder ID) and potential limitations (e.g., maximum reminders per user). Could be more complete for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions, so the description adds little beyond mentioning optional scoping (thread_id, target_agent_slug). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Schedule a reminder' and explains one-time vs recurring types. However, it does not explicitly differentiate from sibling tools like reminder_cancel or reminder_list, so it misses a chance to distinguish its role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives (e.g., when to choose one_time vs daily). No prerequisites or context about required permissions or effect on existing reminders.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_filesA

Read-onlyIdempotent

Inspect

Search files and attachments across the workspace — by content, filename, document type, or origin. For message content use search.messages; for links use search.links.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum files to return.
`query`	No	What to search for (content or filename).
`file_origin`	No	File origin: 'generated' (created by tools), 'received' (from messages), 'uploaded' (manual). Use 'generated' for files the user created/sent. OMIT to include all origins.
`document_type`	No	Filter by document category. OMIT unless the user explicitly mentions one — picking a value narrows the search and is a common cause of zero-result mistakes.
`attachment_name`	No	Exact filename filter. OMIT to skip (do NOT pass an empty string).

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description is not contradicted. However, the description adds no extra behavioral context beyond what annotations provide (e.g., rate limits, response size).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences achieve full clarity with zero waste. The first sentence states purpose and scope, the second directs to alternatives. Perfectly front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with 5 parameters and no output schema, the description provides essential context (purpose, scope, and sibling distinction). It does not explain return format or ordering, but the schema covers parameters adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already describes all parameters. The description adds no parameter-specific information beyond what is in the schema, maintaining the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'search' and resource 'files and attachments', with scope 'across the workspace'. It also lists search dimensions (content, filename, document type, origin) and distinguishes from siblings by directing to other tools for messages and links.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit alternatives ('For message content use search.messages; for links use search.links'), telling the agent when not to use this tool. It implies usage for file searches, though it does not elaborate on prerequisites or specific scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_linksA

Read-onlyIdempotent

Inspect

Search links/URLs shared across the workspace — by type, owner, or associated contact. For files use search.files; for message content use search.messages.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum links to return.
`owner`	No	Link owner: 'self' (user's own) or 'contact' (from others). OMIT to include links regardless of owner.
`query`	No	What to search for in shared links.
`link_kind`	No	Filter links by type. OMIT to include all kinds — picking a value narrows the search and is a common cause of zero-result mistakes.
`contact_hint`	No	Name hint to filter links for a specific contact. OMIT to skip.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description adds context by mentioning 'shared across the workspace' and the common pitfall of over-filtering with link_kind. No contradiction with annotations, and it complements the structured data well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose and filtering dimensions, second provides sibling differentiation. Every word adds value, no fluff. Front-loaded with the verb 'Search' and key resource 'links/URLs'. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description does not explain the return format (e.g., list of link objects, pagination). While annotations cover safety and idempotency, a brief note on what the output contains would improve completeness. Still adequate for a straightforward search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with clear descriptions. The description adds valuable context beyond schema by noting that omitting link_kind includes all types and that picking a value narrows search—a critical behavioral note. This enhances parameter understanding without redundancy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Search links/URLs shared across the workspace' and lists filtering dimensions (type, owner, associated contact). It explicitly distinguishes from sibling tools by directing users to 'search_files' and 'search_messages' for other content types, leaving no ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides direct alternatives: 'For files use search.files; for message content use search.messages.' Additionally, the schema description for 'link_kind' warns against narrowing search unnecessarily, offering practical usage guidance to avoid zero-result mistakes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_messagesA

Read-onlyIdempotent

Inspect

Search message content across all chats — semantic + keyword. Use to find what was said: quotes, topics, info exchanged. For chats/threads themselves use search.threads; for files use search.files; for links use search.links.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum messages to return.
`query`	No	What to search for in message content.
`date_to`	No	ISO8601 date (YYYY-MM-DD) upper bound. OMIT to skip.
`date_from`	No	ISO8601 date (YYYY-MM-DD) lower bound. OMIT to skip.
`participant_name`	No	Filter to messages involving this participant/contact name. OMIT to search across everyone (do NOT pass an empty string).

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and destructiveHint. Description adds that it does both semantic and keyword search, which is consistent and adds minor context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and followed by alternatives. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, and comparisons well. Does not mention return value structure, but with readOnlyHint and no output schema, the description is sufficient for selecting and invoking correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The tool description does not add extra parameter information beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches message content across all chats with semantic and keyword capabilities, and contrasts with sibling tools by naming alternatives for threads, files, and links.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (to find quotes, topics, info) and when not to (use search.threads for chats/threads, search.files for files, search.links for links).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_threadsA

Read-onlyIdempotent

Inspect

Find or list chat threads/conversations — by topic, participant, unread/unanswered status, or recency. Omit query to list threads by filter. For message content use search.messages; for files use search.files. since filters by recency and pairs with only_unread / only_unanswered.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum threads to return.
`query`	No	Topic/keyword to search threads for. OMIT to list threads by filter.
`since`	No	ISO date (YYYY-MM-DD). Only threads with any message activity since this date (recency filter, not 'unanswered'). OMIT to skip.
`only_unread`	No	Limit to threads with unread messages. OMIT to include read threads.
`only_unanswered`	No	Limit to threads where the last message is incoming (you haven't replied). Covers 'threads I haven't replied to'. OMIT to include answered threads too.
`participant_name`	No	Filter to threads with this participant/contact. OMIT to include everyone (do NOT pass an empty string).

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and destructiveHint. The description adds behavioral context beyond annotations, such as the two operational modes (search with query vs. list without) and the pairing of since with status filters. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences that front-load the core purpose, then provide specific usage notes. No extraneous words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 6 parameters (all with schema descriptions) and no output schema, the description covers the main operational modes and parameter interactions. It could mention default values (e.g., limit defaults to 10) but those are in schema. Overall, the description is sufficient for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-documented in the schema. The description adds semantic value by explaining the interaction between since, only_unread, and only_unanswered, and clarifying the behavior when query is omitted. This enhances understanding beyond the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Find or list chat threads/conversations — by topic, participant, unread/unanswered status, or recency.' It distinguishes from sibling tools by directing users to search.messages for message content and search.files for files, making the tool's scope unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage guidance by explaining when to omit the query parameter to list threads by filter, and how since pairs with only_unread/only_unanswered. It explicitly points to alternative tools for message and file search. However, it could more strongly state when not to use this tool, but the sibling references suffice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

system_sleepA

Read-onlyIdempotent

Inspect

Pause execution for a given number of seconds (max 30). Use when you need to wait for an external process to complete before retrying — e.g. message sync, backfill, or API propagation. Total sleep per run is capped at 60 seconds.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	Why you are waiting (logged for debugging)
`seconds`	Yes	Number of seconds to sleep (1-30)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description doesn't need to restate those. It adds valuable behavioral details: max 30 seconds per sleep, total cap of 60 seconds per run, which go beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The action and purpose are front-loaded. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple pause tool with no output schema, the description covers the essential: what it does, why use it, and the limits. No missing behavioral context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with descriptions for both parameters. The description adds minimal extra meaning—mentions 'max 30' which is already in schema. Baseline 3 is appropriate as the description doesn't significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Pause execution for a given number of seconds (max 30)' and specifies the purpose 'wait for an external process to complete before retrying'. This is a specific verb-resource pair with clear scope, and it distinguishes itself from sibling tools like agent_handoff or web_fetch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit use case given: 'Use when you need to wait for an external process to complete before retrying — e.g. message sync, backfill, or API propagation.' It also mentions the total sleep cap. No explicit when-not, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_createBInspect

Create a new task in your to-do list.

ParametersJSON Schema

Name	Required	Description
`title`	Yes	Task title
`due_at`	No	ISO datetime when task is due (e.g. '2026-03-31T15:00:00')
`agent_id`	No	Agent ID whose tasks to access. Required when calling from MCP.
`due_date`	No	Date when task is due (e.g. '2026-03-31'). Use with due_time or alone.
`due_time`	No	Time when task is due (e.g. '15:00'). Used with due_date.
`priority`	No	Task priority (default: medium)
`thread_id`	No	Related thread ID
`description`	No	Detailed description
`assigned_to_contact_id`	No	Contact ID if assigned to someone

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate this is a write operation (readOnlyHint=false). The description merely restates 'create' without adding details like side effects, return behavior, or required permissions. No additional behavioral context is provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words. However, it is so minimal that it could be expanded slightly to include return value or usage context without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description should hint at what is returned (e.g., task ID or object). It does not. Also, with many sibling tools, a brief note on use cases would enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all 9 parameters, so the schema itself documents parameter meaning. The tool description adds no extra semantics beyond 'create a task', which is already implied. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the action 'Create' and the resource 'task', clearly indicating that this tool adds a new item to the to-do list. It naturally distinguishes itself from siblings like tasks_update and tasks_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as tasks_update for modifying existing tasks. It lacks any contextual hints about prerequisites or preferred scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_deleteBInspect

Delete a task from your to-do list by its ID.

ParametersJSON Schema

Name	Required	Description	Default
`task_id`	Yes	ID of the task to delete
`agent_id`	No	Agent ID whose task to delete. Required when calling from MCP.

Tool Definition Quality

B3.1/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description claims a destructive action ('Delete'), but the annotation 'destructiveHint' is false, creating a direct contradiction. According to scoring rules, this warrants a score of 1 with an annotation contradiction flag.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence of 10 words. It contains no extraneous information and efficiently communicates the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with full schema and no output schema, the description covers the basic purpose. However, it omits mention of irreversibility or side effects, and the annotation contradiction diminishes completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with complete parameter descriptions. The tool description does not add additional meaning beyond what the schema already provides, so the baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete'), the resource ('a task from your to-do list'), and the identifier ('by its ID'). This is specific and distinguishes it from sibling tools like tasks_update or tasks_create.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites or exclusions. It only states what the tool does without context of when it is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_listA

Read-onlyIdempotent

Inspect

List your tasks, or another agent's tasks (read-only) using from_agent_id. Use filters to narrow results.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results (default 20)
`status`	No
`overdue`	No	If true, only return tasks past due_at that are not done
`agent_id`	No	Agent ID whose tasks to list. Required when calling from MCP.
`thread_id`	No	Filter by related thread
`from_agent_id`	No	List tasks of another agent (read-only). Omit to list your own.
`assigned_to_contact_id`	No	Filter by assigned contact

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only; description adds clarity on read-only scope for other agents' tasks. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, efficient and front-loaded. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main purpose, key parameter, and filtering. Lacks return format or pagination details, but adequate for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 86% (>80%), so baseline 3. Description adds context on from_agent_id but doesn't detail other parameters beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists tasks, distinguishes own tasks from another agent's tasks (read-only), and mentions filters. Differentiates from create/update/delete siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (list tasks, including for another agent) and that it's read-only. Lacks explicit alternatives but sibling names imply them.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_updateAInspect

Update an existing task. Set status='done' to complete it, 'cancelled' to cancel. Use summary for completion notes.

ParametersJSON Schema

Name	Required	Description
`due_at`	No	ISO datetime
`status`	No
`summary`	No	Completion note (stored when marking done)
`task_id`	Yes	ID of the task to update
`agent_id`	No	Agent ID whose task to update. Required when calling from MCP.
`priority`	No
`description`	No

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false and destructiveHint=false. Description adds that updating status to 'done' completes the task and summary is stored, which is useful context but does not cover all behavioral traits like permissions or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with primary action, followed by specific usage details. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers key behaviors for an update tool with 7 parameters, including status handling. Lacks explanation of all parameters (e.g., description, priority), but schema partially fills gaps. No output schema, but return values are implied.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 57%, the description adds meaning by explaining how to use status and summary in context, beyond what the schema provides. It clarifies the purpose of these parameters in the tool's workflow.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Update an existing task' with specific verb and resource, and provides concrete examples of status values and completion notes. It clearly distinguishes this from sibling tools like tasks_create and tasks_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on using status='done' or 'cancelled' and mentioning summary for completion notes. However, it does not contrast with siblings or exclusion scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

threads_updateAInspect

✏️ Update a conversation thread: rename it, add notes/description, or move to a folder.

When to use:

User wants to rename a chat or group
User wants to add notes/context about a conversation
User wants to organize threads into folders

For DM threads, renaming also updates the linked contact's display name by default. Requires thread_id from threads.list.

ParametersJSON Schema

Name	Required	Description
`title`	No	New title for the thread (max 255 chars)
`folder_id`	No	Move thread to this folder (null removes from folder)
`thread_id`	Yes	Thread ID from threads.list
`description`	No	AI context / notes for this thread. Empty string clears description.
`update_contact`	No	For DM threads, also rename the linked contact (default: true)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds key behavioral context: DM thread renaming also updates linked contact display name by default. No contradictions with annotations. Could mention side effects like folder removal via null folder_id.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with main action and clear bullet list for usage. Efficient but slightly redundant with schema info. Not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main use cases and special DM behavior. No output schema, but update tools often return simple confirmation. Missing details on error handling or folder removal, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 5 parameters with descriptions. Description reinforces requirement of thread_id and default behavior for update_contact but adds limited new meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'update' and resource 'conversation thread' with specific actions (rename, add notes/description, move). Distinguishes from sibling threads_list and other similar tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use' list covering three common scenarios. Mentions prerequisite (thread_id from threads.list). Lacks explicit 'when not to use' but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

videos_generateAInspect

Generate a short video (5-10s) from a text prompt using BytePlus Seedance. Optionally accepts up to 12 image file IDs from the user's attached files (visible in the [ATTACHMENTS] block) as reference_file_ids for style and composition. Returns immediately with a job_id; the video is delivered back via continuation when the job completes (~30-90s for fast model, ~2-5min for pro). Reference images are temporarily re-hosted on a third-party CDN (imgbb) for the duration of generation and deleted on completion — don't submit confidential references. Gated behind a workspace opt-in flag.

ParametersJSON Schema

Name	Required	Description	Default
`seed`	No	Random seed for reproducibility (0-2147483647). Omit for random.
`model`	No	Video model. Recommended: 'wan2.6-i2v-flash' (default, cheap, 720p/1080p, optional audio), 'wan2.6-i2v' (premium, always-on audio), 'wan2.6-t2v' (text-only input, 720p/1080p, no audio), 'wan2.2-i2v-flash' (cheapest, 480p/720p, no audio). Legacy BytePlus: 'seedance-2-fast', 'seedance-2-pro' (720p only).	wan2.6-i2v-flash
`style`	No	Style preset. Seedance models only. OMIT for no style preset.
`prompt`	Yes	Text description of the video to generate (3-4000 chars).
`duration`	No	Output video duration in seconds. Single-clip: 5 or 10. Long-form (chained, i2v models only): 15, 20, 30, 45, or 60. Long-form videos are silent (no audio in v1) and use only reference_file_ids[0] when refs are provided.
`shot_type`	No	Shot mode: 'single' (continuous) or 'multi' (scene cuts). wan2.6-t2v only. OMIT to use the model default.
`resolution`	No	Output resolution. '720p' is the safe default; '1080p' is wan2.6 only; '480p' is wan2.2-i2v-flash only. Per-model support enforced by validation.	720p
`aspect_ratio`	No	Output aspect ratio. Wan supports '16:9', '9:16', '1:1'; Seedance also supports '4:3', '3:4', '21:9'. Per-model support enforced by validation.	16:9
`camera_motion`	No	Camera motion preset. Seedance models only. OMIT for no camera motion.
`generate_audio`	No	Whether the model should produce native audio. For wan2.6-i2v-flash this doubles the per-second rate (e.g., 720p+audio is $0.05/s vs $0.025/s silent) — set False for cheaper silent clips. wan2.6-i2v always produces audio regardless of this flag. wan2.6-t2v / wan2.2-i2v-flash / seedance-2-fast never produce audio.
`negative_prompt`	No	Optional text describing what to AVOID in the output. Honored by Wan and Seedance models.
`reference_file_ids`	No	Optional list of up to 12 image file_ids to use as visual references (style, composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif). Get IDs from the [ATTACHMENTS] block, files.search, or search.files.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no informative annotations, the description compensates well by disclosing: async job_id return, estimated times for fast/pro models, CDN re-hosting and deletion of reference images, and workspace opt-in requirement. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of four sentences, front-loading the core purpose and then adding key details. Every sentence is necessary and well-placed, with no extraneous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description covers essential aspects: async delivery, timing, reference image handling, and gating. It might lack error handling details but is sufficient for a generation tool of moderate complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds value beyond the 100% schema coverage by explaining reference_file_ids usage (up to 12, from attachments, for style/composition), the CDN handling, and that duration 10s costs double. This enriches the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'generate', the resource 'short video', and the input 'from a text prompt using BytePlus Seedance'. It distinguishes from sibling tools like images_generate by focusing on video generation and mentions optional reference image IDs from attachments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides practical guidance: specifies optional reference files from attachments, mentions the async nature and delivery method, warns against confidential references, and notes workspace opt-in gating. It does not explicitly contrast with alternatives but is sufficient for the context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vision_queryA

Read-onlyIdempotent

Inspect

Look at the screen currently being shared in a meeting and answer a question about it. Returns a natural-language answer based on the visual content. Use ONLY when the user explicitly asks about the screen/slide/document being shown.

ParametersJSON Schema

Name	Required	Description	Default
`question`	Yes	Question about the shared screen.
`image_b64`	No	Base64-encoded JPEG image of the screen-share frame.

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, indicating safe, read-only behavior. The description adds context about returning a natural-language answer but does not disclose potential failure modes (e.g., no screen shared, ambiguous question) or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states function, second provides usage restriction. No filler, front-loaded, and every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the primary use case (visual QA on shared screen) and return type (natural-language answer). However, it does not clarify that image_b64 is optional or how the tool behaves when no image is provided (e.g., auto-captures vs. requires input). Given no output schema and simple parameters, it is minimally adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions ('Question about the shared screen.' and 'Base64-encoded JPEG image of the screen-share frame.'). The description adds no additional parameter info, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('look at' / analyze), resource ('screen being shared in a meeting'), and action ('answer a question'). It distinguishes itself from sibling tools like images_generate or images_search by explicitly focusing on shared screen content in meetings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: when the user explicitly asks about the screen/slide/document being shown. This implies when not to use (e.g., other queries, or no screen share). Could be more explicit about avoiding use for non-screen visual queries, but still clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_fetchAInspect

Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL.

Modes (extract):

'auto' (default): picks the right mode based on response content type.
'markdown': for HTML pages; returns cleaned markdown plus the page .
'text': for JSON/XML/plaintext APIs; returns the raw decoded body.
'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read.

Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn.

Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	URL to fetch (http or https). Must be publicly reachable.
`extract`	No	How to handle the response: 'auto' (default), 'markdown' (HTML → markdown), 'text' (raw body), or 'file' (ingest as binary, return file_id).	auto

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are present and description adds significant behavioral context: explains each mode's output (markdown with title, text raw body, file returns file_id and ingests bytes), and notes that files.upload is async. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose, then modes, then comparisons. Slightly verbose in listing all modes but each sentence adds unique value. Could be slightly more concise but still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description fully covers return values for each mode. Addresses when to use vs alternatives, async considerations, and file storage behavior. Complete for a moderately complex fetch tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage. Description adds meaning by clarifying that url must be publicly reachable, and expanding on each extract mode's purpose and return value, especially the file mode which returns a file_id for immediate use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches a single URL and returns content, with specific use cases like reading a link from web.search or a user-pasted URL. It explicitly distinguishes from sibling tools web_search and files.upload, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use the tool (specific URL known) and when not to (use web.search for finding URLs, use files.upload for async file uploads). Also details mode selection based on content type, leaving no ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web__local_searchA

Read-onlyIdempotent

Inspect

Multi-source web research with citations. Returns a synthesized answer with numbered [^1] markers and a citations array of {url, title, snippet, index}. Use for evidence-backed synthesis (competitive analysis, regulatory summary, whitepaper section). For quick fact lookups use web.search instead.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Research question. Specific scoped questions outperform vague keywords.
`language`	No	Search language hint (BCP-47, e.g. 'en', 'ru'). Defaults to 'en'. The synthesis output language matches the query language regardless.	en
`num_sources`	No	How many top search results to fetch and synthesize (1-4, default 4). Lower = faster + cheaper, higher = more comprehensive.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. Description adds that it synthesizes multiple sources, returns citations with numbered markers, and allows source count adjustment. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that cover purpose, output format, use cases, and alternative. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema, description adequately explains return format (synthesized answer with markers and citations array). Combined with schema and annotations, all necessary information is provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 3 params with descriptions. Description adds practical guidance: 'specific scoped questions outperform vague keywords' for query, explains language behavior, and explains trade-offs for num_sources.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it performs multi-source web research with citations, and distinguishes from sibling web.search for quick fact lookups. The verb 'research' and resource 'web sources with citations' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides use cases (competitive analysis, regulatory summary, whitepaper section) and directs to alternative (web.search) for quick lookups.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_searchA

Read-onlyIdempotent

Inspect

Search the web for current information, news, facts, prices, or events. Use this when the user asks about something that requires up-to-date information from the internet, or when internal knowledge base doesn't have the answer. Examples: recent news, stock prices, weather, product information, current events.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Search query - what to search for on the web.
`num_results`	No	Number of results to return (1-10).
`search_type`	No	Type of search: 'search' for general web, 'news' for news articles.	search

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnly, openWorld, idempotent, non-destructive hints. Description adds use-case context but no extra behavioral traits beyond what annotations and schema already convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, then usage, then examples. Every sentence adds value, no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description provides sufficient context for a search tool. Could mention return structure (list of results) but not critical for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description does not add new parameter-specific details beyond examples, which are already implied by schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool searches the web for current information, news, facts, etc., using specific verbs and examples. It distinguishes from internal knowledge base, making purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes when to use (up-to-date info, when internal KB lacks) and gives examples, but does not explicitly mention when not to use or differentiate from sibling tools like web_fetch or web__local_search.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_createAInspect

Create a new livechat widget for your website.

The widget will be created with default settings. You can customize theme, auto-reply mode, and more.

Use this when user wants to add a chat widget to their site.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	Name for the widget (e.g., 'Website Chat', 'Support Widget')
`position`	No	Widget position on screen	bottom-right
`display_mode`	No	Visual mode of the widget. Pick exactly one: - 'chat' (default): full chat panel + voice mic — use for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget (e.g. 'just a voice button', 'no chat, just call'). - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design' / 'no widget chrome'.	chat
`header_title`	No	Title shown in chat header	Chat with us
`primary_color`	No	Primary color for widget theme (hex, e.g., '#2563eb')	#2563eb
`auto_reply_mode`	No	Auto-reply mode: 'draft' (review before sending) or 'auto' (send immediately)	draft
`voice_button_label`	No	Localized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if omitted.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false (mutation) and destructiveHint=false (non-destructive). The description adds that the widget is created with default settings and customizable options, but does not disclose additional behavioral traits like response format or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, front-loading the action verb 'Create.' Every sentence adds value, avoiding redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 7-parameter complexity and absence of an output schema, the description covers the main purpose and customization options. However, it lacks information about what is returned after creation, which could be helpful for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all parameters. The description mentions 'customize theme, auto-reply mode, and more,' which summarizes some parameters but does not add new information beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Create a new livechat widget for your website,' which is a specific verb+resource action. Among sibling tools like widgets_delete and widgets_update, it clearly distinguishes itself as the creation tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes 'Use this when user wants to add a chat widget to their site,' providing clear context for when to use the tool. It does not explicitly state when not to use it or mention alternatives, but the purpose is direct and unambiguous.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_deleteA

DestructiveIdempotent

Inspect

Delete a livechat widget permanently.

This will remove the widget and its embed code will stop working. Existing chat history will be preserved.

Use this when user wants to remove a chat widget.

ParametersJSON Schema

Name	Required	Description	Default
`widget_id`	Yes	ID of the widget to delete

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint=true and idempotentHint=true. Description adds valuable context: permanent removal, embed stop, chat history preserved. This exceeds what annotations alone provide and does not contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three short sentences, front-loaded with the primary action. Every sentence adds necessary information without redundancy. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one required parameter, no output schema, and clear annotations, the description covers purpose, effect on other systems, and usage guidance comprehensively. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with widget_id described. Description does not add extra semantic details beyond the schema. Baseline 3 applies as schema already provides sufficient meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'Delete a livechat widget permanently', identifying the verb (delete) and the resource (livechat widget). It distinguishes from siblings like widgets_create, widgets_get, etc., which have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear usage context: 'Use this when user wants to remove a chat widget.' It also implies when not to use (if the user wants to keep the widget or needs a reversible action). However, it does not explicitly mention alternative tools like widgets_update or disabling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_getA

Read-onlyIdempotent

Inspect

Get full configuration of a single livechat widget.

Returns all settings including theme, identification, actions, and more.

Use this when user wants to see or verify a specific widget's settings.

ParametersJSON Schema

Name	Required	Description	Default
`widget_id`	Yes	ID of the widget to retrieve

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the tool is safe. The description adds value by specifying what is returned (theme, identification, actions, etc.), which aligns with annotations and provides extra context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, each adding value: tool purpose, return details, usage guidance. No wasted words, clearly structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get tool with one parameter, no output schema, and comprehensive annotations, the description is complete. It tells what it does, what it returns, and when to use it, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with a single required parameter widget_id. The description does not add meaning beyond the schema's own description of 'ID of the widget to retrieve.' Baseline 3 is appropriate as schema does the work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'get' and the resource 'widget', and specifies it retrieves full configuration of a single livechat widget. It distinguishes from siblings like widgets_list (which lists all widgets) by focusing on a single widget.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit context: 'Use this when user wants to see or verify a specific widget's settings.' It does not explicitly mention when not to use or alternatives, but the context is clear enough to differentiate from other widget tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_get_embed_codeA

Read-onlyIdempotent

Inspect

Get the embed code snippet for a livechat widget.

Returns HTML/JavaScript code to add to your website. The code should be placed before the closing tag.

Use this when user wants to install the chat widget on their site.

ParametersJSON Schema

Name	Required	Description	Default
`widget_id`	Yes	ID of the widget to get embed code for

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description adds that it returns HTML/JavaScript code and specifies placement before closing </body> tag, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, front-loaded with the main purpose. Every sentence adds value, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, but the description explains the return value (HTML/JavaScript code) and provides placement instructions. This is complete for a simple retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter (widget_id) with clear description. Schema coverage is 100%, so the description adds no new parameter details. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get the embed code snippet for a livechat widget', specifying the exact action and resource. It differentiates from sibling tools like widgets_get or widgets_create by focusing on embed code retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit usage guidance: 'Use this when user wants to install the chat widget on their site.' While it does not contrast with alternative tools, the context is clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_listA

Read-onlyIdempotent

Inspect

List all livechat widgets.

Returns widgets with their configuration, embed code, and status.

Use this when user wants to see their widgets or chat widgets.

ParametersJSON Schema

Name	Required	Description	Default
`active_only`	No	Only return active widgets. OMIT to include inactive widgets too.

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false, so the safety profile is covered. The description adds value by specifying return fields (configuration, embed code, status), but this is not critical for behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Remarkably concise: three sentences covering purpose, return values, and usage context. No extraneous words, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool, the description is complete. It explains what is returned despite no output schema, and the parameter is fully documented. No additional context needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage for the single parameter (active_only). The description does not add additional meaning beyond the schema's own description, so it meets the baseline without exceeding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('list') and resource ('all livechat widgets'), and specifies the return contents (configuration, embed code, status). It effectively distinguishes from sibling tools like widgets_get which focus on single widgets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage context: 'Use this when user wants to see their widgets or chat widgets.' While it does not mention when not to use it, the context is clear and implicitly differentiates from related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_updateAInspect

Update an existing livechat widget configuration.

You can change name, theme, auto-reply mode, and other settings. Only provided fields will be updated.

Use this when user wants to modify their chat widget settings.

ParametersJSON Schema

Name	Required	Description
`name`	No	New name for the widget
`position`	No	Widget position on screen. OMIT to leave the position unchanged.
`is_active`	No	Enable or disable the widget. OMIT to leave the active flag unchanged.
`widget_id`	Yes	ID of the widget to update
`website_url`	No	Website URL for product/site search integration
`calendly_url`	No	Booking URL for calendar action (e.g., 'https://calendly.com/yourname')
`color_scheme`	No	Widget color scheme. 'auto' follows the visitor's OS dark/light mode preference. OMIT to leave the color scheme unchanged.
`display_mode`	No	Visual mode of the widget. Pick exactly one: - 'chat': full chat panel + voice mic — default for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget. - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design'. OMIT to leave the display mode unchanged.
`header_title`	No	Title shown in chat header
`greeting_text`	No	Custom greeting message shown when visitor opens the chat (e.g., 'Hello! How can I help you today?')
`primary_color`	No	Primary color for widget theme (hex, e.g., '#2563eb')
`voice_greeting`	No	Spoken opening line when a visitor starts a voice call through this widget. Played via TTS before the AI model runs. Empty string disables the greeting.
`allowed_domains`	No	List of allowed domains for the widget
`auto_reply_mode`	No	Auto-reply mode: 'draft' or 'auto'. OMIT to leave the auto-reply mode unchanged.
`header_subtitle`	No	Subtitle shown in chat header
`greeting_enabled`	No	Enable or disable the proactive greeting. OMIT to leave this flag unchanged.
`greeting_behavior`	No	notification = show badge after delay; auto_open = open widget automatically after delay; on_open = greet only when visitor manually opens. OMIT to leave the greeting behavior unchanged.
`enable_form_action`	No	Enable or disable the contact form action button. OMIT to leave this flag unchanged.
`voice_button_label`	No	Localized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if not set.
`contact_form_fields`	No	Fields to collect in contact form (e.g., ['name', 'email', 'phone'])
`enable_search_action`	No	Enable or disable the search action button. OMIT to leave this flag unchanged.
`show_visitor_history`	No	Show full chat history to returning visitors. OMIT to leave this flag unchanged.
`identification_fields`	No	Fields to require for visitor identification (e.g., ['name', 'email'])
`enable_calendar_action`	No	Enable or disable the calendar booking action button. OMIT to leave this flag unchanged.
`greeting_delay_seconds`	No	Delay in seconds before the proactive greeting appears (0–300). 0 = send immediately on page load. Default: 30.
`require_identification`	No	Require visitor to identify before chatting. OMIT to leave the identification policy unchanged.
`returning_greeting_text`	No	Greeting for returning visitors who already have chat history (e.g., 'Welcome back! How can I help you today?'). Falls back to greeting_text if not set.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a mutation (readOnlyHint=false). The description adds that only provided fields are updated, which is useful. No mention of auth or rate limits, but annotations don't either.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with the main purpose upfront, no unnecessary words. Efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with many parameters and no output schema, the description provides a reasonable overview but lacks detail about the response or side effects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for 27 parameters, so description doesn't need to detail each. The description gives a high-level summary, which is adequate but not extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates an existing livechat widget configuration, with specific fields like name, theme, auto-reply mode. It distinguishes from sibling tools like widgets_create and widgets_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use this when user wants to modify their chat widget settings.' This is clear but could explicitly exclude other operations like creating or deleting.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_currentA

Read-onlyIdempotent

Inspect

Return the workspace this MCP API key is currently routed to, with the caller's role inside it. Use this to confirm context before/after workspace.switch.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint as safe. Description adds minor context about the return value (workspace + role) but does not disclose additional behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key information, no unnecessary words. Every sentence earn its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters, no output schema, and strong annotations, the description provides sufficient purpose and usage context for a simple read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema coverage is 100%. Description does not need to add parameter info; baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Return' and clearly identifies the resource: the current workspace with the caller's role. It distinguishes itself from sibling tools like workspace_switch (which changes workspace) and workspace_list (which lists all workspaces).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use this to confirm context before/after workspace.switch', providing a clear usage scenario and when to employ this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_listA

Read-onlyIdempotent

Inspect

List every workspace the caller is a member of, with is_current marking the workspace this MCP key is currently routed to. Pair with workspace.switch to change the active workspace without reconnecting.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent behavior. Description adds value by specifying the `is_current` marking of the workspace the MCP key is routed to, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the core action, no superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter, no-output-schema tool, the description fully covers purpose, usage, and output details. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters in schema, so the description's role is reduced. It clearly explains the output structure (list with `is_current` field), effectively covering what would otherwise be parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List every workspace the caller is a member of' with specific verb and resource, and mentions the `is_current` field. It distinguishes from sibling tools like `workspace.switch` and `workspace.current`.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises pairing with `workspace.switch` to change active workspace, providing clear context for when to use this tool and how it relates to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_switchAInspect

Re-point the active MCP API key to a different workspace. Pass exactly one of workspace_id or slug (find them via workspace.list). Takes effect on the very next tool call — no MCP reconnect, no new API key. Sequential checkpoint: do not parallelize tool calls across a switch — calls already in flight when the switch commits will run against the previous workspace.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	No	Workspace slug to switch to. Resolved within the caller's memberships, so cross-tenant slug collisions are not possible. Mutually exclusive with `workspace_id`.
`workspace_id`	No	Numeric workspace id to switch to. Mutually exclusive with `slug`.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the switch takes effect on the next tool call, requires no reconnect or new API key, and that in-flight calls run on the previous workspace—adding context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no redundant information. Every sentence serves a clear function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, behavioral notes, and constraints. No output schema needed; the description is complete for this simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already covers parameters fully (100% coverage). Description adds value by reiterating mutual exclusivity and how to obtain values via workspace.list, but does not deepen semantic meaning beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Re-point the active MCP API key to a different workspace') and the resource, distinguishing it from sibling tools like workspace_current or workspace_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to pass exactly one of workspace_id or slug, referencing workspace.list for discovery, and warns about sequential checkpointing (no parallelization across switch).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_commentA

DestructiveIdempotent

Inspect

Permanently delete a YouTube comment by id (or 'youtube:comment:'). Cannot be undone. Costs 50 quota units.

ParametersJSON Schema

Name	Required	Description	Default
`comment_id`	Yes	Bare commentId OR 'youtube:comment:<id>'.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description reinforces the destructiveHint annotation with 'permanently delete' and 'Cannot be undone', and adds quota cost information beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action, no filler. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with one parameter and no output schema, the description covers purpose, identifier format, permanence, and quota cost—sufficient for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter with 100% schema coverage. The description restates the identifier format provided in the schema, adding no new semantic meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (permanently delete), resource (YouTube comment), and identifier format (id or youtube:comment:<id>). It distinguishes from sibling tools like youtube_moderate_comment and youtube_post_comment_reply.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies irreversible deletion and mentions quota cost, guiding when to use (only if certain). However, it does not explicitly state when not to use or reference alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_videoA

DestructiveIdempotent

Inspect

Permanently delete a YouTube video by id (or 'youtube:video:'). Cannot be undone. Costs 50 quota units. Caller must own the channel.

ParametersJSON Schema

Name	Required	Description	Default
`video_id`	Yes	Bare videoId OR 'youtube:video:<id>'.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses permanent deletion, quota cost, and ownership requirement. Annotations already indicate destructive=true, and the description adds valuable context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with key information front-loaded. No redundant words; every sentence adds essential detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter destructive action with no output schema, the description covers purpose, constraints, cost, and prerequisites completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes the parameter with 100% coverage. The description merely restates the accepted formats, adding no new semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'permanently delete a YouTube video by id', providing a specific verb and resource. It clearly distinguishes from sibling tools like upload or list videos.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use: when the caller owns the channel and intends permanent deletion. It also warns of irreversibility and quota cost, guiding appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_commentsA

Read-onlyIdempotent

Inspect

List comment threads on a YouTube video. Pass video_id (e.g. 'dQw4w9WgXcQ') or channel_ref ('youtube:video:'). Returns top-level comments with inline replies.

ParametersJSON Schema

Name	Required	Description
`video_id`	Yes	YouTube videoId — bare 11-char form OR full 'youtube:video:<id>'.
`page_token`	No	Pagination cursor from a previous call's `next_page_token`.
`max_results`	No	Page size, 1-100. Default 25.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, destructiveHint, idempotentHint. Description adds return format (top-level comments with inline replies), which complements annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no waste, action and purpose front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given simple tool with 3 params, good annotations, and no output schema, description covers input format and output scope completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description adds alternative video_id format (youtube:video:<id>) and clarifies page_token usage implicitly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists comment threads on a YouTube video, using specific verb and resource. It distinguishes from siblings by targeting comments specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear instructions on how to pass video_id or channel_ref, but does not explicitly exclude when not to use or compare to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_videosA

Read-onlyIdempotent

Inspect

List videos on the connected YouTube channel. Returns id, title, published_at, view_count. Paginate via page_token.

ParametersJSON Schema

Name	Required	Description	Default
`page_token`	No	Pagination cursor returned in a previous call's `next_page_token`. Omit for the first page.
`max_results`	No	Page size, 1-50. Default 25.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and idempotentHint=true, so the safety profile is clear. The description adds value by detailing return fields (id, title, published_at, view_count) and pagination, which are beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the main action and then providing return fields and pagination. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with two parameters and no output schema, the description is complete: it specifies behavior, return fields, and pagination. Annotations cover safety, so no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds the pagination context for page_token and mentions return fields, but does not elaborate on max_results beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List videos on the connected YouTube channel' with a specific verb and resource. It distinguishes from sibling tools like youtube_video_query by scoping to the connected channel, and specifies return fields.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing own channel videos but lacks explicit when-to-use or when-not-to-use guidance. It does not compare with alternative tools like youtube_video_query, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_moderate_commentAInspect

Apply a moderation status to a YouTube comment. Allowed status values: heldForReview, published, rejected, spam. Costs 50 quota units.

ParametersJSON Schema

Name	Required	Description	Default
`status`	Yes	One of: heldForReview, published, rejected, spam.
`comment_id`	Yes	Bare commentId OR 'youtube:comment:<id>'.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate it's a non-read-only, non-destructive write operation. The description adds valuable behavioral info: quota cost of 50 units. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loads the key action and allowed values, and includes quota cost. Every word contributes value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a moderate-complexity tool with two parameters and no output schema, the description covers purpose, allowed values, and quota cost. It does not explain return values or error handling but is adequate for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats allowed status values but does not add new meaning beyond the schema. No additional examples or constraints provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Apply a moderation status to a YouTube comment'), specifies allowed status values, and mentions quota cost. It effectively distinguishes from sibling tools like youtube_delete_comment and youtube_list_comments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for changing moderation status but does not explicitly compare to alternatives or provide when-not-to-use guidance. The context of sibling tools suggests differentiation, but the description lacks explicit usage guidelines.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_post_comment_replyAInspect

Post a comment on a YouTube video, or reply to an existing comment. Pass video_id for a top-level comment, OR parent_comment_id to reply. AI-disclosure suffix appended automatically when configured.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	Comment body. 1-10000 chars. AI-disclosure suffix may be auto-appended.
`video_id`	No	Bare videoId or 'youtube:video:<id>' — for a top-level comment.
`parent_comment_id`	No	Bare commentId or 'youtube:comment:<id>' — for a reply.

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that an AI-disclosure suffix may be auto-appended, which is behavioral info beyond annotations. However, it does not mention other important aspects like required authentication, rate limits, or whether the comment becomes immediately visible.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise, front-loaded sentences with no wasted words. Each sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a relatively simple tool (3 parameters, no output schema), the description covers the core usage logic. It does not explain return value or error handling, but given the low complexity, it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already has 100% description coverage. The description adds the crucial OR relationship between video_id and parent_comment_id, which is not explicit in the schema. Also adds nuance about auto-appended suffix for the text parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool posts a comment or reply on YouTube. Distinguishes top-level comment vs reply, which differentiates from sibling tools like youtube_moderate_comment or youtube_delete_comment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly explains when to use video_id (for top-level comment) versus parent_comment_id (for reply). Provides clear usage context but does not mention when not to use this tool (e.g., use youtube_moderate_comment for moderation).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_update_videoAInspect

Update title, description, privacy, or tags on a YouTube video. Costs 1600 quota units. Only fields provided are changed.

ParametersJSON Schema

Name	Required	Description
`tags`	No	New tags list. Omit to keep current.
`title`	No	New title (max 100 chars). Omit to keep current.
`privacy`	No	'private', 'unlisted', or 'public'. Omit to keep current.
`video_id`	Yes	Bare videoId OR 'youtube:video:<id>'.
`description`	No	New description (max 5000 chars). Omit to keep current.

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds important behavioral detail: 'Costs 1600 quota units' and 'Only fields provided are changed' for partial update behavior. Annotations only provide readOnlyHint=false, etc., so description significantly enhances transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and key behavioral info, no unnecessary words. Highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, fields, quota cost, and update behavior. No output schema, but return value is likely standard. Could mention that video_id is required, but it's in schema. Overall complete for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all params. Description adds no new param details but reinforces the partial update concept ('Only fields provided are changed'), which adds slight value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Update title, description, privacy, or tags on a YouTube video', specifying the verb and resource, and lists the fields. This distinguishes it from sibling tools like youtube_upload_video or youtube_delete_video.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that only provided fields are changed, implying partial updates, but does not explicitly state when to use this tool versus alternatives like youtube_upload_video or when not to use it. No prerequisites or exclusions mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_upload_videoAInspect

Upload a workspace-owned video file (file_id) to the connected YouTube channel. Returns video_id + thread_id. Costs 1600 quota units. Default privacy is 'private' — pass privacy='public' to publish.

ParametersJSON Schema

Name	Required	Description	Default
`tags`	No	Optional list of tag strings (max ~500 chars total).
`title`	Yes	Video title (max 100 chars).
`file_id`	Yes	Workspace `files.id` of the video to upload. Must be a video/* MIME type and `status='ready'`. Get IDs from the [ATTACHMENTS] block, files.search, or search.files.
`privacy`	No	Privacy status. 'private' (default), 'unlisted', or 'public'.	private
`category_id`	No	YouTube category ID (default '22' = People & Blogs). See https://developers.google.com/youtube/v3/docs/videoCategories/list.	22
`description`	No	Video description (max 5000 chars). OMIT to upload without a description.
`made_for_kids`	No	COPPA flag. OMIT for the standard (non-kids) default.
`channel_account_id`	No	The connected YouTube channel_account.id. OMIT to auto-resolve the workspace's YouTube account.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a write operation, and the description adds costs (1600 quota units), default privacy, and return values (video_id, thread_id). This goes beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no unnecessary words. The first sentence introduces the core action and returns, the second adds important behavioral details. Perfectly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the upload action, prerequisites (file_id), returns, quota, and privacy. Given there is no output schema, it provides sufficient return info. Minor omissions like file size limits are acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and each parameter has a description. The tool description adds clarity on the privacy default and the file_id requirement, enhancing the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (Upload), the resource (workspace-owned video file to connected YouTube channel), and the returns. It distinguishes from sibling YouTube tools like delete or list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies that the file must be workspace-owned and indicates the default privacy behavior. It does not explicitly state when to use this tool vs alternatives, but the action is unique among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_video_queryA

Read-onlyIdempotent

Inspect

Ask Gemini about a YouTube video. Pass a video URL and any prompt — verbatim transcript with timestamps, summary, targeted Q&A about content or visuals, translation, etc. Works on any public/unlisted video.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	YouTube video URL. Supported forms: youtube.com/watch?v=…, youtu.be/…, youtube.com/shorts/…, m.youtube.com/watch?v=…. Pass-through to Gemini verbatim.
`prompt`	Yes	What to ask Gemini about the video. Examples: 'Provide a verbatim transcript with [HH:MM:SS] timestamps.' / 'What is the main claim made in the first 30 seconds?' / 'Describe what's shown on screen at 0:30.' / 'Translate the spoken Spanish to English.'

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds context about working on public/unlisted videos and the range of prompts, but this is supplementary. No contradiction is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with purpose, then usage details and examples. Every sentence adds value with no redundant information. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two well-described parameters and comprehensive annotations, the description covers all needed context: what it does, how to use it, constraints (public/unlisted videos), and example prompts. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with both parameters described. The description reinforces the usage ('Pass a video URL and any prompt') and gives examples for the prompt parameter, but the schema already captures the meaning effectively. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Ask Gemini about a YouTube video', which is a specific verb-resource combination. It distinguishes from sibling tools like youtube_delete_video or youtube_upload_video by focusing on querying and analysis. Examples of prompts further clarify the purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: 'Pass a video URL and any prompt' and lists use cases like transcript, summary, Q&A, translation. It also notes 'Works on any public/unlisted video.' However, it does not explicitly exclude alternatives or state when not to use, keeping it from a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Server Details

Available Tools

Pipeline A — content_html (canonical for decks, proposals, designed pages)

Slide structure (page_preset="slide_16_9")

Speaker notes

Images

Pipeline B — content_markdown (invoice / contract only)

Delivery contract (CRITICAL)

Exemplars

Invoice INV-{YYYYMMDD-HHMMSS}

Счёт-фактура № INV-{YYYYMMDD-HHMMSS}

Service Agreement

1. Scope of services

2. Term

3. Compensation

4. Confidentiality

5. Termination

6. Governing law

Договор оказания услуг

1. Предмет договора

2. Срок действия

3. Стоимость и порядок оплаты

4. Конфиденциальность

5. Расторжение

6. Применимое право

Pipeline A — `content_html` (canonical for decks, proposals, designed pages)

Slide structure (`page_preset="slide_16_9"`)

Pipeline B — `content_markdown` (invoice / contract only)