DialogBrain

Server Details

AI-powered unified inbox with MCP tools for managing conversations, contacts, and knowledge across WhatsApp, Telegram, Instagram, Email, and LinkedIn.

Status: Healthy
Last Tested: 2026-05-28 14:33
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.6/5.0

Tool DescriptionsA

Average 4.1/5 across 99 of 119 tools scored. Lowest: 2.9/5.

Server CoherenceA

Disambiguation4/5

Most tools have clearly distinct purposes with detailed descriptions. However, the sheer number (119) introduces some potential confusion, especially between agent_handoff and agents_ask, though they serve different roles. Overall, an agent can reliably distinguish tools.

Naming Consistency5/5

All tools follow a consistent snake_case pattern with a noun_verb structure (e.g., agents_create, messages_send, files_upload). No mixed conventions or unpredictable naming; the pattern is maintained across all 119 tools.

Tool Count1/5

119 tools is extremely excessive for any server. Even for a comprehensive platform like DialogBrain, this overwhelms both agents and users. Typical well-scoped servers have 3-15 tools; this is far beyond that, making navigation and selection impractical.

Completeness4/5

The tool surface covers a vast range of functionalities (agents, knowledge, messaging, contacts, LinkedIn, etc.) with few obvious gaps. Missing features like contact deletion or thread archiving are minor; overall, the domain appears well-covered.

Available Tools

173 tools

agent_handoffA

Read-onlyIdempotent

Inspect

Delegate a multi-step task (research, composing messages, booking, scheduling) to the full agentic planner. Use when a user ask needs more than a direct answer. The specialist runs synchronously — its response is already shown to the user in real-time. Summarize the OUTCOME in past tense (e.g. 'The Media Creator generated your video' or 'The Document Composer failed because...'). Do NOT say 'I will delegate' — the delegation already happened. If status is timeout or error, explain what went wrong and offer to retry.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	Execution mode: 'sync' (wait for result, default) or 'async' (fire and forget, child runs in background). Async is only available in background/trigger context.	sync
`agent_id`	No	Optional ID of another agent in the same workspace to delegate the task to. When set, this becomes cross-agent delegation; the target agent runs with ITS OWN prompt, tools, and model. Use this for specialty tasks (see agents.list to discover specialists). Prefer the in-loop variant (no `agent_id`) for one-off escalations. Spawns a new trace linked back to this trace via parent_trace_id (visible in the admin lineage card).
`task_description`	Yes	Plain-language description of what the planner should accomplish. Include everything the planner needs: the user's goal, constraints, and any context already gathered in this voice call.

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description fully compensates: discloses return value (`final_answer`), error handling (do not re-trigger), delegation behavior with `agent_id` (spawns new trace), and model override fallback logic.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, then usage notes and error handling. No redundant information, every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step delegation, return format, error scenarios), the description covers all necessary aspects. No output schema but the return value is explained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds valuable context: explains model override behavior when `agent_id` is set, describes trace linking for `agent_id`, and clarifies task_description should include all context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool delegates multi-step tasks to a planner and is for complex asks. It distinguishes from sibling tools by its unique purpose but does not explicitly differentiate from all siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (when more than a direct answer is needed) and provides a clear don't-re-trigger condition for timeout/error results. Does not list alternative tools but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_add_fileAInspect

Attach a file to this agent's private knowledge (agent-specific files, not shared with other agents).

Workflow:

Upload the file with files_upload (pass source_url for remote files)
Index it with files_ingest (pass the file_id)
Call this tool with agent_id + file_id

Returns chunk_count — shows 0 while still processing. Call agents.list_files later to see the final chunk count once indexing completes.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	file_id returned by files_upload or files_ingest
`agent_id`	Yes	ID of the agent to attach the file to

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses that files are agent-specific and not shared, and explains the async behavior where chunk_count returns 0 until indexing completes. This adds meaningful behavioral context beyond the schema, though it could be more comprehensive (e.g., permissions, idempotency).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a numbered workflow and clear sections. It is concise (5 sentences) with no filler, front-loads the purpose, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with only 2 parameters and no output schema, the description is remarkably complete. It covers prerequisites, workflow, return value behavior (chunk_count 0 while processing), and follow-up actions, leaving no significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. The description adds workflow context (e.g., file_id from upload/ingest) but does not significantly enhance the parameter meanings beyond the schema. Per rubric, baseline is 3 for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Attach a file to this agent's private knowledge (agent-specific files, not shared with other agents).' This uses a specific verb ('attach') and resource ('file to agent's knowledge'), and distinguishes it from siblings like agents_remove_file and agents_list_files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a detailed 3-step workflow (upload, index, attach) and notes that chunk_count shows 0 while processing, with a follow-up suggestion to use agents.list_files later. It implicitly tells when not to use (must have uploaded and ingested first) and gives explicit steps, fully guiding the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_approve_draftAInspect

Approve a pending agent draft and send the message.

The draft will be sent to the conversation it was generated for. You can optionally edit the text before sending.

Use this when user says:

'Approve this draft'
'Send this reply'
'Approve and send'
'Looks good, send it'

IMPORTANT: This will send a message to a real person.

ParametersJSON Schema

Name	Required	Description	Default
`draft_id`	Yes	ID of the draft to approve
`edited_text`	No	Optional edited response text (if user wants to modify before sending)

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full responsibility. It discloses that the action sends a message to a real person and that text can be optionally edited. This is a critical behavioral trait for an irreversible action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with only 4 sentences plus a list of examples. It is well-structured, front-loading the main purpose, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 parameters and no output schema, the description covers purpose, usage triggers, and a critical warning, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema provides full coverage with descriptions for both parameters. The description adds context by mentioning optional editing, but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Approve a pending agent draft and send the message.' It uses specific verbs and resources, and the list of example user phrases helps distinguish this from siblings like agents_reject_draft.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists example user phrases that trigger use, and it warns about sending to a real person. However, it does not explicitly state when not to use the tool or mention alternatives beyond the implied rejection sibling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_askAInspect

Send a message to an AI agent and get its response.

The agent runs with its configured prompt, tools, and knowledge. Use this to test agents or have them process a task.

Returns: {status: 'replied'|'silent', response_text, messages[], full_reply, model_used, tokens_*, send_mode, execution_mode}. messages[] carries each messages.send invocation the agent made (text, subject, reply_to_message_id, timestamp, message_id, attachments=[{file_id,name,mime}]). full_reply concatenates text only — attachment-only sends show up in messages but not full_reply. status='silent' iff both response_text is empty AND messages is empty.

Execution may take 10-60s depending on agent complexity.

ParametersJSON Schema

Name	Required	Description
`message`	Yes	Message/goal to send to the agent
`agent_id`	Yes	ID of the AI agent to ask
`send_mode`	No	Send mode for the agent run: 'draft' = create drafts, 'auto' = send directly. Defaults to the agent's configured default_send_mode. Does NOT change execution_mode — that is fixed by the agent's config.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses execution time (10-60s), explains the return value in detail, and describes the agent's behavior (runs with configured prompt, tools, knowledge). It lacks explicit safety/permission info but is otherwise transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat lengthy but well-structured: purpose first, then return format details. Every sentence adds value, though some output details could be condensed. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description provides a complete return format, including edge cases like 'silent' status and attachment-only sends. It covers timing, parameters' effect on output, and execution mode. Thorough and sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already described. The description adds value by clarifying send_mode's default behavior and its independence from execution_mode, going beyond the schema. No additional semantics needed for message and agent_id.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sends a message to an AI agent and gets its response, and distinguishes it from sibling tools like agent_handoff by specifying 'test agents or have them process a task'. It uses specific verbs and resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this to test agents or have them process a task', providing clear usage context. It does not explicitly mention when not to use or list alternatives, but the sibling names imply distinction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_createAInspect

Create a new AI agent in the workspace.

Execution modes:

ai_assisted (default, recommended): Two-phase AI — fast pre-classifier (Haiku) for keyword filtering and simple replies, then full AI with tools for complex messages. Best for: auto-replies, group monitoring, keyword-based filtering.
agentic: Autonomous multi-step agent with planning and tool execution. Best for: complex scheduled tasks, multi-step automation.
rule_based: Simple pattern matching without AI.

For keyword filtering: use ai_assisted mode + set keywords in trigger conditions (free, deterministic) and/or auto_reply_rules (smart, LLM-based) via agents.update.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Name of the AI agent (1-100 characters)
`prompt_id`	No	ID of the prompt to assign to this agent
`send_mode`	No	Default send mode: 'auto' or 'draft'. OMIT to use 'draft' (the default).
`description`	No	Optional description of what this agent does
`text_engine`	No	Text-execution engine: 'rule_based', 'ai_assisted', 'agentic' (default), or 'claude_channels'. Voice is derived from triggers, not engine. OMIT to use the default ('agentic').

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits. It explains the behavior of execution modes (two-phase AI, autonomous planning, simple pattern matching). However, it omits authorization needs, rate limits, side effects, and return value. Since no output schema exists, return format is unclear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly concise and front-loads the main purpose. The bullet-like list of execution modes uses clear formatting. Minor redundancy exists (e.g., mentioning default in both description and schema) but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 params, 2 enums, no output schema), the description covers the main purpose, execution modes, and parameter context well. Missing return value and error handling, but the richness of mode explanations compensates.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds substantial value beyond the schema. It elaborates on execution modes with detailed explanations and best-use scenarios, which are not present in the schema's enum descriptions. This helps the agent select the correct mode.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new AI agent in the workspace.' It specifies the action (create), resource (AI agent), and scope (workspace). This distinguishes it from sibling tools like agents_update (update) and agents_delete (delete).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit best-use context for each execution mode (e.g., 'Best for: auto-replies, group monitoring, keyword-based filtering' for ai_assisted). It also gives specific advice for keyword filtering to use ai_assisted mode. However, it does not explicitly state when not to use this tool or compare to alternatives like agents_add_file.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_deleteAInspect

Permanently delete an AI agent.

WARNING: This cannot be undone. The agent and all its triggers will be removed.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the agent to delete

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description bears full responsibility for behavioral disclosure. It explicitly warns that deletion is permanent and cannot be undone, and that all triggers will be removed, which are critical side effects. Minor omissions like return format or permissions are acceptable for a straightforward delete tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences with no filler. It front-loads the action and follows with a clear warning, making it easy for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation with a single parameter, the description covers purpose, irreversibility, and cascading deletion of triggers. It is adequate but could briefly mention the return value (e.g., success confirmation) for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the single parameter is already well-documented. The description does not add extra meaning beyond the schema's 'ID of the agent to delete,' so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Permanently delete an AI agent,' using a specific verb and resource. It distinguishes this tool from siblings like agents_create, agents_get, agents_update, and agents_trigger_delete, which handle different operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks guidance on when to use this tool versus alternatives. It does not mention prerequisites, confirmation steps, or conditions under which deletion is appropriate, leaving the agent without clear decision support.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_getA

Read-onlyIdempotent

Inspect

Get detailed information about a specific AI agent.

Returns full agent config including:

Execution configuration
Tool configuration
Knowledge configuration
Escalation configuration
Triggers list
Knowledge collections
Custom AI instructions (prompt_text)
Auto-reply rules override (auto_reply_rules)

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the AI agent to fetch

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry full burden for behavioral disclosure. It describes the return data but does not disclose behavioral traits such as permissions required, rate limits, or whether the operation is safe (non-destructive). The agent is left unaware of any side effects or access constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with a brief introductory sentence followed by a structured bullet list. Every sentence adds value, though the list could be slightly more compact. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description adequately explains the return value with a detailed list. However, it lacks any information about error conditions or access control, which would improve completeness. Overall, it is sufficient for a basic fetch operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the single parameter 'agent_id' is described as 'ID of the AI agent to fetch'. The description does not add additional meaning beyond the schema, so it meets the baseline without enhancement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get detailed information about a specific AI agent' with a specific verb and resource. The bullet list of returned fields further clarifies the scope. Among siblings, this tool is clearly distinct as a read operation, unlike agents_create, agents_update, or agents_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for fetching full agent details by ID but does not explicitly state when to use it versus alternatives like agents_list or agents_trace_get. There is no guidance on prerequisites or exclusions, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_silenceA

Read-onlyIdempotent

Inspect

End this turn without sending any message. Use when the thread is owned by a human operator after job.escalate, when the guest is self-resolving, when the message is a duplicate, or for observation-only turns. Calling this tool is the ONLY correct way to stay silent — narrated silence text (e.g. '(Staying silent…)', 'Internal:…') would be delivered to the guest verbatim.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	Yes	Free-form explanation for admin audit. Stored in trace_tool_executions.tool_params (ClickHouse String; reason filters are scan-only).

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description covers the key behavioral trait: ending the turn silently. It warns that narrated silence would be sent verbatim. It does not mention any side effects or required permissions, but for a simple silent action, the transparency is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no redundancy. The first sentence states the primary action, and the second provides use cases and a critical warning. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema, no nested objects), the description is complete. It covers purpose, usage conditions, and parameter behavior. No missing details are apparent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'reason' has a description that adds value beyond the schema: it explains the field is free-form, stored for audit in trace_tool_executions.tool_params, and that reason filters are scan-only. Schema coverage is 100%, but the extra context justifies a higher score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'End this turn without sending any message.' It specifies the verb (end) and resource (turn), and the phrase 'ONLY correct way to stay silent' distinguishes it from potentially confusing alternatives like narrated silence text.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance is provided: 'Use when the thread is owned by a human operator after job.escalate, when the guest is self-resolving, when the message is a duplicate, or for observation-only turns.' It also warns against using narrated silence text, making the conditions for use very clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_listA

Read-onlyIdempotent

Inspect

List all AI agents configured in the workspace.

Returns agents with their basic info, trigger count, and knowledge collection count.

Each agent's description field tells you when that agent is useful. If you're a router-style agent deciding whether to delegate via agent.handoff, read descriptions and pick the best fit.

Use this to:

See all configured AI agents
Filter by status (active/paused/archived)
Get agent IDs for further operations

ParametersJSON Schema

Name	Required	Description	Default
`status`	No	Filter by status ('active' / 'paused' / 'archived'). Omit for all.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It implies a read-only list operation but does not explicitly state no side effects. It describes the output structure, which adds some transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences plus a bullet list, front-loaded with main purpose, concise and no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers essential aspects: what it does, what it returns, how to filter. Lacks mention of pagination or limits, but for a simple list tool it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%. The description mentions filtering by enabled status but does not add significant meaning beyond what the schema already provides. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists all AI agents in the workspace, specifying the returned data (basic info, trigger count, knowledge count). It distinguishes from siblings like agents_get (single agent) and agents_list_drafts (drafts).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases: see all agents, filter by enabled/disabled status, get IDs for further operations. It does not explicitly state when not to use it, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_draftsA

Read-onlyIdempotent

Inspect

List pending agent drafts awaiting approval.

Shows drafts that have been generated by AI agents but not yet sent. Each draft includes:

Thread/conversation info
Trigger message (what prompted the reply)
Generated response text
Creation time and expiration

Use this when user asks:

'Show pending agent drafts'
'What messages are waiting for approval?'
'List drafts to approve'

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum number of drafts to return
`thread_id`	No	Filter by specific thread ID (optional)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses that drafts have expiration, creation time, and are unsent, but does not mention any safety or authorization needs. Given the read-only nature of listing, it is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: a clear purpose line, bullet points of included fields, and example queries. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the return fields (thread info, trigger, response, creation time, expiration) well, compensating for the lack of output schema. It does not cover pagination or ordering, but is otherwise complete for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for both parameters (limit and thread_id). The description adds no new meaning beyond what is already in the schema, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List pending agent drafts awaiting approval' and provides example user queries. It unambiguously identifies the tool's function and distinguishes it from siblings like agents_approve_draft and agents_reject_draft.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user asks...' and lists specific queries. It implies context for use, though it does not explicitly state when not to use it or name alternatives beyond the general sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_filesA

Read-onlyIdempotent

Inspect

List files directly attached to this agent (agent-specific files, not shared collections).

Returns file_id, title, status, and chunk_count for each file. chunk_count shows how many indexed chunks were created — 0 means the file is still processing.

Use agents.add_file to attach a new file, or agents.remove_file to detach one.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the agent whose files to list

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes return fields (file_id, title, status, chunk_count) and explains that chunk_count=0 means still processing. No annotations provided, so this is valuable behavioral disclosure. Does not explicitly state read-only, but it's implied by 'list'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences plus return field explanation and usage note. Front-loaded with primary function, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one param, no output schema), the description covers purpose, scope, return fields, and related tools. It is complete for an agent to select and use this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter agent_id, so baseline score applies. The description does not add additional meaning or context beyond the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists files attached to a specific agent, distinguishing from shared collections. The verb 'list' and resource 'files attached to this agent' are explicit and differentiate from sibling tools like collections_list_files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that it lists agent-specific files (not shared collections) and mentions related tools for add/remove (agents.add_file, agents.remove_file). Lacks explicit when-not or alternative comparisons, but the distinction is clear enough for typical use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_historyA

Read-onlyIdempotent

Inspect

List past versions of an agent's prompt_text. Every edit to the agent's prompt is snapshotted to an append-only table — use this tool to browse history, find a prior known-good version, and copy it into agents.prompt_restore.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max versions to return (1-200, default 50)
`agent_id`	Yes	ID of the agent
`before_version`	No	Cursor: return versions strictly below this version_number

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states that 'every edit to the agent's prompt is snapshotted to an append-only table,' disclosing the immutable, append-only nature of the history. It implies read-only behavior, which is sufficient given no annotations. No mention of auth needs or rate limits, but acceptable for a browsing tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, followed by usage guidance. Every word is meaningful; no redundancy. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description does not specify the return format (e.g., fields returned like version_number, prompt_text, timestamp). It mentions 'list past versions' but lacks details on what agents can expect, which is important for an API tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions (agent_id, limit, before_version). The description adds minimal extra semantics beyond contextualizing the parameters as part of a browsing history. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List past versions of an agent's `prompt_text`.' It uses a specific verb (list) and resource (past versions of prompt_text), and differentiates from the sibling tool `agents.prompt_restore` by mentioning the intended workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear use case: 'use this tool to browse history, find a prior known-good version, and copy it into `agents.prompt_restore`.' It gives context for when to use it but does not explicitly state when not to use it or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_restoreAInspect

Restore a past version of an agent's prompt_text by version_number. Creates a new version pointing at the restored content — history is preserved. Use agents.prompt_history first to find the version_number you want.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Optional: why this restore is happening (shows up in history UI)
`agent_id`	Yes	ID of the agent
`version_number`	Yes	The version_number to restore (get it from agents.prompt_history)

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that restore creates a new version and preserves history, indicating non-destructive behavior, which is important since no annotations are provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three well-structured, front-loaded sentences without any extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, outcome, and prerequisite adequately for this simple restore tool, with no missing elements given the lack of output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds context beyond the schema: explains reason parameter's purpose ('shows up in history UI') and that version_number comes from prompt_history.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it restores a past version of an agent's prompt_text by version_number, distinguishing it from sibling tools like agents_prompt_history and prompts_prompt_restore.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to first use agents.prompt_history to obtain the version_number, providing clear guidance on prerequisite steps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_reject_draftAInspect

Reject a pending agent draft without sending.

The draft will be marked as rejected and won't be sent. Use this when the generated response isn't appropriate.

Use this when user says:

'Reject this draft'
'Don't send this'
'Cancel this reply'
'Delete this draft'
'This response is wrong'

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	Optional reason for rejection (for logging/feedback)
`draft_id`	Yes	ID of the draft to reject

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the draft will be marked as rejected and won't be sent, which is basic behavioral information. However, without annotations, it lacks details on permanence, permissions, or side effects. The description is adequate for a simple action but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is extremely concise: two short sentences for the main function and a bullet list of use cases. No redundant information. Front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and the simple nature of the tool (reject a draft), the description covers the purpose and usage adequately. Could mention what the tool returns or any confirmation, but this is not critical for a basic rejection action.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for both parameters. The description does not add additional meaning or clarify usage beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (reject) and resource (pending agent draft). Distinguishes itself from sibling tools like agents_approve_draft by its focus on rejection. The verb and resource are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit example user phrases that trigger use, such as 'Reject this draft' and 'Don't send this'. It also gives a general condition ('when the generated response isn't appropriate'). Does not mention alternatives like agents_approve_draft, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_remove_fileAInspect

Remove a file from this agent's private knowledge.

The file itself is not deleted — it's just detached from this agent. Use agents.list_files to find the file_id to remove.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to detach (from agents.list_files)
`agent_id`	Yes	ID of the agent to remove the file from

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description clarifies a key behavioral trait: the file is detached, not deleted. However, with no annotations provided, it lacks details on permissions, side effects, or immediate impact on the agent's knowledge, which would make it more transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the purpose, and every sentence adds value. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple detach operation, the description covers purpose, behavioral nuance, and a prerequisite. It lacks error handling or post-condition details, but these are not critical for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already has 100% description coverage for both parameters. The description adds marginal value by referencing agents.list_files for file_id, but does not provide additional semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool removes a file from an agent's private knowledge. It distinguishes itself from siblings like agents_add_file by specifying detachment rather than deletion, and references agents.list_files for finding the file_id.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides a direct usage hint to use agents.list_files to obtain the file_id. While it does not explicitly state when not to use this tool, the context of removing a file is straightforward and the sibling tools (add, list) make the usage clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_simulate_inboundA

Read-onlyIdempotent

Inspect

Replay an inbound message on a thread through the real trigger pipeline and return what would have happened. The router auto-picks the winning enabled agent + trigger by priority/specificity (same logic as production). By default send_mode='draft' so no real message is sent; pass send_mode='auto' on a test account to let the matched agent actually deliver (drafts get overwritten by the next draft, so 'auto' is the only way to verify Telegram/email delivery end-to-end).

Use to verify routing for a thread: which agent answers, which trigger wins, or — when nothing matches — the structured skip reason. Pass blockchain_tx_data instead of message_text to simulate a blockchain:transfer event on the thread.

Returns: {matched: true, matched_agent: {id, name, execution_mode}, matched_trigger: {id, trigger_type, conditions, specificity_score}, routing_reason, response_text, messages[], execution_mode, send_mode, model_used, tokens_input, tokens_output, latency_ms, rag_queries_made, rag_results_used} on a hit, or {matched: false, skip_reason, simulator_warnings} on a miss.

ParametersJSON Schema

Name	Required	Description	Default
`send_mode`	No	How the matched agent should deliver its reply. 'draft' (default, safe) creates a draft only — no real send, no idempotency key. 'auto' lets the agent deliver through the channel adapter exactly as it would in production — use this on a test account to verify Telegram/email delivery end-to-end. Drafts get overwritten by the next draft on the thread, so 'auto' is required when you want to see the message persisted.	draft
`thread_id`	Yes	Thread ID to route the simulated event from. Must belong to the API key's workspace.
`message_text`	No	Inbound message body to simulate. Defaults to '[MCP simulation test]' when omitted.
`system_message`	No	Tag the simulated inbound as a system/service-message row (missed call, group join, pinned message, etc.) so the `excluded_system_message_kinds` trigger filter can be exercised end-to-end. Shape: {"category": <one of call_event \| membership_change \| contact_signup \| pinned_message \| chat_metadata_change \| voice_chat_event \| other_service>, "native_kind": <free-form upstream event class name, e.g. 'MessageActionPhoneCall'>}. The category is written into `message.meta.system_message` (mirroring the real Telegram ingest path) AND surfaced on the synthetic IncomingEvent so the trigger evaluator honors the block-list. Omit for a normal text-message simulation.
`blockchain_tx_data`	No	When set, simulate a blockchain:transfer event instead of a channel:message:new event. Expected keys: chain, to_address / from_address, tx_hash.
`attachment_file_ids`	No	Optional list of workspace file IDs to attach to the simulated inbound message — same shape as a real Telegram message with image/document attachments. Use this to test agent behavior on incoming messages that carry images (e.g. logos for invoices) or documents the agent must reference. File IDs must belong to the API key's workspace.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It thoroughly explains that send_mode is forced to 'draft', no real message is sent, no idempotency key is written, and routing uses production logic. It also details both hit and miss return structures, including skip reason and simulator_warnings.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with a front-loaded purpose paragraph, followed by usage guidance and return details. While it contains many details, each sentence serves a purpose. Slightly lengthy but justified by complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description provides full return structures for both hit and miss. It covers edge cases (skip reason, warnings) and multiple simulation types. For a 4-parameter simulation tool, this is exceptionally complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds value by explaining thread_id must belong to workspace, message_text default, blockchain_tx_data expected keys, and attachment_file_ids shape/usage (e.g., for logos or documents). This meaningfully extends the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool replays an inbound message through the real trigger pipeline to test routing. It distinguishes itself by forcing 'draft' mode (no real send) and mentions two simulation types (message_text or blockchain_tx_data). No sibling tool duplicates this functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use to verify routing for a thread' and mentions when no match returns skip reason. Does not explicitly state when not to use it, but the context (simulation vs real send) is implied. Could be improved by naming alternatives like agents_ask or actual send tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_task_completeAInspect

Report that a Claude Code agent task has been completed. Call this when you finish processing an agent_task from DialogBrain.

ParametersJSON Schema

Name	Required	Description
`success`	Yes	Whether the task completed successfully
`summary`	No	Brief summary of what was done
`trace_id`	Yes	Trace ID from the agent task event

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must carry burden. It states it 'reports completion' but doesn't disclose side effects, idempotency, or return behavior. Minimal transparency, but acceptable for a simple reporting tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no redundant information. Front-loaded with action and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 3 params, no output schema, and clear purpose, the description is adequate. Could mention if it is a one-way notification, but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description adds no extra context beyond what the schema already provides for each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Report that a Claude Code agent task has been completed' – clear verb and resource. It distinguishes from sibling tools like agents_create or agents_get by focusing on completion reporting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Call this when you finish processing an agent_task from DialogBrain' – clear when to use. Does not mention when not to use or alternatives, but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trace_getA

Read-onlyIdempotent

Inspect

Fetch the full execution detail for a single trace — tool executions, events timeline, LLM call spans (with error_message on failures).

Use after agents.traces_list identifies a specific trace of interest (failed run, slow run, unexpected outcome).

By default LLM system_prompt and prompt_messages are stripped — set include_llm_bodies=true to fetch them when diagnosing prompt engineering issues (emits a WARNING audit log). Set full=true to disable all field truncation. completion_text on failed LLM calls is always returned (capped at 8 KB).

ParametersJSON Schema

Name	Required	Description
`full`	No	Disable all field truncation. Escape hatch for a human operator. OMIT for the standard truncated view.
`agent_id`	Yes	Expected agent_id — used for scope validation. Mismatch returns not_found.
`trace_id`	Yes	Trace identifier returned by agents.traces_list.
`include_llm_bodies`	No	Include system_prompt and prompt_messages in LLM spans. Audited at WARNING level. OMIT to keep them stripped (the default).

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description carries full burden and excels: discloses default stripping of LLM bodies, that include_llm_bodies=true emits WARNING audit log, full=true disables truncation, and failed LLM completion_text is always returned (capped). Also mentions agent_id mismatch returns not_found.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, no redundancy. First sentence defines purpose, second gives usage, third explains defaults and flags with side effects. Efficient and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description doesn't fully explain return structure beyond mentioning error_message and completion_text. But it covers key behavioral aspects and parameter details. Lacks full return format, which would aid agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (baseline 3). Description adds value: for include_llm_bodies, adds 'when diagnosing prompt engineering issues'; for full, adds 'Escape hatch for a human operator'; for trace_id, specifies it comes from agents_traces_list; for agent_id, notes 'scope validation'. Adds behavioral context beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'Fetch' and resource 'full execution detail for a single trace', enumerating contents like tool executions, events timeline, LLM call spans with error_message. Distinguishes from sibling agents_traces_list by specifying it gets one trace's details.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to use after agents_traces_list identifies a trace of interest (failed, slow, unexpected). Provides context for when to use include_llm_bodies and full flags. No explicit exclusions or alternatives, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_listA

Read-onlyIdempotent

Inspect

List recent execution traces for an agent — the same data as /admin/requests, scoped to one agent and readable by an LLM.

Use this when an agent call timed out, drafted the wrong response, or you want to know which tool/LLM call burned the latency. Pair with agents.trace_get for full detail on a specific trace.

Filters: status, success, source (single value or comma-separated: agent,voice), date_from/date_to (ISO-8601), pagination via limit/offset.

Returns returned_count, dropped_on_page (should be 0 — positive means the backend agent_id predicate let something through), and has_more. Edge case: a raw page of all-dedup-dropped rows yields returned_count=0, has_more=true; re-call with offset += limit.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max rows per page (1–100).
`offset`	No	Rows to skip for pagination. OMIT to start at row 0 (default).
`source`	No	Filter by trace source. Single value or comma-separated, e.g. 'agent,voice'. Values: agent / auto_reply / agentic / outreach / voice. Note: source='agent' also matches voice traces today (known upstream bug).
`status`	No	Filter by status. OMIT to include all statuses.
`date_to`	No	ISO-8601 upper bound on created_at.
`success`	No	Filter to succeeded (true) or failed (false) runs only. OMIT to include both.
`agent_id`	Yes	Agent ID to pull traces for (must belong to your workspace).
`date_from`	No	ISO-8601 lower bound on created_at, e.g. '2026-04-10T00:00:00Z'.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavior: pagination edge case (returned_count=0, has_more=true when rows deduplicate), known upstream bug (source='agent' also matches voice), and filter details (comma-separated values, ISO-8601 dates).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized: first sentence states purpose and scope, then usage scenarios, then filter details, then return fields and edge cases. Every sentence adds value; no fluff or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description comprehensively covers filters, pagination behavior, edge cases, and known bugs. It explains return fields adequately for the tool's purpose without overloading.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds extra context: explains filter usage (e.g., comma-separated source values, ISO-8601 format for dates), pagination parameters (limit/offset), and the meaning of return fields (returned_count, dropped_on_page, has_more). This goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists recent execution traces for an agent, scoped to one agent, and explains it is the same data as /admin/requests, readable by an LLM. It distinguishes from siblings like agents.trace_get and agents_traces_stats by specifying use cases and pairing recommendations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly states when to use the tool (e.g., agent call timed out, wrong draft, latency debugging) and recommends pairing with agents.trace_get. While it does not explicitly list when not to use it, the context is clear and practical.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_statsA

Read-onlyIdempotent

Inspect

Aggregated trace statistics for one agent over the last N days — total runs, success rate, avg duration, error breakdown, top tools used, runs-per-day histogram.

Use this when you want a bird's-eye view of an agent's health before diving into individual traces with agents.traces_list / agents.trace_get. Scoped to the target agent (exact match, no substring bleed). days is capped at 30 — matches the ClickHouse request_traces TTL.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Rolling window in days (1–30).
`agent_id`	Yes	Agent ID to compute stats for (must belong to your workspace).

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but the description adds behavioral context: days capped at 30 due to ClickHouse TTL, exact match on agent_id. It implies read-only statistics but doesn't explicitly state permissions or side-effects, which is a minor gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first lists all statistics offered, second provides usage guidance and constraints. No redundant words, clearly structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description lists the statistics but doesn't detail the exact structure (e.g., histogram format, error breakdown keys). However, it is reasonably complete for a stats tool, especially with the TTL and scoping details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters. The description adds value beyond schema: 'exact match, no substring bleed' for agent_id and 'capped at 30 — matches ClickHouse request_traces TTL' for days.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides aggregated trace statistics (total runs, success rate, avg duration, error breakdown, top tools, histogram) for one agent, distinguishing it from sibling trace tools by framing it as a bird's-eye view.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use this for a bird's-eye view before diving into individual traces with agents_traces_list/agents_trace_get, and notes the exact match scoping and 30-day cap due to TTL.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_createAInspect

Create a new trigger for an AI agent.

Triggers determine when the agent activates.

Trigger types:

incoming_message: Activates on new incoming messages
schedule: Activates on a schedule
webhook: Activates on webhook events
event: Activates on system events

ParametersJSON Schema

Name	Required	Description
`enabled`	No	Whether the trigger is enabled. OMIT to use the default (true).
`agent_id`	Yes	ID of the agent to create a trigger for
`priority`	No	Trigger priority — lower numbers run first (default: 100)
`send_mode`	No	Send mode override for this trigger. OMIT to inherit from the agent.
`conditions`	No	Trigger conditions (JSON). Supported fields for incoming_message: - keywords: ["pricing","demo"] — message must contain keyword(s) (free, no LLM cost) - keyword_match: "any" (default, OR) or "all" (AND) - channel_types: ["telegram","whatsapp","livechat_voice","twilio_voice","telegram_voice","voice",...] — filter by channel. For voice, use EITHER the three per-channel keys (scoped) OR "voice" alone (wildcard matching all three) — mixing them is redundant. Per-channel keys: "livechat_voice" (web widget), "twilio_voice" (PSTN inbound), "telegram_voice" (Telegram p2p calls) - context_types: ["dm","group","channel","livechat"] — filter by chat type - group_mode: "mentions_only" or "questions" — for group chats - channel_account_ids: ["123"] — restrict to specific accounts - folder_ids: [5,10] — restrict to threads in folders - ai_tag_ids: [1,2] — restrict to threads with AI tags - ai_filter_ids: [1,2] — semantic intent filters (message matched via embedding similarity, works in noisy groups) - ai_filter_mode: "any" (default, OR) or "all" (AND) — how multiple AI filters combine - ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. Prefer referencing existing filters by id when available. Use ai_filters.create + ai_filters.test for fine-tuning before assigning. - contact_states: ["active"] — filter by contact state - cooldown_seconds: 30 — min gap between runs per thread - max_runs_per_thread_per_hour: 5 — rate limit Supported fields for job_completed (proactive callback when a delegated job finishes): - source_agent_id: <int> — fire only when this agent's job completed - source_agent_slug: <str> — alternate to source_agent_id - job_type: "agentic_session" — match a specific job type (default: any) - outcome: ["completed"] \| ["escalated"] \| ["completed","escalated"] — default ["completed"] - min_duration_seconds: <int> — skip very-short jobs (noise filter) - thread_filter: {thread_ids: [<int>...]} — restrict to specific threads
`thread_ids`	No	Restrict this trigger to specific threads (chats) by their numeric thread IDs. When set, the trigger only fires for messages in these threads. Maps to conditions.thread_filter.thread_ids.
`trigger_type`	Yes	Type of trigger: 'incoming_message', 'incoming_call', 'voice_transcript', 'schedule', 'webhook', 'event', 'blockchain_event', or 'job_completed'

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full behavioral disclosure burden. It explains trigger types and condition semantics in detail, including how fields like keyword_match and channel_types work. However, it does not mention side effects (e.g., immediate activation, dependencies on agent state) or the response structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear header, bullet points for trigger types, and a detailed conditions section. It is front-loaded with the main purpose. However, the conditions section is lengthy and could be more compact.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested conditions), the description covers the main functionality and condition details. However, it omits the return value (e.g., created trigger details) and does not mention prerequisites beyond the schema's required fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value by explaining the conditions object in depth, including supported fields, defaults, and usage guidelines (e.g., for voice channels and AI filters). This far exceeds the schema's brief description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Create a new trigger for an AI agent' and lists trigger types and their purposes. The verb 'create' is specific, and the resource 'trigger' is well-defined, distinguishing it from sibling tools like agents_trigger_update or agents_trigger_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on what the tool does and the types of triggers, but does not explicitly state when to use it over alternatives (e.g., updating an existing trigger). No 'when not to use' or alternative recommendations are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_deleteAInspect

Delete a trigger from an AI agent.

WARNING: This cannot be undone.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the agent that owns this trigger
`trigger_id`	Yes	ID of the trigger to delete

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It correctly warns that deletion is irreversible, which is critical for a destructive operation. However, it does not disclose prerequisites, side effects, or return behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences. The action is front-loaded, and the warning follows directly. No unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with two parameters and no output schema, the description covers the essential purpose and a key behavioral aspect (irreversibility). It could mention the return value or confirmation, but overall it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides clear descriptions for both parameters (agent_id and trigger_id). The description adds no additional meaning beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Delete a trigger from an AI agent.' Uses a specific verb and resource, and distinguishes from sibling tools like agents_trigger_create and agents_trigger_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives. A warning about irreversibility is given, but no mention of when to prefer this over other actions (e.g., disabling a trigger).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_updateAInspect

Update an existing AI agent trigger.

All parameters are optional — only provided fields will be updated.

ParametersJSON Schema

Name	Required	Description
`enabled`	No	Enable or disable this trigger. OMIT to leave the enabled flag unchanged.
`agent_id`	Yes	ID of the agent that owns this trigger
`priority`	No	Trigger priority — lower numbers run first
`send_mode`	No	New send mode override. OMIT to leave the send-mode unchanged.
`conditions`	No	New trigger conditions (replaces existing). Same fields as trigger_create: keywords, keyword_match, channel_types, context_types, group_mode, channel_account_ids, folder_ids, ai_tag_ids, ai_filter_ids, ai_filter_mode, ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. contact_states, cooldown_seconds, max_runs_per_thread_per_hour
`thread_ids`	No	Restrict this trigger to specific threads (chats) by their numeric thread IDs. When set, merged into conditions.thread_filter.thread_ids. If conditions is also provided, thread_ids is merged into it.
`trigger_id`	Yes	ID of the trigger to update
`trigger_type`	No	New trigger type. OMIT to keep the existing type unchanged.

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry full burden. It only states update and partial update behavior, omitting details on idempotency, side effects (e.g., conditions replacement), authorization needs, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with clear purpose and key usage note. No fluff, front-loaded, and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 8 parameters (including nested objects) and no output schema, the description is adequate but incomplete. It does not specify return values or behavior on failure, which would aid an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. The description adds value by clarifying that only provided fields are updated, which goes beyond the schema's individual parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update an existing AI agent trigger,' specifying the action and resource. It is distinct from sibling tools like agents_trigger_create and agents_trigger_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes that all parameters are optional, implying partial updates, but does not explicitly contrast with create or delete, nor does it provide when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_updateAInspect

Update an existing AI agent's configuration.

All parameters are optional — only provided fields will be updated.

Use this to:

Enable or disable an agent
Change agent name or description
Assign or detach a prompt
Change default send mode
Replace knowledge collections
Update agent status
Change agent priority for trigger matching (lower number = higher priority)
Override which tools the agent can/can't call on triggered runs
Override which context sections (situation, communication style, job state, conversation history, thread summary) the agent receives
Opt into boilerplate prompt sections (safety guidelines, data confidentiality, factual accuracy) — all default OFF

ParametersJSON Schema

Name	Required	Description
`name`	No	New name for the agent
`model`	No	Canonical source for which LLM the agent runs on. To switch models pass JUST this — do NOT also rewrite prompt_text (any 'duty model' section in the prompt is stale doc, not the config). OMIT to leave the model unchanged.
`status`	No	Agent status: 'active', 'paused', or 'archived'. OMIT to leave the status unchanged.
`agent_id`	Yes	ID of the agent to update
`priority`	No	Agent priority for trigger matching. LOWER number = HIGHER priority (wins tiebreaks). Typical range 1-100. Fallback auto-reply agents use 10; specialised/topical agents use 100. When two agents match the same incoming message, the one with the lower priority number fires.
`prompt_id`	No	Prompt ID to assign (null to detach)
`send_mode`	No	Default send mode: 'auto' or 'draft'. OMIT to leave the send-mode unchanged.
`fast_model`	No	Model for the fast-path responder (voice, text auto-reply, agent executor). Defaults to claude-haiku-4-5-20251001 when unset. Non-Anthropic models (deepseek-chat, gpt-4.1-nano, kimi-k2.6) do NOT use BYOK today — they use the system API key + credits. Pass null to revert to default.
`api_surface`	No	OpenAI HTTPS endpoint for this agent's LLM calls (Phase 3a). 'chat_completions' (default, also when null) routes to /v1/chat/completions. 'responses' routes to /v1/responses — required for OpenAI native server tools (web_search, code_interpreter, image_generation, input_file PDFs). Capability still wins: agents whose tool list triggers the server_tool_responses_api substitution always route to Responses regardless of this setting. Ignored on non-OpenAI models (Anthropic, DeepSeek, Moonshot). OMIT to leave the api_surface unchanged.
`description`	No	New description for the agent
`prompt_text`	No	DESTRUCTIVE — REPLACES the entire system prompt. Pass ONLY when the user explicitly asks to edit/rewrite the prompt. To READ the prompt use prompts.get. When updating other fields (model, name, …) OMIT this. To append, prompts.get first then concatenate. Pass null to revert to the linked template.
`text_engine`	No	Text-execution engine: 'agentic', 'ai_assisted', 'rule_based', or 'claude_channels'. Replaces the legacy execution_mode field (20260523_002). Voice is now derived from triggers, not engine. OMIT to leave unchanged.
`denied_tools`	No	Block-list of tool IDs the agent must not call on triggered runs. Applied after allowed_tools and default visibility. Empty list [] = clear the block-list.
`allowed_tools`	No	Explicit allow-list of tool IDs this agent can call on triggered runs (e.g. ['messages.send', 'agent.handoff']). Empty list [] = clear the allow-list and fall back to system defaults. When set, only these tools (minus denied_tools) are exposed to the agent. Does NOT affect the My AI dropdown path.
`vision_enabled`	No	Per-agent opt-in for vision content. When true, the executor splices recent image attachments from the active thread into the LLM call (Phase 3a continuous vision for Meet bot screen-share, plus any future channel that uploads images). Requires the agent's model to support vision (model_has_vision check). Default false; new calls pay zero token cost until the operator opts in. OMIT to leave the vision flag unchanged.
`voice_greeting`	No	Opening line the agent speaks when the call connects. Pass an empty string "" to clear. Omit or null leaves unchanged.
`voice_stt_model`	No	Speech-to-text model: 'flux' (alias for flux-general-en), 'flux-general-en' (English Flux, LLM-powered end-of-turn), 'flux-general-multi' (multilingual Flux), or 'nova-3' (silence-based fallback). Flux variants are more responsive; nova-3 is the fallback when your Deepgram plan lacks Flux. OMIT to leave the STT model unchanged.
`voice_tts_speed`	No	TTS playback speed multiplier (0.5-2.0, default 1.0). Yandex/OpenAI/Cartesia only — ignored for Deepgram.
`voice_tts_voice`	No	TTS voice id — provider-specific (e.g. 'aura-2-thalia-en' for Deepgram, 'alloy' for OpenAI, 'alena' for Yandex, Cartesia voice UUID). Pass null to revert to provider default.
`auto_reply_rules`	No	Plain-English rules injected into the fast model's system prompt as a `## Rules` block. No reserved keywords — the fast model reads them as guidance and decides per turn whether to reply directly or escalate to the main model for tools. Example: '- If the user greets, reply "Hi! How can I help?"\n- If the user asks what you can do, reply with a 1-sentence summary\n- If the question needs live data (prices, stock, booking), escalate' Engagement filtering (SKIP) belongs in trigger `conditions` (keywords, ai_filters, channel_types, cooldown), NOT here — if a message should be ignored the trigger shouldn't have fired. Pass null to clear.
`voice_max_tokens`	No	Max TTS tokens per voice reply (40-200, default 100). Lower = snappier, higher = more detail.
`include_job_state`	No	Include current job state (active job context, tasks, notes) in the agent's prompt. OMIT to leave this flag unchanged.
`include_situation`	No	Include situation context (channel, sender info, trigger type) in the agent's prompt. OMIT to leave this flag unchanged.
`voice_stt_keyterms`	No	Domain-vocab bias for STT — names, product SKUs, etc. Passed verbatim as repeated `&keyterm=<w>` query params. Works on both Nova-3 and Flux. Prefer short phrases over full sentences. Empty list [] = no bias. Omit leaves unchanged.
`voice_stt_language`	No	STT language hint. 'multi' (default) enables code-switching; singletons like 'en', 'ru', 'es' give higher accuracy when the caller language is known. Use 'multi' for bilingual callers. OMIT to leave the STT language unchanged.
`voice_tts_language`	No	TTS language code, BCP-47 lite e.g. 'en', 'es', 'pt-BR' (Cartesia only, default 'en').
`voice_tts_provider`	No	Text-to-speech provider: 'deepgram' (default, Aura-2 EN-only), 'openai' (multilingual), 'yandex' (best Russian), or 'cartesia' (Sonic-3 ultra-low TTFB). OMIT to leave the TTS provider unchanged.
`include_specialists`	No	Inject a [SPECIALISTS] block (~50–200 tokens) listing the workspace's delegation-capable agents so a router-style agent can pick a handoff target without first calling agents.list. Default OFF for new agents; the Router template ships with this ON. Agentic mode only. OMIT to leave this flag unchanged.
`voice_primary_model`	No	Primary LLM for voice turns (e.g. 'gpt-4.1-mini', 'claude-haiku-4-5-20251001'). gpt-4.1-nano is too weak for reliable turn tracking; mini is the recommended floor. Pass null to revert to default.
`fast_prompt_override`	No	Full fast-path prompt override. Placeholders substituted via .replace(): {message}, {history}, {rules}, {tools}, {output_contract}. agent.prompt_text is NOT injected into fast_prompt_override — include it yourself if you want it. Pass null to clear.
`voice_filler_enabled`	No	Emit 'thinking' filler audio while tools run so the caller hears life on the line (default true). OMIT to leave this flag unchanged.
`voice_max_tool_calls`	No	Max tool calls per voice turn (1-10, default 3). OMIT to leave unchanged.
`voice_thinking_texts`	No	Pool of phrases spoken while the agent sets up the turn before calling the LLM (e.g. ['Hmm', 'So', 'One sec']). Pre-rendered to PCM at call start; one is picked at random per turn so the agent doesn't repeat the same word. Pass [] to clear. Omit or null leaves unchanged.
`include_learned_style`	No	Include learned communication style (per-contact tone, dormancy state) in the agent's prompt. OMIT to leave this flag unchanged.
`include_thread_summary`	No	Include condensed summary of older thread messages in the agent's prompt. OMIT to leave this flag unchanged.
`include_factual_accuracy`	No	Inject the Factual Accuracy block (~100 tokens, generic anti-hallucination rules) into the system prompt. Default OFF — skip if you write domain-specific accuracy rules in Instructions. Agentic mode only. OMIT to leave this flag unchanged.
`knowledge_collection_ids`	No	Replace all knowledge collections with these IDs (empty list = clear all)
`include_safety_guidelines`	No	Inject the generic Safety Guidelines block (~80 tokens) into the system prompt. Default OFF — enable only if you don't already write safety rules in your Instructions. Agentic mode only. OMIT to leave this flag unchanged.
`include_tool_call_history`	No	Include the agent's own tool calls and results from the last 3 runs on this thread, compacted to IDs + top hits (~200-1000 tokens). Lets the agent recall file IDs, search hits, and decisions it already made across turns. Default ON. Agentic mode only. OMIT to leave this flag unchanged.
`voice_endpointing_min_delay`	No	Silence after end-of-utterance before agent replies (0.1-2.0s, default 0.3). Higher = fewer false interrupts; lower = snappier.
`voice_preemptive_generation`	No	Speculatively start the LLM on STT partials so the agent begins responding before end-of-utterance. Matches LiveKit stock template. Default true. OMIT to leave this flag unchanged.
`include_conversation_history`	No	Include recent messages from this thread (up to 20) in the agent's prompt. OMIT to leave this flag unchanged.
`include_data_confidentiality`	No	Inject the Data Confidentiality block (~250 tokens, cross-contact PII isolation + prompt-injection defense) into the system prompt. Recommended for multi-tenant workspaces. Default OFF. Agentic mode only. OMIT to leave this flag unchanged.
`voice_greeting_interruptible`	No	Allow the caller to barge in during the opener TTS. Default true (trial-friendly — long greetings can be interrupted). Set false on outbound-call agents whose configured opener would otherwise get preempted by the caller's 'Hello?' triggering an off-script auto-turn. OMIT to leave this flag unchanged.
`voice_interruption_min_duration`	No	Min caller speech duration to interrupt the agent (0.1-1.5s, default 0.25). Higher = ignore short fillers like 'uh-huh'.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries full burden. It is highly transparent, noting that all parameters are optional and only provided fields are updated, and includes warnings for destructive actions like prompt_text replacement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear opening sentence and bulleted list of use cases. It is appropriately sized for a complex tool with 43 parameters, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and lack of output schema, the description is highly complete, covering all update scenarios and parameter behaviors. It effectively compensates for missing annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by grouping use cases and explaining parameter intent (e.g., priority explanation), going beyond the schema's per-parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update an existing AI agent's configuration.' It provides a specific verb ('Update') and resource ('AI agent'), distinguishing it from siblings like agents_create or agents_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists many use cases with bullet points, providing clear context on when to use the tool. However, it does not explicitly mention when not to use it or compare with alternatives, though the sibling list offers implicit differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_createAInspect

Create a new AI filter for semantic intent-based message matching.

AI filters use vector embeddings (via Voyage AI) to detect whether an incoming message matches a specific intent or topic. The filter's description is embedded as a reference vector at creation time. When a message arrives, its embedding is compared against this reference using cosine similarity.

The description field is the most important part — it becomes the reference embedding that all incoming messages are compared against. Write it as a clear statement of what kind of messages should match:

'Customer asking about pricing, subscription plans, or billing'
'User reporting a bug, crash, or unexpected behavior in the product'
'Inbound sales lead expressing interest in purchasing or trialing'

The threshold controls sensitivity: 0.5 is a balanced default, lower values (0.3) cast a wider net, higher values (0.8) require closer matches.

Note: This tool calls the Voyage AI embedding API to generate the reference vector.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Filter name — a short, human-readable label (max 100 chars)
`threshold`	No	Cosine similarity threshold for a message to be considered a match. Range 0.1–1.0. Default 0.50. Lower values (e.g. 0.3) are more permissive and catch more messages. Higher values (e.g. 0.8) require closer semantic similarity.
`description`	Yes	Reference text that defines what messages should match this filter. This text is embedded as a vector and used for cosine similarity comparison against all incoming messages. Be specific and descriptive — the quality of this text directly determines filter accuracy. E.g. 'Customer asking about pricing, subscription costs, or billing issues'. Max 500 chars.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries the full burden. It discloses that the tool calls the Voyage AI embedding API to generate the reference vector, which is valuable. However, it does not mention potential side effects (e.g., cost, rate limits, authentication needs, or what happens on failure). The threshold behavior is well explained, but details about the creation process (e.g., immediate activation) are missing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: a concise summary, then technology explanation, usage tips, threshold guidance, and a note about the external API call. It is informative without being overly verbose. A slight trimming of some sentences could improve conciseness, but overall it is efficient for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has three parameters (all well-documented in schema) and no output schema, the description covers purpose, mechanism, parameter importance, and external API dependency. However, it does not explain the return value (likely a filter identifier) or state whether creation is asynchronous or synchronous. Prerequisites (e.g., API key) are not mentioned. For a creation tool, it is fairly complete but leaves some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes all three parameters (name, threshold, description) with 100% coverage. The description adds significant value by stressing the importance of the description field, providing concrete examples, and explaining threshold values and their impact. This goes beyond the schema and helps the agent craft proper inputs.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: creating an AI filter for semantic intent-based message matching. It explains the underlying technology (vector embeddings via Voyage AI) and how the filter works (embedding description as reference vector, cosine similarity comparison). This distinguishes it from sibling tools like ai_filters_list, ai_filters_update, which have different operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides good usage guidance: it explains when to create a filter, emphasizes the importance of the description field with examples, and advises on threshold tuning (0.5 default, lower for wider net, higher for precision). However, it does not explicitly state when not to use it or compare to alternatives like ai_filters_test or ai_filters_update, which would improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_deleteA

DestructiveIdempotent

Inspect

Permanently delete an AI filter.

When to use:

User wants to remove a filter they no longer need

This action cannot be undone. Any triggers that reference this filter by ID will no longer match it — review and update those triggers after deletion.

ParametersJSON Schema

Name	Required	Description	Default
`filter_id`	Yes	ID of the filter to delete

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It clearly states the action is permanent ('cannot be undone') and warns about triggers that reference the filter by ID no longer matching, disclosing important side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with two short paragraphs, front-loading the purpose and then providing necessary warnings. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete operation with one required parameter and no output schema, the description is complete. It covers the action, irreversibility, and impact on triggers, which is sufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already has a description for 'filter_id' as 'ID of the filter to delete,' achieving 100% coverage. The description adds no further parameter details, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Permanently delete an AI filter,' which uses a specific verb and resource, clearly distinguishing it from sibling tools like create, list, or update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'When to use: - User wants to remove a filter they no longer need,' giving clear context for usage. It does not provide explicit when-not-to-use or alternative tools, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_listA

Read-onlyIdempotent

Inspect

List all AI filters for the current workspace.

AI filters are semantic intent-based message filters that use embeddings (vector representations) to detect whether an incoming message matches a specific intent or topic. Unlike keyword filters, they understand meaning: 'I need help with my order' and 'my package hasn't arrived' both match a 'shipping support' filter even without shared keywords.

Each filter stores a reference embedding of its description. When a message arrives, its embedding is compared via cosine similarity against the filter's reference vector. If the similarity exceeds the threshold, the filter matches.

When to use:

Check which semantic filters already exist before creating a new one
Get filter IDs for use in trigger conditions
Review thresholds and active status of existing filters

Returns all filters with id, name, description, threshold, and is_active.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries full burden. It correctly implies a read-only operation ('list all'), explains the embedding-based matching mechanics, and specifies the return fields. It could mention potential limitations like pagination but is transparent overall.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear purpose, technical explanation, usage guidance, and return value specification. Every sentence is informative and non-redundant. It is concise yet comprehensive.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description fully explains the return fields (id, name, description, threshold, is_active). It provides sufficient context about the tool's role within the AI filtering system, making it complete for an agent to understand and use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters, and schema coverage is 100%. Per the guidelines, 0 parameters baseline is 4. The description adds context about what the tool returns and how filters work, but no parameter documentation is needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The first sentence clearly states the tool lists AI filters for the workspace. It then explains what AI filters are (semantic/intent-based) and how they differ from keyword filters, making the tool's purpose specific and distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'When to use' section explicitly lists three concrete scenarios: checking existing filters before creation, getting IDs for triggers, and reviewing thresholds/active status. This provides strong guidance for when this tool is appropriate versus sibling tools like ai_filters_create.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_testA

Read-onlyIdempotent

Inspect

Test a message against an AI filter to check whether it would match.

This tool embeds the provided message using Voyage AI and computes the cosine similarity between the message vector and the filter's stored reference vector. It returns the similarity score, whether the message would match (similarity >= threshold), and the filter's threshold value.

Use this to:

Verify a filter works as intended before using it in a trigger
Tune the threshold by testing borderline messages
Debug why a message did or did not match a filter in production

Returns: {similarity: float, matched: bool, threshold: float}

Note: This tool calls the Voyage AI embedding API to embed the test message.

ParametersJSON Schema

Name	Required	Description	Default
`message`	Yes	The message text to test. This is embedded and compared against the filter's reference vector via cosine similarity.
`filter_id`	Yes	ID of the filter to test against

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully bears the responsibility of disclosing behavior. It reveals that the tool calls the Voyage AI embedding API, computes cosine similarity, and returns similarity score, match boolean, and threshold. It does not mention rate limits, authentication needs, or potential side effects, but as a read-only test tool, the disclosed information is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: four sentences, a bulleted list, and a return format specification. It is front-loaded with the core purpose and efficiently organized, with no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description provides the return format and explains the entire testing process. It covers the tool's scope adequately. Minor omissions like error handling or prerequisite (filter must exist) are not critical for a well-defined test tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds technical context for the 'message' parameter (embedding and comparison), but the schema already describes both parameters adequately. The added value is marginal, not enough to raise the score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tests a message against an AI filter to check for a match, explains the embedding and cosine similarity process, and uses specific verb+resource ('test message against filter'). It distinguishes itself from sibling tools like ai_filters_create by focusing on testing rather than CRUD operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides three specific use cases: verify filter, tune threshold, debug. However, it lacks explicit guidance on when not to use this tool (e.g., for creating or updating filters) and does not compare it to sibling testing-like tools such as agents_simulate_inbound. Nevertheless, the listed use cases offer clear context for invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_updateAInspect

Update an existing AI filter's name, description, threshold, or active state.

When to use:

User wants to rename a filter
User wants to refine the filter description to improve match accuracy
User wants to adjust the similarity threshold (higher = stricter matching)
User wants to enable or disable a filter without deleting it

Provide only the fields you want to change. At least one field is required.

Note: If the description is changed, this tool calls the Voyage AI embedding API to re-generate the reference vector with the new description text.

ParametersJSON Schema

Name	Required	Description
`name`	No	New filter name (max 100 chars, optional)
`filter_id`	Yes	ID of the filter to update
`is_active`	No	Enable (true) or disable (false) the filter. OMIT to leave the active flag unchanged.
`threshold`	No	New cosine similarity threshold. Range 0.1–1.0. Optional.
`description`	No	New reference description text. If changed, the Voyage AI embedding API is called to re-generate the reference vector. Max 500 chars. Optional.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses a key behavioral trait: changing the description calls the Voyage AI embedding API to re-generate the reference vector. With no annotations provided, this adds important context. It does not mention other potential side effects or prerequisites, but the mutation is clearly stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and well-structured: a single opening sentence followed by bullet-point use cases and a note. Every sentence adds value with no filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a partial-update tool with no output schema and no annotations, the description covers the purpose, usage scenarios, and a key side effect (API call). It is sufficiently complete for an agent to understand when and how to use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all parameters. The description summarizes the fields but does not add meaning beyond what the schema already provides (e.g., ranges, optionality). Hence, it meets the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Update' and the resource 'AI filter', and lists the specific attributes (name, description, threshold, active state). It distinguishes from sibling tools like create, delete, list, and test.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'When to use' section provides explicit scenarios (rename, refine description, adjust threshold, enable/disable). It also notes that at least one field must be provided. However, it does not mention when not to use or suggest alternatives, which keeps it from a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_add_to_threadAInspect

Apply one or more AI tags to a thread (manually).

When to use:

User wants to label a conversation with one or more tags
User asks to categorize or tag a thread

Provide the thread_id (integer) and an array of tag_ids to apply. If a tag is already applied it will be updated to is_manual=true.

ParametersJSON Schema

Name	Required	Description	Default
`tag_ids`	Yes	Array of tag IDs to apply (1–20 IDs)
`thread_id`	Yes	ID of the thread to tag

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Discloses that already-applied tags get updated to is_manual=true, but lacks details on error handling, permissions, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise at 5 lines with front-loaded action and clear structure. No superfluous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but tool is simple with two required parameters. Description covers action and a key behavioral detail. Adequate for its complexity, though missing potential error scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description adds minimal value. Repeats parameter names and types. Constraint of 1-20 IDs is already in schema description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Apply one or more AI tags to a thread (manually)' with specific verb and resource. Distinguishes from siblings like ai_tags_remove_from_thread.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use:' section with two scenarios. Does not explicitly state when not to use or mention alternatives, but effectively guides usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_createAInspect

Create a new AI tag (automatic message filter).

AI tags are lightweight classifiers that run on every incoming message. When a message matches the tag's description/criteria, the thread is automatically labelled — so AI agents can cheaply pre-filter threads instead of running full LLM analysis on everything. Good descriptions are the key: they tell the classifier exactly when to apply this tag.

When to use:

User wants to auto-classify incoming messages (e.g. bug reports, sales leads, support requests)
User wants to reduce AI agent costs by pre-filtering threads by topic or intent

Tips for the description field:

Be specific: 'Messages reporting errors, crashes, or unexpected behavior in the product'
Include examples of what qualifies and what doesn't

Limit: 20 active personal tags / 50 active team tags.

ParametersJSON Schema

Name	Required	Description
`icon`	No	Emoji icon for the tag (max 10 chars, optional)
`name`	Yes	Tag name (max 100 chars)
`color`	No	Tailwind color key for the tag badge. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to use the default color.
`description`	No	Classifier prompt: describe exactly when this tag should be applied to a thread. The more specific, the better the auto-classification accuracy. E.g. 'Messages reporting software errors, crashes, or unexpected behavior'. Max 500 chars.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries the full burden. It discloses that AI tags are lightweight classifiers that run on every incoming message, that labels are applied automatically, and mentions limits (20 personal/50 team). Does not mention immediate activation or side effects, but the creation action is straightforward.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: starts with a one-liner, then explains the concept, provides usage guidance, tips, and limits. It is front-loaded with the main action and every sentence adds unique value without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (4 parameters, no output schema), the description adequately covers purpose, usage, and parameter tips. It lacks explicit return value information, but for a creation tool, the behavior is clear enough for an AI agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 4 parameters. The description adds value by explaining the 'description' parameter as a classifier prompt with tips and examples, going beyond the schema. Icon and color parameters are not elaborated beyond schema, but the overall parameter information is clear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Create a new AI tag (automatic message filter)', clearly stating the verb and resource. It explains what AI tags are and distinguishes from sibling tools like ai_tags_delete, ai_tags_list, and ai_filters_create by emphasizing auto-classification of incoming messages to pre-filter threads.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a 'When to use' section explicitly stating two use cases: auto-classifying incoming messages and reducing AI agent costs. Provides tips for the description field but lacks explicit when-not-to-use or alternative tools, though the context implies this is for lightweight classification rather than complex filtering.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_deleteA

DestructiveIdempotent

Inspect

Delete a personal AI tag. All thread associations are removed automatically.

When to use:

User wants to permanently remove a tag they no longer need

This cannot be undone. Threads are NOT deleted — they just lose this tag.

ParametersJSON Schema

Name	Required	Description	Default
`tag_id`	Yes	ID of the tag to delete

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description fully discloses key behaviors: automatic removal of thread associations, irreversibility, and that threads are not deleted. This meets the full burden for transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: three short sentences effectively convey purpose, usage context, and behavioral notes. No filler, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool (1 param, no output schema) and full schema coverage, the description adequately covers purpose and side effects. Slight lack of return value info, but not critical for a delete operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (tag_id described as 'ID of the tag to delete'). The description adds no additional context about the parameter, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete a personal AI tag') and resource (tag). It distinguishes itself from sibling tools like ai_tags_add_to_thread and ai_tags_create by specifying deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a 'When to use' context ('user wants to permanently remove a tag'), but lacks explicit alternatives or when-not-to-use guidance. The behavioral notes (automatic thread association removal, irreversibility) help, but no direct comparison with sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_listA

Read-onlyIdempotent

Inspect

List all personal AI tags.

AI tags are automatic message filters: the system runs a lightweight classifier on every incoming message and applies matching tags to threads. This lets AI agents skip expensive full analysis on most messages — they only act on threads that match relevant tags, dramatically cutting LLM costs.

When to use:

Check which auto-classification filters exist before creating one
Get tag IDs for add_to_thread / remove_from_thread
See how many threads each tag currently matches

Returns all tags with thread counts (non-archived, included threads only).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description fully explains behavior: lightweight classifier runs on every incoming message, tags are applied automatically, and it returns thread counts. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with separate sections, front-loaded with the main action. A few sentences could be trimmed, but overall efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains return values (tags with thread counts) and provides sufficient context about the AI tags feature, making it complete for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has zero parameters, so schema coverage is 100%. Description adds value by explaining the return data (tags with thread counts) and the broader context of the system.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List all personal AI tags' and explains the purpose of AI tags as automatic message filters, distinguishing it from sibling tools like ai_tags_create or ai_tags_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use' bullets: check filters before creating, get tag IDs for add/remove, and see thread counts. Lacks explicit when-not-to-use but is clear in context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_remove_from_threadA

DestructiveIdempotent

Inspect

Remove a specific AI tag from a thread.

When to use:

User wants to un-label or remove a specific tag from a conversation
User wants to correct an incorrectly applied tag

Provide both thread_id and tag_id.

ParametersJSON Schema

Name	Required	Description	Default
`tag_id`	Yes	ID of the tag to remove
`thread_id`	Yes	ID of the thread to remove the tag from

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must bear the burden. It states the core behavior but does not disclose side effects, permissions, or error conditions. It is adequate for a simple removal but lacks detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise, with a clear structure: main action followed by a bulleted 'When to use' list. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal tool, the description covers purpose, usage context, and inputs. Lacks mention of return behavior or error cases, but overall sufficient given the simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds no extra meaning beyond the schema, only reinforcing the need for both IDs. Baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove') and the resource ('a specific AI tag from a thread'). It distinguishes from sibling tools like ai_tags_add_to_thread (opposite) and ai_tags_delete (deletes tag definition, not association).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'When to use' section provides explicit contexts: un-labeling or correcting tags. It requires both IDs. However, it does not mention when not to use or direct to alternatives like ai_tags_delete for deleting the tag entirely.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_updateAInspect

Update an existing personal AI tag's name, description, icon, color, or active state.

When to use:

User wants to rename a tag
User wants to change a tag's icon, color, or description
User wants to enable or disable a tag

Provide only the fields you want to change. At least one field is required.

ParametersJSON Schema

Name	Required	Description
`icon`	No	New emoji icon (max 10 chars, optional)
`name`	No	New tag name (max 100 chars, optional)
`color`	No	New color key. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to leave the color unchanged.
`tag_id`	Yes	ID of the tag to update
`is_active`	No	Enable (true) or disable (false) the tag. OMIT to leave the active flag unchanged.
`description`	No	New LLM hint (max 500 chars; empty string clears it, optional)

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the tool updates tags and lists fields, implying partial update. But it lacks details on side effects, error conditions, permissions, or what happens if tag_id is invalid. Basic behavior is clear but incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two clear paragraphs. The first states the action and updatable fields, the second provides usage scenarios. No superfluous words; structure is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately covers purpose and usage. It explains partial updates and required fields. However, it could mention that tag_id must be obtained from ai_tags_list or similar, but overall it is fairly complete for an update tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover all parameters (100% coverage). The description adds value by grouping fields and emphasizing partial update ('Provide only the fields you want to change'), which reinforces the optional nature beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool updates a personal AI tag's name, description, icon, color, or active state. It lists specific use cases, clearly distinguishing it from siblings like ai_tags_create (create) and ai_tags_delete (delete).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear 'When to use' list covering rename, change icon/color/description, and enable/disable. It notes that only fields to change should be provided and at least one field is required. However, it does not explicitly exclude scenarios or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_attach_identityA

Read-onlyIdempotent

Inspect

Switch the page's identity by loading saved cookies + storage. Use only when switching identity mid-page; for first navigation, pass identity_name to browser.open instead.

ParametersJSON Schema

Name	Required	Description	Default
`page_id`	Yes
`identity_name`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It describes the core behavior (loading saved cookies and storage), but does not disclose potential side effects (e.g., whether the page reloads, if current state is lost, or any restrictions). More detail would improve transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence for purpose, followed by a usage guideline sentence. No wasted words, and the key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with only two parameters and no output schema or annotations, the description could be more complete. It covers purpose and usage but omits parameter descriptions. Considering its simplicity, the description is adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, meaning no parameter descriptions. The tool's description does not elaborate on the two required parameters (page_id and identity_name). With no schema descriptions and no added parameter semantics, the agent has insufficient guidance for parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Switch the page's identity by loading saved cookies + storage.' It uses a specific verb and resource, and distinguishes itself from the sibling tool browser.open by noting that for first navigation, identity_name should be passed to browser.open instead.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('only when switching identity mid-page') and when not to ('for first navigation, pass `identity_name` to browser.open instead'), providing a clear alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_clickB

Read-onlyIdempotent

Inspect

Click an element. ref is either an aria-ref token from browser.snapshot ('e7') OR a CSS selector ('button.submit'). Prefer the aria-ref token.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	Yes
`page_id`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full responsibility for behavior. It only states 'Click an element,' omitting details like whether it waits for the element, error handling, or side effects. The minimal description does not adequately disclose behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, consisting of one informative sentence and a brief clarification. No extraneous information is included, making it efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple click action, the description covers the core function but lacks details on prerequisites (e.g., page_id context) and behavior. It is minimally sufficient but not fully complete given the lack of output schema or annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning for 'ref' by defining it as a CSS selector with an example, but it does not explain 'page_id.' Schema coverage is 0%, so the description partially compensates but leaves one parameter undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool clicks an element, using a verb and resource. It distinguishes from sibling tools like browser_fill or browser_hover by specifying the click action. The explanation of 'ref' as a CSS selector adds precision.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, such as browser_fill for text fields or browser_hover for hovering. The simple description leaves the agent without explicit context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_closeB

Read-onlyIdempotent

Inspect

Close a page opened by browser.open.

ParametersJSON Schema

Name	Required	Description	Default
`page_id`	Yes

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description should disclose behavioral traits. It only states the action without mentioning side effects, resource cleanup, or error handling (e.g., what if page_id is invalid).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no unnecessary words, perfectly concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple single-parameter tool, the description is adequate but lacks context about prerequisites (page must have been opened) and behavior after closing. No output schema exists, so return value is unexplained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description does not explain the 'page_id' parameter beyond its name, missing the opportunity to clarify it comes from browser.open.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly says 'Close a page opened by browser.open,' clearly stating the verb and resource, and distinguishes it from sibling tools like browser_open which opens pages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage after browser.open but provides no explicit when-to-use or alternative guidance. Siblings exist for other browser actions, but no exclusions are stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_console_messagesA

Read-onlyIdempotent

Inspect

Return console.log/warn/error events captured since the last drain. Filter by level ('log'|'info'|'warning'|'error'|'debug') and/or pattern (regex). Buffer caps at 500 entries; oldest are dropped first. Set clear=false to peek without draining.

ParametersJSON Schema

Name	Required	Description	Default
`clear`	No
`level`	No
`page_id`	Yes
`pattern`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description discloses key behaviors: buffer limit of 500 entries, FIFO dropping, drain behavior controlled by clear flag. It explains filtering semantics but does not mention potential side effects like page_id requirements or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, with the first sentence delivering the core purpose and the second covering filtering and behavior. It is front-loaded and free of extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers input parameters and behavioral notes, but it omits the return value format and does not specify what the output looks like. Given no output schema, this is a gap. The overall complexity is moderate, so more detail on output would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description explains three of four parameters: level (lists enum values), pattern (regex), and clear (peek behavior). The required page_id parameter is not mentioned, but its role is inferable from context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns console events ('console.log/warn/error') captured since last drain, with filtering options. This is a specific verb-resource combination that distinguishes it from sibling browser tools like browser_click or browser_snapshot.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides concrete usage tips: filter by level and pattern, a buffer cap note, and the clear flag for peeking. However, it does not explicitly compare to other browser diagnostic tools or specify when not to use it, leaving some room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_dragA

Read-onlyIdempotent

Inspect

Drag one element onto another. source_ref is the element to grab; target_ref is where to drop. Both are CSS selectors. Used for slider captchas, kanban, drag-and-drop uploads.

ParametersJSON Schema

Name	Required	Description	Default
`page_id`	Yes
`source_ref`	Yes
`target_ref`	Yes

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes the core drag action but lacks details on prerequisites (e.g., element must be draggable), side effects, or error states. No annotations provided, so description carries burden; basic transparency achieved.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: action, parameter explanation, and use cases. No redundant information, well front-loaded, and easily scannable by an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, no annotations, and only 3 simple parameters, the description adequately covers the tool's behavior and parameter meanings. Minor gap: no mention of prerequisites or limitations, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 3 parameters with no descriptions (0% coverage). Description explains source_ref and target_ref as CSS selectors and their roles, adding significant meaning beyond schema. Does not explain page_id, but covers majority.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Drag one element onto another' and specifies source_ref and target_ref as CSS selectors, with concrete use cases like slider captchas and kanban, distinguishing it from sibling browser actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions use cases (slider captchas, kanban, drag-and-drop uploads), providing context for when to use. Does not discuss when not to use or alternatives, but sufficient for purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_evaluateA

Read-onlyIdempotent

Inspect

Run JavaScript in the page context and return the result. Use for state not in the a11y tree, captcha iframe inspection, DOM events. Expression can be a value (e.g., 'document.title') or an arrow function ((arg) => ...) — pass arg via the arg parameter. Result is JSON-serialized; non-serializable values become strings. 256KB cap on output.

ParametersJSON Schema

Name	Required	Description	Default
`arg`	No
`page_id`	Yes
`expression`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description discloses important behaviors: result JSON-serialization, non-serializable values become strings, and 256KB output cap. However, it omits potential side effects of running arbitrary JavaScript and does not clarify the execution context (page vs isolated environment).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two sentences plus a few notes. It is front-loaded with the primary purpose, then provides examples and output behavior. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema, the description covers return value format and size cap. It lacks details on error handling, security, or what happens with invalid expressions, but overall it is sufficient for many use cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description adds meaning by explaining that 'expression' can be a value or arrow function and that 'arg' passes an argument. It does not explain 'page_id' (presumably a browser tab identifier) or that it comes from browser_open, leaving ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs JavaScript in the page context and returns the result. It distinguishes itself from sibling browser tools by specifying use cases like a11y tree, captcha iframe, and DOM events, which are not covered by click/fill/snapshot tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (state not in a11y tree, captcha, DOM events). It implicitly differentiates from alternatives but does not explicitly state when not to use it or mention alternatives like browser_snapshot.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_file_uploadA

Read-onlyIdempotent

Inspect

Attach files to an . Pass either local_paths (absolute host paths) or data (list of {name, mime, base64} blobs written to /tmp). 25MB cap per file.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	Yes
`data`	No
`page_id`	Yes
`local_paths`	No

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the 25MB cap and the blob format written to /tmp, but omits error handling, prerequisites (page must have file input), and behavior when both parameters are supplied. This is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose. Every sentence adds essential information: purpose, usage methods, and a constraint. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (two upload methods, size limit, required parameters), the description covers core functionality but lacks details on error handling, element selection (ref), and prerequisites (e.g., page must be open with a file input). It is partially complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description explains local_paths (absolute host paths) and data (list of {name, mime, base64} blobs), but does not describe page_id or ref. It adds value for two of four parameters, partially compensating for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool attaches files to an <input type=file>, which is a specific verb and resource. It distinguishes itself from sibling browser tools (e.g., browser_fill, browser_click) by focusing on file upload.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: it is for attaching files to a file input, and specifies two methods (local_paths or data). However, it does not explicitly exclude alternatives or provide when-not-to-use guidance, but the context is sufficient for most agents.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_fillB

Read-onlyIdempotent

Inspect

Fill an input or textarea with the given value. ref is either an aria-ref token from browser.snapshot ('e7') OR a CSS selector ('input[name=email]'). Prefer the aria-ref token — it's stable and matches exactly what snapshot returned.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	Yes
`value`	Yes
`page_id`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits. It does not mention whether the field is cleared first, if change events are triggered, error handling, or synchronization behavior. This is insufficient for an agent to understand side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence with an illustrative example. Every word is necessary and front-loaded. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and lack of output schema, the description could still be more complete. It omits whether the value replaces existing content, whether it works on all input types, and does not mention the lack of 'browser_type' behavior. Adequate but with clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. The 'ref' parameter is explained as a CSS selector with an example. However, 'page_id' and 'value' are not described beyond their names, leaving ambiguity about 'page_id's role.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool fills an input/textarea with a value and explains the 'ref' parameter as a CSS selector. However, it does not differentiate from sibling tools like browser_type or browser_select_option, so it's not fully distinctive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., browser_type, browser_select_option). The description lacks context on typical use cases or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_fill_formA

Read-onlyIdempotent

Inspect

Fill multiple form fields in one call. fields is a list of {ref, value} dicts. ref is a CSS selector; value is a string (text) or boolean (checkbox). Saves N round-trips vs calling browser.fill repeatedly.

ParametersJSON Schema

Name	Required	Description	Default
`fields`	Yes
`page_id`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the burden. It explains that 'fields' is a list of {ref, value} dicts with ref as CSS selector and value as string/boolean. It does not detail error handling or side effects, but for a form fill operation it is sufficiently transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of three concise sentences. The first sentence states the purpose, the second explains input format, and the third provides a benefit. It is front-loaded and wastes no words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, but the description adequately covers input and motivation. Given the complexity (filling form fields) and the presence of sibling tools like 'browser_fill', the description is complete enough for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, so the description compensates. It explains the structure of 'fields' (list of {ref, value} dicts) and the types (CSS selector for ref, string/boolean for value). The 'page_id' parameter is not elaborated but is likely clear from context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Fill multiple form fields in one call', clearly indicating the verb (fill) and the resource (form fields). It implicitly distinguishes itself from the sibling tool 'browser_fill' which fills a single field.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it 'Saves N round-trips vs calling browser.fill repeatedly', offering clear guidance to use this tool when filling multiple fields. However, it does not explicitly state when not to use it or mention alternatives beyond 'browser.fill'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_handle_dialogA

Read-onlyIdempotent

Inspect

Respond to a pending JS dialog (alert/confirm/prompt). Pass accept=true for OK or false for Cancel. For prompt() dialogs also pass prompt_text. Dialogs are queued at page-open time; returns {pending: false} if none is waiting.

ParametersJSON Schema

Name	Required	Description	Default
`accept`	Yes
`page_id`	Yes
`prompt_text`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that dialogs are queued at page-open, that prompt_text is only for prompt dialogs, and that the tool returns {pending: false} if no dialog waits. This covers key behavioral traits, though it could mention that it dismisses the dialog.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the main purpose and add essential behavioral details. Every sentence earns its place without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and three parameters, the description covers purpose, parameters, and key behaviors (queuing, return value for no dialog). It lacks details on error cases or exact return on success, but is adequate for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning for two of three parameters: accept means true for OK/false for Cancel, and prompt_text is only for prompt dialogs. Page_id is not explained, but it's a common identifier in sibling browser tools. Given 0% schema coverage, the description provides substantial value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool responds to pending JS dialogs (alert/confirm/prompt), explains the accept parameter for OK/Cancel, and mentions prompt_text for prompt dialogs. It distinguishes from sibling browser tools that handle other actions like clicking or typing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates when to use this tool (when a dialog is pending) and provides context about dialogs being queued at page-open time. It does not explicitly list alternatives or exclusions, but the context is clear enough for an agent to determine appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_hoverB

Read-onlyIdempotent

Inspect

Hover the mouse over an element (reveals tooltips + hover menus). ref is a CSS selector.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	Yes
`page_id`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Minimal disclosure: mentions tooltips/hover menus but no info on return value, safety, visibility requirements, or side effects. No annotations to supplement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: one sentence plus a brief note. Front-loaded with main action, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks return type, error handling, visibility requirements, and any behavioral details beyond basic purpose. For a tool with no annotations or output schema, this is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only clarifies 'ref' is a CSS selector; 'page_id' is unexplained. With 0% schema coverage, description should add more detail, but it does not.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action 'hover the mouse over an element' and its effect 'reveals tooltips + hover menus'. Distinguishes from sibling browser tools like click, fill, etc. Mentions CSS selector as identifier.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use hover vs click, fill, or other interactions. No prerequisites or context for proper usage provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_navigate_backC

Read-onlyIdempotent

Inspect

Navigate back in the page's history (browser back button). Returns the new URL + title.

ParametersJSON Schema

Name	Required	Description	Default
`page_id`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavior. It only states the basic action and return values, but omits details like what happens if history is empty, if navigation is synchronous, or if multiple steps back are possible.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded, but it sacrifices necessary details for brevity. Every sentence adds some value, yet it remains incomplete.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite a simple tool with one parameter and no output schema, the description lacks crucial context about the page_id parameter and does not explain how navigation works within the browser context (e.g., per-tab history).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The required 'page_id' parameter is not explained in the description. Given 0% schema coverage, the description fails to clarify what page_id refers to (e.g., which tab or page), leaving the agent with no guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('navigate back') and resource ('page's history'), and mentions the return value (new URL + title). It effectively distinguishes from sibling browser tools like 'browser_open' and 'browser_click'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., browser_open, browser_wait_for). No mention of prerequisites, when not to use it, or edge cases like empty history.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_network_requestsB

Read-onlyIdempotent

Inspect

List HTTP requests the page made since open or last drain. Optional filters: method (GET/POST/...), url_pattern (regex), status_min (e.g. 400 for errors). Captures up to 200 most recent requests per page.

ParametersJSON Schema

Name	Required	Description	Default
`clear`	No
`method`	No
`page_id`	Yes
`status_min`	No
`url_pattern`	No

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Lacks important behavioral details: no explanation of the 'clear' parameter's effect (drains requests?), no disclosure of side effects or state changes. With no annotations, the description should provide more transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and key details. Every part is essential and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Missing return format, no description of output fields, and unclear behavior of 'clear'. Given 5 parameters and no output schema, the description is incomplete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning for 3 of 5 parameters (method, url_pattern, status_min) with examples, but does not explain 'page_id' or 'clear'. Schema coverage is 0%, so description partially compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists HTTP requests made by a page, specifying the scope 'since open or last drain'. It names the resource and action precisely, and the optional filters further clarify the function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like browser_console_messages. The description does not mention prerequisites or conditions for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_openA

Read-onlyIdempotent

Inspect

Open a URL in a remote browser. Saved login cookies are auto-attached when the URL domain matches a claimed browser identity. Pass identity_name to override auto-matching or force a specific identity.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes
`meet_cdp_url`	No
`workspace_id`	Yes
`identity_name`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It discloses the core behavior (open URL, optional identity attachment) but omits details like page loading behavior, timeout expectations, or whether multiple opens create new tabs. It is minimally adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action. No redundant or extraneous information. Every word serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, yet the description does not specify return values (e.g., a tab ID or success status). As an entry-point tool among many browser siblings, it lacks context for subsequent actions (e.g., 'After opening, use browser_click to interact'). Adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It adds meaning to 'identity_name' (attaches saved login cookies) but does not explain 'url' or 'workspace_id' beyond what the schema provides. Some value added, but incomplete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Open a URL in a remote browser.' It also specifies a unique feature (attaching saved login cookies via identity_name), distinguishing it from sibling tools like browser_click or browser_navigate_back.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives, nor does it provide exclusions or guidance on prerequisites. The usage is implied (start browsing by opening a URL), but lacks direct comparison to other browser tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_press_keyA

Read-onlyIdempotent

Inspect

Press a keyboard key (e.g., 'Enter', 'Tab', 'Escape', 'ArrowDown') or a single character. Optional ref focuses an element first — aria-ref token from browser.snapshot ('e7') or a CSS selector.

ParametersJSON Schema

Name	Required	Description	Default
`key`	Yes
`ref`	No
`page_id`	Yes

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. The description states the action but does not disclose behavioral details like simulation mechanics (keydown/keyup), scope (global vs focused element), or side effects. Adequate for a simple key press but lacks depth.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences without extraneous information. Front-loaded with action and examples. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple action tool with no output schema and sparse schema, the description covers the main behavior and optional ref. It lacks details on return values, error handling, or page scope, but is adequate for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds value for key (examples of allowed values) and ref (explains it focuses element first). However, page_id is not described; its purpose is assumed from common context. With 0% schema coverage, this is a good effort but not complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: pressing a keyboard key or single character, with examples. It implicitly distinguishes from sibling tools like browser_type (typing) and browser_click (clicking) by specifying keyboard key press.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions optional ref to focus an element first, giving context for usage. However, it does not explicitly compare to browser_type or other alternatives, leaving the agent to infer when to use press_key vs type.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_resizeB

Read-onlyIdempotent

Inspect

Resize the page viewport. Useful when a site serves different HTML based on viewport width (mobile vs desktop) or when an anti-bot scores risk by viewport dimensions.

ParametersJSON Schema

Name	Required	Description	Default
`width`	Yes
`height`	Yes
`page_id`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. Description does not disclose side effects, return values, or limitations beyond resizing the viewport. With no annotations, more behavioral context is needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Very concise two-sentence structure. First sentence states action, second adds context. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While purpose is clear, lack of parameter explanations and behavioral details makes it incomplete for a tool with no annotations or schema descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and description does not explain any of the three required parameters (page_id, width, height). This provides no added value for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action 'Resize the page viewport' and provides specific use cases like mobile vs desktop and anti-bot avoidance. It distinguishes this tool from other browser actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes two concrete scenarios where resizing is useful, guiding when to use. Does not explicitly exclude alternatives, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_select_optionA

Read-onlyIdempotent

Inspect

Pick option(s) in a native dropdown. Pass value (matches the option's value attr) OR label (matches its visible text). Lists allowed for multi-select.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	Yes
`label`	No
`value`	No
`page_id`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must carry full burden. It explains value/label parameters and multi-select support, but omits behavior on no match, conflict, or clearing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, no redundancy, front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple tool with 4 params, no output schema, and no annotations, but missing details on error handling and return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage; description adds meaning for 'value' and 'label' (value attr vs visible text) but not for 'ref' or 'page_id', which are common across sibling tools.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it picks option(s) in a native select dropdown, distinguishing it from other browser interaction tools like browser_click or browser_fill.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use for select elements but lacks explicit when-to-use vs alternatives like browser_fill or browser_click, and no exclusion conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_snapshotA

Read-onlyIdempotent

Inspect

Return a YAML aria_snapshot of the page DOM. Each interactive node is tagged with [ref=eN] (e.g. [ref=e7]). Pass that exact token as the ref arg to browser.click / browser.fill / browser.type / browser.press_key. Do NOT pass the role name ('combobox', 'button') as ref — only the eN token. Truncated at 32KB.

ParametersJSON Schema

Name	Required	Description	Default
`page_id`	Yes

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries full burden. It discloses truncation at 32KB, a key behavioral trait. However, it omits other behaviors like requiring an open page, performance implications, or whether the snapshot is of the full DOM or just visible portion.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences with no waste. It front-loads the main action and immediately provides critical usage information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers the return format, truncation limit, and practical use. It doesn't explain 'aria_snapshot' but that is likely understood. Slightly more context about prerequisites (e.g., page must be open) would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must fully explain parameters. It mentions page_id only implicitly via context, but doesn't explain what a page_id is or how to obtain it. The agent is left guessing the parameter's meaning and source.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a YAML aria_snapshot of the page DOM and explicitly mentions its use for finding element refs for click/fill actions. It distinguishes itself among sibling browser tools by specifying the snapshot format and purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: use the snapshot to find element refs for browser.click/browser.fill. However, it does not list any exclusions or alternatives (e.g., when to use browser_take_screenshot instead), missing some guidance on when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_tabsA

Read-onlyIdempotent

Inspect

Manage tabs within the same BrowserContext as page_id. action ∈ {list, switch, close, new}. For list, returns all open tab metadata; for new, returns the new tab's page_id.

ParametersJSON Schema

Name	Required	Description	Default
`url`	No
`action`	Yes
`tab_id`	No
`page_id`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses core behavior (actions, returns) but lacks details on side effects (e.g., closing tabs may destroy data, permissions needed, rate limits). The destructive 'close' action is not highlighted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. Information is front-loaded with action enumeration and return types.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no schema descriptions, no output schema, the description leaves significant gaps. It does not explain tab_id, url, or the format of returned metadata. Missing error conditions or edge cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and the description only explains page_id and action. It does not describe tab_id or url parameters. The description adds some meaning for action (list, new returns) but fails to cover how tab_id is used for switch/close or url for new.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool manages tabs within a BrowserContext, specifying actions (list, switch, close, new) and their returns. It distinguishes from siblings like browser_open by referencing the same BrowserContext as page_id.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage guidelines are implied through action descriptions but no explicit when-to-use or alternatives are provided. The description does not mention when to choose this over browser_open or other browser tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_take_screenshotA

Read-onlyIdempotent

Inspect

Capture a PNG screenshot of the page or a specific element. Returns base64-encoded image bytes AND a file_id (persisted in DialogBrain files storage). Pass file_id straight to messages.send(attachment_file_ids=[file_id]) — do NOT call files.upload again. Use sparingly — favor browser.snapshot for structured DOM understanding.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	No
`page_id`	Yes
`full_page`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description covers output format (base64 image) and resource suggestion. Could detail more about limitations or side effects, but adequate for this tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences, front-loaded with core purpose, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and minimal param description in schema, the description covers main aspects but misses parameter details like how to target specific elements. Generally complete for a common tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. It mentions 'page or a specific element' but doesn't clarify how element is specified (via 'ref') or the 'full_page' parameter. Adds some meaning but not complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it captures a PNG screenshot of the page or a specific element and returns base64-encoded bytes. Differentiates from sibling browser_snapshot by mentioning 'structured DOM understanding'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to use sparingly and favor browser.snapshot for structured understanding, providing clear when-to-use vs. when-not.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_typeA

Read-onlyIdempotent

Inspect

Type text into an element with per-keystroke delay (organic). Each character dispatches keydown/keypress/keyup, unlike browser.fill which replaces .value instantly. Use when the page listens to keystroke events or for typing-speed fingerprint checks. ref is an aria-ref token from browser.snapshot ('e7') or a CSS selector. delay_ms defaults to 50.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	Yes
`text`	Yes
`page_id`	Yes
`delay_ms`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It clearly explains the organic keystroke behavior and default delay. However, it doesn't mention prerequisites like element focus or visibility, which would be helpful but not critical.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two sentences plus usage guidance) and front-loaded with the core behavioral distinction. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no output schema, no annotations, and 0% schema coverage, the description fails to explain three required parameters. While the behavioral context is good, parameter ambiguity reduces completeness to minimal acceptable level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. It only explains delay_ms (name and default). The required parameters (ref, text, page_id) are not described at all, leaving meaning ambiguous.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb ('Type text into an element') and resource, and explicitly distinguishes from sibling 'browser.fill' by detailing event differences (per-keystroke dispatch vs instant .value replacement).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly states when to use: 'when the page listens to keystroke events or for typing-speed fingerprint checks', and contrasts with 'browser.fill', providing clear context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_wait_forA

Read-onlyIdempotent

Inspect

Wait for a selector to appear OR a navigation URL to match a glob pattern. Provide ref (selector) OR url_pattern (glob).

ParametersJSON Schema

Name	Required	Description	Default
`ref`	No
`page_id`	Yes
`timeout_ms`	No
`url_pattern`	No

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full burden. It explains the waiting condition but does not disclose timeout behavior, error handling, or what happens if both parameters are provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with front-loaded action word; no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is adequate for a simple wait tool but lacks details on return values, error conditions, and timeout behavior, especially given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must explain all parameters. It covers 'ref' and 'url_pattern' but omits 'page_id' and 'timeout_ms', leaving half of the parameters unexplained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Wait for' and the resources: a selector appearance or URL glob pattern match, distinguishing it from sibling browser tools that perform other actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly instructs to provide either 'ref' or 'url_pattern', indicating mutual exclusivity, but does not mention when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_check_availabilityA

Read-onlyIdempotent

Inspect

Check when you have free time in Google Calendar. Shows busy periods and free slots in a given time range. Useful for finding meeting times or checking schedule conflicts.

ParametersJSON Schema

Name	Required	Description	Default
`end_time`	No	End date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to end of start_time day, or 7 days from now.
`start_time`	No	Start date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to start of today.
`calendar_id`	No	Calendar ID to check. Defaults to primary calendar.	primary
`working_hours_only`	No	If true, only show free slots during working hours (9 AM - 6 PM). OMIT to show all free time (the default).
`min_duration_minutes`	No	Minimum duration in minutes for free slots. Filters out short gaps. Default: 30 minutes.

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries burden. Describes basic behavior (shows busy/free) and hints at filtering via parameters, but lacks details on timezone handling, error states, or output structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the purpose. No unnecessary words. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description does not specify the format of returned data (e.g., list of time blocks, objects with start/end). This is a significant gap for a tool with 5 optional parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters have descriptions in schema, so description adds no additional meaning. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it checks free time in Google Calendar, showing busy periods and free slots. Distinguishes from sibling calendar tools like calendar_list_events by focusing on availability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Useful for finding meeting times or checking schedule conflicts', providing clear use context. However, no explicit alternative suggestions or when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_create_eventBInspect

Create a new event in Google Calendar. Specify the title, start time, end time, and optionally invite attendees. Use ISO 8601 format for dates (e.g., 2024-12-15T14:00:00).

ParametersJSON Schema

Name	Required	Description	Default
`end`	No	Event end time in ISO 8601 format. If not provided, defaults to 1 hour after start. Also accepts 'end_time' as alias.
`start`	No	Event start time in ISO 8601 format (e.g., 2024-12-15T14:00:00). Also accepts 'start_time' as alias.
`title`	No	Alias for summary - event title.
`summary`	No	Event title/summary. Required. Also accepts 'title' as alias.
`end_time`	No	Alias for end - event end time.
`location`	No	Event location (physical address or virtual meeting link).
`timezone`	No	Timezone for the event (e.g., 'America/New_York', 'UTC').
`attendees`	No	List of attendee email addresses to invite.
`start_time`	No	Alias for start - event start time in ISO 8601 format.
`calendar_id`	No	Calendar ID to create event in. Defaults to primary calendar.	primary
`description`	No	Event description/notes.
`add_google_meet`	No	If true, automatically creates a Google Meet link for the event. OMIT to skip Meet link.
`conference_data`	No	Conference data for Google Meet. Alternative to add_google_meet flag.

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses that events are created and provides date format requirements, but does not mention side effects (e.g., whether attendees are notified), required permissions, error behavior, or what happens on success (return value). This leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three short sentences, front-loaded with purpose. It is concise with no fluff, but could be slightly more structured (e.g., grouping core vs optional parameters). Reasonably efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 13 parameters and no output schema or annotations, the description is overly minimal. It lacks details on many parameters, does not mention default behaviors (e.g., calendar_id defaults to primary), and omits information about the return value. There is a contradiction: description implies title, start, end are required, but schema has no required fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats schema content for core parameters (title, start, end) but adds no meaning beyond what the schema already provides. It does not elaborate on many other parameters (location, timezone, conference_data, etc.).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new event in Google Calendar', which is a specific verb+resource. It distinguishes this tool from sibling calendar tools like calendar_update_event or calendar_list_events, as creation is explicitly mentioned.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for creating events but does not provide explicit guidance on when to use this tool versus alternatives like calendar_update_event or calendar_check_availability. No exclusion criteria or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_delete_eventB

DestructiveIdempotent

Inspect

Delete an event from Google Calendar. This action cannot be undone. Use with caution.

ParametersJSON Schema

Name	Required	Description	Default
`event_id`	Yes	ID of the event to delete. Required.
`calendar_id`	No	Calendar ID containing the event. Defaults to primary.	primary
`send_notifications`	No	Whether to send cancellation notifications to attendees.

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description must carry the full burden. It only states 'This action cannot be undone', which is critical, but omits other behavioral details such as required permissions, side effects on attendees (partially covered by schema parameter), or response behavior. This is insufficient for a destructive operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences (15 words) that front-load the primary action and critical warning. Every word earns its place without unnecessary fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema and three parameters, the description does not explain return values, error behavior, or confirmation. For a delete tool, a simple success indication would be useful, and the description lacks this completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds no additional meaning beyond what is already in the schema parameters, resulting in no extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete an event') and the resource ('Google Calendar'), distinguishing it from sibling tools like calendar_create_event and calendar_update_event.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or alternatives are provided. The warning 'Use with caution' implies caution but does not guide the agent on when to prefer this tool over others. Since delete operations are straightforward, this is adequate but not exemplary.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_list_eventsA

Read-onlyIdempotent

Inspect

List events from Google Calendar. Shows upcoming events by default. Can filter by date range and search query.

ParametersJSON Schema

Name	Required	Description	Default
`query`	No	Free text search query to filter events.
`date_to`	No	End date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to 7 days from now. Alias: time_max.
`date_from`	No	Start date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to now. Alias: time_min.
`calendar_id`	No	Calendar ID to list events from. Defaults to primary calendar.	primary
`max_results`	No	Maximum number of events to return.

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description partially discloses behavior (default upcoming events, filtering capabilities). However, it omits details like pagination, rate limits, or error handling. It is adequate but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with no redundancy. It front-loads the core action and then adds key default and filter information efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and no annotations, the description covers the main purpose and defaults. It lacks details on return format, timezone handling, or edge cases, but is sufficient for a straightforward list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the descriptions already document each parameter's purpose. The tool description adds minimal extra value beyond confirming defaults and filtering, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the verb 'list' and resource 'events from Google Calendar', clearly distinguishing it from sibling tools like create, delete, update, and check availability. It also notes default behavior and filtering options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing events with optional date range and search filters. It does not explicitly state when not to use or provide alternatives, but the context is clear enough for an agent to infer appropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_update_eventAInspect

Update an existing event in Google Calendar. Can modify title, time, location, description, and attendees. Only specified fields will be updated.

ParametersJSON Schema

Name	Required	Description	Default
`end`	No	New end time in ISO 8601 format. Optional.
`start`	No	New start time in ISO 8601 format. Optional.
`summary`	No	New event title/summary. Optional.
`event_id`	Yes	ID of the event to update. Required.
`location`	No	New event location. Optional.
`attendees`	No	New list of attendee emails. Replaces existing attendees.
`calendar_id`	No	Calendar ID containing the event. Defaults to primary.	primary
`description`	No	New event description. Optional.

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses partial update behavior but lacks details on permissions required, error handling (e.g., event not found), or side effects beyond updating fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with purpose and supported by a list of modifiable fields. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is minimally adequate for a simple CRUD tool but lacks details on required parameters (event_id implicit), default calendar_id, and return behavior. No output schema exists to compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter is documented. The description adds minimal value by categorizing fields, but the partial update hint is useful. Baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (update an existing event) and the resource (Google Calendar event). It lists modifiable fields, distinguishing it from sibling tools like create and delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies partial update semantics with 'Only specified fields will be updated,' but does not explicitly state when to use this tool versus alternatives, nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_get_transcriptA

Read-onlyIdempotent

Inspect

Get the structured transcript and final state of a voice call by call_id. Returns per-turn rows in chronological order, call status (active/completed/failed/abandoned), duration, and an outcome field telling whether the recipient picked up (answered/no_answer/busy/declined/failed/unknown). answered_at is non-null once the recipient picked up. Returns active turns if the call is still in progress.

ParametersJSON Schema

Name	Required	Description	Default
`call_id`	Yes	Call ID returned by calls.make in _meta.call_id.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully covers behavior: it returns transcript rows, call status, duration, outcome, and notes that answered_at is null until pickup. It also mentions behavior for active calls.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with the purpose stated in the first sentence. No extraneous information, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one parameter and no output schema, the description adequately explains the return structure and special conditions (answered_at, active calls). It is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for call_id. The tool description does not add extra semantic detail beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves structured transcript and final state of a voice call by call_id, listing specific return fields (per-turn rows, status, duration, outcome). This is distinct from sibling tools like calls_make or calls_list_active.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the call_id comes from calls.make, providing context on when to use this tool. Though it doesn't explicitly state when not to use it, the purpose is clear enough for differentiation among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_hangupA

Read-onlyIdempotent

Inspect

Hang up an active voice call by call_id. Use after calls.make when the agent decides to terminate before the callee does, or to abort a stuck call. Idempotent: returns success if the call is already terminal.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	Short internal reason for ending the call (e.g. 'campaign timeout'). Stored on voice_sessions.metadata.
`call_id`	Yes	Call ID returned by calls.make in _meta.call_id.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the burden. It discloses idempotency and implies a write operation. However, it does not mention permissions, side effects on the call session, or success/error details beyond idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences covering purpose, usage, and idempotency. No wasted words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple hangup tool with no output schema and minimal parameters, the description covers the core context: when to use it, idempotency, and parameter sources. Lacks error behavior details but is largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema already describes both parameters well (call_id source, reason metadata storage). The description adds no substantial new meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Hang up'), the resource ('active voice call'), and the identifier ('by call_id'). It distinguishes from siblings like calls_make and calls_list_active by its specific termination function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'after calls.make when the agent decides to terminate before the callee does, or to abort a stuck call.' Also notes idempotent behavior. No exclusionary guidance, but sufficient for this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_activeA

Read-onlyIdempotent

Inspect

List active voice calls in this workspace. Use before calls.make on a Telegram account (only one MTProto call per account at a time) to check whether the line is free.

ParametersJSON Schema

Name	Required	Description	Default
`channel`	No	Filter by voice channel. OMIT to include both telegram and twilio.
`channel_account_id`	No	Filter by channel_account.id (the calling Telegram account or Twilio number). Combine with channel for a per-line busy check.

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It indicates a read operation (listing active calls) and hints at the purpose (line check), but does not disclose details like return format, side effects, or permissions. This provides minimal beyond purpose.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first sentence states purpose, second provides usage context. No redundant information, efficiently packed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema and only 2 optional parameters. Description covers when to use but fails to explain the return format or structure of the list. Given low complexity, it is adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with both parameters described. The description adds no additional meaning beyond the schema, only hinting at Telegram context. Baseline 3 is appropriate as schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists active voice calls in the workspace, using a specific verb and resource. It distinguishes from sibling tools like calls_make or calls_hangup by focusing on listing, and the context confirms no direct sibling exists for generic call listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use: before calls.make on a Telegram account to check line availability. This provides clear context for usage and implies a prerequisite (one call per account), giving strong guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_historyA

Read-onlyIdempotent

Inspect

Search historical voice calls in this workspace by participant name, contact_id, thread, channel, source, and/or date range. Returns one row per call (NOT per turn) with call_id, duration_seconds, outcome, direction, started_at, source, channel_label, and parent_thread_id (the originating chat thread for Telegram-group / Twilio-outbound / Meet calls). Pair with calls.get_transcript(call_id) for the full per-turn transcript. Use this instead of messages.read_history for cross-thread call queries — group calls and Meet sessions live on per-call sub-threads, not on the parent chat thread.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum calls to return (default 20, max 100).
`since`	No	ISO date or datetime lower bound (inclusive). Default: 90 days ago. Naive timestamps are interpreted as UTC.
`until`	No	ISO date or datetime upper bound (inclusive). Default: now.
`source`	No	Filter by voice_sessions.source: 'telegram' (1:1 + group), 'twilio' (PSTN), 'meet' (Google Meet bot), 'livechat' (in-app voice). OMIT to include all sources.
`channel`	No	Filter by message-level channel of the call thread: 'telegram' (1:1 voice or group call sub-thread), 'twilio_voice', 'meet_voice', 'livechat_voice'. OMIT to include all voice channels.
`thread_id`	No	Restrict to calls on this thread OR with this thread as their originating parent (Telegram group → call sub-thread back-link, Twilio outbound source_thread_id back-link).
`contact_id`	No	Filter by exact entity_id (from contacts.find). Mutually exclusive with participant_name when both target the same person.
`participant_name`	No	Filter to calls whose parent thread has a participant matching this name (substring match against entity.title). Resolves group calls via the parent group's roster, not the per-call thread's speaker list.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that the tool returns one row per call (not per turn), the specific fields returned, and behavioral details for several parameters (e.g., thread_id back-links, participant_name substring match via parent group roster, mutual exclusivity of contact_id and participant_name). It does not mention pagination or default limit behavior, but covers many key aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly long but every sentence adds substantial value. It is front-loaded with the main purpose and then provides structured details. There is no redundancy or fluff. Could be slightly tighter, but overall effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description includes the return fields. It also references pairing with another tool for transcripts. It covers many edge cases through parameter descriptions. However, it does not address pagination, sorting, or error conditions, which would make it even more complete for a tool with 8 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant value beyond schema: it explains default values for `since` and `until`, timezone handling for naive timestamps, the distinction between `source` and `channel`, the back-link behavior for `thread_id`, and the mutual exclusivity and resolution logic for `contact_id` and `participant_name`. This goes well beyond the basic schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches historical voice calls by participant name, contact_id, thread, channel, source, and date range. It specifies the return format (one row per call with specific fields) and distinguishes itself from sibling tools like `messages_read_history`.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance: use this instead of `messages_read_history` for cross-thread call queries, and pair with `calls_get_transcript` for full transcripts. It explains when group calls and Meet sessions live on sub-threads, implying appropriate use cases. However, it does not explicitly state when NOT to use (e.g., for active calls).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_makeAInspect

Place an outbound AUDIO/VOICE phone call via Twilio (PSTN) or Telegram (MTProto 1:1 call). Use this any time the user asks to 'call', 'ring', 'phone', 'dial', or have a spoken conversation. Do NOT use messages.send when the user asks to call someone — a call is real-time voice, not a text message. You conduct the conversation as the voice agent using the provided greeting and instructions.

ParametersJSON Schema

Name	Required	Description
`channel`	No	Voice transport: 'twilio' (phone via PSTN — requires phone_number in E.164) or 'telegram' (MTProto 1:1 call — requires telegram_user_id, NOT a phone number or thread_id). OMIT to use 'twilio' (the default).
`greeting`	Yes	The first sentence the agent speaks immediately when the call connects. ALWAYS provide a greeting — without it the caller hears silence. Keep it short and natural. Example: 'Hi, this is Diana calling from DialogBrain. Do you have a moment to chat?'
`report_back`	No	When to re-invoke you after the call ends. 'on_answer' (default) = only if the call was answered, 'always' = even on missed/failed calls, 'never' = fire and forget. Transcript is always stored regardless of this setting.
`instructions`	No	What to do during the call — objective, questions, tone. The AI generates a natural opening and guides the conversation. Example: 'Call about invoice #1234. Ask if they received it and when payment is expected. Be friendly and professional.'
`phone_number`	No	Destination phone number in E.164 format (e.g., '+15551234567', '+66812345678'). Required when channel='twilio'.
`telegram_user_id`	No	Destination Telegram user ID (decimal int64 as string, e.g. '123456789'). Required when channel='telegram'. The caller account must have had prior interaction with this user — a cold contact cannot be reached via voice.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It explains that transcript is always stored, the re-invoke behavior via report_back, and that prior interaction is needed for Telegram calls. However, it does not mention cost implications or explicit side effects like call recording, leaving some behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose and usage guidelines. It is concise relative to the complexity of the tool but could be slightly shorter by reducing redundancy with schema descriptions. Overall, it is well-structured and informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, two channels, no output schema), the description covers the main points: when to use, channel requirements, post-call behavior, and default settings. It could mention error handling or call duration, but it is largely complete for the agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The tool description adds context beyond the schema by explaining the overall flow, default channel, and the requirement for prior Telegram interaction. This adds semantic value, justifying a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it places outbound audio/voice calls via Twilio or Telegram. It specifies the verb 'Place', the resource 'outbound AUDIO/VOICE phone call', and distinguishes from messages.send. It also mentions the agents it uses, providing a specific and clear purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use (user asks to call, ring, phone, dial, spoken conversation) and when not (do not use messages.send). It also gives context about defaults and requirements, such as channel-specific parameters, making it easy for the agent to decide when this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_meetA

Read-onlyIdempotent

Inspect

Dispatch a workspace AI agent into an active Google Meet call. The agent joins as a participant — it can hear the conversation, respond via TTS, see the shared screen (when vision is enabled on the agent), and answer questions about what's on screen. Use when the operator wants to delegate live meeting attendance to an agent (notes, Q&A, summarization, real-time support). The Meet URL must be in canonical 3-4-3 form, e.g. https://meet.google.com/abc-defg-hij. Lookup-redirect URLs are not supported — operator must use the share-link form.

ParametersJSON Schema

Name	Required	Description
`agent_id`	Yes	ID of an active voice agent in this workspace (has at least one incoming_call trigger on a voice channel_type). Get it from agents.list.
`meet_url`	Yes	Canonical Google Meet URL — must match https://meet.google.com/<3 letters>-<4 letters>-<3 letters>, e.g. https://meet.google.com/abc-defg-hij. lookup/ redirects are NOT supported.
`vision_mode`	No	Screen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call the vision_query tool for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and the executor splices the latest scene-change into each agent turn as ambient low-detail context. OMIT to use 'off' (the default).

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since annotations are absent, the description must fully disclose behavior. It explains the agent joins, hears, responds via TTS, sees screen with vision, and answers questions. However, it doesn't mention call termination, agent leaving, or permission requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with the primary action, then details. Every sentence adds information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description covers tool purpose, usage, parameter constraints, and behavioral details. Lacks only a few details like call termination or return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds significant value by explaining agent_id constraints, meet_url canonical form, and detailed vision_mode options beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool dispatches an AI agent into an active Google Meet call. It distinguishes from siblings like calls_make by specifying delegation to an agent for live meeting attendance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use case: delegate live meeting attendance for notes, Q&A, summarization. Also clarifies the required URL format and unsupported lookup-redirect URLs. Lacks explicit 'when not to use' or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_telegram_callA

Read-onlyIdempotent

Inspect

Dispatch a workspace AI agent into an active Telegram group call (t.me/call/ link). The agent joins as a participant via the workspace's Telegram account — it can hear the conversation, respond via TTS, see shared screens (when vision is enabled), and answer questions about what's on screen. Use when the operator wants to delegate live group-call attendance to an agent (notes, Q&A, summarization, real-time support). Pass either the full https://t.me/call/ URL or the bare slug token.

ParametersJSON Schema

Name	Required	Description
`agent_id`	Yes	ID of an active voice agent in this workspace (has at least one incoming_call trigger on a voice channel_type). Get it from agents.list.
`vision_mode`	No	Screen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call vision_query for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and splices the latest scene-change into each agent turn. OMIT to use 'off' (the default).
`telegram_call_url`	Yes	Telegram group-call invite — either the full URL (https://t.me/call/<slug>) or just the slug token. Slug is 12-64 chars from [A-Za-z0-9_-].
`channel_account_id`	No	Workspace Telegram channel account ID that joins as the bot. Optional — when the workspace has exactly one Telegram account, it's used by default. Required when multiple Telegram accounts exist.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It adequately describes the agent's behavior: joins as a participant, hears conversation, responds via TTS, sees screens if vision enabled, answers questions. It does not disclose potential side effects (e.g., if the agent is already in another call), but covers the primary behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is two sentences: first explains the action and capabilities, second provides usage guidance and input format. It is concise, front-loaded, and contains no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains what happens (agent joins and can perform actions) but does not specify return value. This is a minor gap; otherwise, the description is complete for the tool's purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, so baseline is 3. The description adds value by clarifying the telegram_call_url format (full URL or bare slug), and explaining when channel_account_id is optional or required. This goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Dispatch'), the resource ('workspace AI agent into an active Telegram group call'), and provides specific details about agent capabilities. This distinguishes it from sibling tools like calls_send_to_meet or calls_make, as it is specific to Telegram group calls.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'Use when the operator wants to delegate live group-call attendance to an agent.' It also explains what the agent can do (hear, respond via TTS, see screens, answer questions). However, it does not mention when not to use or provide alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_waitA

Read-onlyIdempotent

Inspect

Block until a voice call ends (status changes from 'active') or timeout elapses. Returns ended=true with final state when the call has ended; ended=false on timeout (re-issue to keep waiting). The returned state includes outcome so callers can branch on pickup vs. no-answer (answered/no_answer/busy/declined/failed/unknown). Default timeout 90s; cap 110s — bounded by nginx proxy_read_timeout 120s on /mcp.

ParametersJSON Schema

Name	Required	Description	Default
`call_id`	Yes	Call ID returned by calls.make in _meta.call_id.
`timeout_seconds`	No	Max seconds to wait. Default 90, cap 110 (bounded below nginx 120s proxy_read_timeout). On expiry returns ended=False with status='active' so the caller can re-issue to keep waiting.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses blocking behavior, timeout handling, return of ended flag and outcome, and the underlying nginx limit. No hidden behaviors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action, no extraneous words. Every sentence adds critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but the description adequately explains return values (ended flag, outcome, final state) and polling semantics. Covers timeout handling and system constraints. Complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value: specifies call_id source (_meta.call_id), default timeout 90, cap 110, and behavior on expiry (ended=False with status='active'). Deepens understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool blocks until a voice call ends or timeout elapses, with specific verb 'block' and resource 'voice call'. It distinguishes from siblings like calls_hangup or calls_get_transcript by focusing on waiting/polling behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use (after a call is made to wait for completion), what to do on timeout (re-issue), and mentions default/cap. It lacks explicit exclusions or alternatives, but the polling pattern is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_add_fileAInspect

Add a file to a knowledge collection.

The file must be uploaded and indexed first (files_upload + files_ingest). If the file was previously removed, it is re-enabled.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to add (from files_upload)
`collection_id`	Yes	ID of the collection

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Discloses the re-enable behavior and implies mutation. Lacks details on permissions or side effects but is sufficient for a simple add.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no fluff. Front-loaded with main action, followed by prerequisite and nuance. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and description does not mention return value. For a simple add tool, it is adequate but could be more complete by indicating what the response will be.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions. The description does not add additional meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Add a file to a knowledge collection.' It provides a specific verb (add) and resource (knowledge collection), and differentiates from siblings like collections_remove_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions prerequisites (file must be uploaded and indexed) and a behavioral nuance (re-enables removed files). While it doesn't explicitly state when not to use, the context is clear and helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_assign_agentAInspect

Assign a knowledge collection to an AI agent.

Once assigned, the agent's knowledge.query will automatically scope RAG search to files in its assigned collections.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the AI agent
`collection_id`	Yes	ID of the collection to assign

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, and the description fails to disclose whether the assignment is additive or overwriting, any prerequisites (e.g., agent must exist), or potential errors. The burden is on the description, but it only states the positive effect.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: one for purpose, one for consequence. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple assignment tool with two required ID parameters and no output schema, the description covers the main outcome and effect on agent behavior. Lacks mention of reversibility, but sibling tool exists for that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds no extra meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Assign a knowledge collection to an AI agent' with a specific verb and resource, and distinguishes from its sibling 'collections_unassign_agent'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on the effect (scopes RAG search) and implies use for setting agent knowledge, but does not explicitly state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_createAInspect

Create a named knowledge collection.

Collections group files for RAG search. After creating, add files with collections.add_file and assign to agents with collections.assign_agent.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	Collection name (must be unique per user)
`description`	No	Optional description of the collection

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It fails to disclose behavioral traits such as idempotency, error handling (e.g., duplicate name), or required permissions. The description only states the action without any behavioral context beyond the basic operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences, with the core action in the first sentence and context/next steps in the second. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 parameters and no output schema, the description provides enough context: what a collection is, its purpose (RAG search), and the workflow after creation. However, it lacks any mention of potential errors or behavior under failure, which would make it more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (both parameters have descriptions). The description does not add any additional meaning beyond what is already in the schema. The constraint 'must be unique per user' for name is already in the schema's description. Baseline 3 is appropriate as the schema fully documents the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a named knowledge collection' with a specific verb and resource. It explains the purpose of collections (for RAG search) and distinguishes from sibling tools like collections_add_file and collections_assign_agent by mentioning them explicitly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear context: create a collection, then use collections.add_file and collections.assign_agent for subsequent steps. It implicitly guides when to use this tool (when a new collection is needed) but does not explicitly state when not to use it or provide alternative tools for other scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_deleteA

DestructiveIdempotent

Inspect

Delete a knowledge collection.

If the collection is assigned to agents, prompts, or channels, pass force=true to delete anyway. CASCADE removes all assignments automatically.

ParametersJSON Schema

Name	Required	Description	Default
`force`	No	Force delete even if collection is in use. OMIT for the safe default (refuse to delete in-use collections).
`collection_id`	Yes	ID of the collection to delete

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that deleting a collection may be blocked if assigned, and force=true overrides. No annotation provided so description carries burden; could mention irreversibility or required permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences. First sentence states purpose, second adds critical usage condition. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a delete tool but lacks details on return values, idempotency, or permissions. Without annotations or output schema, more completeness would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds context for force parameter but introduces ambiguous 'CASCADE' not in schema. Slightly confuses but mostly clarifies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Delete a knowledge collection,' using a specific verb and resource. Distinguishes from sibling tools like collections_create and collections_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on when to use force=true (when assigned). Implicitly indicates default behavior (cannot delete if in use). Lacks explicit mention of alternatives like unassigning first.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_listA

Read-onlyIdempotent

Inspect

List all knowledge collections in the workspace.

Collections are named groups of files used for RAG search. Auto-created collections (per-agent, per-prompt) are hidden by default.

ParametersJSON Schema

Name	Required	Description	Default
`include_inactive`	No	Include inactive collections. OMIT to list only active collections (the default).

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description adds value by disclosing that auto-created collections are hidden by default. It lacks permissions or side-effect details but is sufficient for a read-only list tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences, front-loading the main action and avoiding any unnecessary words or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool (one boolean parameter, no output schema), the description fully covers the necessary context: what the tool does, the default behavior, and what collections are.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds 'auto-created collections hidden by default' context but does not significantly enhance parameter meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly and specifically states the tool lists knowledge collections, defines what collections are, and notes that auto-created ones are hidden by default, distinguishing it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by explaining the purpose and default behavior, but does not explicitly mention when to use this tool versus alternatives like collections_list_files.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_list_filesA

Read-onlyIdempotent

Inspect

List all files in a knowledge collection with their indexing status and chunk counts. Each returned file has a file_id (integer) that can be passed to messages.send as attachments=[file_id] to send the file to a contact, or to files.read to read its text content.

ParametersJSON Schema

Name	Required	Description	Default
`collection_id`	Yes	ID of the collection

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It mentions the output includes indexing status and chunk counts, but does not state whether the tool is read-only, requires specific permissions, has pagination, or any rate limits. The lack of such details is a significant gap for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the main purpose, and contains no unnecessary words. Every sentence provides value: the first states the core function, the second adds practical usage information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there is no output schema, the description should elaborate on the return structure. It only mentions 'indexing status and chunk counts' and hints at the file_id usage. This is adequate but lacks details about the shape of the response (e.g., array of objects with specific fields), leaving some ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter, collection_id, is already described in the schema as 'ID of the collection'. The description adds no additional meaning beyond that. Since schema coverage is 100%, baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists files in a knowledge collection with indexing status and chunk counts, and distinguishes from siblings like collections_add_file or collections_list. It also mentions the file_id field and how it can be used with messages.send or files.read, adding specific value.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (listing files in a collection) but provides no explicit guidance on when not to use it or alternatives. Sibling tools like collections_list or files_read are not mentioned as alternatives, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_remove_fileAInspect

Remove a file from a knowledge collection.

The file itself is not deleted — only the collection membership is removed.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to remove
`collection_id`	Yes	ID of the collection

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the file is not deleted, only membership is removed. However, without annotations, it could provide more, such as permissions or side effects (e.g., what happens if file is in no collection).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the action. Every word earns its place; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal tool with two required params and no output schema, the description is largely complete. It could mention return value but is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. The description adds no further semantics beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove a file from a knowledge collection') and the resource, and distinguishes itself from deletion of the file itself. Among siblings, 'collections_add_file' is the counterpart.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives (e.g., files_delete). It does not mention when not to use or provide context for choosing this tool over other file-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_unassign_agentAInspect

Remove a knowledge collection from an AI agent.

The collection and its files are not deleted — only the agent assignment is removed.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	ID of the AI agent
`collection_id`	Yes	ID of the collection to unassign

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description bears the burden. It discloses non-destructive behavior: collection and files remain. No mention of permissions or side effects, which is acceptable for a simple unassignment.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the action. Every word adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with no output schema. Description covers what it does and what it does not do. Missing return value or result type, but contextually adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions. The description does not add extra meaning beyond the schema; baseline applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Remove a knowledge collection from an AI agent.' It distinguishes from deletion by noting collection and files are not deleted. The sibling tool 'collections_assign_agent' implies the inverse. Specific verb+resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use. The description clarifies that only the assignment is removed, which helps avoid misuse, but no alternatives or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_add_channelAInspect

🔗 Link a new channel identity (email, phone, LinkedIn, etc.) to an existing contact.

When to use:

User learns a contact's email or phone and wants to save it
User wants to link a LinkedIn/Instagram profile to an existing contact
Adding a second channel for an existing person

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema

Name	Required	Description
`value`	Yes	Email address, phone number, or username for this channel
`channel`	Yes	Channel type to add
`contact_id`	Yes	entity_id from contacts.find
`display_name`	No	Optional display label for this identity

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description must carry the burden. It describes the action but does not disclose behavior on duplicate channels (e.g., overwrite vs. error) or authentication requirements, leaving gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short paragraphs with clear headings and bullet points. Every sentence adds value, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple add-channel tool, the description covers purpose, usage, and prerequisite. Missing behavior on duplicates is a minor gap, but overall sufficient given schema coverage and tool simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already describes all parameters. The description adds no additional meaning beyond the prerequisite for contact_id, resulting in a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Link a new channel identity ... to an existing contact.' It uses a specific verb and resource, and distinguishes itself from sibling tools like contacts_find and contacts_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists three 'When to use' scenarios and a prerequisite ('Requires contact_id from contacts.find'), providing clear context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_discoverA

Read-onlyIdempotent

Inspect

Search for a contact on a live channel (Telegram, WhatsApp, etc.) before adding them. Use this to look up a person by username or phone number before calling contacts.sync.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Username, phone, or name to search for
`channel`	Yes	Channel name: telegram, whatsapp, etc.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries burden. Indicates it's a search (read) operation, but does not disclose details like error handling or auth. Adequate for a simple search.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences perfectly front-loaded and concise: purpose first, usage guidance second. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low complexity (2 params, no output schema), description covers purpose, usage, and parameter hints adequately. Lacks output format but still sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds example usage ('username or phone number') but not critical new info beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'Search for' and resource 'contact on a live channel', and distinguishes from sibling contacts_sync by saying 'before calling contacts.sync'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use: 'before calling contacts.sync' and how: 'by username or phone number'. Lacks explicit exclusions but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_findA

Read-onlyIdempotent

Inspect

👤 Search for contacts in your address book by name or username.

When to use:

User asks 'find contact X' or 'who is Y?'
User wants to know someone's username or ID
Before sending a message to verify contact exists
To get contact's channel reference for messaging

Examples: ❓ User: 'find contact named [name]' → contacts_search(query='[name]', limit=5)

❓ User: 'who is [full name]?' → contacts_search(query='[full name]', limit=1)

❓ User: 'search for @username' → contacts_search(query='username', limit=10)

Returns: name, username, channel, channel_ref, similarity_score, match_type. Plus:

entity_id: local DB key — pass to contacts.profile. Null for live-discovered contacts (skip contacts.profile for those).
telegram_user_id (when channel='telegram'): the Telegram user ID — pass to calls.make / messages.send. NOT entity_id.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results to return
`query`	Yes	Name or username to search for (supports partial matches)
`channel`	No	Filter by channel. OMIT to search across all channels.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, but the description explains return fields, special cases like entity_id null for live-discovered contacts, and telegram_user_id usage. It sufficiently covers behavioral traits for a read-only search tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with sections and front-loaded purpose, but somewhat verbose with redundancy (e.g., return fields repeated from schema). Could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description fully explains return values and special fields. It covers all necessary context for usage, including entity_id handling and channel specifics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameter descriptions, and the description adds value by showing concrete usage examples (e.g., different query formats, limit usage) beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches for contacts by name or username, with examples differentiating it from sibling tools like contacts_profile or contacts_sync.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use scenarios and examples, but lacks explicit when-not-to-use guidance. Still, the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_profileA

Read-onlyIdempotent

Inspect

👤 Get full profile for a contact: all channel identities, notes, role, capabilities, birthday.

When to use:

After contacts.find to get complete info about a specific person
To see all channels a contact is reachable on
To read notes, role, or capabilities for a contact

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema

Name	Required	Description	Default
`contact_id`	Yes	entity_id from contacts.find

Tool Definition Quality

A4.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It describes the tool as a read operation ('Get full profile') and lists returned data. Lacks mention of edge cases or error handling, but is transparent about the operation and outputs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Very concise: first line states purpose, then bullet points for usage, then requirement. No extraneous information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains what is returned. It also mentions prerequisite (contacts.find). The tool is simple and the context is complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only parameter contact_id has a schema description ('entity_id from contacts.find') which is reinforced and expanded by the description requiring it from contacts.find. Schema coverage is 100%, and description adds value by clarifying the source of the ID.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (Get full profile for a contact) and lists the data it returns (all channel identities, notes, role, capabilities, birthday). Distinguishes itself from sibling contacts.find by specifying it provides 'complete info' about a specific person.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use the tool: after contacts.find, to see all channels, to read notes/role/capabilities. Also provides a requirement: contact_id from contacts.find. No alternatives mentioned but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_syncAInspect

Add a discovered contact and open a conversation thread. Returns thread_id for the new conversation. Call contacts.discover first to verify the contact exists.

ParametersJSON Schema

Name	Required	Description	Default
`channel`	Yes	Channel name: telegram, whatsapp, etc.
`identifier`	Yes	Username or phone number to add

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the return (thread_id) and the prerequisite, but with no annotations, it lacks detail on side effects (e.g., whether it can create duplicate threads, if it modifies existing contacts, or permissions needed). Behavior is partially transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each adding unique value: purpose, return value, and prerequisite. No unnecessary words, tightly structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with only two parameters and no output schema, the description covers purpose, return, and prerequisite. However, it could be more complete by clarifying behavior for existing contacts (e.g., does it open an existing thread or always create new?). Slight gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for both parameters, so the description adds no extra meaning beyond restating the schema fields. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('add a discovered contact and open a conversation thread') and specifies the return value ('Returns thread_id for the new conversation'). It distinguishes from siblings like contacts_discover by noting it as a prerequisite.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells the agent to call contacts.discover first, providing a clear sequential guideline. However, it doesn't specify when not to use this tool or mention alternative tools for related operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_updateAInspect

✏️ Update a contact's profile: name, notes, role, capabilities, birthday, preferred channel.

When to use:

User wants to add notes about a contact
User wants to set/update role or capabilities for a contact
User wants to rename a contact or update birthday

Requires contact_id (entity_id) from contacts.find. At least one optional field must be provided.

ParametersJSON Schema

Name	Required	Description
`role`	No	Contact role (e.g. developer, client, partner). Empty string clears role.
`notes`	No	Free-text notes/context about this contact. Empty string clears notes.
`contact_id`	Yes	entity_id from contacts.find
`birthday_day`	No	Birth day 1-31 (must be set together with birthday_month)
`capabilities`	No	List of capabilities (e.g. ['backend', 'design'])
`display_name`	No	New display name (max 255 chars)
`birthday_year`	No	Birth year 1900-2100 (optional, standalone)
`birthday_month`	No	Birth month 1-12 (must be set together with birthday_day)
`preferred_channel`	No	Preferred channel for contacting this person. OMIT to leave the preferred channel unchanged.

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It indicates mutation ('update') and a prerequisite, but fails to mention side effects such as whether missing optional fields are left unchanged or overwritten, whether the update is partial or full, or what the response looks like. The schema describes clearing behavior for some fields (empty strings), but this is not echoed in the description, leaving behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: a single-line summary with an emoji, followed by bullet-point use cases and a prerequisite note. Every sentence serves a purpose with no redundancy, and the structure is front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, no output schema, no annotations), the description covers core aspects (what, when, prerequisite, constraint) but omits return value, error conditions, and detailed field behavior. It is adequate for basic use but leaves information gaps that an agent may need, preventing a higher score.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter already has a detailed description. The tool description adds a high-level list of updatable fields and the prerequisite for contact_id, but does not clarify constraints like the mutual requirement of birthday_day and birthday_month, or the clearing behavior of empty strings for role/notes. This adds limited value beyond the schema, resulting in a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update a contact's profile' and lists specific fields (name, notes, role, capabilities, birthday, preferred channel). It distinguishes itself from sibling tools like contacts_find (find), contacts_profile (view), and contacts_sync (sync) as the dedicated update tool, leaving no ambiguity about its purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' scenarios (add notes, set/update role/capabilities, rename, update birthday) and important prerequisites (requires contact_id from contacts.find, at least one optional field). It does not list alternatives or exclusions, but the guidance is clear and sufficient for typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

documents_createAInspect

Render a document (PDF / HTML / PPTX / DOCX) and save it to the workspace.

This tool has two input pipelines — pass exactly one of content_html or content_markdown.

Pipeline A — `content_html` (canonical for decks, proposals, designed pages)

You author full HTML+CSS. A baked-in design-system preamble ships first (<style> with Inter/Manrope as data-URI fonts, CSS-variable palette tokens, 8px spacing scale, and pre-styled layout helpers); your markup and any of your own <style> blocks land after the preamble so you can override anything. Chromium renders the assembled document into a static PDF — JavaScript is disabled and DNS is blackholed, so external font / image / script fetches will fail by configuration.

Required when this pipeline is used:

title — human-readable, used for PDF metadata and the saved filename.
content_html — the <body> and any custom <style> blocks. The renderer wraps this in <html>…</html> and injects the preamble + a canonical <meta charset> + <title>. Do NOT emit <script>, <iframe>, <object>, <embed>, <meta>, <link>, <base>, <form>, or event handlers — the sanitizer strips them.
output_type — "pdf" or "html". ("pptx" and "docx" require content_markdown since they need structured markdown intermediates.)

Optional:

page_preset — "slide_16_9" (default for any deck), "a4" (default for flowing documents — used if omitted), "letter", or "none" (you declare your own @page rule).
design_tokens — flat dict overriding the preamble's CSS variables. Whitelisted keys: brand_primary, accent, surface_dark (hex color), font_display, font_body (font name from ['Inter', 'Manrope', 'monospace', 'sans-serif', 'serif', 'system-ui', 'ui-monospace', 'ui-sans-serif', 'ui-serif']).
language — BCP-47 tag (default "en"). Drives <html lang>.

Slide structure (`page_preset="slide_16_9"`)

Each slide is <section class="slide …">…</section>. The base .slide class is what sizes it to the viewport and forces the page break — do not drop it. Composable variants (apply alongside .slide):

.slide-cover — gradient hero, big display title.
.slide-split — two equal columns, image + narrative.
.slide-stats — three-up KPI cards (use <div class="stat"> with .stat-value + .stat-label inside).
.slide-quote — centered pull quote + <cite> attribution.

Layout helpers (work in any preset): .grid-2, .grid-3, .split, .stack, .cluster, .callout, .muted, .kbd.

Speaker notes

<aside class="notes">…text…</aside> inside a <section class="slide">. The sanitizer strips them from the rendered PDF and returns them as slide_notes[] (parallel to slide order). Orphan notes outside any slide are dropped with a warning.

Images

Only these src schemes resolve:

file:NNN — workspace file_id.
data:image/...;base64,... — inline.
https://<host> where <host> ∈ DOCUMENTS_MEDIA_URL_ALLOWLIST. Other URLs are dropped and replaced with an HTML comment placeholder.

Pipeline B — `content_markdown` (invoice / contract only)

Required:

title, content_markdown, output_type.

Optional:

theme — "invoice" or "contract". Triggers the corresponding exemplar styling and (for invoices) the arithmetic validator that fail-closes on missing or mismatched totals.
language — BCP-47 (default "en").

Delivery contract (CRITICAL)

After this tool returns file_id, deliver the file with messages.send(attachments=[file_id], text="<short caption>"). Embedding the file_id in a markdown link, sandbox: URL, or /api/files/<id>/download text will render as plain text on the recipient's channel — the attachments parameter is the only way the file actually attaches.

Exemplars

INVOICE (English):

Invoice INV-{YYYYMMDD-HHMMSS}

From: {Issuer Legal Name}, {Address}, {Tax ID} To: {Customer Name}, {Customer Address}, {Customer Tax ID} Issue date: {YYYY-MM-DD} Due date: {YYYY-MM-DD}

Description	Qty	Unit price	Total
{Service 1}	1	1500.00	1500.00
{Service 2}	2	500.00	1000.00

Subtotal: USD 2500.00 Tax (20%): USD 500.00 Total: USD 3000.00

Payment: {bank details OR crypto wallet — never both}

INVOICE (Russian):

Счёт-фактура № INV-{YYYYMMDD-HHMMSS}

От: {Юридическое название организации}, {Адрес}, ИНН {Tax ID} Кому: {Название клиента}, {Адрес клиента}, ИНН {Tax ID} Дата: {YYYY-MM-DD} Срок оплаты: {YYYY-MM-DD}

Описание	Кол-во	Цена	Сумма
{Услуга 1}	1	1500.00	1500.00
{Услуга 2}	2	500.00	1000.00

Подытог: USD 2500.00 НДС (20%): USD 500.00 Итого: USD 3000.00

Реквизиты: {банковские реквизиты ИЛИ криптокошелёк — не оба сразу}

CONTRACT (English):

Service Agreement

Between: {Provider Legal Name}, {Address} ("Provider") And: {Client Legal Name}, {Address} ("Client") Effective date: {YYYY-MM-DD}

1. Scope of services

{Concise description of what Provider agrees to deliver.}

2. Term

This Agreement begins on the Effective date and continues until {termination condition or end date}.

3. Compensation

Client pays Provider {amount and currency} according to {payment schedule}.

4. Confidentiality

Both parties agree to keep proprietary information of the other party confidential during and after the term of this Agreement.

5. Termination

Either party may terminate with {N} days' written notice.

6. Governing law

{Jurisdiction}.

Provider: ____________________ Client: ____________________ {Provider signatory name} {Client signatory name}

CONTRACT (Russian):

Договор оказания услуг

Между: {Юридическое название Исполнителя}, {Адрес} ("Исполнитель") И: {Юридическое название Заказчика}, {Адрес} ("Заказчик") Дата вступления в силу: {YYYY-MM-DD}

1. Предмет договора

{Краткое описание услуг, которые Исполнитель обязуется оказать.}

2. Срок действия

Договор вступает в силу с указанной даты и действует до {условие прекращения или дата окончания}.

3. Стоимость и порядок оплаты

Заказчик оплачивает услуги Исполнителя в размере {сумма и валюта} в порядке {график платежей}.

4. Конфиденциальность

Стороны обязуются сохранять конфиденциальность сведений, полученных в ходе исполнения настоящего Договора, в течение срока его действия и после его прекращения.

5. Расторжение

Любая из сторон вправе расторгнуть Договор, направив письменное уведомление не менее чем за {N} дней.

6. Применимое право

{Юрисдикция}.

Исполнитель: ____________________ Заказчик: ____________________ {ФИО подписанта Исполнителя} {ФИО подписанта Заказчика}

ParametersJSON Schema

Name	Required	Description	Default
`theme`	No	Invoice or contract styling for content_markdown. Rejected with content_html (use design_tokens + your own CSS instead). OMIT for default (unthemed) styling.
`title`	Yes	Short human-readable title for the document.
`language`	No	BCP-47 language tag (e.g. 'en', 'ru', 'zh', 'ja'). Drives <html lang> and (markdown path) font fallback for non-Latin scripts.	en
`output_type`	Yes	Renderer target: 'pdf' \| 'pptx' \| 'docx' \| 'html'. PPTX/DOCX require content_markdown.
`page_preset`	No	Page geometry for content_html. 'slide_16_9' = 1280x720 deck, 'a4'/'letter' = flowing document, 'none' = LLM declares its own @page. Defaults to 'a4' inside the html branch when omitted. Rejected with content_markdown.
`content_html`	No	Full HTML body (with optional <style> blocks) for the canonical Chromium pipeline. Mutually exclusive with content_markdown.
`design_tokens`	No	Flat dict of CSS-variable overrides for content_html. Whitelisted keys: brand_primary, accent, surface_dark (hex color), font_display, font_body (Inter\|Manrope\|system-ui\|ui-sans-serif\|ui-serif\|ui-monospace\|sans-serif\|serif\|monospace). Unknown keys / invalid values are dropped with a warning. Rejected with content_markdown.
`content_markdown`	No	Markdown body for the invoice/contract pipeline. Mutually exclusive with content_html.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully carries the burden. It discloses critical behaviors: image restrictions, delivery contract, invoice total validation, formatting conventions for slides, and rules for themes. This is exceptionally transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (REQUIRED, Optional, DELIVERY CONTRACT, CONVENTIONS, etc.) but is quite lengthy due to extensive exemplars. While every section adds value, the length slightly impacts conciseness, though it remains well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, no output schema, many edge cases), the description is remarkably complete. It covers the return contract, formatting conventions, image restrictions, error cases (document+pptx), and provides exemplars for invoices/contracts. No missing critical information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema coverage is 100%, the description adds significant meaning beyond the schema: it explains theme triggers for invoice/contract styling, language for font fallback, content_markdown slide separation using '---', format/presentation distinctions, and output_type restrictions. This greatly aids correct parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Generate a document (PDF / PPTX / DOCX / HTML) from markdown content authored by you.' It identifies the specific verb (generate), resource (document), and supported formats, distinguishing it clearly from sibling tools (no other document creation tool present).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit delivery instructions (DELIVERY CONTRACT) and format/output_type rules (e.g., document+pptx rejected). It provides context on when to use different formats and themes but does not explicitly mention when not to use this tool or alternative tools (though no direct alternatives exist among siblings).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

feedback_saveAInspect

Save a behavioral rule, preference, or correction that should guide future agent behavior. Use this when the user gives explicit guidance like 'always reply in Russian', 'don't suggest meetings before 11am', or 'invoice link goes via email, not chat'. Structure the rule as: the rule itself, why it matters (if stated), and how to apply it. Scope: 'workspace' for org-wide rules, 'agent' for per-agent overrides, 'person' for per-contact preferences. Prefer feedback.save over notes.save for anything that's instructive rather than informational.

ParametersJSON Schema

Name	Required	Description
`key`	Yes	Short identifier for this rule (e.g. 'reply_language', 'meeting_hours'). Must not start with '__' (reserved).
`why`	No	Why this rule matters (optional but recommended for the distiller).
`rule`	Yes	The rule itself, in imperative form. Required.
`scope`	Yes	Scope of the rule. 'workspace' for org-wide rules; 'agent' for per-agent overrides; 'thread' for conversation-specific guidance; 'person' for per-contact preferences. 'global' accepted as deprecation alias for 'agent'.
`how_to_apply`	No	When/how to apply the rule (optional). Helpful for conditional rules like 'apply when speaking to Russian-speaking customers'.
`scope_ref_id`	No	Required for scope='thread' (thread_id) and scope='person' (person_id).
`target_agent_id`	No	Target agent. In agent mode optional (defaults to self); required from MCP. Ignored when scope='workspace'.

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It explains the structure of the rule and scope but does not disclose side effects (e.g., overwrite behavior, permissions, or reversibility). This is adequate but leaves some behavioral assumptions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with front-loaded purpose, followed by examples, structure guidance, scope explanation, and sibling comparison. Every sentence adds value and no information is redundant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, no output schema), the description covers purpose, usage, parameter relationships, and sibling distinction. It lacks details on return values or error handling, but is otherwise fairly complete for a save operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds meaningful context by explaining the structure ('the rule itself, why it matters, how to apply it') and scope meanings, which goes beyond the schema descriptions. This adds value for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Save a behavioral rule, preference, or correction that should guide future agent behavior.' It provides concrete examples and explicitly distinguishes from notes.save, making it easy to understand what the tool does and how it differs from a sibling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit usage context with examples ('Use this when the user gives explicit guidance like...') and scope guidelines. It also advises preferring feedback.save over notes.save for instructive content. However, it does not explicitly state when not to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_get_base64A

Read-onlyIdempotent

Inspect

Download one or more files server-side and return their content as base64-encoded strings. Use this to inspect images, PDFs, or any binary file attached to messages when you cannot access presigned S3 URLs directly. Supports up to 5 files per call, max 15 MB each. For large files batch in groups of 1-2 to avoid oversized responses.

ParametersJSON Schema

Name	Required	Description	Default
`file_ids`	Yes	List of file IDs to fetch as base64 (max 5). Get IDs from files.info or message attachment_file_ids.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses server-side operation, base64 output, file limits, and batching guidance. No contradictions with missing annotations. Could mention if read-only or destructive, but overall good transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three succinct sentences: purpose, use case, constraints. No redundant information; each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema or annotations, the description covers purpose, usage context, parameter source, and limitations. Sufficient for correct invocation without additional details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already describes file_ids thoroughly. Description adds source for IDs (files.info or attachment_file_ids) and reinforces max limit, providing value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool downloads files and returns base64-encoded content, with specific use cases (inspecting images, PDFs) and distinguishes from siblings like files_read or files_info by mentioning server-side download and inability to access presigned URLs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides when to use (when cannot access S3 URLs) and constraints (max 5 files, 15 MB each, batching advice). Lacks explicit when not to use or alternative tools, but context implies alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_infoA

Read-onlyIdempotent

Inspect

Get metadata and download URLs for files by their IDs.

When to use:

After messages_read_history returns attachment_file_ids
To get a presigned download URL to read a received file

Returns: filename, mime_type, byte_size, download_url (1-hour presigned URL).

ParametersJSON Schema

Name	Required	Description	Default
`file_ids`	Yes	List of file IDs (max 20)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries full burden. It discloses return fields: filename, mime_type, byte_size, download_url with a 1-hour expiry, which is good transparency for a metadata tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences front-loaded with purpose, followed by usage context and return description. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low complexity, complete schema coverage, and no output schema, the description covers purpose, usage, parameters, and return value sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (file_ids) with 100% schema coverage; description adds no new meaning beyond what is in the schema (list of integer IDs, max 20). Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Get metadata and download URLs for files by their IDs.' The verb 'Get' and resource 'files' are specific, and the tool is distinct from sibling tools like files_get_base64 (base64 content) and files_read (content).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: after messages_read_history returns attachment_file_ids and to get a presigned download URL. No alternatives or exclusions are mentioned, but for a simple retrieval tool, this guidance is adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_ingestAInspect

Save and index a file into the knowledge base. Use this when the user asks to save, store, or remember a document. The file will be processed (OCR if needed) and indexed for future search.

ParametersJSON Schema

Name	Required	Description
`tags`	No	Optional list of tags for categorization (e.g., ['presentation', 'dextrade']).
`title`	No	Human-readable title for the file (e.g., 'Project Presentation', 'Q1 Report'). If not provided, uses original filename.
`file_id`	Yes	ID of the file to ingest (from attachment_file_ids in context).
`thread_id`	No	Optional thread ID to associate the file with. If not provided, uses context thread.
`description`	No	Optional description of the file contents.

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description partially discloses behaviors (OCR processing, indexing) but omits prerequisites, side effects, or limits like file size or required upload step.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences front-loading purpose and usage, with no redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks return value details, error conditions, and explicit prerequisites (e.g., file must be uploaded first). Adequate but not fully complete for a 5-parameter tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds no per-parameter meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs 'save and index' and clearly identifies the resource as 'file into the knowledge base', distinguishing it from siblings like files_upload or files_read.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('when the user asks to save, store, or remember a document'), but does not mention when not to use or provide alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_readA

Read-onlyIdempotent

Inspect

Read text content of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use files.get_base64, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run files.ingest first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.

ParametersJSON Schema

Name	Required	Description	Default
`file_id`	Yes	ID of the file to read (from attachment_file_ids in context).
`encoding`	No	Text encoding to use (default: utf-8).	utf-8
`max_chars`	No	Maximum characters to return (default: 10000). Use smaller values for large files.
`summarize`	No	If true, generate AI summary instead of returning raw content. Use for 'summary', 'summarize', 'краткое содержание' requests. OMIT to return raw content (the default).

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses supported file types, that PDFs return OCR text after ingest, and that images are not handled. Does not explicitly state read-only nature, but it's implied. Good coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no wasted words. Core purpose, usage, and file type info front-loaded. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read tool with 4 params and no output schema, the description covers file types, prerequisites, and alternatives. Could mention return format (raw text or summary), but not critical. Overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed param descriptions. Description adds value by linking the 'summarize' parameter to user requests for summaries, and implies encoding and max_chars usage. Good complement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool reads file contents, specifies supported file types, and distinguishes from sibling tools like files.get_base64 for images. It includes example user requests as usage cues.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use (user asks 'what is in this file?') and when not to (for images, use files.get_base64). Also notes prerequisite for PDFs (needs files.ingest first).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_uploadAInspect

Upload a file to DialogBrain and get a file_id for use in messages_send.

When to use:

User wants to send a file/image to a contact
Before calling messages_send with an attachment

Returns: file_id (integer) to pass to messages_send attachments parameter.

ParametersJSON Schema

Name	Required	Description	Default
`title`	No	Optional display title
`content`	No	Base64-encoded file bytes. Either content OR source_url is required.
`filename`	No	Filename with extension (e.g. 'photo.png')	upload
`mime_type`	No	MIME type (e.g. 'image/png', 'application/pdf')	application/octet-stream
`source_url`	No	Public URL to fetch file from. Either content OR source_url is required.

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries behavioral burden. It mentions upload and return of file_id but does not disclose limitations (e.g., file size limits, authentication needs, side effects). Adds some value but lacks depth for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Short, front-loaded description with bullet points. Every sentence adds value. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description returns file_id. Lacks info on error handling, supported MIME types, or size limits. Adequate for basic use but incomplete for edge cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 5 parameters with descriptions (100% coverage). Description does not add beyond schema; it only mentions the return file_id. Baseline 3 is appropriate since schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Upload a file to DialogBrain and get a file_id for use in messages_send.' Clearly specifies verb (upload), resource (file), and outcome (file_id). Distinguishes from siblings like files_get_base64 and files_read by focusing on upload and subsequent messaging use.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use: 'User wants to send a file/image to a contact' and 'Before calling messages_send with an attachment.' Provides clear context, though it could mention when not to use (e.g., for other file operations).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_createAInspect

📁 Create a new inbox folder to organize threads.

When to use:

User wants to create a folder to group related conversations
User wants to organize threads by topic, project, or contact type

After creating a folder, use threads.update with folder_id to move threads into it.

ParametersJSON Schema

Name	Required	Description	Default
`icon`	No	Emoji icon for the folder (max 10 chars, optional)
`name`	Yes	Folder name (max 100 chars)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It indicates a mutation (creates a resource) and implicitly mentions folder_id is returned. However, it does not disclose potential side effects, error conditions, or permissions needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: 5 lines with an emoji, verb, bullet points, and a forward-reference. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks explicit return value description but implies folder_id via post-creation hint. References a related tool (threads.update). For a simple create tool, this is reasonably complete given the context of sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers both parameters with descriptions (100% coverage). The description adds no additional semantics beyond the schema, such as examples or format constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a new inbox folder to organize threads, using the verb 'create' and resource 'inbox folder'. It is distinct from siblings like folders_delete (delete) and threads_update (move threads).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear 'when to use' scenarios (user wants to create a folder, organize threads) and a post-creation hint to use threads.update. However, it does not explicitly state when not to use it or list alternatives beyond the sibling reference.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_deleteAInspect

🗑️ Delete an inbox folder. Threads inside become unfiled (not deleted).

When to use:

User wants to remove a folder they no longer need
User wants to clean up their inbox organization

Threads inside the folder are NOT deleted — they simply move back to the inbox.

ParametersJSON Schema

Name	Required	Description	Default
`folder_id`	Yes	ID of the folder to delete

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses key behavioral trait: threads become unfiled but are not deleted. This is essential for understanding the tool's impact.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Short, uses emoji for visual cue, two clear sentences plus bullet list. Every sentence adds value; no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 1 parameter, no output schema, and no annotations, description fully covers the tool's behavior including side effects on threads. Complete for an agent to understand usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has one parameter with description. Schema coverage is 100%, so baseline 3. Description adds no extra meaning beyond the schema; it does not specify format or validation rules.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states deletion of an inbox folder and explains consequence for threads (unfiled, not deleted). Distinguishes from siblings like folders_create.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides two when-to-use scenarios: removing unneeded folders and cleaning up inbox organization. Also clarifies that threads inside are not deleted.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_addAInspect

Add a specific group to your discovery list by @username or invite link (t.me/...).

When to use:

You already know the group's @username or invite link
Adding a known group without searching

Returns: group metadata including id, title, member_count.

ParametersJSON Schema

Name	Required	Description	Default
`link`	Yes	The group's @username or invite link (e.g. '@phuket' or 't.me/...')
`channel`	Yes	Channel the group is on (e.g. 'telegram')

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description carries full burden. It discloses return value (group metadata including id, title, member_count) which is helpful. However, it does not specify failure modes or idempotency behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise: action, when-to-use, and returns. Every sentence adds value, and it is front-loaded with the core purpose. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple add tool with two parameters and no output schema, the description fully covers purpose, usage context, parameter semantics, and return value. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds meaning by clarifying 'link' accepts @username or t.me/... links and 'channel' is the platform like 'telegram'. This goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action: add a specific group to your discovery list via @username or invite link. The verb 'add' and resource 'discovery list' are specific, and it distinguishes from sibling tool 'group_discovery_search' which is for searching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a 'When to use' section specifying preconditions: you already know the group's @username or invite link, and you are adding a known group without searching. This provides clear context but does not explicitly mention when not to use or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_joinAInspect

Join a group and start syncing its messages to your inbox. The group must be in your discovery list (use group_discovery.search or group_discovery.add first).

What this does:

Joins the group on Telegram (or other channel)
Creates a thread in your inbox for syncing messages
Optionally enables AI auto-reply drafts

Returns: success, thread_id, auto_reply_enabled.

ParametersJSON Schema

Name	Required	Description	Default
`group_id`	Yes	ID of the discovered group (from group_discovery.search or group_discovery.list)
`enable_auto_reply`	No	Enable AI auto-reply drafts for messages in this group. Drafts can be reviewed and sent manually. Default: true.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full burden. It discloses key behaviors: joining the group, creating a thread, optional auto-reply. It also lists return values. However, it does not mention potential side effects like notification noise or irreversibility, which would elevate transparency further.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a prerequisite sentence and a bullet list of actions. It is concise with no unnecessary words, but the bullet list uses spaces after dashes which is minor. Overall, it efficiently conveys all necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 parameters and no output schema, the description covers inputs, actions, and return values. It explains the prerequisite and optional behavior. It could mention error scenarios (e.g., group not found or already joined) to be fully complete, but it is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so schema already describes both parameters. The description adds value by linking group_id to discovery results, explaining the effect of enable_auto_reply, and indicating return fields (thread_id, auto_reply_enabled). This goes beyond the schema's static descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'join' and resource 'group', and details the actions: joins group, creates thread, optionally enables auto-reply. It distinguishes from sibling tools by specifying the prerequisite that the group must be in the discovery list, contrasting with search/add/list tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit prerequisite guidance: 'The group must be in your discovery list (use group_discovery.search or group_discovery.add first).' This tells agents when to use this tool, though it does not explicitly state when not to use it among alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_listA

Read-onlyIdempotent

Inspect

List groups you've found and joined in this workspace.

Lifecycle values:

discovered: found but not yet evaluated
bookmarked: saved for later
monitored: joined and actively syncing messages
dismissed: hidden

By default, dismissed groups are excluded. Returns: id, title, member_count, lifecycle, scan_status, overall_score.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results (1-100, default 20)
`offset`	No	Pagination offset. OMIT to start at row 0 (default).
`channel`	No	Filter by channel (e.g. 'telegram'). Optional.
`lifecycle`	No	Filter by state: discovered, bookmarked, monitored (=joined/syncing), dismissed. OMIT to include all states (dismissed excluded by default elsewhere).
`min_score`	No	Minimum overall score (0.0-1.0). Optional.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries the full burden. It explains the lifecycle states and that dismissed groups are excluded by default, and lists return fields. This gives sufficient behavioral context for a list tool, though it does not mention auth needs or rate limits, which are less critical here.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, starting with the main action and then providing lifecycle details and defaults. Each sentence adds necessary information without redundancy. It is efficiently structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool, the description covers all important aspects: what it lists, return fields, lifecycle filters, default exclusion, and pagination parameters are implied by schema. No gaps are evident given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the meaning of lifecycle states (e.g., 'discovered: found but not yet evaluated') and the default exclusion of dismissed, which goes beyond the schema's enum list. This enriches parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists groups the user has found and joined in the workspace. It specifies the verb 'list' and the resource, and contrasts with sibling tools like group_discovery_search by implying a broad listing vs filtering. The returned fields are listed, providing clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus sibling tools like group_discovery_search or group_discovery_scan. The description only states what it does, but does not exclude cases where other tools might be more appropriate, leaving the agent without decision support.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_preview_messagesA

Read-onlyIdempotent

Inspect

Read recent public messages from a group without joining it. Only works for groups where can_preview_history=true.

Use this to manually evaluate message quality before deciding to join. For an automated quality score, use group_discovery.scan instead.

Returns: list of recent messages with sender, text, date, is_reply.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Number of recent messages to fetch (1-100, default 20)
`group_id`	Yes	ID of the discovered group (from group_discovery.search or group_discovery.list)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the precondition (can_preview_history=true) and the return fields. However, it does not specify behavior when the condition fails (e.g., error response), which is a minor gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three terse sentences: action, usage guidance with alternative, return format. No redundant text, all sentences earn their place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a two-parameter read tool with no output schema, the description covers purpose, precondition, return format, and sibling differentiation. It lacks explicit mention of error handling or message ordering, but remains largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for both parameters. The description adds no new semantic information beyond what the schema already provides, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Read recent public messages from a group without joining it' with specific verb and resource. It distinguishes itself from the sibling tool group_discovery.scan by specifying manual evaluation vs automated scoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use ('manually evaluate message quality before deciding to join') and provides an alternative ('For an automated quality score, use group_discovery.scan instead'). This gives clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_scanAInspect

Scan a group to evaluate its quality before joining. Fetches recent messages, analyzes activity, spam, and engagement, then returns a quality score and plain-English verdict.

When to use:

After finding groups with group_discovery.search
Before deciding which groups to join

Returns: overall_score (0-1), is_disqualified, disqualify_reasons, individual scores, and a verdict string.

ParametersJSON Schema

Name	Required	Description	Default
`group_id`	Yes	ID of the discovered group (from group_discovery.search or group_discovery.list)

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description fully carries burden. It discloses that the tool fetches recent messages, analyzes activity/spam/engagement, and returns scores and a verdict. No destructive side effects are indicated, and no contradiction with missing annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: four sentences plus bullet points. It front-loads the main action and every sentence adds value. No redundant or missing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description lists return fields (overall_score, is_disqualified, etc.), providing completeness. It covers purpose, usage, parameters, and returns, making it sufficient for an agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has one parameter with 100% coverage description. The description adds value by specifying that group_id comes from group_discovery.search or group_discovery.list, linking it to the workflow. This is helpful beyond the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Scan a group to evaluate its quality before joining.' It uses a specific verb (scan, evaluate) and resource (group). It distinguishes from siblings by specifying it's for evaluation before joining, as opposed to search, list, add, join, or preview tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'When to use' section provides clear context: after group_discovery.search and before deciding to join. This guides the agent on workflow placement and alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_discovery_searchAInspect

Search for public groups or channels by topic on Telegram (or other channels). Returns matching groups with title, member count, and whether messages can be previewed.

When to use:

Finding groups related to a topic or niche
Building a list of groups for outreach or monitoring

After searching, use group_discovery.scan to evaluate quality before joining.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results to return (1-50, default 20)
`channel`	Yes	Channel to search on (e.g. 'telegram')
`keywords`	Yes	Search keywords or phrase (e.g. 'crypto trading signals')

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations present, so description carries full burden. It does not disclose behavioral details like rate limits, authentication, scope (only public groups), error handling, or performance characteristics. The description is minimal on behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise with a clear structure: action, output, when to use, next steps. No redundant sentences. Front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters, no output schema, and many sibling tools, the description covers purpose, usage, and return fields adequately. Lacks behavioral details but is otherwise complete for a search tool. Mentioning output fields compensates for missing output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions already clear. The description adds no additional semantics beyond restating the purpose. Baseline 3 appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches for public groups/channels by topic and specifies the return fields (title, member count, previewability). It distinguishes from sibling group_discovery tools by mentioning using scan after search, but could more explicitly differentiate from other search tools like web_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a 'When to use' section with two concrete use cases and suggests using group_discovery.scan after searching. No explicit when-not-to-use or comparison to siblings, but the guidelines are helpful for typical scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

images_generateAInspect

Generates a PNG image from a text prompt using Gemini 2.5 Flash Image. Returns a file_id consumable by messages.send(attachments=[...]) and other file-aware tools. Supports up to 12 reference image file_ids for subject-consistent edits and composition (use file IDs from the [ATTACHMENTS] block, files.search, or search.files). Latency: ~8-10s per image. Output: 1024×1024 PNG.

ParametersJSON Schema

Name	Required	Description	Default
`prompt`	Yes	Text description of the image to generate (3-4000 chars).
`aspect_ratio`	No	Output aspect ratio.	1:1
`reference_file_ids`	No	Optional list of up to 3 file_ids whose images should be used as visual references (for edits, subject consistency, or composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif).

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden for behavioral disclosure. It discloses latency (~8-10s per image) and output dimensions (1024x1024 PNG), which is helpful, but does not mention potential rate limits, content safety, or error handling. For a generative tool, this is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each carrying distinct information: main action and output, reference image support, and performance characteristics. It is front-loaded with the primary purpose and avoids any redundant or wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description covers the core behavior, input constraints (implied by schema), reference image limits, latency, and output format. It could be more explicit about the full response structure (e.g., is there a status? any metadata?) but is otherwise complete for an image generation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters have descriptions. The description adds value beyond the schema by explaining the purpose of reference_file_ids (subject-consistent edits and composition) and the latency implications. However, it does not add new details for prompt or aspect_ratio beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a PNG image from a text prompt using Gemini 2.5 Flash Image, specifies the return format (file_id), and distinguishes from sibling tools like images_search which search for existing images rather than generate new ones.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool (generating images from text prompts, with optional reference images) and how the output integrates with other tools (messages.send). However, it does not explicitly contrast with alternatives like images_search or mention when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

images_searchA

Read-onlyIdempotent

Inspect

Searches images in this workspace by visual content using vector embeddings (Voyage multimodal-3). Pass a text description; returns ranked file_ids with cosine scores and presigned download URLs. Up to 50 results.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max number of results.
`query`	Yes	Text description of what you're looking for (3-4000 chars).
`mime_type`	No	Optional — restrict to a specific image MIME (e.g. "image/png"). Filter is applied after RAG (same caveat as collection_id).
`collection_id`	No	Optional — restrict to images attached to this collection. Filter is applied after RAG, so you may get fewer than `limit` results; pass a larger limit to broaden if needed.
`score_threshold`	No	Minimum cosine similarity (0.0 returns all, higher = stricter).

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description must disclose behavior. It explains the search mechanism, output format, and result limit (up to 50). It implies a read-only operation but does not explicitly state no side effects or permissions required. The disclosure is adequate for a search tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two sentences, 61 words) and front-loaded with key information. Every word adds value; no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and no annotations, the description covers the core functionality, output format, and limit. It omits details like URL expiration or post-processing steps, but the schema descriptions handle parameter specifics. Overall, it is complete enough for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds no new parameter-specific info beyond what the schema already provides (e.g., limit, query, mime_type, collection_id, score_threshold). The mention of 'up to 50 results' and 'vector embeddings' is context about the tool, not parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches images by visual content using vector embeddings, specifies the model (Voyage multimodal-3), and outlines the output (ranked file_ids, cosine scores, presigned download URLs). It is distinct from sibling tools like workspace_search or web_search by focusing on image content search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: pass a text description to find images. It implicitly advises when to use (when needing image search by content) but lacks explicit guidance on when not to use or alternatives. However, the context is sufficient for a typical use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_list_mediaA

Read-onlyIdempotent

Inspect

List photos and Reels on the connected Instagram Business/Creator account. Returns id, caption, media_type, permalink, thumbnail_url, timestamp.

ParametersJSON Schema

Name	Required	Description	Default
`after`	No	Pagination cursor from a previous call's next_cursor.
`limit`	No	Page size, 1-50. Default 25.

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive behavior. Description adds return field list but no additional behavioral details beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence plus a short list of return fields; no extraneous content, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a list tool without output schema, description adequately covers returned fields and account scope. No additional information needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions ('after', 'limit'). Description does not add further meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List photos and Reels on the connected Instagram Business/Creator account' with specific verb and resource, and distinguishes from sibling tools like instagram_publish_media and instagram_update_media.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear context for when to use the tool (listing media on Instagram) but does not explicitly exclude alternative tools or specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_publish_mediaAInspect

Publish a photo (IMAGE) or video (REELS) from workspace files to a connected Instagram Business/Creator account. Returns media_id + permalink. Instagram allows ~25 publishes per day.

ParametersJSON Schema

Name	Required	Description	Default
`caption`	No	Post caption (max 2200 chars). OMIT to publish without caption.
`file_id`	Yes	Workspace files.id of the photo or video to publish.
`media_type`	No	'auto' (default, detects from mime), 'image', or 'reels'.	auto
`location_id`	No	Facebook Place ID for location tag.
`share_to_feed`	No	For Reels: also show on profile grid (default true).

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (readOnlyHint false, destructiveHint false), so the description carries the burden. It adds that the tool returns media_id and permalink and notes the daily limit, but omits failure scenarios, permission requirements (connected account), or what happens on rate limit exceedance. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences with no redundant information. The first sentence states purpose and returns, the second adds a key constraint. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters and no output schema, the description covers basic purpose and a rate limit, but lacks details on prerequisites (connected Instagram account), error handling, or behavior when limits are hit. With sparse annotations, more context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to elaborate on parameters. It confirms the source as 'workspace files' but adds no additional meaning beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool publishes a photo or video to Instagram, specifies the source (workspace files), and lists return values (media_id, permalink). It distinguishes between IMAGE and REELS, and the sibling tools (instagram_list_media, instagram_update_media) show it is distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a daily rate limit ('~25 publishes per day'), which provides some usage context, but it does not specify when to use this tool versus alternatives like instagram_list_media or instagram_update_media. No when-not-to-use or prerequisite guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_update_mediaAInspect

Update the caption of a published Instagram photo or Reel. Only caption is editable after publish (Instagram limitation).

ParametersJSON Schema

Name	Required	Description	Default
`caption`	Yes	New caption (max 2200 chars).
`media_id`	Yes	Instagram media ID (from list_media or thread metadata).

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a non-read, non-destructive write operation. The description adds valuable context: 'Only caption is editable after publish (Instagram limitation)', disclosing a key behavioral trait beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no unnecessary words, front-loaded with the primary action. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but the operation is simple. The description covers the essential behavioral constraint. Could mention error conditions (e.g., if media not published), but overall sufficient for the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description adds 'from list_media or thread metadata' for media_id, but the schema already provides sufficient meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Update the caption' and the resource 'published Instagram photo or Reel', distinguishing it from other tools like instagram_publish_media. It mentions the Instagram limitation, adding specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (after publish, for caption updates) and notes the Instagram limitation. It lacks explicit when-not or alternative suggestions, but the context of sibling tools provides some differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_completeAInspect

Mark the job as completed. This sanitizes PII from the context and records a completion summary. Use when all tasks in the job are done.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to complete
`summary`	No	Brief summary of what was accomplished

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries the full burden of behavioral disclosure. It mentions sanitizing PII and recording a summary, which are key side effects. However, it does not specify whether the action is reversible, any authorization requirements, or rate limits, leaving gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences totaling 19 words. The first sentence immediately states the core function and key side effects. The second gives a usage condition. No filler or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with 2 optional parameters and no output schema. The description covers purpose and when to use, but lacks information on output/return value, implications of missing required fields (job_id may be logically required despite optional in schema), and integration with related tools (e.g., `job_read_context`). Given no annotations, the description could be more thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage for both parameters (`job_id` and `summary`), so baseline is 3. The description adds minimal extra meaning (e.g., 'completion summary' relates to `summary`), but does not provide details beyond what the schema already states.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (mark job completed) and resource (job), and adds specific details about sanitizing PII and recording a summary. However, it does not explicitly differentiate from sibling tools like `agents_task_complete` or `job_escalate`, which may cause confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('Use when all tasks in the job are done'), providing clear context. However, it does not mention when NOT to use it or list alternative tools such as `job_escalate` for partial completion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_escalateAInspect

Escalate the job to a human. Use when you cannot resolve an issue, someone is not responding, or a situation requires human judgment.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to escalate
`reason`	Yes	Why escalation is needed

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses the action of escalating to a human, but lacks details on side effects (e.g., job state changes, notifications). With no annotations provided, more transparency would be beneficial, but the description is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff, front-loaded with the action. Every word adds value, making it highly concise and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides enough context for a simple escalation tool, explaining when to use it. Lacks details on return value or confirmation, but that is acceptable given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers both parameters with clear descriptions, and the description adds high-level context for the 'reason' parameter. Baseline 3 is appropriate given full schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Escalate' and the resource 'job', and distinguishes from sibling tools like job_complete and job_update_context by indicating a specific action for human involvement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use the tool (cannot resolve, no response, human judgment needed). Does not mention alternatives or when not to use, but the context is clear for a simple escalation action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_read_contextAInspect

Read the current job context. Returns the full state of your active job including assignments, escalations, and any data you previously stored.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to read

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Describes return content but does not clarify behavior when job_id is omitted (optional per schema) or authentication needs. Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action, no redundant information. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains return value (full state). Missing detail on optional job_id behavior if omitted, but overall sufficient for a simple read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameter with description 'The ID of the job to read'. Description adds no additional meaning beyond schema, so baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'Read' and resource 'job context', specifies return value (full state including assignments, escalations, stored data). Distinguishes from siblings like job_update_context and job_complete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage for reading context but lacks explicit guidance on when to use vs alternatives, or prerequisites like job activation. No mention of when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_update_contextAInspect

Update the job context by merging new data. Existing keys are preserved unless explicitly overwritten. Use this to record progress, update assignment statuses, or store intermediate results.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	The ID of the job to update
`updates`	Yes	Key-value pairs to merge into job context

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the merge behavior: 'Existing keys are preserved unless explicitly overwritten.' This is a key behavioral trait beyond the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, each adding value. The first sentence states the action and merge behavior; the second gives usage examples. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple merge tool with no output schema and no annotations, the description covers the essential purpose and behavior. It could mention that it's a mutation (implied) or error conditions, but it's adequately complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for both parameters. The description adds the merge semantics, explaining how the 'updates' object interacts with existing context, which goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Update the job context by merging new data.' The verb 'update' and resource 'job context' are explicit. Distinguishes from siblings like job_read_context and job_complete by focusing on context updates.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases: 'record progress, update assignment statuses, or store intermediate results.' While it doesn't explicitly state when not to use it or mention alternatives, the context is clear enough for most usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_find_entityA

Read-onlyIdempotent

Inspect

Find an entity by name in the Knowledge Graph.

USE WHEN user mentions a person, project, company by name and you need:

To resolve a name to entity_id for subsequent queries
'Кто работает над X?' → find X first
'Расскажи про Y' → find Y first

RETURNS entity_id for use in kg.get_relationships or kg.explore. ALWAYS use this as the FIRST step in KG query chains.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Entity name to search for. Can be in any language (Russian, English, etc.) - transliteration is automatic.
`limit`	No	Maximum results to return (1-10). Default: 5
`entity_type`	No	Filter by entity type: - 'person': People, contacts - 'project': Projects, tasks - 'organization': Companies, teams - 'event': Meetings, deadlines - 'topic': Discussion topics - 'workspace': User's own facts (my/our company) OMIT to include all entity types.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Despite no annotations, the description implies a read-only lookup. It adds useful behavioral details like automatic transliteration and language support, but does not explicitly state the operation is non-destructive or provide error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: three short, front-loaded paragraphs. Every sentence serves a purpose—purpose, usage guidelines, return value, and chaining instruction. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is simple; description covers name resolution, return value (entity_id), and chaining behavior. Lacks details on pagination or error handling, but is sufficient given the tool's straightforward nature and the absence of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description does not add significant value beyond what is already in the schema. It repeats the language flexibility for name but does not further clarify parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Find an entity by name in the Knowledge Graph' and explicitly positions it as the first step in KG queries, distinguishing it from siblings like kg_get_relationships and knowledge_query.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use scenarios (e.g., user mentions a person/project/company), example queries with translation, and directs to use entity_id in subsequent tools. States 'ALWAYS use this as the FIRST step.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_get_relationshipsA

Read-onlyIdempotent

Inspect

Get relationships for a specific entity from Knowledge Graph.

USE WHEN:

'Кто работает над X?' - filter by works_on
'С кем общался Y?' - filter by discussed_with
'Кто из компании Z?' - filter by member_of
'Что связано с W?' - no filter, get all

REQUIRES: entity_id from previous kg.find_entity step. Use: {{step_N.entity_id}} where N is the find_entity step number.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum relationships to return (1-50). Default: 20
`direction`	No	Relationship direction: - 'outgoing': Entity → Others - 'incoming': Others → Entity - 'both': All relationships (default)	both
`entity_id`	Yes	Entity ID from kg.find_entity step. Use {{step_N.entity_id}} reference.
`relation_types`	No	Filter by relationship types (optional): People: works_on, works_for, member_of, manages, knows, client_of, provides_service Communication: discussed_with, participated_in, mentioned_in Org/Project: developed_by, funded_by, partnered_with, integrates_with, depends_on, part_of Document: issued_by, issued_to, signed_by, authored_by Other: uses, located_in, about, follows, owns, related_to

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full burden. It implies a read-only query but does not explicitly state that it does not modify data. The parameter descriptions (limit, direction, relation_types) add some behavioral context, but overall transparency is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and well-structured with clear headings (USE WHEN, REQUIRES). Every sentence adds value, and the format is easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 4 parameters and no output schema, the description covers usage context, parameter semantics, and prerequisites. It does not describe the return format, but this is not critical for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds significant semantic context beyond the schema, especially for relation_types by linking them to user intents (e.g., 'Кто работает над X?' -> works_on). It also reinforces the entity_id requirement with a concrete reference syntax.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get relationships for a specific entity from Knowledge Graph' and provides specific use case examples with Russian phrases and filter types. It effectively distinguishes itself from sibling tools like kg_find_entity by focusing on relationships retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'USE WHEN' section with concrete scenarios and explicit requirement for entity_id from kg.find_entity. However, it does not mention when not to use this tool or provide alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

knowledge_queryA

Read-onlyIdempotent

Inspect

Answer questions using knowledge base (uploaded documents, handbooks, files).

Use for QUESTIONS that need an answer synthesized from documents or messages. Returns an evidence pack with source citations, KG entities, and extracted numbers.

Modes:

'auto' (default): Smart routing — works for most questions
'rag': Semantic search across documents & messages
'entity': Entity-centric queries (e.g., 'Tell me about [entity]')
'relationship': Two-entity queries (e.g., 'How is [entity A] related to [entity B]?')

Examples:

'What did we discuss about the budget?' → knowledge.query
'Tell me about [entity]' → knowledge.query mode=entity
'How is [A] related to [B]?' → knowledge.query mode=relationship

NOT for finding/listing files, threads, or links — use search.files / search.threads / search.links for that.

ParametersJSON Schema

Name	Required	Description
`date_to`	No	Filter messages until this date (ISO format: YYYY-MM-DD).
`file_ids`	No	Specific file IDs to search within (for pinned files)
`question`	Yes	The question to answer from user's knowledge base. Required even for entity queries.
`date_from`	No	Filter messages from this date (ISO format: YYYY-MM-DD). Use for time-based queries like 'this week', 'last month'.
`thread_id`	No	Limit search to a specific thread/chat
`max_sources`	No	Maximum number of sources to consider (1-10)
`needs_aggregation`	No	True if query asks for totals/sums/counts.
`include_relationships`	No	Include KG relationships in answer (default: true for entity mode)

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Describes return format: 'evidence pack with source citations, KG entities, and extracted numbers.' Explains modes and their behavior. Does not disclose any potentially destructive actions, but being a query tool, that is expected.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: purpose, usage, return info, modes, examples, exclusions. Front-loaded with key information. No redundant sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (15 parameters, no output schema), the description provides sufficient context including examples, mode explanations, and an explicit exclusion. It enables an agent to correctly select and invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for each parameter. The description adds value by explaining modes with examples and providing context on when to use each parameter, such as date_from for time-based queries.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Answer questions using knowledge base'. It distinguishes itself from the sibling workspace.search by explicitly stating 'NOT for finding/listing files, threads, or links — use workspace.search for that.'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage guidance: 'Use for QUESTIONS that need an answer synthesized from documents or messages.' Also specifies when not to use and directs to an alternative: 'NOT for finding/listing files, threads, or links — use workspace.search for that.' Examples illustrate appropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_add_commentAInspect

Add a comment to a LinkedIn post. Use post_id from search results or thread data.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Comment text to post
`post_id`	Yes	LinkedIn post/activity ID (from search results or thread metadata)

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It only states 'Add a comment', which implies a write operation, but does not mention side effects, rate limits, authentication needs, or error conditions. The lack of behavioral context is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of two concise sentences, front-loaded with the purpose. Every word adds value, with no unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with two simple parameters and no output schema, the description is mostly complete. It explains the action and provides a source for the ID. It could mention potential failure or return behavior, but the overall context is adequate for a straightforward operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters have descriptive schemas (100% coverage). The description adds context for post_id by specifying its source, but does not elaborate on text constraints or formatting. This adds marginal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Add' and the resource 'comment to a LinkedIn post'. It is distinct from sibling tools, which include other LinkedIn actions but no other comment-adding tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a hint on where to obtain the post_id ('from search results or thread data'), which aids usage. However, it does not explicitly state when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_companyA

Read-onlyIdempotent

Inspect

Get a LinkedIn company profile by company ID or vanity name. Returns company name, description, industry, size, and other details.

ParametersJSON Schema

Name	Required	Description	Default
`identifier`	Yes	Company ID or vanity name

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full burden. It only lists returned fields but does not disclose any behavioral traits such as authentication requirements, rate limits, or what happens if the identifier is invalid. The description is minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first states the purpose and method, the second lists return fields. No extraneous words, front-loaded, and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get tool with one parameter and no output schema, the description is adequate but leaves 'other details' vague. It does not specify whether all company details are returned or only a subset.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'identifier' is described in the schema as 'Company ID or vanity name', and the tool description repeats this. Since schema coverage is 100%, the description adds no new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Get' and resource 'company profile', clearly identifies the two ways to specify the company (ID or vanity name), and distinguishes it from sibling tools like linkedin_get_profile.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use it (when company details are needed) but provides no explicit guidance on alternatives or when not to use it. For a simple tool this is acceptable but not exemplary.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_profileA

Read-onlyIdempotent

Inspect

Get a LinkedIn user profile by ID, public identifier (vanity name), or profile URL. Returns name, headline, location, and other profile information.

ParametersJSON Schema

Name	Required	Description	Default
`identifier`	Yes	LinkedIn member ID, public identifier (vanity name), or full profile URL

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility for behavioral transparency. It indicates a read operation (Get a profile) and lists returned fields, but does not disclose potential rate limits, authentication requirements, or the scope of data (e.g., public vs. private profiles). This is a moderate gap for a typical API tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, directly stating the purpose and key return fields. No extraneous information or repetition. Efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool (one parameter, no output schema), the description adequately covers what the tool does and what it returns. However, it could mention edge cases (e.g., profile not found) or the format of the return data more explicitly. Overall, it is sufficiently complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema's description for 'identifier' already covers the same information as the tool description (ID, vanity name, URL). Since schema coverage is 100%, the description adds no new semantic value beyond what the schema provides. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a LinkedIn user profile using an ID, public identifier, or URL, and lists key returned fields. It distinguishes itself from siblings like linkedin_get_company or linkedin_search by focusing on a single user profile.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains how to identify the profile (by ID, vanity name, or URL) but does not provide guidance on when to use this tool over alternatives like linkedin_search, which might also find profiles. No explicit 'when not to use' or prerequisites are mentioned, but for a simple get operation it is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_inviteAInspect

Send a connection invitation to a LinkedIn user. Optionally include a personalized message (max 300 characters). Rate limited: LinkedIn allows 80-100 invitations per day, max 200 per week.

ParametersJSON Schema

Name	Required	Description	Default
`message`	No	Optional personalized invitation message (max 300 characters)
`provider_id`	Yes	LinkedIn provider ID of the person to invite (from search results or profile)

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses rate limiting behavior but lacks details on error handling, success confirmation, or permission requirements. The message length constraint is already in the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences cover purpose, parameter option, and rate limit with no redundant information. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, no output schema), the description is largely complete. Adding details on error handling or rate limit exceedance would improve, but not required.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. The description adds context for provider_id (from search results or profile) and reiterates message max length. While it does not significantly expand on the schema, the provider_id source is helpful.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Send a connection invitation') and the resource ('to a LinkedIn user'). It is distinct from sibling LinkedIn tools such as linkedin_add_comment or linkedin_get_profile.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit rate limit guidance (80-100 per day, max 200 per week) to inform usage. However, does not specify when not to use the tool or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_connectionsB

Read-onlyIdempotent

Inspect

List your LinkedIn connections, sorted by most recently added.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum connections to return
`cursor`	No	Pagination cursor from previous response

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It only mentions sorting by most recently added, omitting behavioral traits like authentication requirements, rate limits, or that it is a read-only operation. More detail is needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is concise and front-loaded with the action and resource. Every word is functional with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with two parameters and no output schema, the description is adequate but misses context like pagination behavior or the fact that results are paginated. It covers the basics but lacks completeness for optimal usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds no additional meaning beyond the schema's parameter descriptions. It does not elaborate on how 'limit' or 'cursor' affect results.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists LinkedIn connections, sorted by most recently added. It uses a specific verb and resource, and it distinguishes itself from sibling tools like linkedin_get_profile or linkedin_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives, nor does it mention any exclusions or prerequisites. For a simple list tool, some context about pagination or typical use cases would help.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_invitations_sentA

Read-onlyIdempotent

Inspect

List your pending sent connection invitations on LinkedIn.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum invitations to return
`cursor`	No	Pagination cursor from previous response

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must carry disclosure. It states the action is 'list' (read), but does not detail pagination behavior, rate limits, or that only pending sent invitations are returned (implied but not explicit). Adequate but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no filler. Efficiently communicates the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with 2 parameters and no output schema, the description provides the essential purpose. Could mention response format or that it only shows pending sent, but it says 'pending sent' already. Minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters (limit max 100, cursor for pagination). Description adds no extra meaning beyond schema, so baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'List' and resource 'pending sent connection invitations' with scope 'on LinkedIn'. Distinguishes from sibling tools like linkedin_invite (send) and linkedin_list_connections (accepted).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. No mention of when not to use it or what other tools cover related actions like accepted connections.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_reactionsA

Read-onlyIdempotent

Inspect

List all reactions (likes, celebrates, etc.) on a specific LinkedIn post.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum reactions to return
`post_id`	Yes	LinkedIn post/activity ID

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It lists reactions but does not disclose behavioral traits like authentication requirements, rate limits, or handling of invalid post IDs. Minimal behavioral insight.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded and concise. No wasted words; every part serves the purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 parameters and no output schema, the description is adequate. It could mention pagination or default limit, but the schema covers limit details. Overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with both parameters having descriptions. The description adds no additional meaning beyond what the schema already provides, so baseline of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (list all reactions), the resource (specific LinkedIn post), and scope (reactions like likes, celebrates). Distinguishes from sibling tools like linkedin_add_comment or linkedin_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies when to use (to get reactions on a post) but does not explicitly state when not to use or mention alternative tools. No usage guidance beyond the basic purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_raw_requestA

Read-onlyIdempotent

Inspect

Send an arbitrary LinkedIn API request via Unipile's magic route. Only GET and POST methods are allowed. WARNING: This bypasses structured rate limiting and can perform destructive actions. Use this only when no other LinkedIn tool covers the needed functionality.

ParametersJSON Schema

Name	Required	Description	Default
`body`	No	Request body (for POST requests)
`method`	No	HTTP method (only GET and POST allowed)	GET
`request_url`	Yes	Target LinkedIn API endpoint URL
`query_params`	No	URL query parameters

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description carries full burden. It warns about bypassing structured rate limiting and performing destructive actions, which is critical behavioral disclosure. However, it does not detail response format or success/failure behavior, but the key risks are covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose, second gives usage guidance and warnings. Front-loaded, no fluff, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters and no output schema, the description covers purpose, allowed methods, and destructive risk. It does not explain authentication or response format, but for a raw request tool, these are often implied. Sufficient for most usage scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description reinforces allowed methods and adds warnings, but the schema already describes body, method, request_url, and query_params adequately. Description adds marginal value over schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Send an arbitrary LinkedIn API request via Unipile's magic route', identifying the verb and resource. It distinguishes from siblings by warning to use only when no other LinkedIn tool covers the needed functionality, and sibling list includes many specific LinkedIn tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this only when no other LinkedIn tool covers the needed functionality' and restricts methods to GET and POST. This provides clear context for when to use, though no explicit exclusion list for when-not-to-use beyond the generic warning.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_searchA

Read-onlyIdempotent

Inspect

Search LinkedIn for people, companies, jobs, or posts. Supports filtering by keywords, location, industry, network distance, and more. Use linkedin.search_filters first to resolve filter keywords to LinkedIn parameter IDs.

ParametersJSON Schema

Name	Required	Description	Default
`api`	No	LinkedIn product to search with	classic
`url`	No	Direct LinkedIn search URL (alternative to keyword/filter search)
`role`	No	Role/title filter
`limit`	No	Maximum results to return
`category`	No	What to search for	people
`industry`	No	Industry filter IDs
`keywords`	No	Search keywords
`location`	No	Location filter IDs (use linkedin.search_filters to resolve)
`has_job_offers`	No	Filter for people with job offers
`network_distance`	No	Connection degree: F=1st, S=2nd, O=3rd+

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must convey behavioral traits. It implies a read-only search operation but does not explicitly state non-destructiveness, authentication needs, or rate limits. The description is adequate but could be more explicit about side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences covering purpose, filtering, and prerequisite. Every sentence earns its place with no redundancy. The description is front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with 10 parameters and no output schema, the description adequately covers the core functionality and a key usage hint. However, it omits details like pagination behavior or result format, which would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds value by explaining that filter parameters (industry, location) should use IDs from linkedin.search_filters. This provides meaningful context beyond the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Search LinkedIn for people, companies, jobs, or posts' with a specific verb and resource. It lists filtering capabilities and distinguishes itself from sibling tools like linkedin_get_profile by being the general search entry point.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises using linkedin.search_filters first to resolve filter keywords, providing a clear prerequisite. While it doesn't list alternative tools for when not to use it, the context of sibling tools makes the purpose distinct.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_search_filtersA

Read-onlyIdempotent

Inspect

Get LinkedIn search filter parameter IDs. LinkedIn uses internal IDs instead of text for search filters (location, industry, etc.). Call this before linkedin.search to resolve filter keywords to their LinkedIn parameter IDs.

ParametersJSON Schema

Name	Required	Description
`type`	Yes	Filter category to resolve (e.g. LOCATION, INDUSTRY, SKILL)
`limit`	No	Max results per filter category
`keywords`	Yes	Keywords to resolve to parameter IDs (e.g. 'Thailand' for LOCATION)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It describes the tool's behavior (resolve filter keywords to IDs) but does not mention potential errors, rate limits, or side effects. Adequate but could add more context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that are front-loaded with key information. Every sentence adds value without extraneous detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with 3 params and no output schema, the description is fairly complete. It explains the 'why' and 'when', though it could mention the format of results (list of parameter IDs). Still, sufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 3 params with descriptions (100% coverage). The description adds context about the purpose (resolving to parameter IDs) but does not add meaning beyond the schema descriptions. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: getting LinkedIn search filter parameter IDs. It explains why (LinkedIn uses internal IDs) and when to use it (before linkedin.search). This distinguishes it from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('before linkedin.search') and what it does. It does not provide when-not or alternatives, but given its specific helper role, the usage is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_update_profileAInspect

Update the authenticated user's own LinkedIn profile. Supports adding/editing experience entries (role, company, skills, dates). Also supports updating location. Headline, summary, education are NOT supported by the API.

ParametersJSON Schema

Name	Required	Description	Default
`location`	No	Location to set on profile (requires LinkedIn location ID)
`experience`	No	Add or edit a professional experience entry

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description carries full burden. It accurately describes adding/editing experience and updating location, but does not disclose potential side effects (e.g., whether existing experience is replaced or appended) or authentication details beyond 'authenticated user'. Acceptable but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states the main action, second lists supported and unsupported operations. Front-loaded and every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate complexity (nested objects, no required params), the description covers primary use cases and limitations. Lacks details on error handling or partial update behavior, but sufficient for most agents.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds meaning by explaining 'omit id to add new' and 'include id to edit', and clarifies that location requires a LinkedIn location ID obtainable via search_filters. This aids agent understanding beyond schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Update the authenticated user's own LinkedIn profile' and lists specific supported operations (experience, location) and unsupported fields (headline, summary, education). This distinguishes it from sibling tools like linkedin_get_profile and linkedin_add_comment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly indicates when to use (updating experience/location) and what is not supported. It also references linkedin.search_filters for location ID lookup. No explicit when-not to use alternatives, but the scope is well-defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

meet_present_tabA

Read-onlyIdempotent

Inspect

Make the agent's browser tab visible to everyone in the Meet as a real screen-share. Pass the page_id you got from browser.open. Only usable while the agent is in an active Meet. The presented tab stays the active share until you call meet.present_tab with a different page_id, close the tab via browser.close, or the Meet ends.

ParametersJSON Schema

Name	Required	Description	Default
`page_id`	Yes	page_id returned by browser.open for the tab you want to present. Must be a tab still open in the agent's browser context.

Tool Definition Quality

A3.7/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, but the description describes a mutating operation ('make visible as a real screen-share'), a direct contradiction. This undermines trust in behavioral claims.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two focused sentences: first defines core purpose, second details usage constraints and lifecycle. No redundant wording.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no output schema, description covers key use conditions and behavior. However, the annotation contradiction indicates missing or misleading context, reducing completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers page_id with description. Description adds valuable context: origin from browser.open and requirement that tab is still open. Slightly exceeds schema baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states the action 'Make the agent's browser tab visible' with the specific resource 'tab in Meet'. Clearly distinguishes from sibling tools like calls_send_to_meet by detailing the tab presentation mechanism.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear conditions: requires page_id from browser.open, only usable in active Meet, and notes when presentation stops (different page_id, tab close, Meet end). Does not explicitly list alternatives but gives sufficient context for appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_deleteA

DestructiveIdempotent

Inspect

Delete a message from a thread. Supports Telegram, WhatsApp, and other connected channels. Note: Some channels have time limits on message deletion.

ParametersJSON Schema

Name	Required	Description	Default
`thread_id`	Yes	Thread/channel ID containing the message
`message_id`	Yes	ID of the message to delete

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description only states that the tool deletes a message and notes time limits. It does not disclose permissions, irreversibility, error handling, or return behavior, leaving significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences: the first states the core action, and the second adds a relevant note. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple deletion tool with no output schema, the description covers the essential purpose and a key constraint (time limits). It does not mention permissions or error cases, but given the tool's simplicity, it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with self-explanatory parameter descriptions. The tool description does not add any additional meaning or context to the parameters beyond what is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete a message from a thread') and specifies supported channels (Telegram, WhatsApp, etc.), making the tool's purpose immediately obvious and distinct from siblings like messages_send or messages_read_history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a practical usage note about time limits on deletion, which guides when the tool can be used, but does not explicitly state when to use it over alternatives or exclude certain contexts. The note is helpful but not exhaustive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_forwardAInspect

Forward a message from one thread to another. Supports native Telegram forwarding (preserves original sender attribution) and text-based forwarding for cross-channel scenarios.

ParametersJSON Schema

Name	Required	Description
`dest_thread_id`	No	Destination thread to forward into. Provide at least one of dest_thread_id or recipient_name. To forward into the active conversation, pass the current thread_id. (If both are provided, dest_thread_id wins and recipient_name is ignored.)
`recipient_name`	No	Name of person to forward to (channel auto-resolved). Provide at least one of dest_thread_id or recipient_name. Use only when forwarding to a different contact than the current conversation.
`source_thread_id`	Yes	Thread containing the message to forward (e.g., 'telegram:123456' or numeric DB ID)
`source_message_id`	Yes	ID of the message to forward

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided. The description adds behavioral context about preserving sender attribution in native forwarding and cross-channel capability, but does not disclose other traits like whether the original message is affected, auth requirements, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, front-loaded with the core verb and resource, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main functionality but lacks details on return values or side effects, especially since there is no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover all parameters (100% coverage), so baseline is 3. The description does not add new meaning beyond the schema, only summarizes the overall behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool forwards a message between threads, with two modes (native and text-based), distinguishing it from sending new messages or reading history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool vs alternatives, nor when to choose native vs text-based forwarding. Some guidance is implicit in the schema (dest_thread_id vs recipient_name), but the description itself lacks explicit usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_read_historyA

Read-onlyIdempotent

Inspect

Read messages from a conversation thread. Use text_contains to find specific messages by content. Returns the most recent messages, including sender info and timestamps.

Voice calls: each row carries a meta object with allowlisted keys (event_type ∈ 'call_started'|'call_ended'|null, source ∈ 'voice_transcript'|null, call_id, speaker_display_name, duration_seconds, outcome, direction) plus per-message channel. To find calls without scanning every row, use calls.list_history instead.

Usage:

Get thread_id from threads.list first, OR
Use contact_name to auto-resolve thread_id

Examples:

User: 'show me messages from chat with [contact]' → read_history(contact_name='[contact]', limit=10)
User: 'last 5 messages from thread 571' → read_history(thread_id=571, limit=5)

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of messages to return (default: 10, max: 100)
`offset`	No	Number of messages to skip (for pagination, default: 0)
`thread_id`	No	Thread ID to read messages from (e.g., '571' or 'telegram:571'). Optional if contact_name provided.
`contact_name`	No	Contact/thread name to search for (optional if thread_id provided). Example: 'Jane Smith', 'John Doe'
`text_contains`	No	Filter: only return messages containing this text (case-insensitive substring match)
`include_outgoing`	No	Include messages sent by you (default: true)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes that it returns recent messages with sender info and timestamps, supports filtering and pagination. No annotations, so description carries full burden; lacks mention of error handling or read-only nature but is sufficiently transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with usage steps and examples, every sentence adds value, and it is appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, no output schema, and alternative identification methods, the description covers usage, filtering, pagination, and examples. Omits error cases but is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description adds value by explaining the interplay of thread_id and contact_name, usage patterns, and filtering capabilities beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads messages from a conversation thread, distinguishes from siblings like messages_send and messages_delete, and mentions an alternative approach using text_contains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit steps (get thread_id from threads.list or use contact_name) and examples, but does not explicitly state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_sendAInspect

Send a message to a thread, channel, or contact. Supports Telegram, Email, LinkedIn, and other connected channels. For LinkedIn posts (comment_thread kind), this posts a comment on the post. Can automatically resolve recipients and channels when not specified. Can send files/images/documents as attachments — pass attachments=[file_id, ...] with integer file IDs obtained from collections.list_files, search.files, or files.search. text is optional when attachments are provided.

ParametersJSON Schema

Name	Required	Description	Default
`text`	No	Message text to send. Optional if attachments provided.
`format`	No	Message format	text
`silent`	No	Send without notification
`thread_id`	No	Target thread. OMIT to reply in the same chat you received the triggering message from — the backend defaults to the current thread. Pass an explicit value ONLY to reply in a DIFFERENT thread, and only use: (a) a numeric DB thread id from search.threads, or (b) a channel_ref like 'telegram:-12345'. NEVER use a chat-type word (dm, group, channel, livechat) — those are category labels from the SITUATION block, not ids.
`attachments`	No	Array of integer file IDs to send as attachments (images, documents, any files). Get file IDs from collections.list_files (field `file_id`), search.files (field `file_id`), or files.search. Example: [302237]. The file must already exist in the workspace (status=ready) — no separate upload step needed. When attachments are provided, `text` becomes optional (a caption can be included alongside).
`recipient_name`	No	Name of person to send to (e.g., 'Jane', 'John'). Tool will auto-resolve channel. Optional if thread_id provided.
`reply_to_message_id`	No	ID of message to reply to (optional)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description discloses key behaviors: automatic recipient/channel resolution, attachment handling (file IDs, requirement of ready workspace file), optional text with attachments, and specific LinkedIn comment behavior. It also warns against using category labels for thread_id. This is thorough, though it does not mention auth requirements or destructive potential (likely non-destructive).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single well-structured paragraph that front-loads the main purpose, then covers channel specifics, attachment handling, and thread_id guidance. Every sentence adds value without redundancy. It is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 13 parameters and no output schema, the description covers the core functionality and parameter nuances well. It explains automatic resolution, attachment requisites, and thread_id semantics. However, it does not mention the return value (e.g., message ID) which would be helpful for follow-up actions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage, but the description adds significant value beyond schema descriptions. For thread_id, it gives explicit instructions on when to omit vs. pass values and which values are valid. For attachments, it explains how to obtain file IDs. For recipient_username, it notes the Telegram-only requirement. These details raise the score above baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Send a message to a thread, channel, or contact' and specifies supported channels (Telegram, Email, LinkedIn, etc.). It also distinguishes special behavior for LinkedIn posts (comment_thread kind). This differentiates it from siblings like messages_forward or messages_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool, including automatic recipient resolution and details on thread_id usage ('OMIT to reply in the same chat', 'Pass an explicit value ONLY to reply in a DIFFERENT thread'). However, it does not explicitly compare to alternative message tools or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_send_emailAInspect

Compose and send an email — with subject, CC/BCC, and attachments. Use for email; for chat messages (Telegram/WhatsApp/livechat) use messages.send instead.

ParametersJSON Schema

Name	Required	Description
`cc`	No	Email addresses to CC. OMIT to skip.
`bcc`	No	Email addresses to BCC. OMIT to skip.
`text`	No	Email body.
`subject`	No	Email subject line. Required for new emails; for replies it auto-generates 'Re: ...' when omitted.
`attachments`	No	Array of integer file IDs to attach.
`recipient_email`	No	Recipient email address (e.g. 'john@example.com'). Provide to start a new email thread; OMIT to reply in the current email thread.

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=false and destructiveHint=false, so the description does not need to repeat these. The description adds no behavioral details beyond the schema (e.g., auto-generation of subject for replies is in schema, not description). It is consistent and adequate, but does not go beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with the core action and key features, then provides critical differentiation from sibling tool. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 6-parameter tool with full schema coverage and no output schema, the description provides sufficient context for typical email composition. It does not explain return values, but for a send operation, the success/failure is often implied. The differentiation from sibling tools adds completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so all parameters are documented in the schema itself. The tool description mentions subject, CC/BCC, and attachments, but adds no new semantic meaning beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it composes and sends an email with subject, CC/BCC, and attachments. It distinguishes itself from the sibling tool 'messages_send' for chat messages, which is explicit and specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use for email; for chat messages (Telegram/WhatsApp/livechat) use messages.send instead.' This provides clear when-to-use and when-not-to-use guidance with a named alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_deleteB

DestructiveIdempotent

Inspect

Delete a note by ID from the target notebook. Same identity rules as notes.save — agents can only delete from their own notebook.

ParametersJSON Schema

Name	Required	Description	Default
`note_id`	Yes	ID of the note to delete
`target_agent_id`	No	Target notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks.

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses identity rules but omits side effects, irreversibility, or error behavior (e.g., what if note_id not found).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no redundancy; the key information is front-loaded and each sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, so description should cover return behavior. It does not mention what happens on success or failure, leaving gaps for a deletion operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds context about identity rules but does not enhance parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete a note by ID from the target notebook') and distinguishes it from siblings like notes_save and notes_search by focusing on deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions identity rules and ownership constraints but does not explicitly contrast with other note tools or provide when-to-use vs alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_recallA

Read-onlyIdempotent

Inspect

Recall notes from your notebook. By default returns only your own notes (all scopes, newest first). Pass filter_agent_id= to read another agent's notebook, or filter_agent_id="all" (or "*") to read across every agent in the workspace. Pass scope to narrow to global/thread/person. Each result includes agent_id and agent_name of the author.

ParametersJSON Schema

Name	Required	Description
`key`	No	Recall a specific note by key
`limit`	No	Max notes (default 20, max 50). Newest first.
`scope`	No	Optional filter: global \| thread \| person. Omit for all scopes.
`scope_ref_id`	No	Filter by specific thread_id or person_id
`filter_agent_id`	No	Optional. Omit to read only your own notes. Pass a numeric agent_id as a string (e.g. "57") to read another agent's notebook (read-only). Pass "all" or "*" to read across all agents in the workspace.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries full burden. It discloses default behavior, filter agent_id semantics, and result contents (agent_id, agent_name). It does not explicitly state it is read-only, but the description implies no modification.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with main purpose, and efficiently covers key details without extraneous information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, so the description should cover return structure. While it mentions each result includes agent_id and agent_name, it does not specify other fields like key, content, or timestamps. The schema parameter descriptions fill some gaps, but overall completeness is moderate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds value by explaining default behavior and result structure beyond schema descriptions. It clarifies filter_agent_id usage and includes example values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool recalls notes from the notebook. It specifies default behavior (own notes, all scopes, newest first) and distinguishes from siblings like notes_search by focusing on recall by key and filtering by agent/scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: when to use defaults, how to read another agent's notebook, and how to filter by scope. It does not explicitly list alternatives but implies which sibling tools to use for other operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_saveAInspect

Save a fact or note into the agent's memory. Use scope to choose visibility: 'workspace' = visible to every agent in this workspace (use for shared facts, project conventions); 'agent' = private to this agent (use for personal working notes); 'thread' = scoped to one conversation (use for thread-specific reminders); 'person' = scoped to one contact (use for per-contact context). If a note with the same key+scope exists it will be updated. Do NOT use this tool for behavioral rules or corrections — use feedback.save for those.

ParametersJSON Schema

Name	Required	Description
`key`	Yes	Short identifier for this note (must not start with '__' — reserved)
`scope`	Yes	Scope of the note. 'workspace' = shared across all agents; 'agent' = private to this agent (was 'global' pre-PR1); 'thread' = per-conversation; 'person' = per-contact. 'global' is accepted as a deprecation alias for 'agent'.
`value`	Yes	The note content
`pinned`	No	Pin this note so it's always loaded first. Default false.
`scope_ref_id`	No	Reference ID — thread_id (for scope=thread) or person_id (for scope=person). Required for thread/person scope. In MCP mode (no thread context), must be passed explicitly.
`target_agent_id`	No	Target notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks. Ignored when scope='workspace' (workspace memory is shared).
`expires_in_hours`	No	Auto-delete after N hours. Omit for permanent notes.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It discloses mode-specific behavior, update semantics, and scope organization. However, it doesn't mention error conditions or return values (no output schema).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with 4 sentences, each adding value. It is front-loaded with the main purpose and uses no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters and no output schema, the description covers key behaviors and mode differences. It could mention persistence guarantee or limits, but overall it is sufficient for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the schema: update behavior on duplicate key+scope, scope usage guidance, and mode-specific target_agent_id requirement. This complements the 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool saves a note to a notebook, differentiating it from siblings like notes_delete, notes_recall, and notes_search. It specifies the action and resource, and mentions update behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context (agent vs MCP mode, scope organization) but does not explicitly contrast with sibling tools. It implies when to use this tool for creating/updating notes, but could be more direct.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_searchA

Read-onlyIdempotent

Inspect

Full-text search in your notebook. By default searches only your own notes. Pass filter_agent_id= to search another agent's notebook, or "all" (or "*") for workspace-wide. Or list all notes for a person/thread by scope_ref_id.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results (default 10, max 50)
`query`	No	Text to search for in note keys and values. Optional if scope_ref_id is provided.
`scope`	No	Limit search to scope
`scope_ref_id`	No	Filter by specific thread_id or person_id. If provided without query, lists all notes for that ref.
`filter_agent_id`	No	Optional. Omit to search only your own notes. Pass a numeric agent_id as a string (e.g. "57") to search another agent's notebook (read-only). Pass "all" or "*" to search across all agents in the workspace.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses default scope, read-only nature when searching other agents' notes, and the ability to list notes without a query. This adequately informs the agent of behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, each carrying essential information. No wasted words. Front-loaded with purpose, then expands on customization options.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no output schema), the description covers all needed aspects: purpose, usage, parameter details, and edge cases. The agent can correctly invoke the tool without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining defaults, acceptable values, and interactions between parameters (e.g., query optional with scope_ref_id, filter_agent_id accepts 'all' or '*').

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs full-text search in notebooks, with specific filtering capabilities. It distinguishes from sibling tools like notes_recall, notes_delete, notes_save by focusing on search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains default behavior (own notes), how to search other agents' notebooks or workspace-wide, and how to list all notes for a person/thread. It provides concrete usage examples with parameters, making it clear when to use each option.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_getA

Read-onlyIdempotent

Inspect

Get full content of a prompt template: system instructions (prompt_text) and auto-reply rules.

Run prompts.list first to find the prompt_id.

ParametersJSON Schema

Name	Required	Description	Default
`prompt_id`	Yes	ID of the prompt template to fetch

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided; the description relies on the word 'Get' to imply a read operation but does not explicitly confirm idempotency, permissions, or error handling. For a simple retrieval, this is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey the tool's purpose and a key prerequisite. Every sentence adds value with no redundancy or unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers the essential aspects: what is retrieved and how to get the ID. It lacks details on invalid IDs or response structure, but these are common and expected.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter prompt_id is fully described in the schema (100% coverage). The description adds the usage hint about prompts.list but does not elaborate on parameter constraints or formatting beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves full content of a prompt template, including system instructions and auto-reply rules. It distinguishes from siblings like prompts_list (which only lists) and prompts_update (which modifies).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises to run prompts.list first to obtain the prompt_id, providing a clear prerequisite. However, it does not discuss when to avoid using this tool or compare alternatives like prompts_get vs prompts_get_history.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_listA

Read-onlyIdempotent

Inspect

List all prompt templates in this workspace.

Returns id + name + description + category so you know which prompt_id to use in prompts.get or prompts.update.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations given, so description carries full burden. It discloses the tool lists all prompts and returns specific fields. Lacks mention of potential performance or pagination, but for a simple list tool this is acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no waste, front-loaded with action and resource, then explains relevance. Each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters, no output schema, and simplicity of the tool, the description fully covers what an agent needs: what it lists and how to use the result. No significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 0 parameters with 100% coverage. Baseline 3 applies since description adds no parameter info, but none is needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'List' and the resource 'all prompt templates in this workspace', and specifies the returned fields (id, name, description, category) which distinguish it from sibling tools like prompts_get that retrieve a single prompt.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly connects to subsequent tool use: 'so you know which prompt_id to use in prompts.get or prompts.update'. While it doesn't mention when not to use, the context is clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_historyA

Read-onlyIdempotent

Inspect

List past versions of a prompt template's prompt_text. Every edit is snapshotted to an append-only table — use this to browse history and find a version_number for prompts.prompt_restore.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max versions to return (1-200, default 50)
`prompt_id`	Yes	ID of the prompt template
`before_version`	No	Cursor: return versions strictly below this version_number

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the append-only nature of the table, indicating no deletions. The 'List' verb implies read-only behavior, but no explicit readOnlyHint is given. Overall adequate transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. The action is front-loaded, and the context about snapshots and restore linkage is included efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, linkage to restore tool, and high-level behavior. Given no output schema, it misses describing the return format or ordering. The cursor parameter (before_version) is described in schema but not repeated in description. Still, it is reasonably complete for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description adds no additional meaning beyond what is in the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List past versions'), specifies the resource ('prompt_text of a prompt template'), and distinguishes itself from sibling tools by mentioning the version_number is for use with prompts.prompt_restore.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly guides the agent to use this tool to browse history and find a version_number for the restore tool. It implicitly distinguishes from siblings, but could be more explicit about not using this for the current prompt text (use prompts_get).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_restoreAInspect

Restore a past version of a prompt template by version_number. Creates a new version pointing at the restored content — history is preserved. Fans out to every agent using this template without a per-agent override; the response includes affected_agents as a receipt of the fan-out.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Optional: why this restore is happening (shows up in history UI)
`prompt_id`	Yes	ID of the prompt template
`version_number`	Yes	The version_number to restore (get it from prompts.prompt_history)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully covers behavioral traits: it creates a new version, preserves history, and fans out to all agents. It also mentions the response includes affected_agents. This is comprehensive for a tool without annotations, though it could detail permissions or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, each serving a clear purpose: action, side effect, and additional detail. It is concise and front-loaded with the key purpose, though it could be slightly more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description mentions that the response includes affected_agents, which is useful. It also explains the fan-out effect. This covers most necessary context for a restore operation, though it omits details on error handling or rate limits.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All three parameters have descriptions in the schema (100% coverage). The description adds only minor context (e.g., 'get version_number from prompts.prompt_history'), which is helpful but does not significantly enhance parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool restores a past version of a prompt template by version_number. It distinguishes itself from siblings like prompts_prompt_history (which lists history) and prompts_update (which modifies current version) by emphasizing historical preservation and fan-out behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use the tool—when restoring a version that affects all agents using the template. However, it does not explicitly state when not to use it or mention alternative tools, so it loses a point for missing exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_updateAInspect

Update a prompt template's name, system instructions, or auto-reply rules.

Changes affect every agent using this template, unless the agent has its own override (set via agents.update → prompt_text).

All parameters except prompt_id are optional — only provided fields are updated.

ParametersJSON Schema

Name	Required	Description
`name`	No	New name for the prompt template
`prompt_id`	Yes	ID of the prompt template to update
`description`	No	New description for the prompt template
`prompt_text`	No	The AI system prompt: persona, tone, rules, behavior.
`auto_reply_rules`	No	Pre-classifier rules that run BEFORE the main AI. Format: bullet list of conditions → actions (SKIP / SIMPLE_REPLY / SEARCH / CALENDAR). Pass null to clear.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses that changes affect every agent using the template unless overridden, and that only provided fields are updated. This covers major side effects, though immediate effect or caching is not mentioned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with front-loaded purpose. Each sentence adds value: purpose, global impact, optionality. No waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate complexity, the description covers purpose, side effects, and optionality. Could mention error handling or validation, but it is sufficiently complete for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description reiterates optionality and maps high-level fields to parameters but adds no new semantic details beyond the schema's own descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates a prompt template and lists specific fields (name, system instructions, auto-reply rules). It distinguishes from agents.update by mentioning agent-specific overrides, differentiating it from siblings like agents_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states the tool is for updating a prompt template and mentions the alternative agents.update for per-agent overrides. However, it does not explicitly say when not to use this tool, but the context is clear enough for an AI agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_cancelBInspect

Cancel an active reminder by its trigger ID.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	No	Agent ID (required when calling from MCP; ignored in agentic mode).
`trigger_id`	Yes	ID of the reminder to cancel

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description does not disclose behavioral traits such as whether cancelling an already cancelled or non-existent reminder is idempotent, what side effects occur, or if it requires specific permissions. The term 'active' implies it only works on active reminders, but this is not elaborated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is concise and front-loaded with the key action and required parameter. Every word is necessary, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple cancellation tool with only two parameters and no output schema, the description covers the core functionality. It could mention return values or error conditions, but it is reasonably complete given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the schema itself sufficiently describes the parameters. The description adds no additional meaning beyond what is in the schema, thus baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (cancel) and the resource (reminder) with the required identifier (trigger ID). It distinguishes this tool from sibling tools like reminder_set and reminder_list, though it does not explicitly differentiate from them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus alternatives. It does not mention prerequisites (e.g., obtaining the trigger ID from listing reminders) or scenarios where cancellation might not be appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_listA

Read-onlyIdempotent

Inspect

List your active reminders (both one-time and recurring).

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results (default 20)
`agent_id`	No	Agent ID (required when calling from MCP; ignored in agentic mode).
`thread_id`	No	Filter by thread
`include_fired`	No	Include already-fired one-time reminders (default false)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but description indicates this is a read-only listing operation (non-destructive). The word 'list' implies safety, though more explicit mention of being safe/idempotent would be better.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, front-loaded sentence that efficiently conveys purpose with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with no output schema, the description adequately indicates return is a list of active reminders. However, lacks mention of ordering or pagination, which are useful but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, so baseline is 3. Description adds no additional meaning beyond the schema parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'list', resource 'reminders', and scope 'active (both one-time and recurring)'. It differentiates from sibling tools reminder_set and reminder_cancel by specifying list operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage from description, but no explicit when-to-use, when-not-to-use, or alternatives are mentioned. Agent must infer from sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_setBInspect

Schedule a reminder. One-time reminders fire at a specific datetime. Recurring reminders fire on a schedule (daily, weekly, every N days, or every N minutes). Optionally scope to a thread or target another agent.

ParametersJSON Schema

Name	Required	Description
`time`	No	Time of day HH:MM for daily/weekly/every_n_days (e.g. '09:00'). Required for daily/weekly/every_n_days.
`reason`	Yes	What this reminder is for (you'll see this when it fires)
`agent_id`	No	Agent ID (required when calling from MCP; ignored in agentic mode).
`datetime`	No	ISO datetime for one_time (e.g. '2026-04-01T09:00:00+03:00'). Required for one_time.
`timezone`	No	IANA timezone (e.g. 'Europe/Moscow'). Defaults to UTC.
`thread_id`	No	Optional thread ID to scope the reminder to. Omit for workspace-level reminders.
`days_of_week`	No	Days for weekly: 0=Mon, 1=Tue, 2=Wed, 3=Thu, 4=Fri, 5=Sat, 6=Sun. Required for weekly.
`interval_days`	No	For every_n_days: fire every N days (min 2).
`schedule_type`	Yes	one_time = fires once at datetime. daily = fires daily at time. weekly = fires on specific days_of_week at time. every_n_days = fires every N days at time. interval = fires every N minutes.
`interval_minutes`	No	For interval: fire every N minutes (5-1440).
`target_agent_slug`	No	Optional: activate a different staff member instead of yourself when the reminder fires.

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavior. It covers basic scheduling but omits side effects, permissions, or confirmation of success. The description is too minimal for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the purpose, and contains no redundant information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters and no output schema, the description adequately covers schedule types and optional scoping. It could mention default timezone behavior, but the schema descriptions cover the rest. Overall complete for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds no new parameter information beyond summarizing schedule types and optional scoping, which is already in parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool schedules reminders and distinguishes one-time from recurring types. It does not explicitly differentiate from sibling tools reminder_cancel and reminder_list, which would elevate the score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like reminder_cancel or reminder_list. It does not specify prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_filesA

Read-onlyIdempotent

Inspect

Search files and attachments across the workspace — by content, filename, document type, or origin. For message content use search.messages; for links use search.links.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum files to return.
`query`	No	What to search for (content or filename).
`file_origin`	No	File origin: 'generated' (created by tools), 'received' (from messages), 'uploaded' (manual). Use 'generated' for files the user created/sent. OMIT to include all origins.
`document_type`	No	Filter by document category. OMIT unless the user explicitly mentions one — picking a value narrows the search and is a common cause of zero-result mistakes.
`attachment_name`	No	Exact filename filter. OMIT to skip (do NOT pass an empty string).

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description's 'Search' aligns. It adds context by specifying search dimensions (content, filename, etc.) but doesn't mention pagination or limits beyond schema. Some added value but modest.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose, second differentiates siblings. No redundant words, front-loaded with core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with 5 fully documented parameters and clear sibling differentiation, the description is complete. No output schema needed for context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters have schema descriptions with detailed guidance (e.g., when to omit document_type), so the description adds no further semantic value. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches files and attachments by content, filename, document type, or origin. It explicitly distinguishes from sibling tools by directing message content searches to search.messages and link searches to search.links.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use and when-not-to-use guidance: use for files/attachments, not for messages or links. It lists criteria for searching, aiding appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_linksA

Read-onlyIdempotent

Inspect

Search links/URLs shared across the workspace — by type, owner, or associated contact. For files use search.files; for message content use search.messages.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum links to return.
`owner`	No	Link owner: 'self' (user's own) or 'contact' (from others). OMIT to include links regardless of owner.
`query`	No	What to search for in shared links.
`link_kind`	No	Filter links by type. OMIT to include all kinds — picking a value narrows the search and is a common cause of zero-result mistakes.
`contact_hint`	No	Name hint to filter links for a specific contact. OMIT to skip.

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the agent knows this is a safe, non-destructive read. The description does not add further behavioral context like permissions or rate limits, but the existing annotations suffice.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: one for purpose, one for sibling disambiguation. No wasted words, front-loaded with critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with no output schema, the description adequately covers purpose and parameter usage. It could briefly mention return format, but the input schema and annotations handle most context needs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters well. The description does not add extra meaning beyond what's in the schema, but the schema's note on link_kind provides helpful caution.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches links/URLs across the workspace by type, owner, or contact. It explicitly distinguishes from sibling tools like search.files and search.messages, leaving no ambiguity about its purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description tells the agent when to use this tool (searching links) and when to use alternatives (search.files, search.messages). The input schema for link_kind provides additional guidance to avoid zero-result mistakes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_messagesA

Read-onlyIdempotent

Inspect

Search message content across all chats — semantic + keyword. Use to find what was said: quotes, topics, info exchanged. For chats/threads themselves use search.threads; for files use search.files; for links use search.links.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum messages to return.
`query`	No	What to search for in message content.
`date_to`	No	ISO8601 date (YYYY-MM-DD) upper bound. OMIT to skip.
`date_from`	No	ISO8601 date (YYYY-MM-DD) lower bound. OMIT to skip.
`participant_name`	No	Filter to messages involving this participant/contact name. OMIT to search across everyone (do NOT pass an empty string).

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint true, and destructiveHint false. The description adds that search is 'semantic + keyword', providing extra behavioral context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences: first defines purpose, second gives use cases, third differentiates siblings. No unnecessary words; front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with full schema documentation and annotations, the description covers purpose, usage context, and alternatives. No gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add any parameter-specific information beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches message content across all chats using semantic and keyword search, and explicitly distinguishes from sibling tools (search.threads, search.files, search.links).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides when to use (find quotes, topics, info) and when not (use siblings for threads, files, links), offering clear alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_threadsA

Read-onlyIdempotent

Inspect

Find or list chat threads/conversations — by topic, participant, unread/unanswered status, or recency. Omit query to list threads by filter. For message content use search.messages; for files use search.files. since filters by recency and pairs with only_unread / only_unanswered.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum threads to return.
`query`	No	Topic/keyword to search threads for. OMIT to list threads by filter.
`since`	No	ISO date (YYYY-MM-DD). Only threads with any message activity since this date (recency filter, not 'unanswered'). OMIT to skip.
`only_unread`	No	Limit to threads with unread messages. OMIT to include read threads.
`only_unanswered`	No	Limit to threads where the last message is incoming (you haven't replied). Covers 'threads I haven't replied to'. OMIT to include answered threads too.
`participant_name`	No	Filter to threads with this participant/contact. OMIT to include everyone (do NOT pass an empty string).

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and non-destructive behavior. The description adds useful behavioral context (e.g., omitting query for listing, pairing of parameters) without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that are front-loaded with purpose and immediately provide usage guidance and alternatives. Every sentence is impactful with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters and no output schema, the description is largely complete. It covers filtering modes and alternatives but lacks mention of sorting or default ordering. Still, it provides adequate context for a search/list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description still adds value by explaining parameter usage beyond schema definitions, such as the 'since' filter pairing and warning about empty string for 'participant_name'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Find or list' and resource 'chat threads/conversations', and specifies filtering by topic, participant, status, or recency. It also distinguishes from sibling tools like search.messages and search.files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to omit 'query' to list by filter, and directs to alternative tools for message content and files. It also explains how 'since' pairs with 'only_unread' and 'only_unanswered'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

system_sleepA

Read-onlyIdempotent

Inspect

Pause execution for a given number of seconds (max 30). Use when you need to wait for an external process to complete before retrying — e.g. message sync, backfill, or API propagation. Total sleep per run is capped at 60 seconds.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	Why you are waiting (logged for debugging)
`seconds`	Yes	Number of seconds to sleep (1-30)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It mentions a cap on total sleep per run (60 seconds) and per-call max (30 seconds), but does not describe error behavior, idempotency, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise with two sentences. Information is front-loaded: action, limit, use case. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the description covers purpose, usage, and key constraints. Lacks return value or error info, but sleep operations are straightforward.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds minimal value beyond schema: 'reason' for debugging, 'seconds' range implied by max 30. Could add more detail on parameter constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Pause execution') and resource ('execution') and clearly states the action. It distinguishes itself from sibling tools by focusing on timing delays.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'wait for an external process to complete before retrying' with examples like message sync and API propagation. Lacks explicit when-not-to-use, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_createCInspect

Create a new task in your to-do list.

ParametersJSON Schema

Name	Required	Description
`title`	Yes	Task title
`due_at`	No	ISO datetime when task is due (e.g. '2026-03-31T15:00:00')
`agent_id`	No	Agent ID whose tasks to access. Required when calling from MCP.
`due_date`	No	Date when task is due (e.g. '2026-03-31'). Use with due_time or alone.
`due_time`	No	Time when task is due (e.g. '15:00'). Used with due_date.
`priority`	No	Task priority (default: medium)
`thread_id`	No	Related thread ID
`description`	No	Detailed description
`assigned_to_contact_id`	No	Contact ID if assigned to someone

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description does not disclose any behavioral traits beyond the obvious 'create' action. With no annotations, the agent is left uninformed about side effects, required permissions, or what happens on success/failure. The description adds no value beyond the tool's name.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single clear sentence with no wasted words. It is front-loaded and immediately understandable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite good schema coverage, the description lacks any mention of return values or output behavior. With no output schema, the agent would benefit from knowing what the tool returns (e.g., the created task object). The minimal description does not fill this gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Since the input schema covers 100% of parameters with descriptions, the description does not need to add more. Baseline 3 applies because the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('create') and the resource ('task in your to-do list'), providing a clear purpose. However, it does not differentiate this tool from sibling tools like tasks_update or tasks_delete, missing an opportunity to clarify its specific role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., when to create vs. update), nor are any prerequisites or exclusions mentioned. The description assumes the agent knows when creation is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_deleteBInspect

Delete a task from your to-do list by its ID.

ParametersJSON Schema

Name	Required	Description	Default
`task_id`	Yes	ID of the task to delete
`agent_id`	No	Agent ID whose task to delete. Required when calling from MCP.

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits. It only says 'delete' without specifying permanence, required permissions, or side effects. For a destructive action, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, front-loaded sentence with no extraneous words. Efficiently conveys the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is present, so the description should clarify return behavior (e.g., success confirmation) and the required agent_id for MCP calls. Both are missing, leaving the agent uninformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds no extra meaning beyond what the schema provides, placing it at baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and the resource ('a task from your to-do list'), and includes the identifier ('by its ID'). It distinguishes from siblings like tasks_create and tasks_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use delete vs update, nor any prerequisites or conditions. The description does not mention that agent_id is required when calling from MCP, which is important context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_listA

Read-onlyIdempotent

Inspect

List your tasks, or another agent's tasks (read-only) using from_agent_id. Use filters to narrow results.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results (default 20)
`status`	No
`overdue`	No	If true, only return tasks past due_at that are not done
`agent_id`	No	Agent ID whose tasks to list. Required when calling from MCP.
`thread_id`	No	Filter by related thread
`from_agent_id`	No	List tasks of another agent (read-only). Omit to list your own.
`assigned_to_contact_id`	No	Filter by assigned contact

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It mentions 'read-only' for another agent's tasks but does not clarify whether listing one's own tasks also is read-only or has any side effects. It fails to explicitly confirm non-destructiveness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—one sentence achieving clarity without waste. Every word serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having 7 parameters and no output schema, the description does not explain return format, pagination, ordering, or error conditions. It is incomplete for a tool with this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 86%, so the schema already documents most parameters. The description adds context about from_agent_id and filters, but no new semantic meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists tasks, including the ability to list another agent's tasks (read-only). It effectively distinguishes from sibling tools like tasks_create or tasks_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides guidance on when to use the tool, highlighting the use of from_agent_id for read-only access to other agents' tasks and mentioning filters. However, it does not explicitly state when not to use it or direct to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_updateAInspect

Update an existing task. Set status='done' to complete it, 'cancelled' to cancel. Use summary for completion notes.

ParametersJSON Schema

Name	Required	Description
`due_at`	No	ISO datetime
`status`	No
`summary`	No	Completion note (stored when marking done)
`task_id`	Yes	ID of the task to update
`agent_id`	No	Agent ID whose task to update. Required when calling from MCP.
`priority`	No
`description`	No

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but only mentions updating status and summary. It fails to disclose mutation behavior, permissions required, or whether partial updates overwrite unmentioned fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with front-loaded purpose and actionable details. No redundant or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool has 7 parameters, no output schema, and no annotations. Description is too minimal; it does not explain return values, error handling, or behavior for partial updates.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (57%), and description adds meaning for status and summary (e.g., 'summary for completion notes'), but ignores due_at, priority, description, agent_id, and task_id parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update an existing task.' This specific verb+resource combination distinguishes it from sibling tools like tasks_create, tasks_delete, and tasks_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides specific guidance on using status and summary fields (e.g., 'Set status='done' to complete it'), but lacks explicit when-to-use or when-not-to-use instructions compared to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

threads_updateAInspect

✏️ Update a conversation thread: rename it, add notes/description, or move to a folder.

When to use:

User wants to rename a chat or group
User wants to add notes/context about a conversation
User wants to organize threads into folders

For DM threads, renaming also updates the linked contact's display name by default. Requires thread_id from threads.list.

ParametersJSON Schema

Name	Required	Description
`title`	No	New title for the thread (max 255 chars)
`folder_id`	No	Move thread to this folder (null removes from folder)
`thread_id`	Yes	Thread ID from threads.list
`description`	No	AI context / notes for this thread. Empty string clears description.
`update_contact`	No	For DM threads, also rename the linked contact (default: true)

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that DM thread renaming updates linked contact display name by default, and that update_contact controls this. No annotations provided, but description could mention permissions, idempotency, or failure cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with a concise summary and bullet point list for usage. Could be slightly more terse without losing clarity, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main operations, dependencies, and a side effect (DM contact rename). Lacks discussion of error scenarios or permissions, but adequate for a simple update tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description largely repeats schema descriptions. Adds minor context (e.g., max chars for title) but does not significantly extend meaning beyond input schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'update' and resource 'conversation thread', and lists specific actions (rename, add notes/description, move to folder). It distinguishes from sibling tools by detailing DM thread behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a 'When to use' section with three specific scenarios and mentions dependency on threads.list. However, it does not explicitly state when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

videos_generateAInspect

Generate a short video (5-10s) from a text prompt using BytePlus Seedance. Optionally accepts up to 12 image file IDs from the user's attached files (visible in the [ATTACHMENTS] block) as reference_file_ids for style and composition. Returns immediately with a job_id; the video is delivered back via continuation when the job completes (~30-90s for fast model, ~2-5min for pro). Reference images are temporarily re-hosted on a third-party CDN (imgbb) for the duration of generation and deleted on completion — don't submit confidential references. Gated behind a workspace opt-in flag.

ParametersJSON Schema

Name	Required	Description	Default
`seed`	No	Random seed for reproducibility (0-2147483647). Omit for random.
`model`	No	Video model. Recommended: 'wan2.6-i2v-flash' (default, cheap, 720p/1080p, optional audio), 'wan2.6-i2v' (premium, always-on audio), 'wan2.6-t2v' (text-only input, 720p/1080p, no audio), 'wan2.2-i2v-flash' (cheapest, 480p/720p, no audio). Legacy BytePlus: 'seedance-2-fast', 'seedance-2-pro' (720p only).	wan2.6-i2v-flash
`style`	No	Style preset. Seedance models only. OMIT for no style preset.
`prompt`	Yes	Text description of the video to generate (3-4000 chars).
`duration`	No	Output video duration in seconds. Single-clip: 5 or 10. Long-form (chained, i2v models only): 15, 20, 30, 45, or 60. Long-form videos are silent (no audio in v1) and use only reference_file_ids[0] when refs are provided.
`shot_type`	No	Shot mode: 'single' (continuous) or 'multi' (scene cuts). wan2.6-t2v only. OMIT to use the model default.
`resolution`	No	Output resolution. '720p' is the safe default; '1080p' is wan2.6 only; '480p' is wan2.2-i2v-flash only. Per-model support enforced by validation.	720p
`aspect_ratio`	No	Output aspect ratio. Wan supports '16:9', '9:16', '1:1'; Seedance also supports '4:3', '3:4', '21:9'. Per-model support enforced by validation.	16:9
`camera_motion`	No	Camera motion preset. Seedance models only. OMIT for no camera motion.
`generate_audio`	No	Whether the model should produce native audio. For wan2.6-i2v-flash this doubles the per-second rate (e.g., 720p+audio is $0.05/s vs $0.025/s silent) — set False for cheaper silent clips. wan2.6-i2v always produces audio regardless of this flag. wan2.6-t2v / wan2.2-i2v-flash / seedance-2-fast never produce audio.
`negative_prompt`	No	Optional text describing what to AVOID in the output. Honored by Wan and Seedance models.
`reference_file_ids`	No	Optional list of up to 12 image file_ids to use as visual references (style, composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif). Get IDs from the [ATTACHMENTS] block, files.search, or search.files.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavioral traits: asynchronous job with immediate job_id return, estimated completion times for both models, temporary re-hosting of reference images on a third-party CDN with deletion after completion, and a warning about confidential references. It also notes the workspace opt-in flag. This is comprehensive for a generation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but efficiently packs all essential information: purpose, input options, behavior, timing, security notes, and gating. Every sentence adds value, and the most critical information is front-loaded. No word is wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description adequately explains the return mechanism (job_id, continuation). It covers input requirements, parameter behavior, timing, security considerations, and access control. It does not detail error handling or cancellation, but for this tool's complexity, the description is complete enough for an agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds significant value beyond the schema: it clarifies that reference_file_ids must come from attachments, states the file MIME types, explains that generate_audio is ignored for the fast model, and mentions the cost scaling for duration=10. This enriches the agent's understanding of parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: generating a short video from a text prompt using BytePlus Seedance. It specifies the output length (5-10s) and mentions optional image references. The verb 'generate' and resource 'video' are specific, and the inclusion of the service name distinguishes it from generic video tools. Although it does not explicitly contrast with siblings like images_generate, the difference is obvious.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (to generate a video from text) and provides context for optional image references. It mentions the workspace opt-in gate, indicating a prerequisite. However, it does not compare with alternatives or state when not to use it. The guidance is clear but lacks exclusions or alternative tool references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vision_queryA

Read-onlyIdempotent

Inspect

Look at the screen currently being shared in a meeting and answer a question about it. Returns a natural-language answer based on the visual content. Use ONLY when the user explicitly asks about the screen/slide/document being shown.

ParametersJSON Schema

Name	Required	Description	Default
`question`	Yes	Question about the shared screen.
`image_b64`	No	Base64-encoded JPEG image of the screen-share frame.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full weight. It discloses the behavior (returns a natural-language answer based on visual content) and implies a read-only operation. Minor omission: no mention of what happens if no screen is being shared, but the description is still transparent enough for typical use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, with no unnecessary words. Every sentence adds crucial information: what the tool does and when to use it. Highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple query tool with two well-documented parameters and no output schema, the description provides all necessary context. It explains the purpose, usage constraint, and return type (natural-language answer). No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameter descriptions in the schema already cover the meaning. The description adds no extra semantic value beyond what's in the schema's property descriptions; it restates them. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Look at the screen currently being shared in a meeting and answer a question about it.' This specific verb-resource combination distinguishes it from any sibling tools, none of which involve screen sharing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'Use ONLY when the user explicitly asks about the screen/slide/document being shown.' This provides clear usage context without needing to mention alternatives, as no sibling serves a similar purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_fetchAInspect

Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL.

Modes (extract):

'auto' (default): picks the right mode based on response content type.
'markdown': for HTML pages; returns cleaned markdown plus the page .
'text': for JSON/XML/plaintext APIs; returns the raw decoded body.
'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read.

Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn.

Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	URL to fetch (http or https). Must be publicly reachable.
`extract`	No	How to handle the response: 'auto' (default), 'markdown' (HTML → markdown), 'text' (raw body), or 'file' (ingest as binary, return file_id).	auto

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It explains extract modes and their outputs (markdown: cleaned markdown+title; text: raw body; file: returns file_id with usage hints). However, it misses potential error behaviors, timeouts, or limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a one-line summary followed by mode details and usage guidance. Every sentence adds value, though it could be slightly more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately covers output formats for each mode (markdown: content+title, text: raw body, file: file_id) and distinguishes from siblings web_search and files_upload. Lacks error handling details but is sufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. The description adds value by detailing what each extract mode returns (e.g., markdown gives title, text raw body, file provides file_id and usage) beyond the schema's enum descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Fetches') and resource ('a single URL') and immediately distinguishes from sibling tools like web_search and files_upload by specifying when to use this tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (specific URL in mind, after web.search returns a link, or user pastes a URL) and when not to use (use web.search when no specific URL, use web.fetch not files.upload for immediate file_id). Provides clear alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web__local_searchA

Read-onlyIdempotent

Inspect

Multi-source web research with citations. Returns a synthesized answer with numbered [^1] markers and a citations array of {url, title, snippet, index}. Use for evidence-backed synthesis (competitive analysis, regulatory summary, whitepaper section). For quick fact lookups use web.search instead.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Research question. Specific scoped questions outperform vague keywords.
`language`	No	Search language hint (BCP-47, e.g. 'en', 'ru'). Defaults to 'en'. The synthesis output language matches the query language regardless.	en
`num_sources`	No	How many top search results to fetch and synthesize (1-4, default 4). Lower = faster + cheaper, higher = more comprehensive.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully describes the output format (synthesized answer with numbered markers and citations array). It does not mention potential limitations like rate limits or latency, but gives sufficient behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences with no wasted words. The output format and citation mechanism are front-loaded, and usage guidance is immediately clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema, the description explains the return structure sufficiently. Combined with schema, it covers purpose, usage, parameters, and output. No gaps for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds extra guidance: query should be scoped, language synthesis behavior, and num_sources trade-off. This enriches the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs multi-source web research and returns a synthesized answer with citations. It distinguishes from the sibling web.search by noting that web.search is for quick fact lookups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool (evidence-backed synthesis like competitive analysis, regulatory summary, whitepaper section) and when not (quick fact lookups, pointing to web.search).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_searchA

Read-onlyIdempotent

Inspect

Search the web for current information, news, facts, prices, or events. Use this when the user asks about something that requires up-to-date information from the internet, or when internal knowledge base doesn't have the answer. Examples: recent news, stock prices, weather, product information, current events.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Search query - what to search for on the web.
`num_results`	No	Number of results to return (1-10).
`search_type`	No	Type of search: 'search' for general web, 'news' for news articles.	search

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries the full burden. It states the tool searches the web but does not disclose any behavioral traits such as rate limits, authentication requirements, or return format. It is adequate for a simple search but lacks detail on side effects or limitations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main action, includes examples, and is concise with no unnecessary words. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description is reasonably complete for purpose and usage but lacks information about the return format (e.g., titles, URLs, snippets). It could be more helpful by describing what the agent will receive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description only adds value by contextualizing the parameters with examples like 'stock prices' and 'weather'. It does not provide deeper semantics beyond the schema's own descriptions, meeting the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Search the web for current information, news, facts, prices, or events.' It provides specific examples and distinguishes from internal knowledge base, effectively differentiating from siblings like knowledge_query and web_fetch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit when-to-use guidance: when the user needs up-to-date internet info or internal knowledge base falls short. It implies alternatives but does not list when-not-to-use or other sibling tools like web_fetch for fetching a URL.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_createAInspect

Create a new livechat widget for your website.

The widget will be created with default settings. You can customize theme, auto-reply mode, and more.

Use this when user wants to add a chat widget to their site.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	Name for the widget (e.g., 'Website Chat', 'Support Widget')
`position`	No	Widget position on screen	bottom-right
`display_mode`	No	Visual mode of the widget. Pick exactly one: - 'chat' (default): full chat panel + voice mic — use for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget (e.g. 'just a voice button', 'no chat, just call'). - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design' / 'no widget chrome'.	chat
`header_title`	No	Title shown in chat header	Chat with us
`primary_color`	No	Primary color for widget theme (hex, e.g., '#2563eb')	#2563eb
`auto_reply_mode`	No	Auto-reply mode: 'draft' (review before sending) or 'auto' (send immediately)	draft
`voice_button_label`	No	Localized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if omitted.

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description notes that the widget is created with default settings and can be customized, but without annotations, it lacks details on side effects, prerequisites, or failure modes. It provides basic transparency but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three short sentences with no redundancy. Every sentence adds value: purpose, default behavior, and usage context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the purpose and when to use, but does not specify the return value (e.g., widget ID) or error conditions. Given the absence of an output schema, this is a notable gap for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters are fully described in the input schema (100% coverage). The description only adds vague reference to customization (theme, auto-reply mode), which does not significantly augment the schema's semantic information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it creates a new livechat widget, mentions default settings and customization, and distinguishes from sibling tools like widgets_update and widgets_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user wants to add a chat widget to their site,' providing clear context. However, it does not mention when not to use it or alternative tools for updating or deleting widgets.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_deleteA

DestructiveIdempotent

Inspect

Delete a livechat widget permanently.

This will remove the widget and its embed code will stop working. Existing chat history will be preserved.

Use this when user wants to remove a chat widget.

ParametersJSON Schema

Name	Required	Description	Default
`widget_id`	Yes	ID of the widget to delete

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses the permanent nature, effect on embed code, and preservation of chat history. It appropriately informs about the destructive behavior beyond just saying 'delete'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, consequence, usage guidance. Every sentence earns its place, no fluff, front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with one parameter and no output schema, the description covers essential behavioral and usage context. It could mention return values or error conditions, but it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single parameter 'widget_id' described as 'ID of the widget to delete'. The description adds no additional semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and the resource ('livechat widget'), with the word 'permanently' adding specificity. It is distinct from sibling tools like widgets_get or widgets_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user wants to remove a chat widget' and details consequences (embed code stops working, chat history preserved). It does not mention alternatives or when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_getA

Read-onlyIdempotent

Inspect

Get full configuration of a single livechat widget.

Returns all settings including theme, identification, actions, and more.

Use this when user wants to see or verify a specific widget's settings.

ParametersJSON Schema

Name	Required	Description	Default
`widget_id`	Yes	ID of the widget to retrieve

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the burden. It implies a read-only operation by saying 'Get full configuration', but does not explicitly state that no data is modified, nor does it mention any authorization or rate limits. Adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each adding distinct value: action and resource, return content, and usage guidance. No redundant or irrelevant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get tool with one parameter and no output schema, the description adequately explains what it does, what it returns, and when to use it. No gaps are apparent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema provides a description for the single parameter 'widget_id', achieving 100% coverage. The description adds no further meaning beyond the schema, so it meets the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves full configuration of a single widget, with specifics on returned settings (theme, identification, actions). The phrase 'Use this when user wants to see or verify a specific widget's settings' implicitly distinguishes it from sibling tools like widgets_list (list all) and widgets_update (modify).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description concludes with explicit guidance: 'Use this when user wants to see or verify a specific widget's settings.' This provides clear context for when to use the tool, though it does not mention when not to use it or point to specific alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_get_embed_codeA

Read-onlyIdempotent

Inspect

Get the embed code snippet for a livechat widget.

Returns HTML/JavaScript code to add to your website. The code should be placed before the closing tag.

Use this when user wants to install the chat widget on their site.

ParametersJSON Schema

Name	Required	Description	Default
`widget_id`	Yes	ID of the widget to get embed code for

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the return type (HTML/JavaScript) and placement instruction, which is adequate for a read-only tool. However, it does not explicitly state that the operation is read-only or discuss any prerequisites or error conditions, leaving some behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with three sentences: purpose, return type, and usage guidance. No extraneous information, and the key points are front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

There is no output schema, so the description must clarify the return value. It does so by stating 'Returns HTML/JavaScript code' and provides a placement instruction. It could mention more about prerequisites like widget existence, but overall it is sufficient for the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'widget_id', and the schema already describes it as 'ID of the widget to get embed code for'. The description adds no additional semantic meaning beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'get' and the resource 'embed code snippet for a livechat widget', and distinguishes this tool from sibling tools like 'widgets_get' or 'widgets_list' by specifying that it returns embed code for installation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user wants to install the chat widget on their site.', providing clear usage context. It does not list when not to use or alternatives, but the sibling tools include other widget operations, making the purpose distinct enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_listA

Read-onlyIdempotent

Inspect

List all livechat widgets.

Returns widgets with their configuration, embed code, and status.

Use this when user wants to see their widgets or chat widgets.

ParametersJSON Schema

Name	Required	Description	Default
`active_only`	No	Only return active widgets. OMIT to include inactive widgets too.

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description states it returns widgets with configuration, embed code, and status, which indicates the output fields. However, it does not mention read-only nature, auth requirements, or any side effects. The behavioral coverage is adequate but minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences, each serving a distinct purpose: stating action, describing output, and offering usage guidance. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one optional parameter, the description adequately covers the purpose, output, and usage context. It does not mention pagination or sorting, but these are reasonable omissions for such a tool. The presence of sibling widget tools is not addressed, but that is not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the only parameter (active_only). The tool description does not add extra meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all livechat widgets' using a specific verb and resource. It distinguishes this tool from siblings like widgets_get, which retrieves a single widget.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises 'Use this when user wants to see their widgets or chat widgets', which is helpful but does not explicitly exclude cases where alternatives (e.g., widgets_get) are better suited. No when-not-to-use or alternative naming is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_updateAInspect

Update an existing livechat widget configuration.

You can change name, theme, auto-reply mode, and other settings. Only provided fields will be updated.

Use this when user wants to modify their chat widget settings.

ParametersJSON Schema

Name	Required	Description
`name`	No	New name for the widget
`position`	No	Widget position on screen. OMIT to leave the position unchanged.
`is_active`	No	Enable or disable the widget. OMIT to leave the active flag unchanged.
`widget_id`	Yes	ID of the widget to update
`website_url`	No	Website URL for product/site search integration
`calendly_url`	No	Booking URL for calendar action (e.g., 'https://calendly.com/yourname')
`color_scheme`	No	Widget color scheme. 'auto' follows the visitor's OS dark/light mode preference. OMIT to leave the color scheme unchanged.
`display_mode`	No	Visual mode of the widget. Pick exactly one: - 'chat': full chat panel + voice mic — default for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget. - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design'. OMIT to leave the display mode unchanged.
`header_title`	No	Title shown in chat header
`greeting_text`	No	Custom greeting message shown when visitor opens the chat (e.g., 'Hello! How can I help you today?')
`primary_color`	No	Primary color for widget theme (hex, e.g., '#2563eb')
`voice_greeting`	No	Spoken opening line when a visitor starts a voice call through this widget. Played via TTS before the AI model runs. Empty string disables the greeting.
`allowed_domains`	No	List of allowed domains for the widget
`auto_reply_mode`	No	Auto-reply mode: 'draft' or 'auto'. OMIT to leave the auto-reply mode unchanged.
`header_subtitle`	No	Subtitle shown in chat header
`greeting_enabled`	No	Enable or disable the proactive greeting. OMIT to leave this flag unchanged.
`greeting_behavior`	No	notification = show badge after delay; auto_open = open widget automatically after delay; on_open = greet only when visitor manually opens. OMIT to leave the greeting behavior unchanged.
`enable_form_action`	No	Enable or disable the contact form action button. OMIT to leave this flag unchanged.
`voice_button_label`	No	Localized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if not set.
`contact_form_fields`	No	Fields to collect in contact form (e.g., ['name', 'email', 'phone'])
`enable_search_action`	No	Enable or disable the search action button. OMIT to leave this flag unchanged.
`show_visitor_history`	No	Show full chat history to returning visitors. OMIT to leave this flag unchanged.
`identification_fields`	No	Fields to require for visitor identification (e.g., ['name', 'email'])
`enable_calendar_action`	No	Enable or disable the calendar booking action button. OMIT to leave this flag unchanged.
`greeting_delay_seconds`	No	Delay in seconds before the proactive greeting appears (0–300). 0 = send immediately on page load. Default: 30.
`require_identification`	No	Require visitor to identify before chatting. OMIT to leave the identification policy unchanged.
`returning_greeting_text`	No	Greeting for returning visitors who already have chat history (e.g., 'Welcome back! How can I help you today?'). Falls back to greeting_text if not set.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that 'Only provided fields will be updated', indicating a partial update. However, it does not mention other behavioral aspects such as side effects, return value, or prerequisites, making it adequate but limited.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with four sentences, front-loaded with the core purpose and scope. No redundant or irrelevant information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high parameter count (25), the description covers the essential aspects: partial update, usability context, and a brief list of changeable categories. It does not explicitly state the return value, but that can be inferred. Overall, it is fairly complete for a standard update tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds a high-level overview ('name, theme, auto-reply mode, and other settings') but does not provide additional meaning beyond the schema's detailed parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Update' and the resource 'existing livechat widget configuration', and provides examples of settable fields. It distinguishes from siblings like widgets_create, widgets_delete, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user wants to modify their chat widget settings', providing clear context for when to use it. It does not mention when not to use or alternatives, but it is sufficient given the sibling diversity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_currentA

Read-onlyIdempotent

Inspect

Return the workspace this MCP API key is currently routed to, with the caller's role inside it. Use this to confirm context before/after workspace.switch.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Openly describes return values (workspace and role) and implies read-only operation; sufficient given no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences, front-loaded, no extraneous words—maximizes clarity in minimal space.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Fully describes purpose and return content; though no output schema, the text conveys needed info for this simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters, so schema coverage is 100%; baseline of 4 applies as description adds no param info but none needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it returns the current workspace and the caller's role, distinct from sibling tools like workspace_list or workspace_switch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly guides to use for confirming context before/after workspace.switch, providing clear when-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_listA

Read-onlyIdempotent

Inspect

List every workspace the caller is a member of, with is_current marking the workspace this MCP key is currently routed to. Pair with workspace.switch to change the active workspace without reconnecting.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but the description is straightforward: lists workspaces with an `is_current` marker. It lacks disclosure of potential permissions, rate limits, or output structure beyond the mention of `is_current`, which is adequate for a simple list tool with no parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short, front-loaded sentences with no unnecessary text. Every word contributes: purpose, unique feature, and pairing guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and no output schema, the description fully covers what the tool does, the key output field, and how to use it in workflow. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters, so schema coverage is 100%. The description adds value by explaining the `is_current` field and suggesting usage with `workspace.switch`, which goes beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list' and resource 'workspaces', specifies scope 'the caller is a member of', and highlights the unique `is_current` field. It distinguishes itself from siblings like `workspace_search` and `workspace_current`.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly suggests pairing with `workspace.switch` to change active workspace without reconnecting, providing clear guidance on when to use this tool and how it relates to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_switchAInspect

Re-point the active MCP API key to a different workspace. Pass exactly one of workspace_id or slug (find them via workspace.list). Takes effect on the very next tool call — no MCP reconnect, no new API key. Sequential checkpoint: do not parallelize tool calls across a switch — calls already in flight when the switch commits will run against the previous workspace.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	No	Workspace slug to switch to. Resolved within the caller's memberships, so cross-tenant slug collisions are not possible. Mutually exclusive with `workspace_id`.
`workspace_id`	No	Numeric workspace id to switch to. Mutually exclusive with `slug`.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It discloses that the switch takes effect on the next tool call, no reconnect or new API key is needed, and highlights a sequential checkpoint constraint. Slightly lacks explicit statement about error behavior if both parameters are passed, but overall very transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences with no extraneous content. The main purpose is stated first, followed by parameter usage, behavioral effects, and a specific constraint. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, but the description sufficiently explains the effect and a key constraint (sequential checkpoint). Could be improved by noting potential error scenarios (e.g., invalid workspace), but for a simple switch tool it covers the necessary context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers both parameters with descriptions of mutual exclusivity. The description adds value by specifying where to find the parameters (via workspace.list) and reinforcing the mutual exclusivity with 'pass exactly one'. Schema coverage is 100%, so baseline is 3; the added context raises it to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states that the tool 'Re-point the active MCP API key to a different workspace', using a specific verb and resource. It clearly distinguishes from sibling tools like workspace_list (list workspaces) and workspace_current (get current workspace).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage instructions: pass exactly one of workspace_id or slug, and references workspace.list to find them. Also includes explicit when-not-to-use guidance regarding not parallelizing calls across a switch and warns about calls in flight.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_commentA

DestructiveIdempotent

Inspect

Permanently delete a YouTube comment by id (or 'youtube:comment:'). Cannot be undone. Costs 50 quota units.

ParametersJSON Schema

Name	Required	Description	Default
`comment_id`	Yes	Bare commentId OR 'youtube:comment:<id>'.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description effectively discloses destructive behavior ('Permanently delete', 'cannot be undone') and cost (50 quota units).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the core action. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with one parameter and no output schema. Description fully covers purpose, input format, destructive nature, and cost. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with clear description of comment_id format. Description repeats this format and adds cost/irreversibility, but these are not parameter-specific. Meets baseline but adds little extra meaning for the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Permanently delete a YouTube comment by id' with specific verb and resource. Distinguishes from sibling tools like youtube_list_comments or youtube_moderate_comment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context ('cannot be undone'), but lacks explicit guidance on when to use vs alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_videoA

DestructiveIdempotent

Inspect

Permanently delete a YouTube video by id (or 'youtube:video:'). Cannot be undone. Costs 50 quota units. Caller must own the channel.

ParametersJSON Schema

Name	Required	Description	Default
`video_id`	Yes	Bare videoId OR 'youtube:video:<id>'.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully bears the burden. It discloses permanence, quota cost (50 units), and ownership requirement, which is excellent transparency for a destructive operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description comprises two sentences with zero waste—concise yet complete.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple deletion tool with one parameter and no output schema, the description covers purpose, input, side effects, cost, and prerequisites adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the only parameter, and the description reiterates the format. No additional semantic depth needed beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'delete' and resource 'YouTube video', and distinguishes it from sibling tools like youtube_delete_comment or youtube_upload_video. It specifies the input format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes the prerequisite ('Caller must own the channel') and the irrevocable nature ('Cannot be undone'), guiding when to use. It does not explicitly mention alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_commentsB

Read-onlyIdempotent

Inspect

List comment threads on a YouTube video. Pass video_id (e.g. 'dQw4w9WgXcQ') or channel_ref ('youtube:video:'). Returns top-level comments with inline replies.

ParametersJSON Schema

Name	Required	Description
`video_id`	Yes	YouTube videoId — bare 11-char form OR full 'youtube:video:<id>'.
`page_token`	No	Pagination cursor from a previous call's `next_page_token`.
`max_results`	No	Page size, 1-100. Default 25.

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool returns top-level comments with inline replies, but does not specify whether it requires authentication, rate limits, what happens if the video has no comments, or any side effects. The description is minimal on behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise at 27 words across two sentences. It front-loads the purpose and provides a concrete example. Every sentence is necessary and information-dense.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no annotations and no output schema, the description covers the primary purpose and input format. However, it lacks details on pagination behavior, error handling (e.g., invalid video_id), and does not explain the structure of the response beyond 'top-level comments with inline replies'. This is adequate but not complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. The description repeats the video_id format (already in schema) and does not add meaning for page_token or max_results. It adds no additional semantic value beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List comment threads') and the resource ('a YouTube video'). It distinguishes from sibling tools by specifying the function (listing vs deleting, moderating, posting replies). The example video ID adds clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide explicit guidance on when to use this tool versus alternatives like youtube_delete_comment or youtube_moderate_comment. It omits prerequisites, context for read vs write operations, and does not mention that this is a read-only action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_videosA

Read-onlyIdempotent

Inspect

List videos on the connected YouTube channel. Returns id, title, published_at, view_count. Paginate via page_token.

ParametersJSON Schema

Name	Required	Description	Default
`page_token`	No	Pagination cursor returned in a previous call's `next_page_token`. Omit for the first page.
`max_results`	No	Page size, 1-50. Default 25.

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that it lists videos and returns specific fields, which is minimal. It does not mention authentication requirements, rate limits, or behavior when there are no videos. Behavioral traits beyond the basic operation are missing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences, front-loading the action and key details. Every word earns its place, with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 2 parameters, no output schema, and no annotations. The description covers core functionality, pagination, and returned fields adequately. It could mention error handling or auth, but for a simple list operation, it provides sufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters well. The description adds 'Paginate via page_token', which reinforces the schema but does not provide new semantic meaning. The baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list', the resource 'videos on the connected YouTube channel', and the specific fields returned (id, title, published_at, view_count). It distinguishes itself from siblings like youtube_list_comments or youtube_delete_comment by focusing on videos.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions pagination via page_token, providing context on how to iterate results. However, it does not explicitly state when to use this tool versus alternatives (e.g., youtube_list_comments for comments), nor does it provide exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_moderate_commentAInspect

Apply a moderation status to a YouTube comment. Allowed status values: heldForReview, published, rejected, spam. Costs 50 quota units.

ParametersJSON Schema

Name	Required	Description	Default
`status`	Yes	One of: heldForReview, published, rejected, spam.
`comment_id`	Yes	Bare commentId OR 'youtube:comment:<id>'.

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must cover behavioral traits. It mentions quota cost (50 units), but does not disclose if the action is reversible, idempotent, or triggers notifications. Basic transparency but incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the purpose and immediately list allowed values. Every sentence adds value, with no wasted words. Appropriate length for the tool's simplicity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity (2 parameters, no output schema), the description covers the essential aspects: action, allowed values, quota cost, and parameter format. It could mention success response or error conditions, but overall is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers both parameters with descriptions, achieving 100% coverage. The description adds value by specifying the exact allowed values for 'status' and the alternative format for 'comment_id' (Bare commentId or 'youtube:comment:<id>'), which is not in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the action ('apply a moderation status'), the resource ('YouTube comment'), and lists the allowed status values. It clearly distinguishes from sibling tools like youtube_delete_comment or youtube_list_comments by specifying the moderation action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide guidance on when to use this tool versus alternatives (e.g., delete, list). It lacks context on prerequisites, such as needing the comment to exist or required permissions. No explicit when-to-use or when-not-to-use hints.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_post_comment_replyAInspect

Post a comment on a YouTube video, or reply to an existing comment. Pass video_id for a top-level comment, OR parent_comment_id to reply. AI-disclosure suffix appended automatically when configured.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	Comment body. 1-10000 chars. AI-disclosure suffix may be auto-appended.
`video_id`	No	Bare videoId or 'youtube:video:<id>' — for a top-level comment.
`parent_comment_id`	No	Bare commentId or 'youtube:comment:<id>' — for a reply.

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the automatic AI-disclosure suffix, which is a behavioral trait. However, it does not mention whether authentication is required, any rate limits, or what happens on failure. The return value (e.g., comment ID) is not described, but since there is no output schema, this gap is notable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences convey all essential information. The description is front-loaded with the primary action and immediately provides usage instructions. No redundant or vague language.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with only three parameters and no output schema. The description covers purpose, usage, and a behavioral trait. However, missing details such as expected return values (e.g., the created comment ID) or error scenarios leave the description slightly incomplete for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema coverage is 100% and the schema already describes parameters, the description adds meaningful context: it clarifies that video_id is for top-level comments and parent_comment_id for replies, and notes the potential auto-appended suffix on text. This goes beyond the schema's minimal descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Post a comment' or 'reply'), specifies the resource ('YouTube video' or 'existing comment'), and distinguishes between two use cases (top-level comment vs reply). It also differentiates from sibling tools like youtube_delete_comment and youtube_list_comments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly instructs when to use video_id (for a top-level comment) and parent_comment_id (for a reply), providing clear usage context. However, it does not mention alternatives or exclusions (e.g., when not to use this tool), but with distinct sibling actions, this is acceptable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_update_videoAInspect

Update title, description, privacy, or tags on a YouTube video. Costs 1600 quota units. Only fields provided are changed.

ParametersJSON Schema

Name	Required	Description
`tags`	No	New tags list. Omit to keep current.
`title`	No	New title (max 100 chars). Omit to keep current.
`privacy`	No	'private', 'unlisted', or 'public'. Omit to keep current.
`video_id`	Yes	Bare videoId OR 'youtube:video:<id>'.
`description`	No	New description (max 5000 chars). Omit to keep current.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a non-read-only, non-destructive operation. The description adds important behavioral details: the cost of 1600 quota units and the partial update semantics. This goes beyond annotations to help the agent understand resource consumption and the update model.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, both valuable. The first identifies the action and scope, the second adds cost and update behavior. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the core action, fields, cost, and partial update. It does not specify the return value (common for updates), but given the simplicity of the tool and lack of output schema, it is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described. The description reinforces that omitted fields remain unchanged, which adds marginal value beyond the schema. It does not introduce new semantics but confirms behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Update' and the resource 'YouTube video', listing specific fields (title, description, privacy, tags) that can be modified. It distinguishes itself from sibling tools like youtube_delete_video and youtube_upload_video by specifying the update action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear usage guideline: 'Only fields provided are changed,' indicating partial update behavior. It does not explicitly state when not to use this tool or mention alternatives like youtube_delete_video for removal, but the context of sibling tools helps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_upload_videoAInspect

Upload a workspace-owned video file (file_id) to the connected YouTube channel. Returns video_id + thread_id. Costs 1600 quota units. Default privacy is 'private' — pass privacy='public' to publish.

ParametersJSON Schema

Name	Required	Description	Default
`tags`	No	Optional list of tag strings (max ~500 chars total).
`title`	Yes	Video title (max 100 chars).
`file_id`	Yes	Workspace `files.id` of the video to upload. Must be a video/* MIME type and `status='ready'`. Get IDs from the [ATTACHMENTS] block, files.search, or search.files.
`privacy`	No	Privacy status. 'private' (default), 'unlisted', or 'public'.	private
`category_id`	No	YouTube category ID (default '22' = People & Blogs). See https://developers.google.com/youtube/v3/docs/videoCategories/list.	22
`description`	No	Video description (max 5000 chars). OMIT to upload without a description.
`made_for_kids`	No	COPPA flag. OMIT for the standard (non-kids) default.
`channel_account_id`	No	The connected YouTube channel_account.id. OMIT to auto-resolve the workspace's YouTube account.

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses quota cost (1600 units), default privacy, and return values, but lacks details on error behavior, idempotency, or specific prerequisites beyond file_id validity. This is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundancy, front-loaded with action and output. Every word adds value. Extremely concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 8 parameters (3 required) and no output schema or annotations, the description covers core functionality but omits error handling, file constraints, and channel authentication details. More context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds a note about file_id sources and default privacy behavior, but this only reinforces schema info. No significant new parameter semantics beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Upload'), the resource ('workspace-owned video file to connected YouTube channel'), and the output ('Returns video_id + thread_id'). It distinguishes this from sibling tools like youtube_list_comments or youtube_delete_comment by specifying the upload purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context (workspace-owned video, connected channel) but does not explicitly state when to use or when not to use this tool versus alternatives. No sibling tool performs uploads, so alternatives are absent, but guidelines could be more explicit about prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_video_queryA

Read-onlyIdempotent

Inspect

Ask Gemini about a YouTube video. Pass a video URL and any prompt — verbatim transcript with timestamps, summary, targeted Q&A about content or visuals, translation, etc. Works on any public/unlisted video.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	YouTube video URL. Supported forms: youtube.com/watch?v=…, youtu.be/…, youtube.com/shorts/…, m.youtube.com/watch?v=…. Pass-through to Gemini verbatim.
`prompt`	Yes	What to ask Gemini about the video. Examples: 'Provide a verbatim transcript with [HH:MM:SS] timestamps.' / 'What is the main claim made in the first 30 seconds?' / 'Describe what's shown on screen at 0:30.' / 'Translate the spoken Spanish to English.'

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavioral traits. It mentions the tool works on public/unlisted videos and passes URLs verbatim, but lacks details on potential rate limits, response format, or whether it is asynchronous. This is adequate but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—two sentences that front-load the purpose and immediately give actionable examples. Every sentence adds value with no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with two required parameters and no output schema, the description is adequate. It covers the main use cases and video types. However, it does not mention error handling, return value structure, or limitations (e.g., prompt length), leaving some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The tool description provides additional examples for the prompt parameter, but the schema already describes both parameters well. No further semantics are added beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool asks Gemini about a YouTube video, specifying the verb ('Ask Gemini') and resource ('a YouTube video'). It distinguishes itself from sibling tools like other YouTube operations (e.g., youtube_list_comments) and other query tools (e.g., vision_query) by focusing on video content analysis via Gemini.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides examples of prompts (e.g., transcript, summary, Q&A) and notes it works on public/unlisted videos, giving clear usage context. However, it does not explicitly state when not to use this tool or mention alternatives like vision_query for images or knowledge_query for database queries, leaving room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?

Server Details

Available Tools

Pipeline A — content_html (canonical for decks, proposals, designed pages)

Slide structure (page_preset="slide_16_9")

Speaker notes

Images

Pipeline B — content_markdown (invoice / contract only)

Delivery contract (CRITICAL)

Exemplars

Invoice INV-{YYYYMMDD-HHMMSS}

Счёт-фактура № INV-{YYYYMMDD-HHMMSS}

Service Agreement

1. Scope of services

2. Term

3. Compensation

4. Confidentiality

5. Termination

6. Governing law

Договор оказания услуг

1. Предмет договора

2. Срок действия

3. Стоимость и порядок оплаты

4. Конфиденциальность

5. Расторжение

6. Применимое право

Pipeline A — `content_html` (canonical for decks, proposals, designed pages)

Slide structure (`page_preset="slide_16_9"`)

Pipeline B — `content_markdown` (invoice / contract only)