Skip to main content
Glama

Server Details

AI-powered unified inbox with MCP tools for managing conversations, contacts, and knowledge across WhatsApp, Telegram, Instagram, Email, and LinkedIn.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.2/5 across 179 of 179 tools scored. Lowest: 2.8/5.

Server CoherenceA
Disambiguation4/5

With 179 tools, there is some potential for overlap, especially among similar domains like agent management and browser automation. However, detailed descriptions and distinct naming help differentiate most tools. A few tool pairs, such as 'agents_traces_list' and 'agents_trace_get', might cause minor confusion, but overall the boundaries are clear.

Naming Consistency5/5

Tool names consistently follow a 'domain_action' pattern using snake_case (e.g., 'agents_create', 'calendar_list_events'). The only minor deviation is 'web__local_search' with double underscores, but this is still readable. Overall, the naming is highly consistent and predictable.

Tool Count1/5

179 tools is an extreme volume for an MCP server. Typical servers have 3-15 tools; this one has over an order of magnitude more. The sheer number overwhelms the agent's decision space and risks poor selection performance, regardless of how well individual tools are designed.

Completeness5/5

The tool set covers an exceptionally broad range of domains: agents, AI filters/tags, browser automation, calendar, calls, contacts, documents, files, folders, groups, images, Instagram, knowledge, LinkedIn, messages, notes, prompts, reminders, search, tasks, threads, videos, vision, web, widgets, workspace, and YouTube. Most entities have full CRUD operations, and there are debugging and simulation tools. No obvious gaps for the stated purpose.

Available Tools

179 tools
agent_handoffA
Read-onlyIdempotent
Inspect

Delegate a multi-step task (research, composing messages, booking, scheduling) to the full agentic planner. Use when a user ask needs more than a direct answer. The specialist runs synchronously — its response is already shown to the user in real-time. Summarize the OUTCOME in past tense (e.g. 'The Media Creator generated your video' or 'The Document Composer failed because...'). Do NOT say 'I will delegate' — the delegation already happened. If status is timeout or error, explain what went wrong and offer to retry.

ParametersJSON Schema
NameRequiredDescriptionDefault
modeNoExecution mode: 'sync' (wait for result, default) or 'async' (fire and forget, child runs in background). Async is only available in background/trigger context.sync
agent_idNoOptional ID of another agent in the same workspace to delegate the task to. When set, this becomes cross-agent delegation; the target agent runs with ITS OWN prompt, tools, and model. Use this for specialty tasks (see agents.list to discover specialists). Prefer the in-loop variant (no `agent_id`) for one-off escalations. Spawns a new trace linked back to this trace via parent_trace_id (visible in the admin lineage card).
target_slugNoOptional stable slug of a system-template specialist to delegate to (e.g. 'doc-composer' for the Document Composer). Env-portable alternative to agent_id — resolves the workspace's fork of that template (auto-forking on first use). Used by async handoffs that target a specialist without knowing its per-workspace id.
task_descriptionYesPlain-language description of what the planner should accomplish. Include everything the planner needs: the user's goal, constraints, and any context already gathered in this voice call.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains execution modes (sync/async) and outcome reporting, adding context beyond annotations. However, annotations declare readOnlyHint:true and destructiveHint:false, which contradict the tool's likely side effects (performing tasks). This inconsistency is not addressed. Score lowered due to contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise and front-loaded. Every sentence serves a purpose: stating the action, when to use, how to summarize, and error handling. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the annotations and schema, the description covers usage scenarios, behavioral notes (sync/async), and result summarization. It lacks output schema details but that is not required. Could be more explicit about error handling format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to add much. It reiterates the high-level purpose but does not elaborate on parameter nuances beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Delegate a multi-step task...to the full agentic planner.' It provides examples (research, composing) and usage condition ('when a user ask needs more than a direct answer'). It distinguishes from simple direct-answer tools but does not explicitly name sibling tools like agents_ask.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit guidance: 'Use when a user ask needs more than a direct answer.' It also instructs on how to summarize outcomes and handle timeout/error statuses. However, it lacks exclusions or explicit alternatives, which would make it a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_activityA
Read-onlyIdempotent
Inspect

See what you — or another agent in your workspace — actually did over a time window: messages sent, documents created, calls made, plus a summary (run counts, per-day, top tools). Use this to answer 'what did I do today / yesterday / last week / in the last hour?' or 'what did do?' with real data instead of guessing.

Omit agent for your own activity, or pass another workspace agent's name, slug, or id. Pass since/until as ISO datetimes (e.g. '2026-06-03T09:00:00') for sub-day windows like the last hour, or plain dates ('2026-06-03') for whole days — compute them from the current date/time you were given. Defaults to the last 24h. Traces are retained 30 days.

Times are interpreted as UTC — if the current time you were given is in another timezone, convert to UTC before passing since/until.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentNoTarget agent: name, slug, or numeric id. OMIT for yourself.
limitNoMax actions / recent runs to return.
sinceNoWindow start — ISO datetime or date. OMIT for last 24h.
untilNoWindow end — ISO datetime or date (exclusive day-end for a bare date). OMIT for now.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. Description adds that traces are retained 30 days, defaults to last 24h, and timezone handling. No contradictions; additional context is useful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise, front-loaded with purpose, and well-structured. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description explains return contents (messages, documents, calls, summary). Covers defaults, retention, and timezone. Complete for a read-only activity tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description adds practical guidance: how to format since/until (ISO datetime or plain date), how to compute them, and when to omit agent. Adds significant value beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool retrieves activity (messages, documents, calls, summary) for the current agent or another agent. It distinguishes itself from sibling tools like agents_traces_list by focusing on a broader activity summary.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases ('what did I do today?') and parameter usage (omit agent for self, since/until formats). Does not explicitly contrast with alternatives, but context is clear enough for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_add_fileAInspect

Attach a file to this agent's private knowledge (agent-specific files, not shared with other agents).

Workflow:

  1. Upload the file with files_upload (pass source_url for remote files)

  2. Index it with files_ingest (pass the file_id)

  3. Call this tool with agent_id + file_id

Returns chunk_count — shows 0 while still processing. Call agents.list_files later to see the final chunk count once indexing completes.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesfile_id returned by files_upload or files_ingest
agent_idYesID of the agent to attach the file to
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate the tool is not read-only and not destructive. The description goes beyond annotations by explaining the asynchronous nature: 'chunk_count — shows 0 while still processing' and the need to poll via agents.list_files. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: first sentence states purpose, then a clear numbered workflow, then a note about the return value. Every sentence is informative and there is no redundant text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains the return value and its interpretation. It covers the workflow and prerequisites. However, it does not address idempotency (idempotentHint false) or potential errors, which would make it more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with descriptions for both parameters. The description adds value by explaining the workflow connecting file_id to prior steps (files_upload/files_ingest) and the significance of the return value (chunk_count). This additional context justifies a score above the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Attach a file to this agent's private knowledge,' which is a specific verb+resource. It distinguishes itself from sibling tools like agents_remove_file and agents_list_files by emphasizing agent-specific, non-shared files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear three-step workflow (upload, ingest, then call this tool) and advises checking agents.list_files for final status. It implicitly guides when to use this tool (after upload/ingest) but does not explicitly mention when not to use it or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_approve_draftAInspect

Approve a pending agent draft and send the message.

The draft will be sent to the conversation it was generated for. You can optionally edit the text before sending.

Use this when user says:

  • 'Approve this draft'

  • 'Send this reply'

  • 'Approve and send'

  • 'Looks good, send it'

IMPORTANT: This will send a message to a real person.

ParametersJSON Schema
NameRequiredDescriptionDefault
draft_idYesID of the draft to approve
edited_textNoOptional edited response text (if user wants to modify before sending)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-readonly and non-destructive behavior. The description adds critical context: 'This will send a message to a real person.' It also explains the draft is sent to the original conversation and that optional text editing is possible. This goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise and well-structured. It uses bullet points for example user phrases and a clear warning. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description fully explains the tool's effect: approval, sending, optional editing, and the real-person warning. It covers all necessary context for an agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so both parameters (draft_id, edited_text) are described. The description adds context that edited_text is optional and used for modifications before sending, which is helpful but not essential. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Approve a pending agent draft and send the message.' It is specific about the action (approve and send) and the resource (agent draft), distinguishing it from siblings like agents_reject_draft.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit example phrases like 'Approve this draft', 'Send this reply', etc., guiding when to use the tool. It also notes that it sends to a real person, implying caution. It lacks explicit when-not-to-use cases, but the examples are sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_askAInspect

Send a message to an AI agent and get its response.

The agent runs with its configured prompt, tools, and knowledge. Use this to test agents or have them process a task.

Returns: {status: 'replied'|'silent', response_text, messages[], full_reply, model_used, tokens_*, send_mode, execution_mode}. messages[] carries each messages.send invocation the agent made (text, subject, reply_to_message_id, timestamp, message_id, attachments=[{file_id,name,mime}]). full_reply concatenates text only — attachment-only sends show up in messages but not full_reply. status='silent' iff both response_text is empty AND messages is empty.

Execution may take 10-60s depending on agent complexity.

ParametersJSON Schema
NameRequiredDescriptionDefault
messageYesMessage/goal to send to the agent
agent_idYesID of the AI agent to ask
send_modeNoSend mode for the agent run: 'draft' = create drafts, 'auto' = send directly. Defaults to the agent's configured default_send_mode. Does NOT change execution_mode — that is fixed by the agent's config.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds extensive behavioral detail beyond annotations: explains return format (status, response_text, messages[], full_reply, etc.), special status='silent' condition, field contents, and execution time. No contradictions with annotations (readOnlyHint: false aligns with mutation).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and efficient: first sentence states purpose, second gives usage, then detailed return info. Every sentence serves a purpose with no fluff. Front-loaded with key info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description fully compensates by detailing the return structure, including edge cases (silent status, attachment-only sends). It covers execution time, model usage, and field semantics, making the tool behavior completely understandable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds value by explaining send_mode default behavior and its non-effect on execution_mode. This nuance is helpful beyond the schema's basic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Send a message to an AI agent and get its response.' It specifies the resource (agent), verb (send), and context (configured prompt, tools, knowledge). This distinguishes it from siblings like agents_create or agents_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit usage guidance: 'Use this to test agents or have them process a task.' While it doesn't list alternatives or when not to use, the context is clear enough to decide. No exclusions provided, but adequate for most use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_createAInspect

Create a new AI agent in the workspace.

Execution modes:

  • ai_assisted (default, recommended): Two-phase AI — fast pre-classifier (Haiku) for keyword filtering and simple replies, then full AI with tools for complex messages. Best for: auto-replies, group monitoring, keyword-based filtering.

  • agentic: Autonomous multi-step agent with planning and tool execution. Best for: complex scheduled tasks, multi-step automation.

  • rule_based: Simple pattern matching without AI.

For keyword filtering: use ai_assisted mode + set keywords in trigger conditions (free, deterministic) and/or auto_reply_rules (smart, LLM-based) via agents.update.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesName of the AI agent (1-100 characters)
prompt_idNoID of the prompt to assign to this agent
send_modeNoDefault send mode: 'auto' or 'draft'. OMIT to use 'draft' (the default).
descriptionNoOptional description of what this agent does
text_engineNoText-execution engine: 'rule_based', 'ai_assisted', 'agentic' (default), or 'claude_channels'. Voice is derived from triggers, not engine. OMIT to use the default ('agentic').
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a non-read-only, non-destructive operation, which aligns with creation. The description adds context about execution modes and their behaviors, but doesn't disclose potential failure modes (e.g., duplicate names) or required permissions. However, it does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose and well-organized into execution modes. However, the 'Execution modes' section could be slightly more concise, as it repeats some information already in the schema.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description does not specify return values, which is a minor gap. It covers the core functionality and execution modes thoroughly. For a creation tool with many siblings, it provides sufficient context for correct selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage, so baseline is 3. The description adds value by explaining defaults (OMIT for send_mode and text_engine) and detailing the text_engine enum options beyond the schema. However, it does not elaborate on other parameters like name constraints or prompt_id usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Create a new AI agent in the workspace' and differentiates it from siblings like agents_update or agents_delete by focusing solely on creation. The specific verb+resource pair is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use each execution mode (ai_assisted, agentic, rule_based) with concrete 'Best for' examples. It also directs users to agents.update for setting keywords, clarifying boundaries between tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_deleteBInspect

Permanently delete an AI agent.

WARNING: This cannot be undone. The agent and all its triggers will be removed.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the agent to delete
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims permanent deletion and irreversibility, but annotations set destructiveHint=false, contradicting the described behavior. This is a serious inconsistency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two front-loaded sentences with no wasted words: purpose stated first, warning second.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple deletion tool: warns about permanence and trigger removal. Lacks details on prerequisites but sufficient given low complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter agent_id, so the description adds no additional meaning. Baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action 'permanently delete' and the resource 'AI agent'. Distinguishes from siblings like agents_create and agents_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a warning about irreversibility but no explicit guidance on when to use versus alternatives like agents_update or agents_list. Implies usage for deletion but no exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_getA
Read-onlyIdempotent
Inspect

Get detailed information about a specific AI agent.

Returns full agent config including:

  • Execution configuration

  • Tool configuration

  • Knowledge configuration

  • Escalation configuration

  • Triggers list

  • Knowledge collections

  • Custom AI instructions (prompt_text)

  • Auto-reply rules override (auto_reply_rules)

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the AI agent to fetch
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the safety profile is clear. The description adds value by detailing the returned configuration fields, surpassing what annotations alone provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loaded with the main purpose, and uses a clear bullet list for return details. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 'get' tool with one required parameter and no output schema, the description is fairly complete. It details return fields but lacks mention of error handling (e.g., agent not found) or response format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (agent_id) with schema description 'ID of the AI agent to fetch'. Schema coverage is 100%, so baseline 3 is appropriate. The description does not add extra context like format, examples, or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get detailed information about a specific AI agent' and lists specific configuration sections returned, distinguishing it from sibling tools like agents_list (which lists all agents) and agents_create/agents_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool vs alternatives (e.g., agents_list for a summary, agents_get for full details) nor provides prerequisites or exclusions. Usage is implied but not guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_silenceA
Read-onlyIdempotent
Inspect

End this turn without sending any message. Use when the thread is owned by a human operator after job.escalate, when the guest is self-resolving, when the message is a duplicate, or for observation-only turns. Calling this tool is the ONLY correct way to stay silent — narrated silence text (e.g. '(Staying silent…)', 'Internal:…') would be delivered to the guest verbatim.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonYesFree-form explanation for admin audit. Stored in trace_tool_executions.tool_params (ClickHouse String; reason filters are scan-only).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) already indicate safety. The description adds that this is the ONLY correct way to stay silent, explaining behavioral implications of alternatives beyond structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that efficiently convey purpose, use cases, and a warning. No extraneous text; front-loaded with core function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one param, no output schema), the description covers all necessary aspects: purpose, when to use, behavioral uniqueness, and parameter detail. Fully adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the single 'reason' parameter. The description adds value by explaining it's for admin audit, storage location, and filter limitations, enhancing the schema's meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool stops the turn without sending a message, and lists specific scenarios (human operator after job.escalate, guest self-resolving, duplicate, observation-only). It contrasts with narrated silence text which would be delivered verbatim, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases and warns against using narrated silence text. While it doesn't directly compare to sibling tools like agent_handoff or agents_ask, the guidance is sufficient for appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_listA
Read-onlyIdempotent
Inspect

List all AI agents configured in the workspace.

Returns agents with their basic info, trigger count, and knowledge collection count.

Each agent's description field tells you when that agent is useful. If you're a router-style agent deciding whether to delegate via agent.handoff, read descriptions and pick the best fit.

Use this to:

  • See all configured AI agents

  • Filter by status (active/paused/archived)

  • Get agent IDs for further operations

ParametersJSON Schema
NameRequiredDescriptionDefault
statusNoFilter by status ('active' / 'paused' / 'archived'). Omit for all.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds value by detailing the returned data (basic info, trigger count, knowledge collection count) and the role of the description field for routing, which goes beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized into a clear opening, details about return fields, guidance for router agents, and bulleted use cases. It is efficient with no unnecessary words, though slightly longer than strictly needed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple listing tool with no output schema, the description adequately explains what is returned (basic info, counts, descriptions) and the only parameter. It provides sufficient context for an agent to use the tool correctly, especially for discovering and selecting agents.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description covers the status parameter completely (enum values and explanation). The tool description reinforces the filtering use case but does not add new semantic information beyond what is in the schema. With 100% schema coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all AI agents configured in the workspace' with a specific verb and resource. It also lists the return fields (basic info, trigger count, knowledge collection count). However, it does not explicitly differentiate from sibling listing tools like agents_list_drafts, but the purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides use cases: see all agents, filter by status, get IDs. It also advises router agents on how to use agent descriptions for delegation. However, it does not explicitly state when not to use this tool or contrast it with alternatives like agents_get or agents_list_drafts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_draftsA
Read-onlyIdempotent
Inspect

List pending agent drafts awaiting approval.

Shows drafts that have been generated by AI agents but not yet sent. Each draft includes:

  • Thread/conversation info

  • Trigger message (what prompted the reply)

  • Generated response text

  • Creation time and expiration

Use this when user asks:

  • 'Show pending agent drafts'

  • 'What messages are waiting for approval?'

  • 'List drafts to approve'

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of drafts to return
thread_idNoFilter by specific thread ID (optional)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds value by disclosing that drafts are 'awaiting approval' and what each draft includes (thread info, trigger message, generated response, creation time, expiration). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: starts with the core purpose, lists included details, and ends with example queries. Every sentence serves a purpose, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only listing tool with two optional parameters and no output schema, the description adequately explains what the tool returns and when to use it. It could mention expiration details but is otherwise complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add extra meaning beyond the schema's parameter descriptions, which are already clear for 'limit' and 'thread_id'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List pending agent drafts awaiting approval.' It specifies the resource (drafts) and action (list), and distinguishes from sibling tools like agents_approve_draft and agents_reject_draft by focusing on the listing aspect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit user queries that indicate when to use this tool, such as 'Show pending agent drafts' and 'List drafts to approve'. While it doesn't explicitly state when not to use it, the examples give clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_list_filesA
Read-onlyIdempotent
Inspect

List files directly attached to this agent (agent-specific files, not shared collections).

Returns file_id, title, status, and chunk_count for each file. chunk_count shows how many indexed chunks were created — 0 means the file is still processing.

Use agents.add_file to attach a new file, or agents.remove_file to detach one.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the agent whose files to list
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and non-destructive behavior. Description adds valuable context about chunk_count behavior (0 means still processing), which goes beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Highly concise: two sentences plus a note on chunk_count and references to sibling tools. Front-loaded with purpose, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 required param, no output schema, annotations present), the description covers purpose, usage, and behavioral nuance completely. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'agent_id'. The description does not add extra meaning beyond what the schema provides. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb and resource: 'List files directly attached to this agent'. Distinguishes from sibling tools like collections_list_files by specifying 'not shared collections'. Also mentions return fields.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (list attached files) and references alternative tools: agents.add_file for attaching and agents.remove_file for detaching. Provides context on interpreting chunk_count for processing status.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_historyA
Read-onlyIdempotent
Inspect

List past versions of an agent's prompt_text. Every edit to the agent's prompt is snapshotted to an append-only table — use this tool to browse history, find a prior known-good version, and copy it into agents.prompt_restore.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax versions to return (1-200, default 50)
agent_idYesID of the agent
before_versionNoCursor: return versions strictly below this version_number
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds that the table is append-only and that every edit is snapshotted, which is useful. However, it does not describe pagination or the return format, which are behavioral aspects beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no fluff. First sentence defines the action and resource, second sentence provides usage guidance. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks details about the return structure (fields like version_number, timestamp, prompt_text). Since there is no output schema, the description should compensate. However, the description implies a list of versions, and the workflow with `agents_prompt_restore` suggests the agent can infer the version number. This leaves gaps for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters. The description does not add additional meaning beyond what the schema provides. Baseline of 3 is appropriate given high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it lists past versions of an agent's prompt_text (verb+resource+scope). It mentions the append-only table and the purpose of browsing history. However, it does not explicitly differentiate from the sibling tool `prompts_prompt_history`, which may serve a similar role for prompts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly guides the agent to use this tool to browse history, find a prior known-good version, and copy it into `agents.prompt_restore`. This provides a clear workflow, but does not mention when not to use or alternatives like `prompts_prompt_history`.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_prompt_restoreAInspect

Restore a past version of an agent's prompt_text by version_number. Creates a new version pointing at the restored content — history is preserved. Use agents.prompt_history first to find the version_number you want.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoOptional: why this restore is happening (shows up in history UI)
agent_idYesID of the agent
version_numberYesThe version_number to restore (get it from agents.prompt_history)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool creates a new version and preserves history, which is consistent with annotations (readOnlyHint=false indicating mutation, destructiveHint=false). Adds context beyond annotations about the versioning mechanism.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the core action and key behavioral trait (history preserved), then providing usage guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains the effect (creates a new version). It references the prerequisite tool. Missing minor details like error handling, but overall complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters. The description adds value by specifying that version_number comes from agents.prompt_history and that reason appears in history UI, enhancing understanding beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool restores a past version of an agent's prompt_text by version_number, creating a new version while preserving history. It distinguishes itself from sibling tools like agents_prompt_history (which lists versions) and prompts_prompt_restore (for prompts vs agents).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to use agents.prompt_history first to find the version_number, providing clear context for when to use this tool. It could be more explicit about when not to use, but the guidance is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_reject_draftAInspect

Reject a pending agent draft without sending.

The draft will be marked as rejected and won't be sent. Use this when the generated response isn't appropriate.

Use this when user says:

  • 'Reject this draft'

  • 'Don't send this'

  • 'Cancel this reply'

  • 'Delete this draft'

  • 'This response is wrong'

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoOptional reason for rejection (for logging/feedback)
draft_idYesID of the draft to reject
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains the effect: 'The draft will be marked as rejected and won't be sent.' This adds behavioral context beyond the annotations (which don't indicate destructive behavior). There is no contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with only five short sentences, each serving a purpose. No wasted words, and the most important information (action, effect, usage triggers) is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool (2 parameters, no output schema), the description fully explains what the tool does, when to use it, and what happens to the draft. It is complete for the agent's decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters. The description adds minor context for 'reason' ('for logging/feedback'), but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Reject a pending agent draft without sending.' It specifies the resource (agent draft) and the action (reject), and distinguishes it from siblings like agents_approve_draft.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use scenarios with example user phrases like 'Reject this draft', 'Don't send this', etc., and states the condition 'when the generated response isn't appropriate.' This offers clear guidance for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_remove_fileAInspect

Remove a file from this agent's private knowledge.

The file itself is not deleted — it's just detached from this agent. Use agents.list_files to find the file_id to remove.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to detach (from agents.list_files)
agent_idYesID of the agent to remove the file from
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-read-only and non-destructive. The description adds valuable context that the file is not deleted but detached, which is not obvious from the annotations alone. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two essential sentences plus a practical tip. All information is front-loaded and no word is wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two required parameters and no output schema, the description covers the purpose, effect, parameter source, and non-destructive nature. It is complete and leaves no ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for both parameters. The description adds extra value by explaining how to obtain the file_id (via agents.list_files), which aids in correct parameter selection beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Remove a file from this agent's private knowledge.' It specifies the verb (remove) and the resource (file from agent), and distinguishes from siblings like agents_delete and agents_add_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear guidance: the file is only detached, not deleted, and directs to use agents.list_files to get the file_id. This helps the agent understand prerequisites and the non-destructive nature, though it does not explicitly state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_simulate_inboundA
Read-onlyIdempotent
Inspect

Replay an inbound message on a thread through the real trigger pipeline and return what would have happened. The router auto-picks the winning enabled agent + trigger by priority/specificity (same logic as production). By default send_mode='draft' so no real message is sent; pass send_mode='auto' on a test account to let the matched agent actually deliver (drafts get overwritten by the next draft, so 'auto' is the only way to verify Telegram/email delivery end-to-end).

Use to verify routing for a thread: which agent answers, which trigger wins, or — when nothing matches — the structured skip reason. Pass blockchain_tx_data instead of message_text to simulate a blockchain:transfer event on the thread.

Returns: {matched: true, matched_agent: {id, name, execution_mode}, matched_trigger: {id, trigger_type, conditions, specificity_score}, routing_reason, response_text, messages[], execution_mode, send_mode, model_used, tokens_input, tokens_output, latency_ms, rag_queries_made, rag_results_used} on a hit, or {matched: false, skip_reason, simulator_warnings} on a miss.

ParametersJSON Schema
NameRequiredDescriptionDefault
send_modeNoHow the matched agent should deliver its reply. 'draft' (default, safe) creates a draft only — no real send, no idempotency key. 'auto' lets the agent deliver through the channel adapter exactly as it would in production — use this on a test account to verify Telegram/email delivery end-to-end. Drafts get overwritten by the next draft on the thread, so 'auto' is required when you want to see the message persisted.draft
thread_idYesThread ID to route the simulated event from. Must belong to the API key's workspace.
message_textNoInbound message body to simulate. Defaults to '[MCP simulation test]' when omitted.
system_messageNoTag the simulated inbound as a system/service-message row (missed call, group join, pinned message, etc.) so the `excluded_system_message_kinds` trigger filter can be exercised end-to-end. Shape: {"category": <one of call_event | membership_change | contact_signup | pinned_message | chat_metadata_change | voice_chat_event | other_service>, "native_kind": <free-form upstream event class name, e.g. 'MessageActionPhoneCall'>}. The category is written into `message.meta.system_message` (mirroring the real Telegram ingest path) AND surfaced on the synthetic IncomingEvent so the trigger evaluator honors the block-list. Omit for a normal text-message simulation.
blockchain_tx_dataNoWhen set, simulate a blockchain:transfer event instead of a channel:message:new event. Expected keys: chain, to_address / from_address, tx_hash.
attachment_file_idsNoOptional list of workspace file IDs to attach to the simulated inbound message — same shape as a real Telegram message with image/document attachments. Use this to test agent behavior on incoming messages that carry images (e.g. logos for invoices) or documents the agent must reference. File IDs must belong to the API key's workspace.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description aligns with annotations (readOnlyHint, destructiveHint, idempotentHint). Discloses default draft mode prevents real sends, and warns auto mode is only for test accounts. Explains overwrite behavior of drafts. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections for purpose, usage, and return format. Slightly verbose but every sentence adds value. Could be more concise but effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all important aspects: core functionality, use cases, parameter details, return structure for both hit and miss, edge cases like blockchain and system messages. No output schema, but description provides comprehensive return format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds significant value beyond schema: explains send_mode's effects, system_message shape and purpose, blockchain_tx_data expected keys, attachment_file_ids usage. Each parameter's semantics are enriched.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool replays an inbound message through the trigger pipeline to check routing, using specific verbs and resources. It distinguishes from siblings like agents_ask by focusing on simulation rather than direct interaction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases: verifying routing, determining matched agent/trigger, skip reasons. Advises using 'auto' only on test accounts. Could be improved by explicitly stating when not to use (e.g., for actual messaging).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_task_completeAInspect

Report that a Claude Code agent task has been completed. Call this when you finish processing an agent_task from DialogBrain.

ParametersJSON Schema
NameRequiredDescriptionDefault
successYesWhether the task completed successfully
summaryNoBrief summary of what was done
trace_idYesTrace ID from the agent task event
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false. The description adds that this tool reports task completion, implying a state change. However, it does not detail side effects (e.g., what happens if called multiple times) or other behavioral nuances beyond what annotations already convey. The description is adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and usage condition. Every word serves a function with no redundancy. It is appropriately concise for the tool's simplicity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 parameters, no output schema, no nested objects), the description covers the essential points: what it does and when to use it. It does not explain the return value or error handling, but for a reporting tool this is likely sufficient. Slightly more detail on prerequisites or effects could improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptive parameter names and descriptions. The description adds minimal parameter context by mentioning 'agent_task from DialogBrain' which relates to trace_id, but does not elaborate on success or summary beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Report') and resource ('that a Claude Code agent task has been completed'), clearly distinguishing this tool from sibling tools like agents_create or agents_ask. It also specifies the triggering context ('finish processing an agent_task from DialogBrain'), leaving no ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to call the tool: 'when you finish processing an agent_task from DialogBrain.' This provides clear usage context. Although it does not explicitly list alternatives or when not to use, the condition is specific enough to guide the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trace_getA
Read-onlyIdempotent
Inspect

Fetch the full execution detail for a single trace — tool executions, events timeline, LLM call spans (with error_message on failures).

Use after agents.traces_list identifies a specific trace of interest (failed run, slow run, unexpected outcome).

By default LLM system_prompt and prompt_messages are stripped — set include_llm_bodies=true to fetch them when diagnosing prompt engineering issues (emits a WARNING audit log). Set full=true to disable all field truncation. completion_text on failed LLM calls is always returned (capped at 8 KB).

ParametersJSON Schema
NameRequiredDescriptionDefault
fullNoDisable all field truncation. Escape hatch for a human operator. OMIT for the standard truncated view.
agent_idYesExpected agent_id — used for scope validation. Mismatch returns not_found.
trace_idYesTrace identifier returned by agents.traces_list.
include_llm_bodiesNoInclude system_prompt and prompt_messages in LLM spans. Audited at WARNING level. OMIT to keep them stripped (the default).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnly, idempotent), the description reveals default stripping of LLM bodies, audit logging when including them, truncation behavior, and a 8KB cap on completion_text, which is critical behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: one clear sentence on purpose, followed by usage guidance and parameter behavior. No redundant information, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no output schema, and solid annotations, the description covers return details, default behaviors, edge cases (stripped bodies, audit, truncation, capped text), and practical use cases. Fully adequate for agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds explanations for each parameter: `full` as an escape hatch, `include_llm_bodies` as audited, `trace_id` from traces_list, `agent_id` for validation. This adds meaningful context beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches full execution detail for a single trace, listing specific components (tool executions, events timeline, LLM call spans) and distinguishes from the sibling tool `agents.traces_list` which lists traces.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly recommends use after `agents.traces_list` to drill into specific traces, providing context (failed run, slow run, unexpected outcome). Does not explicitly exclude alternatives but the guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_listA
Read-onlyIdempotent
Inspect

List recent execution traces for an agent — the same data as /admin/requests, scoped to one agent and readable by an LLM.

Use this when an agent call timed out, drafted the wrong response, or you want to know which tool/LLM call burned the latency. Pair with agents.trace_get for full detail on a specific trace.

Filters: status, success, source (single value or comma-separated: agent,voice), date_from/date_to (ISO-8601), pagination via limit/offset.

Returns returned_count, dropped_on_page (should be 0 — positive means the backend agent_id predicate let something through), and has_more. Edge case: a raw page of all-dedup-dropped rows yields returned_count=0, has_more=true; re-call with offset += limit.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax rows per page (1–100).
offsetNoRows to skip for pagination. OMIT to start at row 0 (default).
sourceNoFilter by trace source. Single value or comma-separated, e.g. 'agent,voice'. Values: agent / auto_reply / agentic / outreach / voice. Note: source='agent' also matches voice traces today (known upstream bug).
statusNoFilter by status. OMIT to include all statuses.
date_toNoISO-8601 upper bound on created_at.
successNoFilter to succeeded (true) or failed (false) runs only. OMIT to include both.
agent_idYesAgent ID to pull traces for (must belong to your workspace).
date_fromNoISO-8601 lower bound on created_at, e.g. '2026-04-10T00:00:00Z'.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and destructiveHint=false. The description adds behavioral context beyond this: it explains the agent_id predicate dropping rows, the `returned_count` vs `dropped_on_page` fields, and the edge case of all rows being dropped. It also mentions a known upstream bug for source filtering, which is valuable for the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized into clear sections (purpose, usage, filters, response) but is somewhat lengthy. Every sentence adds information, but it could be tightened slightly without loss.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 parameters, no output schema, and complex filtering, the description covers purpose, usage, filter semantics, pagination, return fields, and an edge case. It is fully adequate for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining response fields (returned_count, dropped_on_page, has_more) and the edge case for pagination, which the schema does not cover. However, most parameter semantics are already in the schema, so the increase is modest.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List recent execution traces for an agent' and distinguishes it from sibling tools like `agents_trace_get` (for full detail) and `agents_traces_stats`. It also references the same data as /admin/requests, providing a clear verb-resource-scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit use cases ('when an agent call timed out, drafted the wrong response, or you want to know which tool/LLM call burned the latency') and recommends pairing with `agents.trace_get` for full detail. It also lists filters and pagination, making when-to-use clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_traces_statsA
Read-onlyIdempotent
Inspect

Aggregated trace statistics for one agent over the last N days — total runs, success rate, avg duration, error breakdown, top tools used, runs-per-day histogram.

Use this when you want a bird's-eye view of an agent's health before diving into individual traces with agents.traces_list / agents.trace_get. Scoped to the target agent (exact match, no substring bleed). days is capped at 30 — matches the ClickHouse request_traces TTL.

ParametersJSON Schema
NameRequiredDescriptionDefault
daysNoRolling window in days (1–30).
agent_idYesAgent ID to compute stats for (must belong to your workspace).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, but the description adds significant behavioral context: the exact-match scoping, days cap due to TTL, and the aggregated nature of the stats (not per-trace). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with key purpose and metrics, no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a stats aggregation tool with two parameters and no output schema, the description completely covers the output (list of metrics), usage context (bird's-eye view), constraints (exact match, days cap), and relationship to siblings. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides complete descriptions for both parameters (days with min/max/default, agent_id with workspace ownership). The description adds context about the days cap and scoping but does not add new semantics beyond what the schema provides. Baseline 3 is appropriate given 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Get... trace statistics') and resource ('one agent'), clearly lists the metrics (total runs, success rate, etc.), and distinguishes from siblings by stating it's an aggregated bird's-eye view before diving into individual traces with agents.traces_list / agents.trace_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use ('when you want a bird's-eye view of an agent's health before diving into individual traces'), mentions the scope (exact agent match, no substring bleed), and states the days cap (30) tied to TTL. Provides clear context on when to use vs alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_createAInspect

Create a new trigger for an AI agent.

Triggers determine when the agent activates.

Trigger types:

  • incoming_message: Activates on new incoming messages

  • schedule: Activates on a schedule

  • webhook: Activates on webhook events

  • event: Activates on system events

ParametersJSON Schema
NameRequiredDescriptionDefault
enabledNoWhether the trigger is enabled. OMIT to use the default (true).
agent_idYesID of the agent to create a trigger for
priorityNoTrigger priority — lower numbers run first (default: 100)
send_modeNoSend mode override for this trigger. OMIT to inherit from the agent.
conditionsNoTrigger conditions (JSON). Supported fields for incoming_message: - keywords: ["pricing","demo"] — message must contain keyword(s) (free, no LLM cost) - keyword_match: "any" (default, OR) or "all" (AND) - channel_types: ["telegram","whatsapp","livechat_voice","twilio_voice","telegram_voice","voice",...] — filter by channel. For voice, use EITHER the three per-channel keys (scoped) OR "voice" alone (wildcard matching all three) — mixing them is redundant. Per-channel keys: "livechat_voice" (web widget), "twilio_voice" (PSTN inbound), "telegram_voice" (Telegram p2p calls) - context_types: ["dm","group","channel","livechat"] — filter by chat type - group_mode: "mentions_only" or "questions" — for group chats - channel_account_ids: ["123"] — restrict to specific accounts - folder_ids: [5,10] — restrict to threads in folders - ai_tag_ids: [1,2] — restrict to threads with AI tags - ai_filter_ids: [1,2] — semantic intent filters (message matched via embedding similarity, works in noisy groups) - ai_filter_mode: "any" (default, OR) or "all" (AND) — how multiple AI filters combine - ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. Prefer referencing existing filters by id when available. Use ai_filters.create + ai_filters.test for fine-tuning before assigning. - contact_states: ["active"] — filter by contact state - cooldown_seconds: 30 — min gap between runs per thread - max_runs_per_thread_per_hour: 5 — rate limit Supported fields for job_completed (proactive callback when a delegated job finishes): - source_agent_id: <int> — fire only when this agent's job completed - source_agent_slug: <str> — alternate to source_agent_id - job_type: "agentic_session" — match a specific job type (default: any) - outcome: ["completed"] | ["escalated"] | ["completed","escalated"] — default ["completed"] - min_duration_seconds: <int> — skip very-short jobs (noise filter) - thread_filter: {thread_ids: [<int>...]} — restrict to specific threads Supported fields for calendar_event (fires N minutes before a Google Calendar event starts): - window_minutes_before: <int 1-1440> — REQUIRED, fire when an event starts within this window - channel_account_ids: [<int>...] — restrict to specific calendar accounts (default: all) - keywords: ["standup"] — word-boundary match on event title - prepare_meet_join: true — pre-invite pool bots to the event (enables unattended Meet join) Generic run-mode fields (incoming_message AND calendar_event): - run_mode: "text" (default, normal agent run) or "voice" (deterministically join the meeting/call resolved from the event or message — requires send_mode=auto) - voice: {speak_first: <bool — greet immediately vs stay silent until addressed>, vision_mode: "off"|"on_demand"|"continuous_0_3fps"}
thread_idsNoRestrict this trigger to specific threads (chats) by their numeric thread IDs. When set, the trigger only fires for messages in these threads. Maps to conditions.thread_filter.thread_ids.
trigger_typeYesType of trigger: 'incoming_message', 'incoming_call', 'schedule', 'webhook', 'event', 'blockchain_event', 'job_completed', or 'calendar_event'
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide minimal info (readOnlyHint=false, destructiveHint=false). The description adds context about trigger types and conditions, which informs behavior, but does not disclose creation side effects, auth needs, or rate limits. With sparse annotations, description partially compensates.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with a clear purpose but becomes verbose with a long conditions section. It is structured with bullet points and sections, but could be more concise overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, nested objects, multiple trigger types), the description is comprehensive. It covers all trigger types and their conditions fields. No output schema exists, so return values are not required, but the description is complete for configuration.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds significant meaning beyond the schema for the 'conditions' parameter, detailing supported fields for each trigger type. This compensates for the complexity, raising the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new trigger for an AI agent' and explains what triggers are. It lists trigger types with brief explanations, distinguishing it from siblings like agents_trigger_update and agents_trigger_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives. It implies usage through the list of trigger types and conditions, but lacks explicit guidance on choice or when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_deleteBInspect

Delete a trigger from an AI agent.

WARNING: This cannot be undone.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the agent that owns this trigger
trigger_idYesID of the trigger to delete
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds a warning 'cannot be undone' which contradicts the destructiveHint=false annotation (Annotation Contradiction). Despite adding behavioral context, the contradiction undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. The warning is important and front-loaded effectively.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, and description does not explain return values or confirmation. Missing context about what happens after deletion. Could mention related tools like agents_trigger_update.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add extra meaning beyond the schema; it only names the tool and provides a warning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete a trigger from an AI agent' using a specific verb and resource. It distinguishes from sibling tools like agents_trigger_create and agents_trigger_update.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Lacks context such as prerequisites or when to avoid deletion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_trigger_updateBInspect

Update an existing AI agent trigger.

All parameters are optional — only provided fields will be updated.

ParametersJSON Schema
NameRequiredDescriptionDefault
enabledNoEnable or disable this trigger. OMIT to leave the enabled flag unchanged.
agent_idYesID of the agent that owns this trigger
priorityNoTrigger priority — lower numbers run first
send_modeNoNew send mode override. OMIT to leave the send-mode unchanged.
conditionsNoNew trigger conditions (replaces existing). Same fields as trigger_create: keywords, keyword_match, channel_types, context_types, group_mode, channel_account_ids, folder_ids, ai_tag_ids, ai_filter_ids, ai_filter_mode, ai_filters: [{id: 1}, {name: "...", description: "..."}] — shorthand: reference existing by id or create inline (calls Voyage embedding API). If a filter with the same name already exists, it is reused by id. contact_states, cooldown_seconds, max_runs_per_thread_per_hour. calendar_event: window_minutes_before (1-1440, required), channel_account_ids, keywords, prepare_meet_join. Generic (incoming_message + calendar_event): run_mode "text"|"voice" (voice requires send_mode=auto), voice: {speak_first, vision_mode}
thread_idsNoRestrict this trigger to specific threads (chats) by their numeric thread IDs. When set, merged into conditions.thread_filter.thread_ids. If conditions is also provided, thread_ids is merged into it.
trigger_idYesID of the trigger to update
trigger_typeNoNew trigger type. OMIT to keep the existing type unchanged.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (readOnlyHint, destructiveHint both false), so the description adds value by clarifying 'only provided fields will be updated', indicating a non-destructive, partial update behavior. However, it does not disclose potential side effects, permissions, or behavior of omitted fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences with no redundant information. The first sentence clearly identifies the action, and the second provides key behavioral context. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool's complexity (8 parameters, nested objects, no output schema), the description is minimal. It does not explain the return value, error handling, or implications of updating conditions. Leaves significant gaps for an agent to understand the full behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds the insight that only provided fields are updated, but incorrectly claims 'all parameters are optional' when trigger_id and agent_id are required in the schema. This misrepresentation reduces the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it updates an existing AI agent trigger. The verb 'Update' and resource 'AI agent trigger' are specific. It implicitly distinguishes from sibling tools like agents_trigger_create and agents_trigger_delete, but does not explicitly contrast them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., create or delete). The note 'all parameters are optional' hints at partial updates but does not clarify prerequisites or exclusions. Missing explicit when-to-use or when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_updateAInspect

Update an existing AI agent's configuration.

All parameters are optional — only provided fields will be updated.

Use this to:

  • Enable or disable an agent

  • Change agent name or description

  • Assign or detach a prompt

  • Change default send mode

  • Replace knowledge collections

  • Update agent status

  • Change agent priority for trigger matching (lower number = higher priority)

  • Override which tools the agent can/can't call on triggered runs

  • Override which context sections (situation, communication style, job state, conversation history, thread summary) the agent receives

  • Opt into boilerplate prompt sections (safety guidelines, data confidentiality, factual accuracy) — all default OFF

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew name for the agent
modelNoCanonical source for which LLM the agent runs on. To switch models pass JUST this — do NOT also rewrite prompt_text (any 'duty model' section in the prompt is stale doc, not the config). OMIT to leave the model unchanged.
statusNoAgent status: 'active', 'paused', or 'archived'. OMIT to leave the status unchanged.
agent_idYesID of the agent to update
priorityNoAgent priority for trigger matching. LOWER number = HIGHER priority (wins tiebreaks). Typical range 1-100. Fallback auto-reply agents use 10; specialised/topical agents use 100. When two agents match the same incoming message, the one with the lower priority number fires.
prompt_idNoPrompt ID to assign (null to detach)
send_modeNoDefault send mode: 'auto' or 'draft'. OMIT to leave the send-mode unchanged.
fast_modelNoModel for the fast-path responder (voice, text auto-reply, agent executor). Defaults to deepseek-chat when unset. Non-Anthropic models (deepseek-chat, gpt-4.1-nano, kimi-k2.6) do NOT use BYOK today — they use the system API key + credits. Pass null to revert to default.
api_surfaceNoOpenAI HTTPS endpoint for this agent's LLM calls (Phase 3a). 'chat_completions' (default, also when null) routes to /v1/chat/completions. 'responses' routes to /v1/responses — required for OpenAI native server tools (web_search, code_interpreter, image_generation, input_file PDFs). Capability still wins: agents whose tool list triggers the server_tool_responses_api substitution always route to Responses regardless of this setting. Ignored on non-OpenAI models (Anthropic, DeepSeek, Moonshot). OMIT to leave the api_surface unchanged.
descriptionNoNew description for the agent
prompt_textNoDESTRUCTIVE — REPLACES the entire system prompt. Pass ONLY when the user explicitly asks to edit/rewrite the prompt. To READ the prompt use prompts.get. When updating other fields (model, name, …) OMIT this. To append, prompts.get first then concatenate. Pass null to revert to the linked template.
text_engineNoText-execution engine: 'agentic', 'ai_assisted', 'rule_based', or 'claude_channels'. Replaces the legacy execution_mode field (20260523_002). Voice is now derived from triggers, not engine. OMIT to leave unchanged.
denied_toolsNoBlock-list of tool IDs the agent must not call on triggered runs. Applied after allowed_tools and default visibility. Empty list [] = clear the block-list.
allowed_toolsNoExplicit allow-list of tool IDs this agent can call on triggered runs (e.g. ['messages.send', 'agent.handoff']). Empty list [] = clear the allow-list and fall back to system defaults. When set, only these tools (minus denied_tools) are exposed to the agent. Does NOT affect the My AI dropdown path.
max_iterationsNoHard cap on agentic-loop turns (LLM round-trips) per run, 1-50 (default 10). Each turn can call tools; the loop stops when the model replies with no tool call OR this cap is hit. Raise it for multi-step tool chains (e.g. browser automation: open → snapshot → fill → confirm → reply) that otherwise exhaust their turns before producing a final answer. OMIT to leave it unchanged.
vision_enabledNoPer-agent opt-in for vision content. When true, the executor splices recent image attachments from the active thread into the LLM call (Phase 3a continuous vision for Meet bot screen-share, plus any future channel that uploads images). Requires the agent's model to support vision (model_has_vision check). Default false; new calls pay zero token cost until the operator opts in. OMIT to leave the vision flag unchanged.
voice_greetingNoOpening line the agent speaks when the call connects. Pass an empty string "" to clear. Omit or null leaves unchanged.
voice_stt_modelNoSpeech-to-text model: 'flux' (alias for flux-general-en), 'flux-general-en' (English Flux, LLM-powered end-of-turn), 'flux-general-multi' (multilingual Flux), or 'nova-3' (silence-based fallback). Flux variants are more responsive; nova-3 is the fallback when your Deepgram plan lacks Flux. OMIT to leave the STT model unchanged.
voice_tts_speedNoTTS playback speed multiplier (0.5-2.0, default 1.0). Yandex/OpenAI/Cartesia only — ignored for Deepgram.
voice_tts_voiceNoTTS voice id — provider-specific (e.g. 'aura-2-thalia-en' for Deepgram, 'alloy' for OpenAI, 'alena' for Yandex, Cartesia voice UUID). Pass null to revert to provider default.
auto_reply_rulesNoPlain-English rules injected into the fast model's system prompt as a `## Rules` block. No reserved keywords — the fast model reads them as guidance and decides per turn whether to reply directly or escalate to the main model for tools. Example: '- If the user greets, reply "Hi! How can I help?"\n- If the user asks what you can do, reply with a 1-sentence summary\n- If the question needs live data (prices, stock, booking), escalate' Engagement filtering (SKIP) belongs in trigger `conditions` (keywords, ai_filters, channel_types, cooldown), NOT here — if a message should be ignored the trigger shouldn't have fired. Pass null to clear.
voice_max_tokensNoMax TTS tokens per voice reply (40-200, default 100). Lower = snappier, higher = more detail.
include_job_stateNoInclude current job state (active job context, tasks, notes) in the agent's prompt. OMIT to leave this flag unchanged.
include_situationNoInclude situation context (channel, sender info, trigger type) in the agent's prompt. OMIT to leave this flag unchanged.
voice_stt_keytermsNoDomain-vocab bias for STT — names, product SKUs, etc. Passed verbatim as repeated `&keyterm=<w>` query params. Works on both Nova-3 and Flux. Prefer short phrases over full sentences. Empty list [] = no bias. Omit leaves unchanged.
voice_stt_languageNoSTT language hint. 'multi' (default) enables code-switching; singletons like 'en', 'ru', 'es' give higher accuracy when the caller language is known. Use 'multi' for bilingual callers. OMIT to leave the STT language unchanged.
voice_tts_languageNoTTS language code, BCP-47 lite e.g. 'en', 'es', 'pt-BR' (Cartesia only, default 'en').
voice_tts_providerNoText-to-speech provider: 'deepgram' (default, Aura-2 EN-only), 'openai' (multilingual), 'yandex' (best Russian), or 'cartesia' (Sonic-3 ultra-low TTFB). OMIT to leave the TTS provider unchanged.
include_specialistsNoInject a [SPECIALISTS] block (~50–200 tokens) listing the workspace's delegation-capable agents so a router-style agent can pick a handoff target without first calling agents.list. Default OFF for new agents; the Router template ships with this ON. Agentic mode only. OMIT to leave this flag unchanged.
voice_primary_modelNoPrimary LLM for voice turns (e.g. 'gpt-4.1-mini', 'claude-haiku-4-5-20251001'). gpt-4.1-nano is too weak for reliable turn tracking; mini is the recommended floor. Pass null to revert to default.
fast_prompt_overrideNoFull fast-path prompt override. Placeholders substituted via .replace(): {message}, {history}, {rules}, {tools}, {output_contract}. agent.prompt_text is NOT injected into fast_prompt_override — include it yourself if you want it. Pass null to clear.
voice_filler_enabledNoEmit 'thinking' filler audio while tools run so the caller hears life on the line (default true). OMIT to leave this flag unchanged.
voice_max_tool_callsNoMax tool calls per voice turn (1-10, default 3). OMIT to leave unchanged.
voice_thinking_textsNoPool of phrases spoken while the agent sets up the turn before calling the LLM (e.g. ['Hmm', 'So', 'One sec']). Pre-rendered to PCM at call start; one is picked at random per turn so the agent doesn't repeat the same word. Pass [] to clear. Omit or null leaves unchanged.
include_learned_styleNoInclude learned communication style (per-contact tone, dormancy state) in the agent's prompt. OMIT to leave this flag unchanged.
include_thread_summaryNoInclude condensed summary of older thread messages in the agent's prompt. OMIT to leave this flag unchanged.
include_factual_accuracyNoInject the Factual Accuracy block (~100 tokens, generic anti-hallucination rules) into the system prompt. Default OFF — skip if you write domain-specific accuracy rules in Instructions. Agentic mode only. OMIT to leave this flag unchanged.
knowledge_collection_idsNoReplace all knowledge collections with these IDs (empty list = clear all)
include_safety_guidelinesNoInject the generic Safety Guidelines block (~80 tokens) into the system prompt. Default OFF — enable only if you don't already write safety rules in your Instructions. Agentic mode only. OMIT to leave this flag unchanged.
include_tool_call_historyNoInclude the agent's own tool calls and results from the last 3 runs on this thread, compacted to IDs + top hits (~200-1000 tokens). Lets the agent recall file IDs, search hits, and decisions it already made across turns. Default ON. Agentic mode only. OMIT to leave this flag unchanged.
voice_endpointing_min_delayNoSilence after end-of-utterance before agent replies (0.1-2.0s, default 0.3). Higher = fewer false interrupts; lower = snappier.
voice_preemptive_generationNoSpeculatively start the LLM on STT partials so the agent begins responding before end-of-utterance. Matches LiveKit stock template. Default true. OMIT to leave this flag unchanged.
include_conversation_historyNoInclude recent messages from this thread (up to 20) in the agent's prompt. OMIT to leave this flag unchanged.
include_data_confidentialityNoInject the Data Confidentiality block (~250 tokens, cross-contact PII isolation + prompt-injection defense) into the system prompt. Recommended for multi-tenant workspaces. Default OFF. Agentic mode only. OMIT to leave this flag unchanged.
voice_greeting_interruptibleNoAllow the caller to barge in during the opener TTS. Default true (trial-friendly — long greetings can be interrupted). Set false on outbound-call agents whose configured opener would otherwise get preempted by the caller's 'Hello?' triggering an off-script auto-turn. OMIT to leave this flag unchanged.
voice_interruption_min_durationNoMin caller speech duration to interrupt the agent (0.1-1.5s, default 0.25). Higher = ignore short fillers like 'uh-huh'.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false and destructiveHint=false, which the description does not contradict. However, the description does not disclose that some parameters (e.g., prompt_text) are destructive. The burden is partially on parameter schema, but the main description could be more transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose, but the lengthy bullet list (19 items) could be trimmed. While informative, it is not maximally concise given the high parameter count.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers many use cases but lacks information on return values (no output schema) and error conditions. Additional context would be beneficial for a tool with 46 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The bullet list in the description summarizes some parameters but does not add significant meaning beyond the detailed schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update an existing AI agent's configuration.' It lists specific use cases (e.g., enable/disable, change name, assign prompt), making the tool's purpose obvious and distinct from siblings like agents_create and agents_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'Use this to:' bullet list provides explicit guidance on when to use the tool. However, it does not explicitly contrast with siblings (e.g., when NOT to use it), which would improve decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agents_update_from_templateAInspect

Update a forked agent's instructions (prompt) to the latest version of the system template it was created from.

Use when the platform has improved a template and the user wants their forked agent to pick up the new prompt. This OVERWRITES the agent's prompt_text with the template's current prompt — any customizations to the prompt are replaced (recoverable via prompt history). Tool/model/execution settings are NOT changed. Only works on agents forked from a template (not from-scratch agents or templates themselves).

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the forked agent to update from its template
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits beyond annotations: it overwrites prompt_text (with recoverability via history), does not change tool/model/execution settings, and only works for forked agents. This adds significant context that the annotations (only readOnlyHint and destructiveHint) do not cover. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured, with a clear first sentence stating the core action, followed by detailed usage guidelines and behavioral notes. Every sentence adds value, and there is no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema) and adequate annotations, the description fully covers the necessary context: what changes, what does not, conditions for use, and recovery. It leaves no ambiguity for an AI agent to misapply the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes agent_id thoroughly ('ID of the forked agent to update from its template'), so schema coverage is 100%. The description adds the contextual constraint that the agent must be forked from a template, but this is implied in the parameter description. Thus, the description adds minimal additional meaning over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the verb ('update') and resource ('forked agent's instructions') and distinguishes it from sibling tools like agents_update by specifying it updates from the originating template. It explicitly states the action: 'Update a forked agent's instructions (prompt) to the latest version of the system template it was created from.'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use ('Use when the platform has improved a template...') and includes exclusions ('Only works on agents forked from a template (not from-scratch agents or templates themselves)'). It also provides clear context about what is overwritten and what is preserved.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_createAInspect

Create a new AI filter for semantic intent-based message matching.

AI filters use vector embeddings (via Voyage AI) to detect whether an incoming message matches a specific intent or topic. The filter's description is embedded as a reference vector at creation time. When a message arrives, its embedding is compared against this reference using cosine similarity.

The description field is the most important part — it becomes the reference embedding that all incoming messages are compared against. Write it as a clear statement of what kind of messages should match:

  • 'Customer asking about pricing, subscription plans, or billing'

  • 'User reporting a bug, crash, or unexpected behavior in the product'

  • 'Inbound sales lead expressing interest in purchasing or trialing'

The threshold controls sensitivity: 0.5 is a balanced default, lower values (0.3) cast a wider net, higher values (0.8) require closer matches.

Note: This tool calls the Voyage AI embedding API to generate the reference vector.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesFilter name — a short, human-readable label (max 100 chars)
thresholdNoCosine similarity threshold for a message to be considered a match. Range 0.1–1.0. Default 0.50. Lower values (e.g. 0.3) are more permissive and catch more messages. Higher values (e.g. 0.8) require closer semantic similarity.
descriptionYesReference text that defines what messages should match this filter. This text is embedded as a vector and used for cosine similarity comparison against all incoming messages. Be specific and descriptive — the quality of this text directly determines filter accuracy. E.g. 'Customer asking about pricing, subscription costs, or billing issues'. Max 500 chars.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by explaining that the filter uses Voyage AI embeddings, how the description becomes a reference vector, and the threshold's effect on sensitivity. Annotations only state readOnlyHint=false and destructiveHint=false, so the description adds valuable behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: it starts with the core purpose, explains the mechanism, gives practical examples, and ends with a note on the embedding API call. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's behavior thoroughly but omits mention of the return value after creation. Since there is no output schema, the agent is left uncertain about what the response includes (e.g., created filter ID). Otherwise, it is complete for a creation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides high-coverage descriptions (100%), but the description adds extra guidance: emphasizing the description field's importance, providing example phrasing, and elaborating on threshold sensitivity. This adds meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Create a new AI filter for semantic intent-based message matching.' It uses a specific verb (Create) and resource (AI filter), and the context distinguishes it from sibling tools like ai_filters_delete and ai_filters_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on how the filter works (vector embeddings, cosine similarity) and explains the threshold usage, but does not explicitly mention when to use this tool versus alternatives like updating or testing filters. The logic is implied given the distinct 'create' action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_deleteA
DestructiveIdempotent
Inspect

Permanently delete an AI filter.

When to use:

  • User wants to remove a filter they no longer need

This action cannot be undone. Any triggers that reference this filter by ID will no longer match it — review and update those triggers after deletion.

ParametersJSON Schema
NameRequiredDescriptionDefault
filter_idYesID of the filter to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare destructiveHint=true, which the description reinforces by stating 'Permanently delete' and 'This action cannot be undone.' It adds valuable context about triggers referencing the filter, going beyond the bare annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with the purpose in the first sentence, followed by a usage bullet and two sentences on consequences. Every sentence adds value, and the structure is front-loaded and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with one parameter and comprehensive annotations, the description fully covers the key aspects: purpose, usage guidance, irreversibility, and impact on triggers. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage for the single parameter (filter_id). The description does not add any additional meaning beyond the schema's description, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Permanently delete an AI filter,' which clearly identifies the verb (delete) and resource (AI filter). The tool name and sibling tools like ai_filters_create and ai_filters_update distinguish it effectively.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use' bullet that specifies the user wants to remove a filter, providing clear context. It also warns about irreversibility and impact on triggers, though it does not explicitly mention when not to use or compare with siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_listA
Read-onlyIdempotent
Inspect

List all AI filters for the current workspace.

AI filters are semantic intent-based message filters that use embeddings (vector representations) to detect whether an incoming message matches a specific intent or topic. Unlike keyword filters, they understand meaning: 'I need help with my order' and 'my package hasn't arrived' both match a 'shipping support' filter even without shared keywords.

Each filter stores a reference embedding of its description. When a message arrives, its embedding is compared via cosine similarity against the filter's reference vector. If the similarity exceeds the threshold, the filter matches.

When to use:

  • Check which semantic filters already exist before creating a new one

  • Get filter IDs for use in trigger conditions

  • Review thresholds and active status of existing filters

Returns all filters with id, name, description, threshold, and is_active.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, but the description adds significant detail: it explains the semantic filtering mechanism (embeddings, cosine similarity), what fields are returned (id, name, description, threshold, is_active), and the overall behavior. There is no contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat lengthy but well-structured: it starts with the main purpose, then explains the underlying technology, followed by usage guidance and return fields. Every sentence serves a purpose, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters, no output schema, and adequate annotations, the description comprehensively explains what the tool does, how it works, when to use it, and what it returns. It fully satisfies the context needed for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters, so schema coverage is 100%. The description does not need to elaborate on parameters. With no parameters, the baseline is 4, and the description adds no unnecessary param info, maintaining the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all AI filters for the current workspace' and explains the concept of AI filters, distinguishing it from sibling tools like ai_filters_create, ai_filters_delete, etc. It uses specific verbs and a defined resource, achieving high clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use' section with three specific use cases (check existing filters, get IDs for triggers, review thresholds/status), providing clear context. However, it does not explicitly state when not to use this tool or mention alternatives, which keeps it from a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_testA
Read-onlyIdempotent
Inspect

Test a message against an AI filter to check whether it would match.

This tool embeds the provided message using Voyage AI and computes the cosine similarity between the message vector and the filter's stored reference vector. It returns the similarity score, whether the message would match (similarity >= threshold), and the filter's threshold value.

Use this to:

  • Verify a filter works as intended before using it in a trigger

  • Tune the threshold by testing borderline messages

  • Debug why a message did or did not match a filter in production

Returns: {similarity: float, matched: bool, threshold: float}

Note: This tool calls the Voyage AI embedding API to embed the test message.

ParametersJSON Schema
NameRequiredDescriptionDefault
messageYesThe message text to test. This is embedded and compared against the filter's reference vector via cosine similarity.
filter_idYesID of the filter to test against
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, but the description adds crucial behavioral details: it calls the Voyage AI embedding API, computes cosine similarity, and returns similarity score, match boolean, and threshold. This goes beyond annotations by disclosing external dependency and computational steps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: a clear lead sentence, technical explanation, bulleted use cases, return format, and an important note about the external API call. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description fully explains the return value (similarity, matched, threshold) and the threshold logic. It also covers practical use cases and external API implications, making it complete for an agent to understand the tool's behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, both parameters are already well-described in the input schema. The description does not add new semantic information about the parameters beyond what the schema provides, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Test a message against an AI filter to check whether it would match.' This distinct verb+resource combination differentiates it from sibling tools like ai_filters_create and ai_filters_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases: 'Verify a filter works as intended before using it in a trigger', 'Tune the threshold', and 'Debug why a message did or did not match a filter'. This guides the agent on when to apply this tool over its siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_filters_updateAInspect

Update an existing AI filter's name, description, threshold, or active state.

When to use:

  • User wants to rename a filter

  • User wants to refine the filter description to improve match accuracy

  • User wants to adjust the similarity threshold (higher = stricter matching)

  • User wants to enable or disable a filter without deleting it

Provide only the fields you want to change. At least one field is required.

Note: If the description is changed, this tool calls the Voyage AI embedding API to re-generate the reference vector with the new description text.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew filter name (max 100 chars, optional)
filter_idYesID of the filter to update
is_activeNoEnable (true) or disable (false) the filter. OMIT to leave the active flag unchanged.
thresholdNoNew cosine similarity threshold. Range 0.1–1.0. Optional.
descriptionNoNew reference description text. If changed, the Voyage AI embedding API is called to re-generate the reference vector. Max 500 chars. Optional.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses a key behavioral trait: changing the description triggers a Voyage AI embedding API call to re-generate the reference vector. Annotations (readOnlyHint=false, destructiveHint=false) do not contradict. This adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, then organized into clear sections for usage and side effects. Every sentence adds value; no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers the update operation, partial updates, and side effects. It could mention what is returned (e.g., the updated filter), but overall it is sufficiently complete for an update tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds meaning: for threshold, explains 'higher = stricter matching'; for is_active, clarifies 'OMIT to leave unchanged'; for description, mentions the side effect. This enhances understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update an existing AI filter's name, description, threshold, or active state,' specifying the resource (AI filter) and action (update). Siblings include create, delete, list, test—so it is well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

A 'When to use' section lists four concrete scenarios (rename, refine description, adjust threshold, enable/disable). It does not explicitly state when not to use or suggest alternatives like deleting a filter, but the guidance is clear and context-rich.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_add_to_threadAInspect

Apply one or more AI tags to a thread (manually).

When to use:

  • User wants to label a conversation with one or more tags

  • User asks to categorize or tag a thread

Provide the thread_id (integer) and an array of tag_ids to apply. If a tag is already applied it will be updated to is_manual=true.

ParametersJSON Schema
NameRequiredDescriptionDefault
tag_idsYesArray of tag IDs to apply (1–20 IDs)
thread_idYesID of the thread to tag
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false, destructiveHint=false, idempotentHint=false, which are consistent with a mutation tool. The description adds valuable behavioral detail: 'If a tag is already applied it will be updated to is_manual=true.' No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is exceptionally concise, with a single purpose sentence, two bullet-point usage guidelines, and two sentences for parameter and behavior. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple mutation tool with no output schema, the description covers purpose, usage, parameters, and idempotency behavior. It does not elaborate on return values or auth, but these are not critical for correct invocation given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with clear descriptions for both parameters. The description restates the parameter requirements ('Provide the thread_id (integer) and an array of tag_ids to apply') but does not add significant semantic value beyond the schema. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Apply' and the resource 'AI tags to a thread', with the parenthetical '(manually)' distinguishing from automatic tagging. It effectively differentiates from siblings like ai_tags_remove_from_thread and ai_tags_create.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use:' bullets, covering the common scenarios of labeling or categorizing a conversation. It does not mention alternatives or when not to use, but the context is clear given the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_createAInspect

Create a new AI tag (automatic message filter).

AI tags are lightweight classifiers that run on every incoming message. When a message matches the tag's description/criteria, the thread is automatically labelled — so AI agents can cheaply pre-filter threads instead of running full LLM analysis on everything. Good descriptions are the key: they tell the classifier exactly when to apply this tag.

When to use:

  • User wants to auto-classify incoming messages (e.g. bug reports, sales leads, support requests)

  • User wants to reduce AI agent costs by pre-filtering threads by topic or intent

Tips for the description field:

  • Be specific: 'Messages reporting errors, crashes, or unexpected behavior in the product'

  • Include examples of what qualifies and what doesn't

Limit: 20 active personal tags / 50 active team tags.

ParametersJSON Schema
NameRequiredDescriptionDefault
iconNoEmoji icon for the tag (max 10 chars, optional)
nameYesTag name (max 100 chars)
colorNoTailwind color key for the tag badge. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to use the default color.
descriptionNoClassifier prompt: describe exactly when this tag should be applied to a thread. The more specific, the better the auto-classification accuracy. E.g. 'Messages reporting software errors, crashes, or unexpected behavior'. Max 500 chars.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that tags run on every incoming message and provides limits (20 personal/50 team tags). Annotations indicate non-destructive write, so no contradiction. Adds context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with purpose, then explanation, use cases, and tips. Slightly wordy but each paragraph adds value. Efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks mention of what is returned after creation (e.g., tag ID). No output schema, so this is a gap. However, covers inputs well with limits and usage tips.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds valuable tips for the 'description' parameter (classifier prompt, examples) and explains its role. This goes beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new AI tag' and explains its function as an automatic message filter. It distinguishes from sibling ai_tags_* tools (e.g., add/delete) by focusing on creation of the classifier definition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a dedicated 'When to use' section with specific use cases (auto-classify incoming messages, reduce costs). Missing explicit when-not-to-use and alternatives like ai_filters_create, but the guidance is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_deleteA
DestructiveIdempotent
Inspect

Delete a personal AI tag. All thread associations are removed automatically.

When to use:

  • User wants to permanently remove a tag they no longer need

This cannot be undone. Threads are NOT deleted — they just lose this tag.

ParametersJSON Schema
NameRequiredDescriptionDefault
tag_idYesID of the tag to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark destructiveHint true. The description adds that all thread associations are removed automatically and threads are not deleted, providing useful behavioral context beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, with three short sentences and a separate 'When to use' line. It is front-loaded with the primary action and efficiently covers key details without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter, destructive tool with annotations and full schema coverage, the description covers purpose, usage guidance, behavioral effects, and consequences. No missing elements are apparent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already covers 100% of parameters with the description 'ID of the tag to delete'. The tool description adds no additional meaning to parameter semantics beyond that.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Delete' and resource 'personal AI tag'. It distinguishes from sibling tool ai_tags_remove_from_thread by emphasizing that all thread associations are removed automatically and the tag is permanently deleted.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'When to use' section explicitly states the user wants to permanently remove a tag. It also advises that this action is irreversible and threads are not deleted. However, it does not explicitly mention when not to use or list alternative tools for partial operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_listA
Read-onlyIdempotent
Inspect

List all personal AI tags.

AI tags are automatic message filters: the system runs a lightweight classifier on every incoming message and applies matching tags to threads. This lets AI agents skip expensive full analysis on most messages — they only act on threads that match relevant tags, dramatically cutting LLM costs.

When to use:

  • Check which auto-classification filters exist before creating one

  • Get tag IDs for add_to_thread / remove_from_thread

  • See how many threads each tag currently matches

Returns all tags with thread counts (non-archived, included threads only).

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that it returns tags with thread counts for non-archived, included threads only, and explains the underlying lightweight classifier mechanism. This provides useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a title line, explanatory paragraph, bullet points for usage, and final line about return value. It is concise, front-loaded, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters or output schema, the description fully covers the tool's purpose, usage, and return information (tags with thread counts). It is complete and self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has no parameters (schema coverage 100% vacuously). The description does not need to add parameter info. It effectively explains the return value and usage, meeting the baseline for zero-parameter tools.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all personal AI tags' with a specific verb and resource. It distinguishes from sibling tools like ai_tags_create, ai_tags_add_to_thread, etc., by explaining that it lists auto-classification filters.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit 'When to use' bullet points: check before creating, get IDs for add/remove, see thread counts. This provides clear guidance on when to use this tool versus alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_remove_from_threadA
DestructiveIdempotent
Inspect

Remove a specific AI tag from a thread.

When to use:

  • User wants to un-label or remove a specific tag from a conversation

  • User wants to correct an incorrectly applied tag

Provide both thread_id and tag_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
tag_idYesID of the tag to remove
thread_idYesID of the thread to remove the tag from
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true and idempotentHint=true. The description adds no additional behavioral context beyond restating 'remove'. It does not disclose side effects or constraints beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short, front-loaded with the action, and bullet points for use cases. Every sentence is purposeful with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple removal tool with no output schema, the description combined with annotations provides sufficient context. It clearly states what it does and when to use it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and description does not add extra meaning beyond reiterating the need for both parameters. No format or constraint details beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Remove a specific AI tag from a thread' with a clear verb and resource. The sibling tool 'ai_tags_add_to_thread' provides contrast, making the purpose distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'When to use' section provides two common scenarios (un-label, correct tag) and reminds to provide both IDs. It lacks explicit 'when not to use' or alternatives, but for a simple removal, this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_tags_updateAInspect

Update an existing personal AI tag's name, description, icon, color, or active state.

When to use:

  • User wants to rename a tag

  • User wants to change a tag's icon, color, or description

  • User wants to enable or disable a tag

Provide only the fields you want to change. At least one field is required.

ParametersJSON Schema
NameRequiredDescriptionDefault
iconNoNew emoji icon (max 10 chars, optional)
nameNoNew tag name (max 100 chars, optional)
colorNoNew color key. Allowed: amber, blue, green, red, purple, yellow, slate. OMIT to leave the color unchanged.
tag_idYesID of the tag to update
is_activeNoEnable (true) or disable (false) the tag. OMIT to leave the active flag unchanged.
descriptionNoNew LLM hint (max 500 chars; empty string clears it, optional)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=false, etc.) already indicate mutation. Description accurately describes update operation but adds no additional behavioral context like permissions or side effects. With annotations covering safety profile, description offers minimal extra transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four concise sentences: purpose, usage scenarios, instruction to provide changed fields, and requirement. No fluff, front-loaded with main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, parameter requirements, and usage scenarios. Lacks explanation of return value or error conditions, but for a simple update tool with no output schema, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The description adds critical constraint 'At least one field is required' (beyond required tag_id), which is not in schema. Also repeats 'OMIT' hints but schema already includes them, so moderate added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (update) and resource (personal AI tag), listing specific attributes (name, description, icon, color, active state). It distinguishes from sibling tools like ai_tags_create or ai_tags_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes a 'When to use' section with specific scenarios (rename, change icon/color/description, enable/disable). Provides guidance to provide only changed fields and states at least one field is required. Lacks explicit when-not-to-use but is clear from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_attach_identityA
Read-onlyIdempotent
Inspect

Switch the page's identity by loading saved cookies + storage. Use only when switching identity mid-page; for first navigation, pass identity_name to browser.open instead.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
identity_nameYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds context about identity switching via cookies/storage, consistent with annotations, but doesn't detail potential side effects beyond loading state.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no unnecessary words, efficiently conveying the core information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 required string params, no output schema) and supportive annotations, the description is mostly complete. However, it could mention prerequisites like the page being open or identity existence.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description does not elaborate on the meaning or format of page_id or identity_name beyond what the parameter names imply, failing to compensate for the missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Switch the page's identity') and mechanism ('loading saved cookies + storage'), and effectively distinguishes it from the sibling tool browser.open.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('switching identity mid-page') and when not to ('for first navigation, pass identity_name to browser.open instead'), providing a direct alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_clickA
Read-onlyIdempotent
Inspect

Click an element. ref is either an aria-ref token from browser.snapshot ('e7') OR a CSS selector ('button.submit'). Prefer the aria-ref token.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
page_idYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide safety info (readOnlyHint, idempotentHint, destructiveHint). The description adds context about ref formats but no further behavioral details (e.g., click effects, page interaction). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with front-loaded action. No unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple click tool, the description is largely complete. It lacks explanation of return values and page_id, but the action is clear and the ref guidance helps. Adequate for agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description explains 'ref' (aria-ref or CSS selector). It does not explain 'page_id', which is a required parameter. Partial coverage raises it slightly above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Click an element' with a specific verb and resource. It distinguishes from sibling browser tools like browser_fill, browser_type, etc., by specifying the click action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises 'Prefer the aria-ref token,' providing guidance on ref selection. However, it does not explicitly state when not to use or compare with alternatives like browser_hover or browser_drag.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_closeB
Read-onlyIdempotent
Inspect

Close a page opened by browser.open.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description says 'Close' (write action) but annotation readOnlyHint=true contradicts this. No disclosure of side effects or idempotency despite annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no wasted words; perfectly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with one param, but no output schema or return value description. Adequate for a close action but incomplete given the annotation contradiction.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, description should clarify page_id format or usage. It only says 'page opened by browser.open', adding minimal value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'Close' and resource 'page opened by browser.open', clearly distinguishing from siblings like browser_open and browser_tabs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage is implied through pairing with browser.open, but no explicit when/when-not guidance or alternative tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_console_messagesA
Read-onlyIdempotent
Inspect

Return console.log/warn/error events captured since the last drain. Filter by level ('log'|'info'|'warning'|'error'|'debug') and/or pattern (regex). Buffer caps at 500 entries; oldest are dropped first. Set clear=false to peek without draining.

ParametersJSON Schema
NameRequiredDescriptionDefault
clearNo
levelNo
page_idYes
patternNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds significant behavioral context beyond annotations: buffer cap of 500 entries, oldest dropped first, drainage behavior. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficient sentences, front-loaded with main purpose, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main behavior, filtering, buffer limit, and drain/peek behavior. Missing details about return value format and role of page_id, but acceptable for a simple retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Explains level, pattern, and clear parameters well, but fails to mention the required page_id parameter. Schema coverage is 0%, so description partially compensates but misses a key parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool returns console.log/warn/error events, with filtering by level and pattern. It distinguishes from sibling browser tools by focusing on console messages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context for usage (since last drain, buffer caps, clear parameter for peeking) but does not explicitly compare to alternatives or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_dragA
Read-onlyIdempotent
Inspect

Drag one element onto another. source_ref is the element to grab; target_ref is where to drop. Both are CSS selectors. Used for slider captchas, kanban, drag-and-drop uploads.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
source_refYes
target_refYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only and idempotent. The description adds behavioral context by stating it performs a drag operation using CSS selectors and lists use cases. It does not contradict annotations. However, it could have disclosed more about potential page events triggered, but it is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three focused sentences: action definition, parameter explanation, and use cases. Every sentence adds value with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and good annotations, the description covers the action, parameter semantics, and use cases. It does not explain return values or error conditions, which might be assumed, but overall it provides sufficient context for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains the meaning of source_ref (element to grab) and target_ref (drop target) and specifies they are CSS selectors, adding value beyond the schema which has no descriptions. However, the required page_id parameter is not explained, leaving a small gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (drag) and resource (elements on a page), explains the parameters, and gives specific use cases (slider captchas, kanban, drag-and-drop uploads). This distinguishes it from sibling tools like browser_click or browser_hover.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists three use cases which imply when to use the tool, but it does not explicitly state when not to use it or suggest alternative tools for similar actions. Thus, it provides clear context but no exclusions or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_evaluateA
Read-onlyIdempotent
Inspect

Run JavaScript in the page context and return the result. Use for state not in the a11y tree, captcha iframe inspection, DOM events. Expression is either a plain JS value ('document.title') or a zero-arg IIFE ('(() => { … })()'). Inline any runtime values into the expression itself. Result is JSON-serialized; non-serializable values become strings. 256KB cap on output.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
expressionYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint. The description adds beyond that: result serialization details, 256KB output cap, and requirement to inline runtime values. This gives the agent a clear understanding of limitations and behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with the core purpose. Every sentence adds unique value: use cases, expression format, output behavior, and size limit. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers output format and size limit. It explains when to use the tool and expression syntax. Missing is mention of error handling (e.g., syntax errors) but that is a minor gap for a simple eval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning for 'expression' (plain JS value or IIFE, inline values) but doesn't elaborate on 'page_id'. With only two parameters, the description partially compensates but not fully.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs JavaScript in the page context and returns the result. It lists specific use cases (state not in a11y tree, captcha iframe inspection, DOM events) that distinguish it from sibling browser tools like browser_snapshot or browser_fill.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes when to use it ('Use for state not in the a11y tree...') and provides guidance on expression format and inlining values. It could be more explicit about when not to use it, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_file_uploadA
Read-onlyIdempotent
Inspect

Attach files to an . Pass either local_paths (absolute host paths) or data (list of {name, mime, base64} blobs written to /tmp). 25MB cap per file.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
dataNo
page_idYes
local_pathsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds value by disclosing the 25MB cap per file and the /tmp side effect for data blobs. However, it lacks details on error handling or prerequisites (e.g., the page must have a file input). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences with zero wasted words. It is front-loaded with the purpose and immediately gives usage details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description adequately covers the core functionality and constraints. However, it lacks explanation of what happens after the file is attached (e.g., success indication, no output schema), and the required parameters ref and page_id are not explained, leaving gaps for the AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates partially by explaining the two data parameters (local_paths and data) and their formats. However, it does not explain the required parameters ref and page_id, which are crucial for the tool's operation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Attach files to an <input type=file>.' This is a specific verb+resource, and it distinguishes itself from sibling browser tools like browser_click or browser_fill by focusing on file uploads to file input elements.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit guidance on how to invoke the tool: 'Pass either local_paths (absolute host paths) or data (list of {name, mime, base64} blobs written to /tmp).' It also mentions a 25MB cap. However, it does not explicitly state when NOT to use this tool or alternatives, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_fillB
Read-onlyIdempotent
Inspect

Fill an input or textarea with the given value. ref is either an aria-ref token from browser.snapshot ('e7') OR a CSS selector ('input[name=email]'). Prefer the aria-ref token — it's stable and matches exactly what snapshot returned.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
valueYes
page_idYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description contradicts the readOnlyHint annotation; filling an input is a write operation, not read-only. This is a serious inconsistency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no unnecessary information. Clear and to the point.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the tool is simple, the description lacks explanation of return values or post-fill effects. The page_id parameter is not clarified. The annotation contradiction also reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains the 'ref' parameter in detail (aria-ref vs CSS selector), but does not cover 'value' or 'page_id'. Schema description coverage is 0%, so description partially compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fills an input/textarea with a value, and distinguishes between aria-ref tokens and CSS selectors. It is a specific verb+resource description.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on preferring aria-ref tokens over CSS selectors, but does not explicitly state when to use this tool versus alternatives like browser_fill_form or browser_type.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_fill_formA
Read-onlyIdempotent
Inspect

Fill multiple form fields in one call. fields is a list of {ref, value} dicts. ref is a CSS selector; value is a string (text) or boolean (checkbox). Saves N round-trips vs calling browser.fill repeatedly.

ParametersJSON Schema
NameRequiredDescriptionDefault
fieldsYes
page_idYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description claims 'Fill' which is a write/modification operation, but the annotations set 'readOnlyHint' to true, indicating a non-modifying operation. This is a direct contradiction. Additionally, the description does not disclose other behavioral traits like triggering events or requiring page load. The annotation contradiction severely undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that state purpose, explain the key parameter structure, and provide a benefit comparison. No extraneous words, front-loaded with the primary action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers the essential usage. It explains input format and performance benefit. However, it doesn't describe the return value or error behavior, but for a form filling tool this is acceptable. The annotation contradiction is a gap but not directly in the description's completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds meaning for the 'fields' parameter (CSS selector, value types) but does not explain the 'page_id' parameter. The explanation for 'fields' is useful, but the lack of coverage for 'page_id' keeps this at a moderate score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Fill', the resource 'multiple form fields', and the scope 'in one call'. It distinguishes from the sibling tool 'browser_fill' by emphasizing batching, so the agent knows exactly what this tool does.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly contrasts with 'browser.fill repeatedly', implying use when multiple fields need filling. It also explains the structure of the 'fields' parameter. However, it does not mention when not to use it or prerequisites like the form being present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_handle_dialogA
Read-onlyIdempotent
Inspect

Respond to a pending JS dialog (alert/confirm/prompt). Pass accept=true for OK or false for Cancel. For prompt() dialogs also pass prompt_text. Dialogs are queued at page-open time; returns {pending: false} if none is waiting.

ParametersJSON Schema
NameRequiredDescriptionDefault
acceptYes
page_idYes
prompt_textNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds context about queuing and return value but does not significantly expand behavioral disclosure. There is a possible contradiction between 'respond' (mutation) and readOnlyHint=true.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two front-loaded sentences with no wasted words. Efficiently covers purpose, parameter usage, and behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description mentions the return value. It explains the queuing mechanism and required parameters, making the tool self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning beyond the schema: accept=true for OK, false for Cancel; prompt_text for prompt() dialogs. This is crucial given 0% schema description coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the action: responding to a pending JS dialog (alert/confirm/prompt). It distinguishes itself from other browser tools by focusing on dialog handling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explains when to use the tool (when a dialog is pending) and mentions that dialogs are queued at page-open time, with return value indicating no waiting dialog. It does not explicitly say when not to use it or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_hoverA
Read-onlyIdempotent
Inspect

Hover the mouse over an element (reveals tooltips + hover menus). ref is a CSS selector.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, and non-destructive. The description adds behavioral context by stating it reveals tooltips and hover menus, which is useful beyond the annotation data. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words, action stated upfront. Highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple hover tool with two parameters and no output schema, the description covers the core purpose and effect. Missing explanation of 'page_id' is a minor gap, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds meaning to the 'ref' parameter (CSS selector) but does not explain 'page_id'. For a two-parameter tool, this is partial compensation. A higher score would require explanation of both parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's action ('hover the mouse over an element') and its effect ('reveals tooltips + hover menus'), using specific verbs and resource. It distinguishes from sibling browser tools like browser_click or browser_drag.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage is implied (for hovering to reveal tooltips/menus) but no explicit when-to-use or when-not-to-use guidance is provided. Among many sibling browser tools, there is no direction on when to prefer hover over click, wait, etc.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_navigate_backB
Read-onlyIdempotent
Inspect

Navigate back in the page's history (browser back button). Returns the new URL + title.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that the tool returns new URL and title, but does not mention side effects, authorization needs, or edge cases (e.g., no history). There is a contradiction: readOnlyHint=true suggests no state change, but 'navigate back' implies changing the current page, which may be considered a state modification. This inconsistency lowers transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—one sentence that states purpose and return value. It is front-loaded and efficient. However, it could add a bit more detail (e.g., what happens when history is empty) without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple navigation tool, the description provides basic purpose and return info. However, it lacks explanation of the page_id parameter and does not cover edge cases like empty history or behavior when already at the first page. Given the availability of annotations for safety, it is barely adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'page_id' has no description in the schema (0% coverage). The description does not explain what page_id is, how to obtain it, or its role. With no additional meaning beyond the schema, the parameter semantics are severely lacking.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Navigate back in the page's history (browser back button).' It uses a specific verb ('Navigate') and resource ('page's history'), and distinguishes itself from sibling browser tools like browser_open or browser_click by explicitly mentioning navigation backward. Also specifies what it returns: new URL and title.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage (when you want to go back in history) but provides no explicit guidance on when to use vs alternatives, such as other navigation tools like browser_open or browser_click. No exclusions or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_network_requestsA
Read-onlyIdempotent
Inspect

List HTTP requests the page made since open or last drain. Optional filters: method (GET/POST/...), url_pattern (regex), status_min (e.g. 400 for errors). Captures up to 200 most recent requests per page.

ParametersJSON Schema
NameRequiredDescriptionDefault
clearNo
methodNo
page_idYes
status_minNo
url_patternNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and idempotentHint=true, so the description's contribution is the behavioral detail: it captures up to 200 most recent requests per page and resets on 'drain'. However, the 'clear' parameter is not explained, which could be important for controlling the drain behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, consisting of two short sentences. The first sentence clearly states the action, and the second lists key filters and limits. Every phrase serves a purpose, and no redundant information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool's moderate complexity (5 parameters, no output schema), the description omits critical details. It does not explain the output format of the requests, nor does it describe the clear parameter. A user would need to infer the behavior from the parameter names alone, which is insufficient for full understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description does little justice to the parameters. It mentions method, url_pattern, and status_min as optional filters, but does not explain the required page_id parameter or the clear parameter. The description adds minimal meaning beyond the parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses the specific verb 'List' and resource 'HTTP requests', making it clear this tool retrieves network activity. It distinguishes itself from sibling browser interaction tools (e.g., browser_click, browser_snapshot) by focusing on network data. The scope 'since open or last drain' adds precision.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide explicit guidance on when to use this tool versus alternatives. There are no sibling tools that also list network requests, but the lack of usage context (e.g., 'Use this when debugging page load issues') reduces clarity. No conditions or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_openA
Read-onlyIdempotent
Inspect

Open a URL in a remote browser. Saved login cookies are auto-attached when the URL domain matches a claimed browser identity. Pass identity_name to override auto-matching or force a specific identity.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYes
identity_nameNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, destructiveHint false. Description adds valuable context about cookie auto-attachment and identity override, enhancing transparency beyond structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences. First sentence states purpose, second adds behavioral detail. No unnecessary words, front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with no output schema, description covers key behavior. Could mention what happens on invalid URL or missing identity, but acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so description must compensate. It explains identity_name well (override auto-matching) but does not describe url format or constraints. Partial compensation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Open a URL in a remote browser' with a specific verb and resource. It is distinct from sibling browser tools like click, fill, etc., which are for other actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context about auto-attaching cookies and overriding identity. Implicitly tells when to use (open URL) but does not explicitly exclude alternatives or mention when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_press_keyB
Read-onlyIdempotent
Inspect

Press a keyboard key (e.g., 'Enter', 'Tab', 'Escape', 'ArrowDown') or a single character. Optional ref focuses an element first — aria-ref token from browser.snapshot ('e7') or a CSS selector.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYes
refNo
page_idYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description says 'Press a keyboard key', which is a write action, but annotations declare readOnlyHint=true, a clear contradiction. The description does not disclose side effects (e.g., triggering events, navigation) and does not resolve the annotation inconsistency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the action, examples in parentheses, no redundant text. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description omits return value or behavioral effects (e.g., whether the press triggers navigation, submits forms). Given the simple action and available annotations, the description should clarify the actual impact, especially to resolve the readOnlyHint contradiction.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage; description explains 'key' with examples and 'ref' (aria-ref or CSS selector) but does not explain 'page_id', which is required. This partially compensates for missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool presses a keyboard key, with specific examples (Enter, Tab, Escape, ArrowDown) and indicates it can press single characters. This distinguishes it from sibling tools like browser_type (for typing strings) and browser_click (for clicking).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies use for key presses but does not explicitly guide when to use versus alternatives like browser_type or browser_click. It provides context (optional ref for focusing) but no exclusions or comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_resizeA
Read-onlyIdempotent
Inspect

Resize the page viewport. Useful when a site serves different HTML based on viewport width (mobile vs desktop) or when an anti-bot scores risk by viewport dimensions.

ParametersJSON Schema
NameRequiredDescriptionDefault
widthYes
heightYes
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and non-destructive behavior. Description adds valuable behavioral context about viewport changes affecting HTML rendering and anti-bot detection, which goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences with no redundancy. The key purpose and use cases are front-loaded, making it efficient for an AI agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple resize tool with no output schema and basic annotations, the description adequately covers purpose and context but misses parameter semantics. This is sufficient for a minimal viable description but has clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% coverage, but description does not explain any parameter details (e.g., units for width/height, that page_id refers to an open tab). It only implies that width and height are dimensions, leaving considerable ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Resize the page viewport' with specific verb and resource. It distinguishes from sibling browser tools by focusing solely on viewport sizing and providing concrete use cases like responsive design testing and anti-bot evasion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description gives explicit when-to-use scenarios (site serving different HTML, anti-bot scoring). While it doesn't explicitly state when not to use or list alternatives, the context is clear and no direct sibling competition exists for this specific action.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_select_optionA
Read-onlyIdempotent
Inspect

Pick option(s) in a native dropdown. Pass value (matches the option's value attr) OR label (matches its visible text). Lists allowed for multi-select.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
labelNo
valueNo
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and destructiveHint=false. The description adds that it works on native <select> elements, which is a key behavioral constraint not covered by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the purpose, and every sentence adds value. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema and 0% schema coverage, the description covers some parameter usage but omits ref and page_id explanations and does not mention error conditions or the outcome of selection. Adequate but with gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds meaning for value and label parameters, explaining their use. However, required parameters ref and page_id are not explained, leaving gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Pick option(s) in a native <select> dropdown,' which is a specific verb and resource. It distinguishes this tool from sibling tools like browser_click or browser_fill by focusing on select dropdowns.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains how to use parameters (value or label) and mentions lists for multi-select. However, it does not explicitly state when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_snapshotA
Read-onlyIdempotent
Inspect

Return a YAML aria_snapshot of the page DOM. Each interactive node is tagged with [ref=eN] (e.g. [ref=e7]). Pass that exact token as the ref arg to browser.click / browser.fill / browser.type / browser.press_key. Do NOT pass the role name ('combobox', 'button') as ref — only the eN token. Truncated at 32KB.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral context such as truncation at 32KB and the format of ref tokens, which is helpful beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with no unnecessary words. It front-loads the purpose and then provides critical usage details in a well-structured manner.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the return format, ref token usage, and truncation limit. However, it lacks an explanation of the page_id parameter, which is a minor gap given that other sibling tools also use page_id and context may fill it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has one required parameter (page_id) with 0% description coverage. The description does not explain what page_id refers to or how to obtain it, leaving a gap in understanding. Since the parameter is not described, the agent may misuse the tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a YAML aria_snapshot of the page DOM, tagging interactive nodes with `[ref=eN]` tokens. It distinguishes itself from sibling tools like browser_click and browser_fill by explaining how the ref tokens are used.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: pass the eN token as the `ref` argument to other browser tools, and do NOT pass the role name. This helps the agent select and use the tool correctly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_tabsB
Read-onlyIdempotent
Inspect

Manage tabs within the same BrowserContext as page_id. action ∈ {list, switch, close, new}. For list, returns all open tab metadata; for new, returns the new tab's page_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlNo
actionYes
tab_idNo
page_idYes
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description lists actions including close and new, which are write operations that modify tab state, yet annotations declare readOnlyHint=true, creating a direct contradiction. No additional behavioral details such as side effects, permissions, or error conditions are disclosed beyond the annotation mismatch.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences), front-loads the core purpose, and lists actions. However, it sacrifices completeness for brevity, especially regarding parameter roles. Still, it is efficient and easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema and enums, the description should provide more detail on return values for all actions and parameter behavior. It covers list and new returns but omits switch and close behavior. The overall description is adequate for basic use but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate but only partially. It explains the action parameter's domain and return for list and new, but does not clarify the purpose of url (likely for new action) or tab_id (likely for switch/close). The optional parameters remain semantically under-defined.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that this tool manages tabs within the same BrowserContext as a given page_id, enumerates four specific actions (list, switch, close, new), and distinguishes itself from sibling browser tools like browser_close that close entire contexts. The verb 'manage' and resource 'tabs' with specific actions are precise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage within a specific BrowserContext and for tab management, but it does not explicitly state when to use this tool over alternatives like browser_close or browser_click. No when-not-to-use or exclusions are provided, leaving the agent to infer from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_take_screenshotA
Read-onlyIdempotent
Inspect

Capture a PNG screenshot of the page or a specific element. Returns base64-encoded image bytes AND a file_id (persisted in DialogBrain files storage). Pass file_id straight to messages.send(attachment_file_ids=[file_id]) — do NOT call files.upload again. Use sparingly — favor browser.snapshot for structured DOM understanding.

ParametersJSON Schema
NameRequiredDescriptionDefault
refNo
page_idYes
full_pageNo
inline_bytesNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral context: it returns both base64 and a persisted file_id, warns against re-uploading, and implies performance cost with 'Use sparingly.' No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with purpose, and packs essential information: action, output, usage tip, and alternative. Every sentence earns its place without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the annotations (readOnly, idempotent) and no output schema, the description covers purpose, output, usage guidelines, and a performance hint. However, it omits details on 'full_page' and 'inline_bytes' parameters, making it slightly incomplete but still adequate for most agents.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% with 4 parameters. The description mentions capturing the page or a specific element, which hints at the 'ref' parameter for element selection, but does not explain 'full_page' or 'inline_bytes'. It adds some semantic value but not full compensation for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Capture a PNG screenshot of the page or a specific element.' It specifies the output (base64-encoded image bytes and a file_id) and distinguishes from sibling 'browser.snapshot' by advising to favor the latter for structured DOM understanding.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidelines: 'Use sparingly — favor browser.snapshot for structured DOM understanding.' It also instructs how to use the result: 'Pass file_id straight to messages.send(attachment_file_ids=[file_id]) — do NOT call files.upload again.' This clarifies when to use and how to handle the output.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_typeA
Read-onlyIdempotent
Inspect

Type text into an element with per-keystroke delay (organic). Each character dispatches keydown/keypress/keyup, unlike browser.fill which replaces .value instantly. Use when the page listens to keystroke events or for typing-speed fingerprint checks. ref is an aria-ref token from browser.snapshot ('e7') or a CSS selector. delay_ms defaults to 50.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYes
textYes
page_idYes
delay_msNo
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description describes a write operation (typing text into an element), but the annotation readOnlyHint is true, which contradicts the described behavior. According to the rule, score 1 if description contradicts annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core behavior and event details, then usage guide. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema and only 4 parameters, the description provides all necessary context: event dispatch, comparison to fill, when to use, ref format, delay default. It is sufficient for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description explains ref as an aria-ref token or CSS selector and delay_ms defaulting to 50. However, it does not clarify page_id or text beyond the initial sentence, leaving some meaning to be inferred.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool types text into an element with per-keystroke delays, and distinguishes itself from the sibling tool browser.fill by explaining the difference in event dispatching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use: 'Use when the page listens to keystroke events or for typing-speed fingerprint checks', providing clear context and contrasting with the alternative browser.fill.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browser_wait_forA
Read-onlyIdempotent
Inspect

Wait for a selector to appear OR a navigation URL to match a glob pattern. Provide ref (selector) OR url_pattern (glob).

ParametersJSON Schema
NameRequiredDescriptionDefault
refNo
page_idYes
timeout_msNo
url_patternNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, non-destructive behavior. The description adds that it waits for conditions (selector or URL), which is consistent with annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the action and conditions, with no redundant words. It efficiently conveys the core behavior and parameter choice.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations and the simplicity of the tool, the description covers the main intent but lacks details on the required 'page_id' parameter and the 'timeout_ms' parameter. No output schema exists, so return behavior is not explained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, and the description only explains 'ref' and 'url_pattern' (the OR condition) but omits 'page_id' (required) and 'timeout_ms'. These missing parameters reduce the utility for an agent that needs to know how to configure the tool fully.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the verb 'Wait for' and specifies two conditions: selector appearance or URL pattern match. It clearly distinguishes this from other browser tools which are actions (click, navigate, etc.), making the purpose highly specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description instructs the user to provide 'ref' OR 'url_pattern', clarifying the parameter choice. However, it does not explicitly state when to use this tool over alternatives (e.g., browser_snapshot), though the context implies it's for waiting before proceeding.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_check_availabilityA
Read-onlyIdempotent
Inspect

Check when you have free time in Google Calendar. Shows busy periods and free slots in a given time range. Useful for finding meeting times or checking schedule conflicts.

ParametersJSON Schema
NameRequiredDescriptionDefault
end_timeNoEnd date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to end of start_time day, or 7 days from now.
start_timeNoStart date/time to check availability (YYYY-MM-DD or ISO 8601). Defaults to start of today.
calendar_idNoCalendar ID to check. Defaults to primary calendar.primary
working_hours_onlyNoIf true, only show free slots during working hours (9 AM - 6 PM). OMIT to show all free time (the default).
min_duration_minutesNoMinimum duration in minutes for free slots. Filters out short gaps. Default: 30 minutes.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the bar is lowered. The description adds that it shows both busy and free slots. This is valuable context beyond annotations, though it does not specify output format or edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the primary action, and every sentence adds value. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 params, no output schema) and rich annotations, the description is fairly complete. It explains the purpose, usage, and key behavior. However, it does not describe the return structure, which would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so all parameters are documented in the schema. The tool's description does not add new parameter meaning beyond indicating the time range context, which is already clear from schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks availability in Google Calendar, showing busy periods and free slots. It distinguishes from sibling tools like calendar_list_events (lists events) and calendar_create_event (creates events) by focusing on availability rather than event management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it is useful for finding meeting times or checking schedule conflicts, providing clear context. However, it does not explicitly mention when not to use it or direct to alternatives like calendar_list_events for detailed event viewing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_create_eventAInspect

Create a new event in Google Calendar. Specify the title, start time, end time, and optionally invite attendees. Use ISO 8601 format for dates (e.g., 2024-12-15T14:00:00).

ParametersJSON Schema
NameRequiredDescriptionDefault
endNoEvent end time in ISO 8601 format. If not provided, defaults to 1 hour after start. Also accepts 'end_time' as alias.
startNoEvent start time in ISO 8601 format (e.g., 2024-12-15T14:00:00). Also accepts 'start_time' as alias.
titleNoAlias for summary - event title.
summaryNoEvent title/summary. Required. Also accepts 'title' as alias.
end_timeNoAlias for end - event end time.
locationNoEvent location (physical address or virtual meeting link).
timezoneNoTimezone for the event (e.g., 'America/New_York', 'UTC').
attendeesNoList of attendee email addresses to invite.
start_timeNoAlias for start - event start time in ISO 8601 format.
calendar_idNoCalendar ID to create event in. Defaults to primary calendar.primary
descriptionNoEvent description/notes.
add_google_meetNoIf true, automatically creates a Google Meet link for the event. OMIT to skip Meet link.
conference_dataNoConference data for Google Meet. Alternative to add_google_meet flag.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the tool creates events (a write operation) and mentions ISO 8601 format and optional attendees. However, it does not cover other behavioral aspects like default calendar (though schema shows calendar_id defaults to 'primary'), authentication needs, or what happens with overlapping times. Annotations only indicate readOnlyHint=false, so description carries the burden but is incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the action, and contains no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 13 parameters (many optional) and no output schema, the description is minimal. It does not explain return values, constraints (e.g., time ranges, overlapping events), or detailed behavior for advanced features like conference data. The description only covers basic usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Since the schema covers 100% of parameter descriptions, the baseline is 3. The description adds value by highlighting key parameters (title, start, end, attendees) and providing ISO 8601 format guidance, but does not mention several other parameters like location, timezone, or add_google_meet.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that it creates a new event in Google Calendar, which is a specific verb+resource. It distinguishes itself from sibling tools (calendar_check_availability, calendar_delete_event, calendar_list_events, calendar_update_event) by focusing solely on creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates when to use the tool (to create an event) but does not explicitly mention when not to use it or provide alternatives. However, given the distinct sibling actions, the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_delete_eventA
DestructiveIdempotent
Inspect

Delete an event from Google Calendar. This action cannot be undone. Use with caution.

ParametersJSON Schema
NameRequiredDescriptionDefault
event_idYesID of the event to delete. Required.
calendar_idNoCalendar ID containing the event. Defaults to primary.primary
send_notificationsNoWhether to send cancellation notifications to attendees.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark destructiveHint=true, but the description adds context about irreversibility ('This action cannot be undone'), which reinforces the destructive nature and advises caution. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the action, followed by a caution. Every word is necessary and no fluff, making it highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description adequately covers the core behavior and irreversibility for a simple deletion tool with strong annotations. However, it could briefly mention the effect on attendees (e.g., cancellations sent) to enhance completeness, though schema covers send_notifications.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; all three parameters (event_id, calendar_id, send_notifications) are documented in the input schema. The description does not add additional parameter-level information, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete an event from Google Calendar' with a specific verb and resource. It distinguishes from sibling tools like calendar_create_event or calendar_update_event by focusing on deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use with caution' but provides no explicit guidance on when to use this tool versus alternatives (e.g., updating events or canceling with notifications). It implies deletion context but lacks alternative comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_list_eventsA
Read-onlyIdempotent
Inspect

List events from Google Calendar. Shows upcoming events by default. Can filter by date range and search query.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryNoFree text search query to filter events.
date_toNoEnd date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to 7 days from now. Alias: time_max.
date_fromNoStart date/time to query (YYYY-MM-DD or ISO 8601 format). Defaults to now. Alias: time_min.
calendar_idNoCalendar ID to list events from. Defaults to primary calendar.primary
max_resultsNoMaximum number of events to return.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the safety profile is clear. The description adds default behavior and filter options, which is useful but not critical.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the verb and resource, no unnecessary words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers default behavior, filtering scope (date range, search query), and default calendar. No output schema, but return values are implied.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description adds no meaning beyond what the parameter descriptions already provide.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list' and the resource 'events from Google Calendar', distinguishing it from sibling tools that create, update, or delete events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing events but does not explicitly contrast with alternatives like calendar_check_availability or calendar_create_event.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calendar_update_eventAInspect

Update an existing event in Google Calendar. Can modify title, time, location, description, and attendees. Only specified fields will be updated.

ParametersJSON Schema
NameRequiredDescriptionDefault
endNoNew end time in ISO 8601 format. Optional.
startNoNew start time in ISO 8601 format. Optional.
summaryNoNew event title/summary. Optional.
event_idYesID of the event to update. Required.
locationNoNew event location. Optional.
attendeesNoNew list of attendee emails. Replaces existing attendees.
calendar_idNoCalendar ID containing the event. Defaults to primary.primary
descriptionNoNew event description. Optional.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false and destructiveHint=false, so the description confirms it's a mutation tool. It adds the partial update behavior ('Only specified fields will be updated'), but does not disclose potential side effects, permissions, or response behavior. Annotations carry the burden, so description adds marginal value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two sentences, front-loaded with the action, then listing key capabilities. Every sentence adds value with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 8 parameters, the description covers the core purpose and partial update behavior. However, it does not explain return values (no output schema), error scenarios, or required permissions. Given the complexity and lack of output schema, the description is adequate but leaves gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 8 parameters. The description adds semantic value by summarizing the modifiable fields and clarifying partial update semantics, which is not fully captured in the schema. This helps the agent understand that only provided fields are changed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Update an existing event'), the resource ('in Google Calendar'), and the specific fields that can be modified (title, time, location, description, attendees). This distinguishes it from sibling tools like calendar_create_event, calendar_delete_event, and calendar_list_events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly specify when to use this tool versus alternatives. It implies partial update behavior ('Only specified fields will be updated') but lacks context such as prerequisites, when not to use, or comparison with create or delete. Usage is inferred but not directly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_get_transcriptA
Read-onlyIdempotent
Inspect

Get the structured transcript and final state of a voice call by call_id. Returns per-turn rows in chronological order, call status (active/completed/failed/abandoned), duration, and an outcome field telling whether the recipient picked up (answered/no_answer/busy/declined/failed/unknown). answered_at is non-null once the recipient picked up. Returns active turns if the call is still in progress.

ParametersJSON Schema
NameRequiredDescriptionDefault
call_idYesCall ID returned by calls.make in _meta.call_id.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations confirm readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds significant value by detailing return fields (chronological transcript rows, call status, duration, outcome, answered_at) and explaining behavior for in-progress calls.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—three sentences that front-load the main purpose and sequentially explain key features. No unnecessary or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool without an output schema, the description covers the essential return fields and edge cases (active calls). Could be more complete with notes on potential large transcript handling, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter `call_id` has a schema description specifying where to find it. The tool description does not add extra meaning beyond stating 'by call_id', but with 100% schema coverage, the baseline is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool retrieves the structured transcript and final state of a voice call by call_id. This distinguishes it from sibling call tools like `calls_list_active` or `calls_list_history`, which list calls but do not provide transcripts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is used when needing detailed call transcript data for a specific call. It explains behavior for both completed and active calls, but lacks explicit guidance on when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_hangupA
Read-onlyIdempotent
Inspect

Hang up an active voice call by call_id. Use after calls.make when the agent decides to terminate before the callee does, or to abort a stuck call. Idempotent: returns success if the call is already terminal.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoShort internal reason for ending the call (e.g. 'campaign timeout'). Stored on voice_sessions.metadata.
call_idYesCall ID returned by calls.make in _meta.call_id.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds idempotent behavior ('returns success if the call is already terminal'), but there is a contradiction: the readOnlyHint annotation claims the tool is read-only, while hanging up is a mutation. This inconsistency reduces transparency clarity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. Front-loaded with the core action, then usage and idempotent note. Every sentence is valuable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, no output schema, and annotations covering idempotence, the description provides complete context: what it does, when to use, parameter hint, and behavioral trait.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, baseline is 3. The description adds context: 'reason' is described as 'Short internal reason for ending the call (e.g. 'campaign timeout'). Stored on voice_sessions.metadata.' This adds meaning beyond the schema, raising the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Hang up an active voice call by call_id.' It uses a specific verb and resource, and distinguishes from sibling tools like calls_make or calls_wait.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage context: 'Use after calls.make when the agent decides to terminate before the callee does, or to abort a stuck call.' It does not explicitly state when not to use, but the guidance is clear and helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_activeA
Read-onlyIdempotent
Inspect

List active voice calls in this workspace. Use before calls.make on a Telegram account (only one MTProto call per account at a time) to check whether the line is free.

ParametersJSON Schema
NameRequiredDescriptionDefault
channelNoFilter by voice channel. OMIT to include both telegram and twilio.
channel_account_idNoFilter by channel_account.id (the calling Telegram account or Twilio number). Combine with channel for a per-line busy check.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only and safe behavior. The description further explains the operational context (checking line free) and the effect of parameters, adding value beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the primary purpose and immediately follow with usage guidance. No superfluous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the essential context for using the tool. Although there is no output schema, the purpose is clear and the lack of return value description is acceptable for a simple list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds meaning by explaining how to combine channel and channel_account_id for a per-line busy check, which goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list), the resource (active voice calls), and the scope (workspace). It distinguishes from siblings like calls_list_history by specifying 'active'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using this tool before calls.make to check line availability, citing the constraint of one MTProto call per account at a time. This provides clear context for when to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_list_historyA
Read-onlyIdempotent
Inspect

Search historical voice calls in this workspace by participant name, contact_id, thread, channel, source, and/or date range. Returns one row per call (NOT per turn) with call_id, duration_seconds, outcome, direction, started_at, source, channel_label, and parent_thread_id (the originating chat thread for Telegram-group / Twilio-outbound / Meet calls). Pair with calls.get_transcript(call_id) for the full per-turn transcript. Use this instead of messages.read_history for cross-thread call queries — group calls and Meet sessions live on per-call sub-threads, not on the parent chat thread.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum calls to return (default 20, max 100).
sinceNoISO date or datetime lower bound (inclusive). Default: 90 days ago. Naive timestamps are interpreted as UTC.
untilNoISO date or datetime upper bound (inclusive). Default: now.
sourceNoFilter by voice_sessions.source: 'telegram' (1:1 + group), 'twilio' (PSTN), 'meet' (Google Meet bot), 'livechat' (in-app voice). OMIT to include all sources.
channelNoFilter by message-level channel of the call thread: 'telegram' (1:1 voice or group call sub-thread), 'twilio_voice', 'meet_voice', 'livechat_voice'. OMIT to include all voice channels.
thread_idNoRestrict to calls on this thread OR with this thread as their originating parent (Telegram group → call sub-thread back-link, Twilio outbound source_thread_id back-link).
contact_idNoFilter by exact entity_id (from contacts.find). Mutually exclusive with participant_name when both target the same person.
participant_nameNoFilter to calls whose parent thread has a participant matching this name (substring match against entity.title). Resolves group calls via the parent group's roster, not the per-call thread's speaker list.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint. The description adds detailed behavioral context: returns one row per call with specific fields, mentions pagination via limit parameter, and explains pairing with get_transcript. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph, but it is well-structured with a front-loaded purpose statement. It is slightly verbose but every sentence contributes essential information. Could be shortened without loss, but still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description fully compensates by listing returned fields and referencing the sibling tool for transcripts. It covers filtering, usage criteria, and behavioral notes, making it complete for a complex tool with 8 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, baseline is 3. The description adds value by explaining participant_name resolves via parent group roster and thread_id includes back-links, which goes beyond the schema descriptions. This extra semantic context justifies a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Search historical voice calls') and the resource (voice calls in workspace) with specific filtering criteria. It explicitly distinguishes from messages.read_history, making the tool's purpose unambiguous and differentiated from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly instructs when to use this tool: 'Use this instead of messages.read_history for cross-thread call queries.' It also advises pairing with calls.get_transcript for per-turn transcripts, providing clear usage context and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_makeAInspect

Place an outbound AUDIO/VOICE phone call via Twilio (PSTN) or Telegram (MTProto 1:1 call). Use this any time the user asks to 'call', 'ring', 'phone', 'dial', or have a spoken conversation. Do NOT use messages.send when the user asks to call someone — a call is real-time voice, not a text message. You conduct the conversation as the voice agent using the provided greeting and instructions.

ParametersJSON Schema
NameRequiredDescriptionDefault
channelNoVoice transport: 'twilio' (phone via PSTN — requires phone_number in E.164) or 'telegram' (MTProto 1:1 call — requires telegram_user_id, NOT a phone number or thread_id). OMIT to use the current conversation's channel (e.g. a Telegram DM → a Telegram call to that contact).
greetingYesThe first sentence the agent speaks immediately when the call connects. ALWAYS provide a greeting — without it the caller hears silence. Keep it short and natural. Example: 'Hi, this is Diana calling from DialogBrain. Do you have a moment to chat?'
report_backNoWhen to re-invoke you after the call ends. 'on_answer' (default) = only if the call was answered, 'always' = even on missed/failed calls, 'never' = fire and forget. Transcript is always stored regardless of this setting.
instructionsNoWhat to do during the call — objective, questions, tone. The AI generates a natural opening and guides the conversation. Example: 'Call about invoice #1234. Ask if they received it and when payment is expected. Be friendly and professional.'
phone_numberNoDestination phone number in E.164 format (e.g., '+15551234567', '+66812345678'). Required when channel='twilio'.
voice_agent_idNoID of the agent that conducts the call (an `id` from agents.list). If omitted, uses the workspace's default voice-capable agent when one exists. Pass this when the call fails with 'No voice agent configured'.
telegram_user_idNoDestination Telegram user ID (decimal int64 as string, e.g. '123456789'). Required when channel='telegram'. The caller account must have had prior interaction with this user — a cold contact cannot be reached via voice.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=false, so write behavior is expected. Description adds context: call is real-time voice, greeting/instructions are used, transcript always stored. No contradiction; adds useful behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with purpose first, then usage, then parameter details. It is concise for the complexity but some sentences (e.g., 'You conduct the conversation as the voice agent using the provided greeting and instructions') could be merged.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers tool purpose, usage, and parameters well. However, it does not mention the return value (e.g., call ID, connection status). Without output schema, the agent is left unsure what response to expect after placing a call.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% parameter coverage, but description adds valuable context: explains channel omission behavior, specifies greeting requirement, clarifies when to use voice_agent_id, and notes tele_user_id requires prior interaction. Goes beyond schema for multiple parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool places outbound audio voice calls via Twilio or Telegram. It lists specific user intent triggers ('call', 'ring', 'phone', 'dial') and distinguishes from messages_send by emphasizing real-time voice vs. text message.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (user asks for call) and when not to use (do not use messages.send). Provides context about conducting conversation as voice agent. Lacks guidance on prerequisites like channel configuration or when to use sibling call tools like calls_hangup.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_meetA
Read-onlyIdempotent
Inspect

Dispatch a workspace AI agent into an active Google Meet call. The agent joins as a participant — it can hear the conversation, respond via TTS, see the shared screen (when vision is enabled on the agent), and answer questions about what's on screen. Use when the operator wants to delegate live meeting attendance to an agent (notes, Q&A, summarization, real-time support). The Meet URL must be in canonical 3-4-3 form, e.g. https://meet.google.com/abc-defg-hij. Lookup-redirect URLs are not supported — operator must use the share-link form.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of an active agent in this workspace. Get it from agents.list. Any active agent can be dispatched — a voice trigger is NOT required (the runner attaches the agent you name directly).
meet_urlYesCanonical Google Meet URL — must match https://meet.google.com/<3 letters>-<4 letters>-<3 letters>, e.g. https://meet.google.com/abc-defg-hij. lookup/ redirects are NOT supported.
vision_modeNoScreen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call the vision_query tool for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and the executor splices the latest scene-change into each agent turn as ambient low-detail context. OMIT to use 'off' (the default).
instructionsNoWhat the agent should do once it joins — its task brief, e.g. 'greet everyone and present the overview deck' or 'take notes and answer questions about the roadmap'. Woven into the agent's system prompt for the session. OMIT for a generic listening agent.
start_immediatelyNoIf true, the agent starts talking as soon as it joins — it greets everyone and begins the task in `instructions` without waiting for someone to say a wake-word. OMIT (default false) to stay silent until addressed.
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description states the tool dispatches an agent into a live meeting, which is a state-modifying action. However, annotations include readOnlyHint: true, implying no state changes. This is a clear contradiction, so score is 1 per rules.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single dense paragraph but efficiently covers all necessary information. The main verb is front-loaded. Could be more structured (e.g., bullet points) but no wasted sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters, 2 required, and no output schema, the description covers all aspects: what the tool does, parameter details, constraints, and behavioral expectations. Complete and actionable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters are described with context beyond the schema: agent_id mentions source and voice-trigger exemption, meet_url gives format rules, vision_mode explains each enum in detail, instructions and start_immediately describe expected behavior. Adds significant value despite 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Dispatch a workspace AI agent into an active Google Meet call') and specifies the resource and purpose. It distinguishes this tool from siblings like agents_ask or calls_make by focusing on meeting delegation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use scenarios ('delegate live meeting attendance') and critical constraints (URL must be canonical 3-4-3 form, no lookup-redirects). Clearly differentiates from alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_send_to_telegram_callB
Read-onlyIdempotent
Inspect

Dispatch a workspace AI agent into a live Telegram GROUP voice chat OR an encrypted call-link conference. FOUR ways to target the call: (1) chat_id — the group's numeric id, e.g. -1001234567890 (use for a private group with no @username); (2) target=@username or a t.me/ link; (3) omit both target and chat_id to use the CURRENT thread (when it's a Telegram group); (4) target=t.me/call/ encrypted group-call link. start_if_none=true spawns a new voice chat if the group has none active. The agent joins via the workspace's Telegram account — hears the call, replies via TTS, and sees shared screens (when vision is enabled). NOTE: joining a regular in-group voice chat does NOT need a slug link — pass chat_id directly.

ParametersJSON Schema
NameRequiredDescriptionDefault
targetNoTelegram call or group to join. Either: (1) encrypted group-call slug (https://t.me/call/<slug> URL or bare slug token, 12-64 chars), (2) group reference (@username or https://t.me/<group> URL), or (3) omit this (and chat_id) to use the current thread (must be a Telegram group chat). Max ~200 chars.
chat_idNoTelegram group chat id to join the voice chat of (e.g. -1001234567890 for a supergroup, -4766727451 for a basic group). Use this to target a PRIVATE group that has no @username and isn't the current thread. Overrides `target` when both are given. Group mode only (ignored for slug links).
agent_idYesID of an active agent in this workspace. Get it from agents.list. Any active agent can be dispatched — a voice trigger is NOT required (the runner attaches the agent you name directly).
vision_modeNoScreen-share capture mode. 'off' = no vision (default), 'on_demand' = the agent can call vision_query for fine-detail reads, 'continuous_0_3fps' = the bot captures the screen at 1 fps with phash dedupe and splices the latest scene-change into each agent turn. OMIT to use 'off' (the default).
instructionsNoWhat the agent should do once it joins — its task brief, woven into the voice system prompt. e.g. 'present the overview deck' or 'greet everyone and summarize the discussion'. Optional.
start_if_noneNoWhen joining a group voice chat (not a slug-based encrypted call), spawn a new voice chat if none is active. Default false. Ignored when target is a call slug (a slug IS the call).
channel_account_idNoWorkspace Telegram channel account ID that joins as the bot. Optional — when the workspace has exactly one Telegram account, it's used by default. Required when multiple Telegram accounts exist.
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description contradicts the annotations: it describes a write/mutation operation (dispatching an agent), but the annotations set readOnlyHint=true. This is a major inconsistency. Additionally, while the description details some behaviors (agent joining, hearing, TTS), it omits other important aspects like permissions, rate limits, or what happens if the call ends.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with an overview followed by enumerated options. It is front-loaded with the main action. While slightly long, every sentence adds value. Could be more concise but effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, no output schema), the description covers the main behavior and most parameters thoroughly. However, it lacks information about return values (success/failure indication), preconditions (e.g., agent must be active, workspace must have a Telegram account), and potential errors. This leaves some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant meaning by explaining the four targeting methods in detail, clarifying the interplay between target and chat_id, and describing vision_mode and start_if_none behavior more concretely than the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: dispatching an AI agent into a Telegram group voice chat or call link. It specifies the resource and the verb, and the targeting methods provide specificity. However, it does not explicitly differentiate from sibling tools like calls_make, though the Telegram-specific context helps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear guidance on when to use different targeting methods (chat_id for private groups, target for @username or link, omit for current thread, slug for encrypted calls). It also explains start_if_none behavior and notes about not needing slug for in-group chats. However, it does not explicitly state when not to use this tool (e.g., for phone calls) or contrast with related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calls_waitA
Read-onlyIdempotent
Inspect

Block until a voice call ends (status changes from 'active') or timeout elapses. Returns ended=true with final state when the call has ended; ended=false on timeout (re-issue to keep waiting). The returned state includes outcome so callers can branch on pickup vs. no-answer (answered/no_answer/busy/declined/failed/unknown). Default timeout 90s; cap 110s — bounded by nginx proxy_read_timeout 120s on /mcp.

ParametersJSON Schema
NameRequiredDescriptionDefault
call_idYesCall ID returned by calls.make in _meta.call_id.
timeout_secondsNoMax seconds to wait. Default 90, cap 110 (bounded below nginx 120s proxy_read_timeout). On expiry returns ended=False with status='active' so the caller can re-issue to keep waiting.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behaviors: blocking nature, timeout mechanism, return values (ended=true/false, outcome field), and the technical cap (nginx proxy_read_timeout 120s). Annotations (readOnlyHint, idempotentHint) are consistent: the tool reads state repeatedly and is safe. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose, then details behavior and constraints. Every sentence adds value; no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 2 simple parameters and no output schema, the description fully covers return values (ended, outcome) and edge cases (timeout, re-issue). It also explains the nginx bound to preempt technical questions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters (call_id, timeout_seconds) have schema descriptions, but the description adds valuable context: call_id comes from calls.make, timeout default 90s with cap 110s, and expiry behavior. This enhances understanding beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool blocks until a voice call ends or timeout elapses. The verb 'Block' and resource 'voice call' are specific, and it distinguishes from sibling tools like calls_hangup (which terminates a call) and calls_list_active (which lists active calls without blocking).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use (to wait for call completion) and what happens on timeout (re-issue to keep waiting). It does not explicitly mention alternatives like calls_list_active for non-blocking status checks, but the context is sufficient for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

channels_connect_telegram_botAInspect

🤖 Connect a Telegram Bot (Bot API) channel from its bot token.

When to use:

  • After a bot was created via @BotFather and you have its token.

  • The token alone is sufficient — no Telegram user account is needed.

Validates the token, creates the channel account, and registers the webhook so the bot starts receiving messages immediately.

ParametersJSON Schema
NameRequiredDescriptionDefault
bot_tokenYesTelegram bot token from @BotFather (e.g. '123456789:ABCdef...').
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool validates the token, creates a channel account, and registers a webhook for immediate message reception. Annotations already indicate mutation (readOnlyHint=false) and no destruction; description adds process details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences: a clear title line, a 'When to use' section with two bullet points, and a summary of operations. No wasted words; information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one required parameter, no output schema), the description fully covers the purpose, prerequisites, and operational steps. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'bot_token' is described in both schema and description. Description adds value by stating the token is sufficient and implies validation, enhancing the schema's example. Schema coverage is 100%, so baseline is 3; description moves it to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Connect a Telegram Bot (Bot API) channel' and specifies it uses the bot token. This is distinct among sibling tools; no other tool connects Telegram bots.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a 'When to use' section with two clear prerequisites: after bot creation via @BotFather and that token alone is sufficient. While it doesn't explicitly exclude scenarios, it gives adequate context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_add_fileAInspect

Add a file to a knowledge collection.

The file must be uploaded and indexed first (files_upload + files_ingest). If the file was previously removed, it is re-enabled.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to add (from files_upload)
collection_idYesID of the collection
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=false, destructiveHint=false) are consistent. The description adds the nuance that adding a file re-enables it if previously removed. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states the action, second provides prerequisites and a side effect. Every sentence is necessary and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two parameters and no output schema, the description covers the action, prerequisites, and behavioral nuance (re-enabling). Complete for the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description does not add extra meaning beyond what the schema already provides, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Add a file to a knowledge collection,' which is a specific verb+resource. It distinguishes from sibling tools like collections_remove_file and collections_list_files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states prerequisites: the file must be uploaded and indexed first via files_upload and files_ingest. It also notes the re-enabling behavior for previously removed files, guiding when to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_assign_agentAInspect

Assign a knowledge collection to an AI agent.

Once assigned, the agent's knowledge.query will automatically scope RAG search to files in its assigned collections.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the AI agent
collection_idYesID of the collection to assign
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the behavioral effect (scoping RAG search) beyond annotations. Annotations provide minimal safety info (all false), so the description adds value but is not deeply detailed. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two short sentences with no fluff. Every sentence adds value, and key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple assignment operation and lack of output schema, the description adequately covers the purpose and effect. Not mentioning return value is acceptable for this action.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers both parameters with descriptions already. The description adds no parameter-specific meaning beyond the schema, so baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('assign') and resource ('knowledge collection to an AI agent'), and the effect is clearly stated. It distinguishes itself from sibling 'collections_unassign_agent'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the consequence of assignment (scoping RAG search) which implies when to use it. However, it lacks explicit when-not-to-use or alternative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_createAInspect

Create a named knowledge collection.

Collections group files for RAG search. After creating, add files with collections.add_file and assign to agents with collections.assign_agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesCollection name (must be unique per user)
descriptionNoOptional description of the collection
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate it's a non-read-only, non-destructive write operation. Description adds the context of grouping files for RAG search but does not disclose additional behavioral traits beyond creation. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, explanation of collections, and next steps. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description is clear for a simple creation tool. Lacks mention of return value (e.g., collection ID or confirmation). However, given no output schema, the description is otherwise complete for the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and describes both parameters (name with uniqueness constraint, optional description). Description does not add substantial meaning beyond 'named knowledge collection' and 'Optional description', so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'Create' and resource 'named knowledge collection', and explains that collections group files for RAG search. It distinguishes from siblings by referencing specific follow-up tools (collections.add_file, collections.assign_agent).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on when to use (to create a collection for RAG) and next steps (add files, assign agents). However, does not explicitly state when not to use or provide alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_deleteA
DestructiveIdempotent
Inspect

Delete a knowledge collection.

If the collection is assigned to agents, prompts, or channels, pass force=true to delete anyway. CASCADE removes all assignments automatically.

ParametersJSON Schema
NameRequiredDescriptionDefault
forceNoForce delete even if collection is in use. OMIT for the safe default (refuse to delete in-use collections).
collection_idYesID of the collection to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructive behavior. Description adds context by explaining conditional deletion based on assignments and the force parameter. Provides additional behavioral detail beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, straight to the point. Front-loaded with core purpose. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main behavior and special case. No output schema needed. Lacks details on permissions or error handling, but adequate for a simple deletion tool with good annotations and full schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions. Description adds context on when force applies (assigned collections) beyond the schema's 'in use' phrasing, incrementing value slightly. Baseline 3 upgrades to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Delete a knowledge collection.' with a specific verb and resource. Distinguishes from sibling tools like collections_unassign_agent by focusing on deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on when to use force=true (when collection is assigned) and mentions CASCADE for automatic assignment removal. However, the description of CASCADE is slightly ambiguous as it could be misinterpreted as an alternative parameter. Lacks explicit when-not-to-use pointers.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_listA
Read-onlyIdempotent
Inspect

List all knowledge collections in the workspace.

Collections are named groups of files used for RAG search. Auto-created collections (per-agent, per-prompt) are hidden by default.

ParametersJSON Schema
NameRequiredDescriptionDefault
include_inactiveNoInclude inactive collections. OMIT to list only active collections (the default).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, non-destructive, idempotent behavior. The description adds context that auto-created collections are hidden by default, which is behavioral information beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences: first states the action, second provides essential context. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one optional parameter and good annotations, the description is mostly complete. It explains collections and default hiding. Lacks mention of response format, but output schema is absent, so a slight gap remains.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage for the single parameter 'include_inactive', the schema already documents its purpose. The tool description adds no extra parameter information, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all knowledge collections in the workspace' with a specific verb and resource. It also explains what collections are and distinguishes auto-created from manual ones, making it unambiguous compared to siblings like 'collections_create' or 'collections_list_files'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool over alternatives or provide exclusions. The usage is implied by the purpose, but no guidance is given for scenarios like when to use 'collections_list_files' instead.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_list_filesA
Read-onlyIdempotent
Inspect

List all files in a knowledge collection with their indexing status and chunk counts. Each returned file has a file_id (integer) that can be passed to messages.send as attachments=[file_id] to send the file to a contact, or to files.read to read its text content.

ParametersJSON Schema
NameRequiredDescriptionDefault
collection_idYesID of the collection
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only and non-destructive. The description adds behavioral context by specifying that the return includes indexing status, chunk counts, and a file_id that can be used with other tools. However, it does not disclose potential pagination or error handling, keeping it from a 5.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose, and every sentence adds value. No superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity (1 param, no output schema) and presence of annotations, the description provides sufficient context about the tool's behavior and return structure. However, it could be more complete by mentioning any limits or pagination, but it adequately covers the essentials.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the single parameter (collection_id). Per the baseline rule, a score of 3 is appropriate as the description does not add additional meaning to the parameter beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists files in a knowledge collection with indexing status and chunk counts. The verb 'list' and resource 'files in a knowledge collection' are specific. It additionally explains the utility of the returned file_id, which enhances clarity and distinguishes it from sibling tools like collections_add_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when one needs to enumerate files in a collection, but it does not provide explicit guidance on when to use this tool versus alternatives (e.g., search_files). No when-not-to-use or alternative tool is mentioned, meeting only the minimum viable standard.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_remove_fileAInspect

Remove a file from a knowledge collection.

The file itself is not deleted — only the collection membership is removed.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to remove
collection_idYesID of the collection
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral context beyond annotations: clarifies that the file itself is not deleted, only membership. No contradictions with annotations (destructiveHint=false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with main action, no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Sufficient for a simple two-parameter tool with no output schema; explains effect and what it does not do.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters; description adds no additional parameter-level detail beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

States verb + resource ('remove a file from a knowledge collection') and distinguishes from deletion of the file itself. Differentiates from sibling tools like collections_add_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage via the action description, but no explicit guidance on when to use vs alternatives like agents_remove_file or collections_add_file.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

collections_unassign_agentAInspect

Remove a knowledge collection from an AI agent.

The collection and its files are not deleted — only the agent assignment is removed.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idYesID of the AI agent
collection_idYesID of the collection to unassign
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-destructive (destructiveHint=false), and the description adds that only the assignment is removed, not files. This clarifies the exact effect beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states the primary action, second clarifies the scope. No superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple unassign operation, the description covers purpose and behavioral nuance. Required parameters are clear from schema. No output schema needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters are fully described in the schema (100% coverage). The description adds no additional semantics beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Specifies the action 'Remove a knowledge collection from an AI agent' and clarifies it is distinct from deletion by stating the collection and files are not deleted. Differentiates from sibling 'collections_assign_agent'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context that this operation only unassigns and does not delete, helping the agent decide against using 'collections_delete'. However, no explicit when-not-to-use or alternatives beyond the deletion clarification.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_add_channelAInspect

🔗 Link a new channel identity (email, phone, LinkedIn, etc.) to an existing contact.

When to use:

  • User learns a contact's email or phone and wants to save it

  • User wants to link a LinkedIn/Instagram profile to an existing contact

  • Adding a second channel for an existing person

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema
NameRequiredDescriptionDefault
valueYesEmail address, phone number, or username for this channel
channelYesChannel type to add
contact_idYesentity_id from contacts.find
display_nameNoOptional display label for this identity
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate mutation (readOnlyHint=false) and non-destructive behavior (destructiveHint=false). The description adds that it requires a contact_id from contacts.find, but does not cover potential side effects like overwriting existing channels or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences plus a bullet list, front-loaded with the purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with four parameters, the description covers purpose, usage scenarios, and a prerequisite. It lacks information about the return value or handling of duplicate channels, but is largely complete for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All four parameters are described in the schema with 100% coverage. The description provides no additional detail beyond the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Link' and resource 'channel identity to an existing contact', listing specific channel types (email, phone, LinkedIn, etc.). It distinguishes from sibling tools like contacts_update or contacts_find by focusing on adding channels.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'When to use' section provides three concrete scenarios and explicitly mentions the prerequisite (contact_id from contacts.find). However, it does not include when not to use or contrast with alternatives like updating a contact.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_discoverA
Read-onlyIdempotent
Inspect

Search for a contact on a live channel (Telegram, WhatsApp, etc.) before adding them. Use this to look up a person by username or phone number before calling contacts.sync. This is the right tool when asked to add or find a specific person by @username or phone (use contacts.sync afterwards to actually add them) — not group_discovery.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryYesUsername, phone, or name to search for
channelYesChannel name: telegram, whatsapp, etc.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, idempotent, non-destructive. The description adds that it looks up on a live channel and is a preliminary step before adding. This adds useful context beyond what annotations provide, though it doesn't detail response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with key information, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with no output schema, the description covers purpose, workflow, and parameters adequately. It mentions the follow-up action (contacts.sync), which provides good context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters (query, channel). The description reinforces query's purpose but doesn't add new meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches for a contact on a live channel (Telegram, WhatsApp, etc.) before adding them, distinguishing it from contacts.sync and group_discovery. It specifies the verb 'search' and the resource 'contact on a live channel'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (when asked to add or find a specific person by @username or phone) and provides a workflow (use this then contacts.sync). It also clarifies what not to use (not group_discovery).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_findA
Read-onlyIdempotent
Inspect

👤 Search for contacts in your address book by name or username.

When to use:

  • User asks 'find contact X' or 'who is Y?'

  • User wants to know someone's username or ID

  • Before sending a message to verify contact exists

  • To get contact's channel reference for messaging

Examples: ❓ User: 'find contact named [name]' → contacts_search(query='[name]', limit=5)

❓ User: 'who is [full name]?' → contacts_search(query='[full name]', limit=1)

❓ User: 'search for @username' → contacts_search(query='username', limit=10)

Returns: name, username, channel, channel_ref, similarity_score, match_type. Plus:

  • entity_id: local DB key — pass to contacts.profile. Null for live-discovered contacts (skip contacts.profile for those).

  • telegram_user_id (when channel='telegram'): the Telegram user ID — pass to calls.make / messages.send. NOT entity_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results to return
queryYesName or username to search for (supports partial matches)
channelNoFilter by channel. OMIT to search across all channels.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, openWorldHint=true, idempotentHint=true, destructiveHint=false. Description adds crucial details about entity_id behavior (null for live-discovered, pass to contacts.profile) and telegram_user_id for messaging, enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured: summary, when-to-use, examples with user queries and tool calls, return field explanation. Every sentence is purposeful, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description fully compensates by detailing return fields and their special behaviors (entity_id, telegram_user_id). Also references related tools (contacts.profile, calls.make, messages.send) for complete context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds value with example query formats (name, @username) and reinforces channel behavior ('OMIT to search across all channels'). Concrete examples improve usability.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Search for contacts in your address book by name or username' with specific verb and resource, and examples distinguish it from sibling tools like contacts_profile, contacts_discover, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'When to use' list with scenarios like verifying contacts before messaging. Does not explicitly state when not to use, but links to contacts.profile subtly. Overall clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_profileA
Read-onlyIdempotent
Inspect

👤 Get full profile for a contact: all channel identities, notes, role, capabilities, birthday.

When to use:

  • After contacts.find to get complete info about a specific person

  • To see all channels a contact is reachable on

  • To read notes, role, or capabilities for a contact

Requires contact_id (entity_id) from contacts.find.

ParametersJSON Schema
NameRequiredDescriptionDefault
contact_idYesentity_id from contacts.find
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds return content details (channels, notes, etc.) and requirement for contact_id from contacts.find. Adequate but not rich beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Six lines structured with bullet points for usage scenarios. Front-loaded purpose sentence. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, and parameter origin. Lacks explicit output structure (no output schema), but lists return fields. For a simple retrieval tool with one param, this is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% with description 'entity_id from contacts.find'. Description repeats the origin but adds no new syntax or format info. Baseline of 3 is appropriate as schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Get full profile for a contact' with specific resources: channel identities, notes, role, capabilities, birthday. Differentiates from sibling contacts.find by indicating it provides complete info after finding.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: after contacts.find, to see channels, read notes/role/capabilities. Mentions required contact_id from contacts.find. Lacks explicit when-not but positive guidance is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_syncAInspect

Add a discovered contact and open a conversation thread. Returns thread_id for the new conversation. Call contacts.discover first to verify the contact exists.

ParametersJSON Schema
NameRequiredDescriptionDefault
channelYesChannel name: telegram, whatsapp, etc.
identifierYesUsername or phone number to add
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutability and non-idempotency. Description adds that adding a contact opens a conversation and returns thread_id, which is useful beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the main action, no wasted words. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two params and no output schema, the description covers purpose, prerequisite, and return value. Missing only minor details like error handling, but sufficient for most agents.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add additional meaning to the parameters beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (add, open) and the resource (contact, conversation thread), and distinguishes from siblings by mentioning the prerequisite to call contacts.discover first.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to call contacts.discover first, providing clear prerequisite guidance. Does not explicitly list when not to use, but the context is sufficient for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contacts_updateAInspect

✏️ Update a contact's profile: name, notes, role, capabilities, birthday, preferred channel.

When to use:

  • User wants to add notes about a contact

  • User wants to set/update role or capabilities for a contact

  • User wants to rename a contact or update birthday

Requires contact_id (entity_id) from contacts.find. At least one optional field must be provided.

ParametersJSON Schema
NameRequiredDescriptionDefault
roleNoContact role (e.g. developer, client, partner). Empty string clears role.
notesNoFree-text notes/context about this contact. Empty string clears notes.
contact_idYesentity_id from contacts.find
birthday_dayNoBirth day 1-31 (must be set together with birthday_month)
capabilitiesNoList of capabilities (e.g. ['backend', 'design'])
display_nameNoNew display name (max 255 chars)
birthday_yearNoBirth year 1900-2100 (optional, standalone)
birthday_monthNoBirth month 1-12 (must be set together with birthday_day)
preferred_channelNoPreferred channel for contacting this person. OMIT to leave the preferred channel unchanged.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a non-read-only, non-destructive mutation. The description adds the important constraint that at least one optional field must be provided, which is not in the annotations. However, it does not disclose other behavioral details like idempotency or side effects, which is acceptable given the simple nature of the update.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: a single sentence for purpose, three bullet points for usage, and one line for prerequisites. Every sentence is useful and front-loaded. No fluff or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters and rich schema documentation, the description covers the essential usage scenarios and the key constraint of requiring at least one optional field. The only minor gap is no explicit mention of birthday field pairing (day/month), but the schema handles that. Overall, it provides sufficient context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters are described in the schema. The description mentions some fields (e.g., notes, role, capabilities) but does not add new semantics beyond the schema. The 'at least one optional field' rule is added, but it's a usage constraint rather than parameter semantics. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it updates a contact's profile and lists the specific fields (name, notes, role, capabilities, birthday, preferred channel). The verb 'update' and resource 'contact profile' are explicit, and it distinguishes from sibling tools like contacts_find and contacts_add_channel by focusing on profile modifications.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' bullet points (add notes, set/update role/capabilities, rename, update birthday) and mentions the prerequisite of obtaining contact_id from contacts.find. While it does not explicitly state when not to use or list alternatives, the context is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

documents_createAInspect

Render a document (PDF / HTML / PPTX / DOCX) and save it to the workspace.

This tool has two input pipelines — pass exactly one of content_html or content_markdown.

Pipeline A — content_html (canonical for decks, proposals, designed pages)

You author full HTML+CSS. A baked-in design-system preamble ships first (<style> with Inter/Manrope as data-URI fonts, CSS-variable palette tokens, 8px spacing scale, and pre-styled layout helpers); your markup and any of your own <style> blocks land after the preamble so you can override anything. Chromium renders the assembled document into a static PDF — JavaScript is disabled and DNS is blackholed, so external font / image / script fetches will fail by configuration.

Required when this pipeline is used:

  • title — human-readable, used for PDF metadata and the saved filename.

  • content_html — the <body> and any custom <style> blocks. The renderer wraps this in <html>…</html> and injects the preamble + a canonical <meta charset> + <title>. Do NOT emit <script>, <iframe>, <object>, <embed>, <meta>, <link>, <base>, <form>, or event handlers — the sanitizer strips them.

  • output_type"pdf" or "html". ("pptx" and "docx" require content_markdown since they need structured markdown intermediates.)

Optional:

  • page_preset"slide_16_9" (default for any deck), "a4" (default for flowing documents — used if omitted), "letter", or "none" (you declare your own @page rule).

  • design_tokens — flat dict overriding the preamble's CSS variables. Whitelisted keys: brand_primary, accent, surface_dark (hex color), font_display, font_body (font name from ['Inter', 'Manrope', 'monospace', 'sans-serif', 'serif', 'system-ui', 'ui-monospace', 'ui-sans-serif', 'ui-serif']).

  • language — BCP-47 tag (default "en"). Drives <html lang>.

Slide structure (page_preset="slide_16_9")

Each slide is <section class="slide …">…</section>. The base .slide class is what sizes it to the viewport and forces the page break — do not drop it. Composable variants (apply alongside .slide):

  • .slide-cover — gradient hero, big display title.

  • .slide-split — two equal columns, image + narrative.

  • .slide-stats — three-up KPI cards (use <div class="stat"> with .stat-value + .stat-label inside).

  • .slide-quote — centered pull quote + <cite> attribution.

Layout helpers (work in any preset): .grid-2, .grid-3, .split, .stack, .cluster, .callout, .muted, .kbd.

Speaker notes

<aside class="notes">…text…</aside> inside a <section class="slide">. The sanitizer strips them from the rendered PDF and returns them as slide_notes[] (parallel to slide order). Orphan notes outside any slide are dropped with a warning.

Images

Only these src schemes resolve:

  • file:NNN — workspace file_id.

  • data:image/...;base64,... — inline.

  • https://<host> where <host>DOCUMENTS_MEDIA_URL_ALLOWLIST. Other URLs are dropped and replaced with an HTML comment placeholder.

Pipeline B — content_markdown (invoice / contract only)

Required:

  • title, content_markdown, output_type.

Optional:

  • theme"invoice" or "contract". Triggers the corresponding exemplar styling and (for invoices) the arithmetic validator that fail-closes on missing or mismatched totals.

  • language — BCP-47 (default "en").

Delivery contract (CRITICAL)

After this tool returns file_id, deliver the file with messages.send(attachments=[file_id], text="<short caption>"). Embedding the file_id in a markdown link, sandbox: URL, or /api/files/<id>/download text will render as plain text on the recipient's channel — the attachments parameter is the only way the file actually attaches.

Exemplars

INVOICE (English):

Invoice INV-{YYYYMMDD-HHMMSS}

From: {Issuer Legal Name}, {Address}, {Tax ID} To: {Customer Name}, {Customer Address}, {Customer Tax ID} Issue date: {YYYY-MM-DD} Due date: {YYYY-MM-DD}

Description

Qty

Unit price

Total

{Service 1}

1

1500.00

1500.00

{Service 2}

2

500.00

1000.00

Subtotal: USD 2500.00 Tax (20%): USD 500.00 Total: USD 3000.00

Payment: {bank details OR crypto wallet — never both}

INVOICE (Russian):

Счёт-фактура № INV-{YYYYMMDD-HHMMSS}

От: {Юридическое название организации}, {Адрес}, ИНН {Tax ID} Кому: {Название клиента}, {Адрес клиента}, ИНН {Tax ID} Дата: {YYYY-MM-DD} Срок оплаты: {YYYY-MM-DD}

Описание

Кол-во

Цена

Сумма

{Услуга 1}

1

1500.00

1500.00

{Услуга 2}

2

500.00

1000.00

Подытог: USD 2500.00 НДС (20%): USD 500.00 Итого: USD 3000.00

Реквизиты: {банковские реквизиты ИЛИ криптокошелёк — не оба сразу}

CONTRACT (English):

Service Agreement

Between: {Provider Legal Name}, {Address} ("Provider") And: {Client Legal Name}, {Address} ("Client") Effective date: {YYYY-MM-DD}

1. Scope of services

{Concise description of what Provider agrees to deliver.}

2. Term

This Agreement begins on the Effective date and continues until {termination condition or end date}.

3. Compensation

Client pays Provider {amount and currency} according to {payment schedule}.

4. Confidentiality

Both parties agree to keep proprietary information of the other party confidential during and after the term of this Agreement.

5. Termination

Either party may terminate with {N} days' written notice.

6. Governing law

{Jurisdiction}.


Provider: ____________________ Client: ____________________ {Provider signatory name} {Client signatory name}

CONTRACT (Russian):

Договор оказания услуг

Между: {Юридическое название Исполнителя}, {Адрес} ("Исполнитель") И: {Юридическое название Заказчика}, {Адрес} ("Заказчик") Дата вступления в силу: {YYYY-MM-DD}

1. Предмет договора

{Краткое описание услуг, которые Исполнитель обязуется оказать.}

2. Срок действия

Договор вступает в силу с указанной даты и действует до {условие прекращения или дата окончания}.

3. Стоимость и порядок оплаты

Заказчик оплачивает услуги Исполнителя в размере {сумма и валюта} в порядке {график платежей}.

4. Конфиденциальность

Стороны обязуются сохранять конфиденциальность сведений, полученных в ходе исполнения настоящего Договора, в течение срока его действия и после его прекращения.

5. Расторжение

Любая из сторон вправе расторгнуть Договор, направив письменное уведомление не менее чем за {N} дней.

6. Применимое право

{Юрисдикция}.


Исполнитель: ____________________ Заказчик: ____________________ {ФИО подписанта Исполнителя} {ФИО подписанта Заказчика}

ParametersJSON Schema
NameRequiredDescriptionDefault
themeNoInvoice or contract styling for content_markdown. Rejected with content_html (use design_tokens + your own CSS instead). OMIT for default (unthemed) styling.
titleYesShort human-readable title for the document.
languageNoBCP-47 language tag (e.g. 'en', 'ru', 'zh', 'ja'). Drives <html lang> and (markdown path) font fallback for non-Latin scripts.en
output_typeYesRenderer target: 'pdf' | 'pptx' | 'docx' | 'html'. PPTX/DOCX require content_markdown.
page_presetNoPage geometry for content_html. 'slide_16_9' = 1280x720 deck, 'a4'/'letter' = flowing document, 'none' = LLM declares its own @page. Defaults to 'a4' inside the html branch when omitted. Rejected with content_markdown.
content_htmlNoFull HTML body (with optional <style> blocks) for the canonical Chromium pipeline. Mutually exclusive with content_markdown.
design_tokensNoFlat dict of CSS-variable overrides for content_html. Whitelisted keys: brand_primary, accent, surface_dark (hex color), font_display, font_body (Inter|Manrope|system-ui|ui-sans-serif|ui-serif|ui-monospace|sans-serif|serif|monospace). Unknown keys / invalid values are dropped with a warning. Rejected with content_markdown.
content_markdownNoMarkdown body for the invoice/contract pipeline. Mutually exclusive with content_html.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description extensively discloses behavioral details beyond the minimal annotations: Chromium rendering with JavaScript disabled and DNS blackholed, sanitization rules, allowed image sources, speaker notes handling, slide structure, and the critical delivery contract (must use attachments). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with headings, bullet points, and exemplars. It front-loads the core purpose and then provides detailed sections. While very thorough, some redundancy exists (e.g., repeating pipeline requirements), so it is not maximally concise but still well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (8 parameters, two pipelines, no output schema), the description is highly complete. It covers return value (file_id), post-step delivery instructions via attachments, image rules, speaker notes, slide structure, and comprehensive exemplars for invoices and contracts. Absolutely adequate for agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 100% coverage, and the description adds substantial meaning: mutual exclusivity of content_html/content_markdown, defaults (page_preset defaults to a4 when omitted), constraints (theme rejected with content_html), and exemplars for markdown templates. It provides much more context than the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Render a document (PDF / HTML / PPTX / DOCX) and save it to the workspace,' which is a specific verb+resource. It distinguishes from sibling tools by detailing unique input pipelines and output types, making it clear this is for document creation, not file upload or ingestion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use each pipeline (content_html for decks/proposals, content_markdown for invoices/contracts) and output type restrictions (PPTX/DOCX require content_markdown). However, it does not explicitly exclude alternatives like files_upload or other document-related tools, though the specificity of pipelines makes usage clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

feedback_saveAInspect

Save a behavioral rule, preference, or correction that should guide future agent behavior. Use this when the user gives explicit guidance like 'always reply in Russian', 'don't suggest meetings before 11am', or 'invoice link goes via email, not chat'. Structure the rule as: the rule itself, why it matters (if stated), and how to apply it. Scope: 'workspace' for org-wide rules, 'agent' for per-agent overrides, 'person' for per-contact preferences. Prefer feedback.save over notes.save for anything that's instructive rather than informational.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesShort identifier for this rule (e.g. 'reply_language', 'meeting_hours'). Must not start with '__' (reserved).
whyNoWhy this rule matters (optional but recommended for the distiller).
ruleYesThe rule itself, in imperative form. Required.
scopeYesScope of the rule. 'workspace' for org-wide rules; 'agent' for per-agent overrides; 'thread' for conversation-specific guidance; 'person' for per-contact preferences. 'global' accepted as deprecation alias for 'agent'.
how_to_applyNoWhen/how to apply the rule (optional). Helpful for conditional rules like 'apply when speaking to Russian-speaking customers'.
scope_ref_idNoRequired for scope='thread' (thread_id) and scope='person' (person_id).
target_agent_idNoTarget agent. In agent mode optional (defaults to self); required from MCP. Ignored when scope='workspace'.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false, and the description adds context about the structuring and application of rules. No contradictions, though it could mention if updates are idempotent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that is clear and front-loaded with purpose. It is concise but could be slightly more streamlined. No unnecessary content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters and no output schema, the description covers purpose, usage, parameter structure, and examples adequately. It provides enough context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The description adds value by explaining how to structure the rule and scope usage, going beyond the schema's individual field descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it saves behavioral rules, preferences, or corrections to guide future agent behavior. It provides concrete examples ('always reply in Russian') and distinguishes from similar tools like notes.save.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use (user gives explicit guidance) and when not to (prefer feedback.save over notes.save for instructive content). Also explains scope options and preferred alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_get_base64A
Read-onlyIdempotent
Inspect

Download one or more files server-side and return their content as base64-encoded strings. Use this to inspect images, PDFs, or any binary file attached to messages when you cannot access presigned S3 URLs directly. Supports up to 5 files per call, max 15 MB each. For large files batch in groups of 1-2 to avoid oversized responses.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idsYesList of file IDs to fetch as base64 (max 5). Get IDs from files.info or message attachment_file_ids.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only and idempotent. Description adds server-side download, response format (base64), file ID sources, and size limits. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, use case, constraints. Front-loaded, no fluff, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, constraints, and parameter source. Lacks details on response structure but sufficient for a simple tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers parameter fully. Description adds guidance on where to get file IDs (files.info or message attachments), adding value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool downloads files and returns base64 content. Specifies resource (files), action (download and return base64), and use case (inspect when cannot access S3). Distinguishes from sibling file tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (when cannot access presigned S3 URLs) and provides batch size limits and advice. No explicit exclusions, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_infoA
Read-onlyIdempotent
Inspect

Get metadata and download URLs for files by their IDs.

When to use:

  • After messages_read_history returns attachment_file_ids

  • To get a presigned download URL to read a received file

Returns: filename, mime_type, byte_size, download_url (1-hour presigned URL).

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idsYesList of file IDs (max 20)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe read operation; description adds important temporal behavior: download_url is valid for 1 hour. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Short, front-loaded with purpose, structured usage guidelines, and return values. No unnecessary content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description lists return fields (filename, mime_type, byte_size, download_url) and expiry, making it complete for a simple retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema describes file_ids as list of integers (max 20); description adds context that these IDs come from messages, but parameter semantics are already covered well by schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'Get' and resource 'metadata and download URLs for files by their IDs', distinguishing it from siblings like files_read or files_upload.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidelines: after messages_read_history returns attachment_file_ids, or to get a presigned download URL. Provides clear context and preconditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_ingestAInspect

Save and index a file into the knowledge base. Use this when the user asks to save, store, or remember a document. The file will be processed (OCR if needed) and indexed for future search.

ParametersJSON Schema
NameRequiredDescriptionDefault
tagsNoOptional list of tags for categorization (e.g., ['presentation', 'dextrade']).
titleNoHuman-readable title for the file (e.g., 'Project Presentation', 'Q1 Report'). If not provided, uses original filename.
file_idYesID of the file to ingest (from attachment_file_ids in context).
thread_idNoOptional thread ID to associate the file with. If not provided, uses context thread.
descriptionNoOptional description of the file contents.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint: false, so the agent knows it's a write operation. The description adds behavioral details beyond annotations: 'The file will be processed (OCR if needed) and indexed for future search', explaining what happens after ingestion. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two sentences that front-load the purpose and usage context. Every sentence adds value with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the processing (OCR, indexing) but does not mention the return value or success/failure response. With no output schema, the return behavior should be hinted at for completeness. However, it provides enough context for an agent to decide to call it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description does not add significant meaning beyond the schema for individual parameters; it only provides general behavioral context. The schema already describes each parameter adequately, so no extra value needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Save and index a file into the knowledge base'. It specifies the verb (save and index) and the resource (file into knowledge base), and distinguishes from siblings like files_upload by emphasizing indexing for future search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use: 'when the user asks to save, store, or remember a document'. This provides clear context for invocation. However, it does not mention when NOT to use the tool or suggest alternatives, missing an opportunity to differentiate from sibling tools like agents_add_file.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_readA
Read-onlyIdempotent
Inspect

Read text content of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use files.get_base64, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run files.ingest first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.

ParametersJSON Schema
NameRequiredDescriptionDefault
file_idYesID of the file to read (from attachment_file_ids in context).
encodingNoText encoding to use (default: utf-8).utf-8
max_charsNoMaximum characters to return (default: 10000). Use smaller values for large files.
summarizeNoIf true, generate AI summary instead of returning raw content. Use for 'summary', 'summarize', 'краткое содержание' requests. OMIT to return raw content (the default).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, but description adds critical behavioral context: that calling on binary mime-types returns an error, that PDFs require prior ingest, and that non-PDF docs need files_ingest first. This goes beyond annotations and provides actionable guidance.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3-4 sentences) and front-loaded with the main action and acceptable file types. Uses bold for emphasis. A slight redundancy in 'DO NOT call on binary files' could be trimmed, but overall it's well-structured and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate complexity (4 params), the description covers all necessary context: what tool does, when to use vs alternatives, error cases, prerequisites (ingest). It does not explain return format, but that is acceptable since reading file content typically returns text. Complete for usage decisions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for all 4 parameters. The description adds value by explaining permissible file types (relevant to file_id parameter) and error behavior, but does not detail each parameter individually. Given high schema coverage, baseline is 3; the additional context pushes it to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Explicitly states it reads text content of attached files, lists supported types (.txt, .md, .json, code, PDFs after ingest), and clearly distinguishes from sibling tools (files_get_base64 for images, files_ingest for non-PDF docs). The verb 'Read' with specific resources and clear boundaries earns top score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use and when-not-to-use guidance: works for text/PDF after ingest, do not call on binary files, for images use files_get_base64, for audio/video it cannot transcribe, for non-PDF documents run files_ingest first. Also mentions calling on binary returns error to save a turn. No gaps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_uploadAInspect

Upload a file to DialogBrain and get a file_id for use in messages_send.

When to use:

  • User wants to send a file/image to a contact

  • Before calling messages_send with an attachment

Returns: file_id (integer) to pass to messages_send attachments parameter.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNoOptional display title
contentNoBase64-encoded file bytes. Either content OR source_url is required.
filenameNoFilename with extension (e.g. 'photo.png')upload
mime_typeNoMIME type (e.g. 'image/png', 'application/pdf')application/octet-stream
source_urlNoPublic URL to fetch file from. Either content OR source_url is required.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (readOnlyHint false, destructiveHint false), so the description must carry the behavioral burden. It discloses the mutation behavior ('upload') and the return value (file_id integer). However, it lacks details on permissions, file size limits, or storage implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: three short sections (purpose, usage, return) with no fluff. Every sentence serves a purpose, and the key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is straightforward, and the description covers the essential aspects: what it does, when to use, and what it returns. The output format is explained (file_id integer) even though no output schema is provided. Minor missing details like file persistence or limits are acceptable for this tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% field description coverage, so the schema already explains each parameter. The overall description adds little beyond noting the mutual exclusivity of 'content' and 'source_url', which is already in the schema. Thus, baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the primary purpose: 'Upload a file to DialogBrain and get a file_id for use in messages_send.' It specifies the verb 'upload' and the resource 'DialogBrain', and distinguishes from other file-related sibling tools by tying its usage directly to sending messages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance is provided: 'When to use: - User wants to send a file/image to a contact - Before calling messages_send with an attachment.' This clearly indicates when to employ the tool. However, it does not explicitly mention when not to use it or alternative tools for file operations not related to messaging.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_createAInspect

📁 Create a new inbox folder to organize threads.

When to use:

  • User wants to create a folder to group related conversations

  • User wants to organize threads by topic, project, or contact type

After creating a folder, use threads.update with folder_id to move threads into it.

ParametersJSON Schema
NameRequiredDescriptionDefault
iconNoEmoji icon for the folder (max 10 chars, optional)
nameYesFolder name (max 100 chars)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate it's a write operation (readOnlyHint=false) and non-destructive. The description adds minimal extra behavioral context beyond the creation action and a tip about folder assignment via threads.update. No mention of side effects, auth requirements, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, well-structured with a title line, 'When to use' section, and a follow-up tip. No wasted words, effectively front-loads purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple create tool with 2 parameters and no output schema, the description covers purpose, usage context, and next steps. It lacks mention of response format or potential errors, but these are not critical for a straightforward creation operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description does not add any additional meaning beyond what is already in the schema. Baseline of 3 is appropriate as the description contributes no extra parameter insight.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (create), resource (inbox folder), and context (organize threads). It distinguishes from sibling tools like folders_delete by specifying the creation action and the intended use for organizing conversations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'When to use' section with two bullet points provides clear context. It also advises using threads.update to move threads into the folder after creation, effectively excluding this tool for thread movement and offering an alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

folders_deleteAInspect

🗑️ Delete an inbox folder. Threads inside become unfiled (not deleted).

When to use:

  • User wants to remove a folder they no longer need

  • User wants to clean up their inbox organization

Threads inside the folder are NOT deleted — they simply move back to the inbox.

ParametersJSON Schema
NameRequiredDescriptionDefault
folder_idYesID of the folder to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate write operation (readOnlyHint=false) and not destructive (destructiveHint=false). The description adds value beyond annotations by explaining the exact effect on threads (they become unfiled, not deleted), confirming non-destructive behavior for threads. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: a single-line action, two usage bullets, and one clarifying note. Every sentence adds value with no unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one required parameter and no output schema, the description covers the essential aspects: action, effect on threads, and use cases. It does not mention return values or error conditions, but these are standard and often not needed in descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter (folder_id) is fully described in the input schema (100% coverage). The description does not add additional parameter information, but the schema already provides sufficient meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete an inbox folder') and the resource ('folder'), and distinguishes from other folder tools by explaining that threads move back to the inbox, not deleted. This is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit 'When to use' scenarios (remove a folder, clean up inbox organization). Although it doesn't name alternative tools, it clarifies that threads are not deleted, which helps agents decide when to use this vs. other deletion tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_addAInspect

Add a specific group to your discovery list by @username or invite link (t.me/...).

Groups and channels only — this does NOT add an individual person/contact. To add a person by @username (e.g. a customer or lead), use contacts.discover then contacts.sync instead.

When to use:

  • You already know the group's @username or invite link

  • Adding a known group without searching

Returns: group metadata including id, title, member_count.

ParametersJSON Schema
NameRequiredDescriptionDefault
linkYesThe group's @username or invite link (e.g. '@phuket' or 't.me/...')
channelYesChannel the group is on (e.g. 'telegram')
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes the action as adding to a discovery list and returns metadata. Annotations already indicate non-read-only, so description confirms write behavior. Could be more explicit about idempotency or side effects, but adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise and well-structured: main action first, then clarification, then usage guidelines, then output. Every sentence adds value with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple add tool with only 2 parameters and no output schema, the description covers the key aspects: what it does, when to use, what it returns. No critical gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3 applies. Description adds little beyond schema examples: it mentions '@username or invite link' which matches schema description for 'link', and 'Groups and channels only' which does not add specific parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'add', the resource 'group to discovery list', and the means '@username or invite link'. Distinguishes from adding individual persons, and its purpose is distinct from siblings like group_create, group_join, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: knowing the group's @username or invite link, and when not to use: for adding individuals, with alternative tools provided (contacts.discover then contacts.sync).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_add_memberAInspect

Add a member to an existing group on Telegram or WhatsApp.

What this does:

  • Adds the specified member to the group

  • Resolves the member by username, phone number, or JID

  • Reports if the member is already in the group

Returns: success, chat_id, member, already_member.

ParametersJSON Schema
NameRequiredDescriptionDefault
memberYesThe member to add (format depends on channel: @username on Telegram, phone on WhatsApp)
channelYesChannel where the group exists (e.g., 'telegram', 'whatsapp')
chat_idYesID of the group/channel to add the member to
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds behavioral context beyond annotations: member resolution, idempotency reporting (already_member). No contradictions with annotations (readOnlyHint=false, idempotentHint=false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise with bullet points, no fluff. Every sentence adds value. Structure is clear and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given full schema coverage and no output schema, the description adequately explains returns (success, chat_id, member, already_member). Adequate for a simple 3-param tool with no required output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. Description adds meaningful detail: format depends on channel (Telegram @username, WhatsApp phone), and explains member resolution, adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Add a member to an existing group on Telegram or WhatsApp' with a clear verb and resource. It distinguishes from sibling tools like group_create, group_join, and group_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains how members are resolved (username, phone, JID) but does not explicitly state when to use this tool versus alternatives or provide exclusions. Usage context is implied but not fully explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_createAInspect

Create a new group on a channel (Telegram or WhatsApp). Returns the new group's chat ID and invite link.

What this does:

  • Creates a new group with the specified title

  • Returns chat_id, invite_link, and channel_ref for further operations

  • Optionally registers the group in your inbox for monitoring

Returns: success, chat_id, channel_ref, title, thread_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
aboutNoOptional description or about text for the group
titleYesTitle/name of the group to create
channelYesChannel to create the group on (e.g., 'telegram', 'whatsapp')
group_typeNoType of group to create. Options: 'supergroup' (default), 'basic'. Telegram-only; ignored on WhatsApp.supergroup
register_in_inboxNoAuto-register the created group in your inbox for monitoring. Default: true.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate mutation (readOnlyHint=false). The description adds context: what is returned (chat_id, invite_link, channel_ref), optional registration, and channel-specific behavior (group_type Telegram-only). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with the main action and includes bullet points for clarity. However, some redundancy exists between the initial sentence and the 'What this does' section, slightly reducing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description provides a list of return values (success, chat_id, etc.) which is helpful. It covers the main aspects of the tool, though it could mention error scenarios or prerequisites. Overall, adequately complete for the complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. The description adds value beyond schema by explaining the purpose of 'channel' (Telegram/WhatsApp) and that 'group_type' is Telegram-only. This extra context justifies a higher score than baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new group on a channel (Telegram or WhatsApp)' with specific verb and resource. It distinguishes from sibling tools like group_add, group_join, etc., which focus on adding members or joining existing groups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (for creating a new group) but does not explicitly contrast with sibling tools or state when not to use. It mentions optional inbox registration but lacks direct usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_joinAInspect

Join a group and start syncing its messages to your inbox. The group must be in your discovery list (use group.search or group.add first).

What this does:

  • Joins the group on Telegram (or other channel)

  • Creates a thread in your inbox for syncing messages

  • Optionally enables AI auto-reply drafts

Returns: success, thread_id, auto_reply_enabled.

ParametersJSON Schema
NameRequiredDescriptionDefault
group_idYesID of the discovered group (from group.search or group.list)
enable_auto_replyNoEnable AI auto-reply drafts for messages in this group. Drafts can be reviewed and sent manually. Default: true.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses behavioral traits beyond annotations: joining the group, creating a thread for syncing, and optionally enabling auto-reply drafts. Annotations only provide readOnlyHint=false and destructiveHint=false, so the description adds meaningful context about what the tool does.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: a summary line, a prerequisite note, bullet points for effects, and a return format line. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (2 params, no output schema), the description covers prerequisites, actions, and return values fully. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description mentions group_id and enable_auto_reply in the bullet points but does not add significant meaning beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Join a group and start syncing its messages to your inbox.' It identifies the specific action (join) and the effect (syncing), distinguishing from sibling tools like group_search or group_add which deal with discovery.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states the prerequisite that the group must be in the discovery list and references group.search or group.add. While it doesn't explicitly say when not to use it, the prerequisite provides clear guidance for correct usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_listA
Read-onlyIdempotent
Inspect

List groups you've found and joined in this workspace.

Lifecycle values:

  • discovered: found but not yet evaluated

  • bookmarked: saved for later

  • monitored: joined and actively syncing messages

  • dismissed: hidden

By default, dismissed groups are excluded. Returns: id, title, member_count, lifecycle, scan_status, overall_score.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results (1-100, default 20)
offsetNoPagination offset. OMIT to start at row 0 (default).
channelNoFilter by channel (e.g. 'telegram'). Optional.
lifecycleNoFilter by state: discovered, bookmarked, monitored (=joined/syncing), dismissed. OMIT to include all states (dismissed excluded by default elsewhere).
min_scoreNoMinimum overall score (0.0-1.0). Optional.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds value by explaining lifecycle state meanings, default exclusion of dismissed groups, and return fields, which go beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded: first sentence states purpose, followed by lifecycle explanations, default behavior, and return fields. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description lists return fields. Parameters are fully documented in the schema. The description covers default behavior and lifecycle, making it complete for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the baseline is 3. The description adds meaning by explaining lifecycle enum values and default exclusion of dismissed groups, which is not fully captured in the schema's parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists groups the user has found and joined in the workspace. It distinguishes itself from sibling tools like group_create, group_join, and group_search by focusing on listing existing groups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on default behavior (dismissed groups excluded) and explains lifecycle values for filtering, but does not explicitly state when to use this tool versus alternatives like group_search.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_preview_messagesA
Read-onlyIdempotent
Inspect

Read recent public messages from a group without joining it. Only works for groups where can_preview_history=true.

Use this to manually evaluate message quality before deciding to join. For an automated quality score, use group.scan instead.

Returns: list of recent messages with sender, text, date, is_reply.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoNumber of recent messages to fetch (1-100, default 20)
group_idYesID of the discovered group (from group.search or group.list)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark readOnlyHint=true and destructiveHint=false. Description adds the return format (list of messages with sender, text, date, is_reply) and the precondition, which is helpful beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise multi-sentence description with no extraneous content. Each sentence adds value: main action, precondition, use case, alternative, return format.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers key aspects: purpose, precondition, alternative, return fields. Lacks mention of ordering (e.g., most recent first) but not critical. Overall adequate for a read-only list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for both group_id and limit. The description adds no new semantic information beyond what's already in the input schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads recent public messages without joining, a specific verb-resource combination. It distinguishes itself from the sibling group.scan by noting manual evaluation vs automated scoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (manual evaluation before joining), and gives alternative (group.scan for automated score). Also notes precondition: only works if can_preview_history=true.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_promote_adminAInspect

Promote a member to admin in an existing group on Telegram or WhatsApp.

What this does:

  • Gives the specified member admin status in the group

  • On Telegram, this grants visibility of all group messages (even if not a bot)

  • Defaults to minimal/empty rights; specify custom rights if needed

Returns: success, chat_id, member.

ParametersJSON Schema
NameRequiredDescriptionDefault
memberYesThe member to promote (format depends on channel: @username on Telegram, phone on WhatsApp)
rightsNoOptional admin rights dict (Telegram-specific). If not provided, defaults to minimal/admin status only. Example: {"post_messages": true, "edit_messages": true}
channelYesChannel where the group exists (e.g., 'telegram', 'whatsapp')
chat_idYesID of the group/channel where the member will be promoted
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds behavioral details beyond annotations: Telegram grants message visibility even to non-bots, defaults to minimal/empty rights, and allows custom rights. This provides useful context about side effects and configuration, though it doesn't cover WhatsApp specifics or reversibility.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise with front-loaded purpose, bullet points for details, and a returns line. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 params, nested object, no output schema), the description covers key aspects: function, behavioral notes, defaults, and returns. It is adequate but could benefit from clarifying prerequisites (e.g., member must exist in group).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with clear parameter descriptions. The tool description adds the return format ('Returns: success, chat_id, member') but does not deepen understanding of parameters beyond schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Promote a member to admin in an existing group on Telegram or WhatsApp', specifying the verb (promote), resource (member to admin), and scope (existing group, channels). It differentiates from sibling tools like group_add_member by focusing on promoting to admin.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies the group must already exist but provides no explicit guidance on when to use this tool versus alternatives like group_add_member. No exclusion cases or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

group_scanAInspect

Scan a group to evaluate its quality before joining. Fetches recent messages, analyzes activity, spam, and engagement, then returns a quality score and plain-English verdict.

When to use:

  • After finding groups with group.search

  • Before deciding which groups to join

Returns: overall_score (0-1), is_disqualified, disqualify_reasons, individual scores, and a verdict string.

ParametersJSON Schema
NameRequiredDescriptionDefault
group_idYesID of the discovered group (from group.search or group.list)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds behavioral context (fetches messages, analyzes spam/engagement) beyond annotations. No contradiction, but could mention if any state changes occur, though annotations indicate non-destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, well-structured with clear sections (purpose, when to use, returns). No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all needed context: single parameter, return values listed, usage flow explained. No output schema but description compensates fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and description reinforces that group_id comes from specific sources (group.search, group.list). Adds value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool scans a group to evaluate quality before joining. Verb 'scan' and resource 'group' are specific, and it distinguishes from siblings like group_search (finds groups) and group_join (joins groups).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance: use after group.search and before deciding to join. Also lists return values, helping agent understand when to invoke.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

images_generateAInspect

Generates a PNG image from a text prompt using Gemini 2.5 Flash Image. Returns a file_id consumable by messages.send(attachments=[...]) and other file-aware tools. Supports up to 12 reference image file_ids for subject-consistent edits and composition (use file IDs from the [ATTACHMENTS] block, files.search, or search.files). Latency: ~8-10s per image. Output: 1024×1024 PNG.

ParametersJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the image to generate (3-4000 chars).
aspect_ratioNoOutput aspect ratio.1:1
reference_file_idsNoOptional list of up to 3 file_ids whose images should be used as visual references (for edits, subject consistency, or composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate mutability but no destructiveness. Description adds latency, output dimensions, model name, and consumable file_id. However, it contradicts the schema on reference_file_ids count (says 12 vs schema says 3).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four concise, front-loaded sentences covering function, output, references, and latency. No redundant text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 params and no output schema, the description adequately covers purpose, output format, and constraints. Missing content policy notes or deeper explanation of 'subject-consistent edits', but still functional.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds prompt char limit (3-4000) and usage hints for reference_file_ids, but the count mismatch hurts reliability.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates a PNG image from a text prompt using Gemini 2.5 Flash Image, explains the output format (file_id consumable by other tools), and distinguishes from siblings like videos_generate and images_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explains when to use the tool (generate image from prompt, with optional references) and hints at usage context (file IDs from attachments/search), but lacks explicit when-not-to-use guidance or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_list_mediaA
Read-onlyIdempotent
Inspect

List photos and Reels on the connected Instagram Business/Creator account. Returns id, caption, media_type, permalink, thumbnail_url, timestamp.

ParametersJSON Schema
NameRequiredDescriptionDefault
afterNoPagination cursor from a previous call's next_cursor.
limitNoPage size, 1-50. Default 25.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive behavior. Description adds return field list but no additional behavioral details beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence plus a short list of return fields; no extraneous content, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a list tool without output schema, description adequately covers returned fields and account scope. No additional information needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions ('after', 'limit'). Description does not add further meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List photos and Reels on the connected Instagram Business/Creator account' with specific verb and resource, and distinguishes from sibling tools like instagram_publish_media and instagram_update_media.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear context for when to use the tool (listing media on Instagram) but does not explicitly exclude alternative tools or specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_publish_mediaAInspect

Publish a photo (IMAGE) or video (REELS) from workspace files to a connected Instagram Business/Creator account. Returns media_id + permalink. Instagram allows ~25 publishes per day.

ParametersJSON Schema
NameRequiredDescriptionDefault
captionNoPost caption (max 2200 chars). OMIT to publish without caption.
file_idYesWorkspace files.id of the photo or video to publish.
media_typeNo'auto' (default, detects from mime), 'image', or 'reels'.auto
location_idNoFacebook Place ID for location tag.
share_to_feedNoFor Reels: also show on profile grid (default true).
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (readOnlyHint false, destructiveHint false), so the description carries the burden. It adds that the tool returns media_id and permalink and notes the daily limit, but omits failure scenarios, permission requirements (connected account), or what happens on rate limit exceedance. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences with no redundant information. The first sentence states purpose and returns, the second adds a key constraint. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters and no output schema, the description covers basic purpose and a rate limit, but lacks details on prerequisites (connected Instagram account), error handling, or behavior when limits are hit. With sparse annotations, more context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to elaborate on parameters. It confirms the source as 'workspace files' but adds no additional meaning beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool publishes a photo or video to Instagram, specifies the source (workspace files), and lists return values (media_id, permalink). It distinguishes between IMAGE and REELS, and the sibling tools (instagram_list_media, instagram_update_media) show it is distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a daily rate limit ('~25 publishes per day'), which provides some usage context, but it does not specify when to use this tool versus alternatives like instagram_list_media or instagram_update_media. No when-not-to-use or prerequisite guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

instagram_update_mediaAInspect

Update the caption of a published Instagram photo or Reel. Only caption is editable after publish (Instagram limitation).

ParametersJSON Schema
NameRequiredDescriptionDefault
captionYesNew caption (max 2200 chars).
media_idYesInstagram media ID (from list_media or thread metadata).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a non-read, non-destructive write operation. The description adds valuable context: 'Only caption is editable after publish (Instagram limitation)', disclosing a key behavioral trait beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no unnecessary words, front-loaded with the primary action. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but the operation is simple. The description covers the essential behavioral constraint. Could mention error conditions (e.g., if media not published), but overall sufficient for the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description adds 'from list_media or thread metadata' for media_id, but the schema already provides sufficient meaning. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Update the caption' and the resource 'published Instagram photo or Reel', distinguishing it from other tools like instagram_publish_media. It mentions the Instagram limitation, adding specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (after publish, for caption updates) and notes the Instagram limitation. It lacks explicit when-not or alternative suggestions, but the context of sibling tools provides some differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_completeAInspect

Mark the job as completed. This sanitizes PII from the context and records a completion summary. Use when all tasks in the job are done.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to complete
summaryNoBrief summary of what was accomplished
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate write and non-destructive behavior. The description adds value by mentioning PII sanitization and summary recording, but does not cover idempotency, side effects, or whether the job must be active.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey purpose, side effects, and usage context with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks mention of return value, optional parameters' implications, or prerequisites like job state. Given no output schema, more detail on behavior would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add extra meaning beyond the schema; it mentions 'summary' but schema already describes that parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool marks a job as completed, including PII sanitization and summary recording. It is specific, but does not explicitly distinguish from sibling tools like job_escalate or job_update_context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear condition ('Use when all tasks in the job are done'), offering context for when to invoke. However, it does not mention exclusions or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_escalateAInspect

Escalate the job to a human. Use when you cannot resolve an issue, someone is not responding, or a situation requires human judgment.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to escalate
reasonYesWhy escalation is needed
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and destructiveHint=false, consistent with escalation being a non-destructive mutation. The description adds that it involves a human, but lacks details on side effects (e.g., ticket creation, notifications) or post-escalation state.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences, with the purpose stated first and usage guidance following. No redundant or unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple handoff tool with no output schema, the description covers purpose and usage, but lacks information about the outcome or what the agent should expect after calling the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters have schema descriptions (job_id, reason). The description does not add additional meaning beyond the schema, so the baseline score of 3 applies due to high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Escalate the job to a human.' It specifies the resource (job) and the action (escalate), and distinguishes from sibling tools like job_complete or job_update_context that handle different aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use: 'when you cannot resolve an issue, someone is not responding, or a situation requires human judgment.' It clearly defines the context, though it does not mention alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_read_contextBInspect

Read the current job context. Returns the full state of your active job including assignments, escalations, and any data you previously stored.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to read
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims read-only behavior but readOnlyHint annotation is false, indicating potential side effects. This is a contradiction. No additional behavioral traits disclosed beyond 'returns full state'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with action verb. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequately describes return content (assignments, escalations, stored data) but lacks specifics on output format, error handling, or behavior when job_id is omitted.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds minimal value beyond parameter description, lacking clarity on 'current' vs parameterized reading.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Read the current job context' with verb and resource. However, 'current' conflicts with the optional job_id parameter allowing reading of any job, causing slight ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus siblings like job_update_context. Usage is implied but not detailed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_update_contextAInspect

Update the job context by merging new data. Existing keys are preserved unless explicitly overwritten. Use this to record progress, update assignment statuses, or store intermediate results.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoThe ID of the job to update
updatesYesKey-value pairs to merge into job context
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses merge semantics (keys preserved unless overwritten) and context-friendly use cases. Annotations are neutral, so description adds valuable behavioral context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences: first states action and behavior, second provides use cases. No fluff or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple update tool: covers purpose, usage, and parameter behavior. Lacks error handling or size limits, but overall complete given no output schema and simple schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds meaning by explaining that 'updates' are merged and giving examples, going beyond the schema's 'Key-value pairs'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (Update), resource (job context), and critical behavior (merging, preserving existing keys). It distinguishes from siblings like job_read_context and job_complete by specifying the update operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases: 'record progress, update assignment statuses, or store intermediate results.' Does not state when not to use, but the merge behavior implies it is for incremental updates rather than replacements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_find_entityA
Read-onlyIdempotent
Inspect

Find an entity by name in the Knowledge Graph.

USE WHEN user mentions a person, project, company by name and you need:

  • To resolve a name to entity_id for subsequent queries

  • 'Кто работает над X?' → find X first

  • 'Расскажи про Y' → find Y first

RETURNS entity_id for use in kg.get_relationships or kg.explore. ALWAYS use this as the FIRST step in KG query chains.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesEntity name to search for. Can be in any language (Russian, English, etc.) - transliteration is automatic.
limitNoMaximum results to return (1-10). Default: 5
entity_typeNoFilter by entity type: - 'person': People, contacts - 'project': Projects, tasks - 'organization': Companies, teams - 'event': Meetings, deadlines - 'topic': Discussion topics - 'workspace': User's own facts (my/our company) OMIT to include all entity types.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds context about returning entity_id and chaining, aligning with annotations and providing additional behavioral insight.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with front-loaded purpose and usage guidelines, but is slightly verbose. Every sentence adds value, but could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with 3 parameters, the description adequately covers purpose, usage, and return value. It does not describe error handling or limitations, but is sufficient for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The description adds minor value (e.g., transliteration note for name) but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool finds an entity by name in the Knowledge Graph, with specific verb and resource. It distinguishes from sibling tools like kg_get_relationships and kg_explore by emphasizing its role as the first step in query chains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use scenarios with examples ('Кто работает над X?', 'Расскажи про Y') and states to ALWAYS use as first step, giving clear context and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kg_get_relationshipsA
Read-onlyIdempotent
Inspect

Get relationships for a specific entity from Knowledge Graph.

USE WHEN:

  • 'Кто работает над X?' - filter by works_on

  • 'С кем общался Y?' - filter by discussed_with

  • 'Кто из компании Z?' - filter by member_of

  • 'Что связано с W?' - no filter, get all

REQUIRES: entity_id from previous kg.find_entity step. Use: {{step_N.entity_id}} where N is the find_entity step number.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum relationships to return (1-50). Default: 20
directionNoRelationship direction: - 'outgoing': Entity → Others - 'incoming': Others → Entity - 'both': All relationships (default)both
entity_idYesEntity ID from kg.find_entity step. Use {{step_N.entity_id}} reference.
relation_typesNoFilter by relationship types (optional): People: works_on, works_for, member_of, manages, knows, client_of, provides_service Communication: discussed_with, participated_in, mentioned_in Org/Project: developed_by, funded_by, partnered_with, integrates_with, depends_on, part_of Document: issued_by, issued_to, signed_by, authored_by Other: uses, located_in, about, follows, owns, related_to
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the tool is a safe read operation. The description adds behavioral context: it requires a prior entity lookup, and it supports filtering by relation types and direction. However, it doesn't describe the return format or pagination, which is a minor gap given no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (few sentences plus bullet list) and front-loaded with the main action. Every sentence adds value: purpose, usage examples, prerequisite, parameter reference. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters (1 required) and no output schema, the description provides sufficient context: purpose, when to use, prerequisite, and parameter usage through examples. The annotations confirm it's safe and idempotent. Complete for a read-only lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers all 4 parameters with descriptions (100% coverage). The description adds value by mapping natural language queries to specific relation_types (e.g., 'works_on', 'discussed_with'), which enhances understanding beyond the enum list. No further parameter details needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with a clear verb+resource: 'Get relationships for a specific entity from Knowledge Graph.' It distinguishes from sibling tools like kg_find_entity (which finds entities) and knowledge_query (general query). The 'USE WHEN' examples further clarify the purpose with specific relation types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'USE WHEN' with natural language mappings to relation types (e.g., 'Кто работает над X?' → filter by works_on). It also specifies the prerequisite: entity_id from kg.find_entity step, and provides a template reference. This gives clear guidance on when and how to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

knowledge_queryA
Read-onlyIdempotent
Inspect

Answer questions using knowledge base (uploaded documents, handbooks, files).

Use for QUESTIONS that need an answer synthesized from documents or messages. Returns an evidence pack with source citations, KG entities, and extracted numbers.

Modes:

  • 'auto' (default): Smart routing — works for most questions

  • 'rag': Semantic search across documents & messages

  • 'entity': Entity-centric queries (e.g., 'Tell me about [entity]')

  • 'relationship': Two-entity queries (e.g., 'How is [entity A] related to [entity B]?')

Examples:

  • 'What did we discuss about the budget?' → knowledge.query

  • 'Tell me about [entity]' → knowledge.query mode=entity

  • 'How is [A] related to [B]?' → knowledge.query mode=relationship

NOT for finding/listing files, threads, or links — use search.files / search.threads / search.links for that.

ParametersJSON Schema
NameRequiredDescriptionDefault
date_toNoFilter messages until this date (ISO format: YYYY-MM-DD).
file_idsNoSpecific file IDs to search within (for pinned files)
questionYesThe question to answer from user's knowledge base. Required even for entity queries.
date_fromNoFilter messages from this date (ISO format: YYYY-MM-DD). Use for time-based queries like 'this week', 'last month'.
thread_idNoLimit search to a specific thread/chat
max_sourcesNoMaximum number of sources to consider (1-10)
needs_aggregationNoTrue if query asks for totals/sums/counts.
include_relationshipsNoInclude KG relationships in answer (default: true for entity mode)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and idempotentHint=true. Description adds details about return format (evidence pack with citations, KG entities, extracted numbers) and mode behaviors (auto, rag, entity, relationship). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured into sections: purpose, returns, modes, examples, exclusions. Minimally verbose yet informative. Slightly longer than necessary due to mode explanations but justified by tool complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description adequately explains return format (evidence pack). Covers all key aspects: scope, modes, exclusions. No gaps given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with detailed parameter descriptions. Description adds usage context but does not explain the missing 'mode' parameter (modes are described conceptually). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool answers questions using a knowledge base, with specific verb 'Answer questions using knowledge base'. It distinguishes itself by explicitly excluding file/thread/link finding and naming alternatives (search.files, search.threads, search.links). Examples reinforce its scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (questions from documents/messages) and when not to use (finding files/threads/links). Provides mode-specific guidance with examples and alternative tools. No ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_add_commentAInspect

Add a comment to a LinkedIn post. Use post_id from search results or thread data.

ParametersJSON Schema
NameRequiredDescriptionDefault
textYesComment text to post
post_idYesLinkedIn post/activity ID (from search results or thread metadata)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=false, implying a write operation, but the description adds no further behavioral context beyond 'Add a comment'. It does not disclose potential error conditions, visibility settings, or any side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences, front-loading the purpose. No superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple write operation, the description covers the basic action and parameter source. However, it lacks information about return values, constraints (e.g., text length), or success confirmation, making it only moderately complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description adds minimal extra context for post_id (source hint), but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add a comment') and the resource ('LinkedIn post'), with a specific verb and resource. It distinguishes from sibling tools as no other tool adds comments to LinkedIn posts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides guidance on where to obtain the post_id ('from search results or thread data'), indicating usage context. However, it does not mention when not to use this tool or alternative approaches.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_companyA
Read-onlyIdempotent
Inspect

Get a LinkedIn company profile by company ID or vanity name. Returns company name, description, industry, size, and other details.

ParametersJSON Schema
NameRequiredDescriptionDefault
identifierYesCompany ID or vanity name
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the safety profile is clear. The description adds value by indicating the return content (company details). It does not contradict annotations and provides some behavioral context beyond the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the purpose and includes essential details. Every part adds value with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with one parameter and no output schema, the description is complete. It covers input, action, and output fields. Minor improvement could be mentioning the return format (JSON) but not necessary.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with the parameter description matching the tool description ('Company ID or vanity name'). The tool description does not add additional meaning beyond what the schema already provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get a LinkedIn company profile') and the resource (company profile). It specifies the input method (by company ID or vanity name) and lists the returned fields (name, description, industry, size, and other details). This distinguishes it from sibling tools like linkedin_get_profile which likely returns person profiles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use this tool (when you need company details) but does not explicitly state when not to use it or provide alternatives. Given the large set of sibling tools, explicit guidance would help. However, the purpose is clear enough that an agent can infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_get_profileA
Read-onlyIdempotent
Inspect

Get a LinkedIn user profile by ID, public identifier (vanity name), or profile URL. Returns name, headline, location, and other profile information.

ParametersJSON Schema
NameRequiredDescriptionDefault
identifierYesLinkedIn member ID, public identifier (vanity name), or full profile URL
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds that it returns name, headline, location, and other profile information, which is useful but expected. No additional behavioral details (e.g., error handling, rate limits) are provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence with no redundancy, front-loading the main action ('Get a LinkedIn user profile'). Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 required parameter, no output schema), the description is largely complete. It specifies the return content (name, headline, location) but lacks details on other potential fields or error conditions. Annotations cover safety, so this is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents the parameter. The description goes beyond by clarifying acceptable formats (ID, vanity name, profile URL), adding concrete usage context. This exceeds the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a LinkedIn profile and specifies multiple identifier formats (ID, vanity name, URL). However, it does not differentiate from sibling tools like linkedin_get_company or linkedin_search, which is acceptable given the distinct resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, no prerequisites mentioned, and no exclusions stated. The agent is left to infer usage from the tool name and description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_inviteAInspect

Send a connection invitation to a LinkedIn user. Optionally include a personalized message (max 300 characters). Rate limited: LinkedIn allows 80-100 invitations per day, max 200 per week.

ParametersJSON Schema
NameRequiredDescriptionDefault
messageNoOptional personalized invitation message (max 300 characters)
provider_idYesLinkedIn provider ID of the person to invite (from search results or profile)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-read-only, non-idempotent, non-destructive behavior. The description adds important behavioral context: rate limiting, which is not in annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The purpose is front-loaded, and rate limits are concisely stated. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two parameters, no output schema, and no nested objects, the description covers the essential: action, optional message, and rate limits. It does not need to explain returns, so it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so schema already documents parameters. The description adds value by hinting where to obtain provider_id (from search or profile) and reiterating message max length, providing context beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sends a connection invitation to a LinkedIn user. It uses a specific verb-resource combination and is distinct from siblings, as no other tool covers invitation sending.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides rate limits (80-100 per day, max 200 per week), which guide usage. While no explicit alternatives are given, the tool is unique among siblings, so usage context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_connectionsB
Read-onlyIdempotent
Inspect

List your LinkedIn connections, sorted by most recently added.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum connections to return
cursorNoPagination cursor from previous response
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds the sorting behavior, but does not disclose pagination specifics, rate limits, or authentication needs. With annotations covering safety, this is minimal but acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: one sentence that front-loads the main action and sort order. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with a clear schema, the description is adequate. However, it could mention the return format or common use cases, but it's not critical given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for both parameters (limit, cursor). The description adds no further meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists LinkedIn connections with a specific sort order (most recently added). It distinguishes from sibling tools like linkedin_get_profile or linkedin_search by focusing on the list of connections.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. For example, it does not mention that for a single connection's details, use linkedin_get_profile, or for searching connections, use linkedin_search.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_invitations_sentA
Read-onlyIdempotent
Inspect

List your pending sent connection invitations on LinkedIn.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum invitations to return
cursorNoPagination cursor from previous response
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, destructiveHint, and idempotentHint. The description adds the specificity 'pending sent,' which clarifies the type of invitations returned, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, focused sentence that immediately conveys the tool's purpose with no unnecessary words or information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool, the description adequately specifies what is being listed (pending sent invitations). It does not describe the response format, but given no output schema, agents can infer typical LinkedIn invitation data. Slight gap in pagination behavior but cursor parameter covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions for 'limit' and 'cursor'. The description does not add additional meaning to these parameters, so it meets the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (list), resource (pending sent connection invitations), and platform (LinkedIn), distinguishing it from sibling tools like linkedin_list_connections (accepted connections) and linkedin_invite (send invitations).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing pending sent invitations but does not explicitly guide when to use this tool versus alternatives like linkedin_list_connections or linkedin_invite. No exclusion criteria or alternative suggestions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_list_reactionsA
Read-onlyIdempotent
Inspect

List all reactions (likes, celebrates, etc.) on a specific LinkedIn post.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum reactions to return
post_idYesLinkedIn post/activity ID
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds content type (likes, celebrates) but no additional behavioral traits like pagination, rate limits, or auth requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, 9 words, front-loaded with purpose. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two parameters and full schema coverage, the description is adequate but lacks any mention of return values or pagination behavior, which would be helpful given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add meaning beyond the schema; no parameter explanations or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists all reactions (likes, celebrates, etc.) on a specific LinkedIn post, using specific verb and resource. It distinguishes from sibling tools like linkedin_list_connections which list connections.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use or when not to use, nor alternatives. The tool's purpose is implied but lacks context such as prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_raw_requestA
Read-onlyIdempotent
Inspect

Send an arbitrary LinkedIn API request via Unipile's magic route. Only GET and POST methods are allowed. WARNING: This bypasses structured rate limiting and can perform destructive actions. Use this only when no other LinkedIn tool covers the needed functionality.

ParametersJSON Schema
NameRequiredDescriptionDefault
bodyNoRequest body (for POST requests)
methodNoHTTP method (only GET and POST allowed)GET
request_urlYesTarget LinkedIn API endpoint URL
query_paramsNoURL query parameters
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description warns that the tool 'bypasses structured rate limiting and can perform destructive actions,' which directly contradicts the annotations (readOnlyHint=true, destructiveHint=false). This is a critical inconsistency that undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, each serving a distinct purpose: stating the function, restricting methods, and providing warnings with usage guidance. No unnecessary words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, method restrictions, and usage context, but lacks information about response format, error handling, or how to construct valid request URLs. For a raw request tool, these details are important for effective agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds no extra meaning to parameters. The description does not elaborate on any parameter beyond what the schema already provides, earning a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sends an arbitrary LinkedIn API request via Unipile's magic route, specifying allowed methods (GET and POST) and explicitly noting it's a fallback for when no other LinkedIn tool covers the functionality. This differentiates it from sibling LinkedIn-specific tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: 'Use this only when no other LinkedIn tool covers the needed functionality' and warns about bypassing rate limiting and potential destructive actions. However, it does not name specific alternative tools, making the guidance slightly less actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_search_filtersA
Read-onlyIdempotent
Inspect

Get LinkedIn search filter parameter IDs. LinkedIn uses internal IDs instead of text for search filters (location, industry, etc.). Call this before linkedin.search to resolve filter keywords to their LinkedIn parameter IDs.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesFilter category to resolve (e.g. LOCATION, INDUSTRY, SKILL)
limitNoMax results per filter category
keywordsYesKeywords to resolve to parameter IDs (e.g. 'Thailand' for LOCATION)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds the key behavioral detail that the tool resolves filter keywords to internal IDs, which aids understanding without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. The first sentence clearly states the core purpose, and the second provides necessary context about LinkedIn's internal IDs and the prerequisite relationship to linkedin.search.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with full schema coverage and clear annotations, the description explains the why, when, and prerequisite. It could mention the return format (list of parameter IDs) but is complete enough for use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents each parameter. The description provides overarching context but does not add new detail about individual parameters beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states exactly what the tool does ('Get LinkedIn search filter parameter IDs'), explains why it's needed (LinkedIn uses internal IDs), and distinguishes it from the sibling tool linkedin_search by positioning it as a prerequisite.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to call this before linkedin.search, providing a clear usage context. Does not discuss when not to use or alternatives, but the single clear directive is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

linkedin_update_profileAInspect

Update the authenticated user's own LinkedIn profile. Supports adding/editing experience entries (role, company, skills, dates). Also supports updating location. Headline, summary, education are NOT supported by the API.

ParametersJSON Schema
NameRequiredDescriptionDefault
locationNoLocation to set on profile (requires LinkedIn location ID)
experienceNoAdd or edit a professional experience entry
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations show readOnlyHint=false and destructiveHint=false; description adds value by detailing supported/unsupported fields and API limitations, providing context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences front-load the main purpose, followed by clear details. No redundant or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 2 parameters and no output schema, the description covers primary use cases and limitations. However, it could mention if any side effects or confirmations occur.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions. The description adds high-level context (e.g., 'Supports adding/editing experience entries') and clarifies what is not supported, enhancing the schema's meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'Update the authenticated user's own LinkedIn profile' and lists supported (experience, location) and unsupported (headline, summary, education) features, clearly distinguishing from sibling tools like linkedin_get_profile.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying what is supported, but does not explicitly state when to use this tool versus alternatives like linkedin_add_comment or linkedin_get_profile. No when-not guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_deleteA
DestructiveIdempotent
Inspect

Delete a message from a thread. Supports Telegram, WhatsApp, and other connected channels. Note: Some channels have time limits on message deletion.

ParametersJSON Schema
NameRequiredDescriptionDefault
thread_idYesThread/channel ID containing the message
message_idYesID of the message to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructive behavior and idempotency. The description adds useful information about time limits on deletion, which goes beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no fluff: the first states the action, the second adds an important limitation. Efficient and to the point.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with no output schema, the description covers the key aspects: what it does, supported channels, and a notable limitation. It is sufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions in the schema itself. The description does not add extra meaning beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete a message from a thread' with a specific verb and resource, and distinguishes from sibling tools like messages_send or messages_forward by its action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides helpful context about channel support and time limits, which aids in deciding when to use the tool, but does not explicitly compare to alternatives or state prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_forwardAInspect

Forward a message from one thread to another. Supports native Telegram forwarding (preserves original sender attribution) and text-based forwarding for cross-channel scenarios.

ParametersJSON Schema
NameRequiredDescriptionDefault
dest_thread_idNoDestination thread to forward into. Provide at least one of dest_thread_id or recipient_name. To forward into the active conversation, pass the current thread_id. (If both are provided, dest_thread_id wins and recipient_name is ignored.)
recipient_nameNoName of person to forward to (channel auto-resolved). Provide at least one of dest_thread_id or recipient_name. Use only when forwarding to a different contact than the current conversation.
source_thread_idYesThread containing the message to forward (e.g., 'telegram:123456' or numeric DB ID)
source_message_idYesID of the message to forward
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavioral differences between native and text-based forwarding (preserving sender attribution). Annotations already indicate non-destructive, non-read-only, non-idempotent behavior, so the description adds useful context about forwarding behaviors beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with only two sentences. It leads with the primary action and quickly covers the two forwarding modes without extraneous detail. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description adequately explains the two forwarding modes but does not mention return values or error behavior. Given no output schema and moderate complexity, it is fairly complete but could be slightly more comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with detailed descriptions for all parameters. The tool description does not add new semantic information beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's action: forwarding a message from one thread to another. It specifies support for two distinct modes (native Telegram forwarding with preserved attribution and text-based for cross-channel), which distinguishes it from sibling tools like messages_send.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions two forwarding modes but does not provide explicit guidance on when to use each or when to prefer this tool over alternatives like messages_send. It implies cross-channel scenarios for text-based forwarding but lacks clear exclusion criteria or comparison.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_read_historyA
Read-onlyIdempotent
Inspect

Read messages from a conversation thread. Use text_contains to find specific messages by content. Returns the most recent messages, including sender info and timestamps.

Voice calls: each row carries a meta object with allowlisted keys (event_type ∈ 'call_started'|'call_ended'|null, source ∈ 'voice_transcript'|null, call_id, speaker_display_name, duration_seconds, outcome, direction) plus per-message channel. To find calls without scanning every row, use calls.list_history instead.

Usage:

  1. Get thread_id from threads.list first, OR

  2. Use contact_name to auto-resolve thread_id

Examples:

  • User: 'show me messages from chat with [contact]' → read_history(contact_name='[contact]', limit=10)

  • User: 'last 5 messages from thread 571' → read_history(thread_id=571, limit=5)

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of messages to return (default: 10, max: 100)
offsetNoNumber of messages to skip (for pagination, default: 0)
thread_idNoThread ID to read messages from (e.g., '571' or 'telegram:571'). Optional if contact_name provided.
contact_nameNoContact/thread name to search for (optional if thread_id provided). Example: 'Jane Smith', 'John Doe'
text_containsNoFilter: only return messages containing this text (case-insensitive substring match)
include_outgoingNoInclude messages sent by you (default: true)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, non-destructive. Description adds details on return content (sender info, timestamps, meta object for calls) and filtering behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with core purpose, then structured sections for voice calls and usage. Slightly lengthy but each sentence adds essential context; no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with 6 well-described parameters, it covers return format, filtering, pagination (offset/limit), and special cases (voice calls). No output schema, but description adequately informs agent of expected data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. Description adds value by explaining usage patterns (e.g., 'Use text_contains to find specific messages') and providing examples that tie parameters to user intents.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads messages from a conversation thread, distinguishing it from siblings like calls.list_history and mentioning text_contains for content search. Examples reinforce purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit instructions: get thread_id via threads.list or use contact_name. Provides alternatives (calls.list_history) and example queries for common user intents.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_sendAInspect

Send a message to a thread, channel, or contact. Supports Telegram, Email, LinkedIn, and other connected channels. For LinkedIn posts (comment_thread kind), this posts a comment on the post. Can automatically resolve recipients and channels when not specified. Can send files/images/documents as attachments — pass attachments=[file_id, ...] with integer file IDs obtained from collections.list_files, search.files, or files.search. text is optional when attachments are provided.

ParametersJSON Schema
NameRequiredDescriptionDefault
textNoMessage text to send. Optional if attachments provided.
formatNoMessage formattext
silentNoSend without notification
thread_idNoTarget thread. OMIT to reply in the same chat you received the triggering message from — the backend defaults to the current thread. Pass an explicit value ONLY to reply in a DIFFERENT thread, and only use: (a) a numeric DB thread id from search.threads, or (b) a channel_ref like 'telegram:-12345'. NEVER use a chat-type word (dm, group, channel, livechat) — those are category labels from the SITUATION block, not ids.
attachmentsNoArray of integer file IDs to send as attachments (images, documents, any files). Get file IDs from collections.list_files (field `file_id`), search.files (field `file_id`), or files.search. Example: [302237]. The file must already exist in the workspace (status=ready) — no separate upload step needed. When attachments are provided, `text` becomes optional (a caption can be included alongside).
recipient_nameNoName of person to send to (e.g., 'Jane', 'John'). Tool will auto-resolve channel. Optional if thread_id provided.
recipient_usernameNoTelegram @username to message (e.g. '@some_username'). Use this for a Telegram user NOT yet in contacts — it resolves the handle, adds the contact, and creates the thread. Telegram only; for existing contacts prefer thread_id or recipient_name.
reply_to_message_idNoID of message to reply to (optional)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate not read-only and not destructive. Description adds context on auto-resolution, attachment handling, and LinkedIn comment behavior. No contradictions, and it provides useful behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured and front-loaded with core purpose. Some sentences are lengthy, but the description is clear and efficient overall. Could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 params, no required, no output schema), the description covers essential aspects: sending, attachments, auto-resolution, channel-specific behavior. Does not explain return values, but that is acceptable without output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds significant extra meaning: detailed guidance on thread_id (when to omit, valid value types), attachments (how to obtain file IDs, requirement that file exists), and recipient_username (use case for new Telegram contacts). Exceeds baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sends a message to a thread, channel, or contact, and specifies supported channels (Telegram, Email, LinkedIn). It distinguishes from siblings like messages_send_email by covering general messaging.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear guidance on parameter usage, auto-resolution of recipients/channels, and when to use different parameters (thread_id vs recipient_name vs recipient_username). Lacks explicit comparison to sibling tools like messages_forward or messages_send_email.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_send_emailAInspect

Compose and send an email — with subject, CC/BCC, and attachments. Use for email; for chat messages (Telegram/WhatsApp/livechat) use messages.send instead.

ParametersJSON Schema
NameRequiredDescriptionDefault
ccNoEmail addresses to CC. OMIT to skip.
bccNoEmail addresses to BCC. OMIT to skip.
textNoEmail body.
subjectNoEmail subject line. Required for new emails; for replies it auto-generates 'Re: ...' when omitted.
attachmentsNoArray of integer file IDs to attach.
recipient_emailNoRecipient email address (e.g. 'john@example.com'). Provide to start a new email thread; OMIT to reply in the current email thread.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=false and destructiveHint=false, so the description does not need to repeat these. The description adds no behavioral details beyond the schema (e.g., auto-generation of subject for replies is in schema, not description). It is consistent and adequate, but does not go beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with the core action and key features, then provides critical differentiation from sibling tool. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 6-parameter tool with full schema coverage and no output schema, the description provides sufficient context for typical email composition. It does not explain return values, but for a send operation, the success/failure is often implied. The differentiation from sibling tools adds completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so all parameters are documented in the schema itself. The tool description mentions subject, CC/BCC, and attachments, but adds no new semantic meaning beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it composes and sends an email with subject, CC/BCC, and attachments. It distinguishes itself from the sibling tool 'messages_send' for chat messages, which is explicit and specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use for email; for chat messages (Telegram/WhatsApp/livechat) use messages.send instead.' This provides clear when-to-use and when-not-to-use guidance with a named alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_deleteA
DestructiveIdempotent
Inspect

Delete a note by ID from the target notebook. Same identity rules as notes.save — agents can only delete from their own notebook.

ParametersJSON Schema
NameRequiredDescriptionDefault
note_idYesID of the note to delete
target_agent_idNoTarget notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses deletion behavior and identity rules beyond annotations (destructiveHint, idempotentHint). Could detail error handling for missing note_id or unauthorized access, but adds significant context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with primary action. No wasted words; includes essential identity context efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given simple tool with two params (one required) and informative annotations, description covers identity rules critical for correct usage. Complete for an AI agent to use appropriately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already covers both parameters with descriptions. Description adds value by referencing identity rules for target_agent_id behavior, enhancing understanding beyond schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'delete', resource 'note by ID', and context 'from the target notebook'. Distinguishes from sibling tools like notes.save and notes.search by specifying the delete action and identity restrictions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage context: agents can only delete from their own notebook, referencing identity rules from notes.save. Lacks explicit 'when not to use' or alternative tools, but the restriction is clearly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_recallA
Read-onlyIdempotent
Inspect

Recall notes from your notebook. By default returns only your own notes (all scopes, newest first). Pass filter_agent_id= to read another agent's notebook, or filter_agent_id="all" (or "*") to read across every agent in the workspace. Pass scope to narrow to global/thread/person. Each result includes agent_id and agent_name of the author.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyNoRecall a specific note by key
limitNoMax notes (default 20, max 50). Newest first.
scopeNoOptional filter: global | thread | person. Omit for all scopes.
scope_ref_idNoFilter by specific thread_id or person_id
filter_agent_idNoOptional. Omit to read only your own notes. Pass a numeric agent_id as a string (e.g. "57") to read another agent's notebook (read-only). Pass "all" or "*" to read across all agents in the workspace.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds value by explaining result structure (includes agent_id and agent_name), default ordering, and filtering behavior, which goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with 5 sentences, front-loading the core purpose. Each sentence adds useful information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only recall tool with no output schema, the description sufficiently covers return values (includes agent_id and agent_name), default limit, ordering, and filtering. All parameters are well documented in the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. The description adds semantic value by explaining how to use filter_agent_id with examples and noting scope filtering, which enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Recall notes from your notebook.' It specifies default behavior (own notes, all scopes, newest first) and distinguishes from sibling tools like notes_save, notes_delete, and notes_search by focusing on retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on when to use the tool and how to filter: by default returns own notes, but can read another agent's notebook via filter_agent_id or all agents. Lacks explicit when-not-to-use guidance, but the filtering options are well explained.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

notes_saveAInspect

Save a fact or note into the agent's memory. Use scope to choose visibility: 'workspace' = visible to every agent in this workspace (use for shared facts, project conventions); 'agent' = private to this agent (use for personal working notes); 'thread' = scoped to one conversation (use for thread-specific reminders); 'person' = scoped to one contact (use for per-contact context). If a note with the same key+scope exists it will be updated. Do NOT use this tool for behavioral rules or corrections — use feedback.save for those.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesShort identifier for this note (must not start with '__' — reserved)
scopeYesScope of the note. 'workspace' = shared across all agents; 'agent' = private to this agent (was 'global' pre-PR1); 'thread' = per-conversation; 'person' = per-contact. 'global' is accepted as a deprecation alias for 'agent'.
valueYesThe note content
pinnedNoPin this note so it's always loaded first. Default false.
scope_ref_idNoReference ID — thread_id (for scope=thread) or person_id (for scope=person). Required for thread/person scope. In MCP mode (no thread context), must be passed explicitly.
target_agent_idNoTarget notebook. In agent mode optional (defaults to your own); required from MCP. Agents cannot target other agents' notebooks. Ignored when scope='workspace' (workspace memory is shared).
expires_in_hoursNoAuto-delete after N hours. Omit for permanent notes.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already show readOnlyHint=false and destructiveHint=false. The description adds that existing notes with same key+scope will be updated, which is useful behavioral context. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured and front-loaded with main purpose. However, it is somewhat lengthy. Still, each sentence adds value and no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 required params and 7 total, the description covers complex scoping and distinguishes from siblings. It also addresses edge cases like MCP mode for scope_ref_id. Output schema not present, but description sufficiently explains behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. Description adds value by explaining scope semantics in detail, including deprecated 'global' alias. Also mentions key reserved prefix which schema already covers, but overall context is enhanced.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Save a fact or note into the agent's memory' with specific verb and resource. Distinguishes from sibling feedback.save by explicitly saying 'Do NOT use for behavioral rules'. Also differentiates scopes (workspace, agent, thread, person) which adds clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use each scope with examples. Also states when NOT to use: 'Do NOT use this tool for behavioral rules or corrections — use feedback.save', giving a clear alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

present_tabA
Read-onlyIdempotent
Inspect

Share the agent's browser tab on the live call so everyone sees it as a real screen-share. Pass the page_id you got from browser.open. Only usable while the agent is in an active voice call. The shared tab stays the active share until you call present_tab with a different page_id, close the tab via browser.close, or the call ends.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_idYespage_id returned by browser.open for the tab you want to share. Must be a tab still open in the agent's browser context.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true (no data modification) and idempotentHint=true (multiple identical calls have same effect). The description adds context beyond annotations: it explains the sharing persists until a new page_id is provided, the tab is closed, or the call ends. It also clarifies the prerequisite (active call), which is not in annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long. The first sentence immediately states the purpose, the second specifies the input condition, and the third explains the behavior and duration. Every sentence is essential and front-loaded with key information. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, no output schema, no nested objects), the description covers purpose, input, prerequisites, and behavioral lifecycle. It does not specify return values, but for an action tool like this, the behavior description is sufficient. The context signals show high schema coverage, so the description is adequately complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (one parameter with a clear description). The description adds value by emphasizing that the page_id must come from browser.open and that the tab must still be open. This reinforces the schema description and provides additional operational context. Baseline is 3 due to full coverage, and the extra context raises the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Share the agent's browser tab on the live call as a real screen-share.' It specifies the required input (page_id from browser.open) and distinguishes itself by noting it is only usable in an active voice call. This differentiation from sibling tools like browser.open or web_fetch is explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'Only usable while the agent is in an active voice call.' It also explains the lifecycle: the shared tab remains active until a different page_id is presented, the tab is closed, or the call ends. This provides clear guidance and implicitly communicates when not to use (e.g., outside a call).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_getA
Read-onlyIdempotent
Inspect

Get full content of a prompt template: system instructions (prompt_text) and auto-reply rules.

Run prompts.list first to find the prompt_id.

ParametersJSON Schema
NameRequiredDescriptionDefault
prompt_idYesID of the prompt template to fetch
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the agent knows this is a safe read operation. The description adds no extra behavioral details beyond the return content. With annotations covering safety, a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences: the first states the purpose and contents, the second gives a usage guideline. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple, read-only tool with one parameter and no output schema, the description adequately explains what is returned (system instructions and auto-reply rules). It does not mention the full return structure but is complete enough for the agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (one parameter with a description). The description does not add additional meaning beyond what the schema already provides for the 'prompt_id' parameter. Baseline of 3 is correct.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves 'full content of a prompt template' and specifies the contents: 'system instructions (prompt_text) and auto-reply rules.' It differentiates from siblings like 'prompts_list' (which lists templates) and 'prompts_update' (which modifies).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises 'Run prompts.list first to find the prompt_id,' providing a clear prerequisite step and context for when the tool should be used. It implicitly tells the agent not to use this tool before listing prompts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_listA
Read-onlyIdempotent
Inspect

List all prompt templates in this workspace.

Returns id + name + description + category so you know which prompt_id to use in prompts.get or prompts.update.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description's addition of returning id+name+description+category is valuable context. No contradictions. However, it does not mention any other behaviors like ordering or pagination, which is acceptable given no parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences, the first stating the action, the second detailing return value and relation to other tools. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters and a simple list operation, the description is complete. It explains what is returned and how to use that output with sibling tools, which is sufficient for an agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema coverage is 100% (empty). Per guidelines, a baseline of 4 is appropriate since the description adds no parameter information but none is needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists prompt templates and returns specific fields. It distinguishes itself from sibling tools by mentioning that the returned IDs are used in prompts.get and prompts.update, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies that the tool should be used before prompts.get or prompts.update to obtain the necessary prompt_id. It provides clear context by naming the alternative tools, but does not explicitly state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_historyA
Read-onlyIdempotent
Inspect

List past versions of a prompt template's prompt_text. Every edit is snapshotted to an append-only table — use this to browse history and find a version_number for prompts.prompt_restore.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax versions to return (1-200, default 50)
prompt_idYesID of the prompt template
before_versionNoCursor: return versions strictly below this version_number
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds that every edit is snapshotted to an append-only table, which extends transparency beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, each with distinct value: purpose then usage guidance. No redundant or extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with annotations and full schema coverage, the description is nearly complete. It mentions the key output field (version_number) and ties to a sibling tool, though it could explicitly list all return fields in the absence of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions for all parameters. The description adds no new parameter-level details beyond mentioning version_number, but this is already covered by the schema's before_version field.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and the resource ('past versions of a prompt template's prompt_text'). It distinguishes from siblings by mentioning that the output includes version_number for use with prompts.prompt_restore, a sibling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to use this tool to browse history and find a version_number for prompts.prompt_restore. It provides clear context but does not exclude alternatives like prompts_get for current version.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_prompt_restoreAInspect

Restore a past version of a prompt template by version_number. Creates a new version pointing at the restored content — history is preserved. Fans out to every agent using this template without a per-agent override; the response includes affected_agents as a receipt of the fan-out.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoOptional: why this restore is happening (shows up in history UI)
prompt_idYesID of the prompt template
version_numberYesThe version_number to restore (get it from prompts.prompt_history)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=false, destructiveHint=false, etc., confirming a write operation without destruction. The description adds valuable behavioral context: 'Creates a new version... history is preserved' (non-destructive), 'Fans out to every agent... the response includes affected_agents' (side effects and output). No mention of authorization or rate limits, but the additional context is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. The first sentence states the primary action, the second explains the side effects (fan-out, response). Information is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explains what the response contains ('affected_agents'). Parameters are fully defined in the input schema. The description covers the essential behavior: restoration mechanics, history preservation, and the fan-out effect. No gaps for the tool's complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the input schema already describes the parameters (prompt_id, version_number, reason). The description does not add extra meaning to the parameters beyond the schema content (e.g., version_number's schema description already says 'get it from prompts.prompt_history'). Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Restore a past version of a prompt template by version_number') and the result ('Creates a new version pointing at the restored content — history is preserved'). It distinguishes from similar tools (e.g., prompts_update, prompts_prompt_history) by mentioning the fan-out effect, but does not explicitly contrast with alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide guidance on when to use this tool versus alternatives like prompts_update or agents_prompt_restore. No when-not-to-use scenarios are mentioned, leaving the agent to infer usage solely from the description and sibling names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prompts_updateAInspect

Update a prompt template's name, system instructions, or auto-reply rules.

Changes affect every agent using this template, unless the agent has its own override (set via agents.update → prompt_text).

All parameters except prompt_id are optional — only provided fields are updated.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew name for the prompt template
prompt_idYesID of the prompt template to update
descriptionNoNew description for the prompt template
prompt_textNoThe AI system prompt: persona, tone, rules, behavior.
auto_reply_rulesNoPre-classifier rules that run BEFORE the main AI. Format: bullet list of conditions → actions (SKIP / SIMPLE_REPLY / SEARCH / CALENDAR). Pass null to clear.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations, the description discloses that changes affect all agents unless overridden, and that all parameters except prompt_id are optional. This adds useful behavioral context about side effects and partial updates. It does not, however, describe error conditions or confirmation behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three well-structured sentences: first states purpose, second explains impact, third notes optionality. No redundant or filler content; every sentence adds value and is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a write tool with no output schema, the description is missing information about return values (e.g., does it return the updated prompt?). It is adequate for a simple update with well-described schema parameters, but lacks full completeness regarding response or error handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description only repeats which fields can be updated (name, system instructions, auto-reply rules) without adding new semantic information beyond what the schema already provides for each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Update' and the resource 'prompt template', and specifies the fields that can be updated (name, system instructions, auto-reply rules). It distinguishes from sibling tools like prompts_get and prompts_list by focusing on mutation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on the impact of changes: 'Changes affect every agent using this template, unless the agent has its own override'. This helps decide when to use this tool versus agents.update. However, it does not explicitly state when not to use or mention alternative tools for creating templates.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_cancelAInspect

Cancel an active reminder by its trigger ID.

ParametersJSON Schema
NameRequiredDescriptionDefault
agent_idNoAgent ID (required when calling from MCP; ignored in agentic mode).
trigger_idYesID of the reminder to cancel
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false, destructiveHint=false, etc. The description adds the basic behavioral fact that it cancels a reminder, but does not provide additional context like permissions, reversibility, or side effects beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single, front-loaded sentence with no extraneous information. It efficiently communicates the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple nature of this tool (one required parameter, no output schema), the description is fully complete. It tells the agent exactly what the tool does without any missing information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% - both parameters (agent_id, trigger_id) are well-described in the schema. The tool description does not add new meaning beyond what the schema already provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'Cancel', the resource 'active reminder', and the identifier 'by its trigger ID'. It effectively distinguishes from sibling tools like reminder_set (create) and reminder_list (list).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly indicates when to use this tool (to cancel a reminder) and how (using trigger ID). It does not explicitly mention when not to use it or alternatives, but the context of sibling tools and the simplicity of the tool make it adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_listA
Read-onlyIdempotent
Inspect

List your active reminders (both one-time and recurring).

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax results (default 20)
agent_idNoAgent ID (required when calling from MCP; ignored in agentic mode).
thread_idNoFilter by thread
include_firedNoInclude already-fired one-time reminders (default false)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false, so the description correctly adds that it lists 'active' reminders, implying filtering behavior. It is consistent with annotations and adds useful context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, front-loaded sentence that is concise and without wasted words. It effectively communicates the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and the presence of annotations, the description is fairly complete. It could mention the default limit or return format, but it covers the core functionality adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description need not add much. It mentions 'active' but does not elaborate on the parameters beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists active reminders, both one-time and recurring, using a specific verb and resource. It distinguishes itself from sibling tools like reminder_set and reminder_cancel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as reminders_set or reminder_cancel. It does not mention prerequisites or context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reminder_setAInspect

Schedule a reminder. One-time reminders fire at a specific datetime. Recurring reminders fire on a schedule (daily, weekly, every N days, or every N minutes). Optionally scope to a thread or target another agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
timeNoTime of day HH:MM for daily/weekly/every_n_days (e.g. '09:00'). Required for daily/weekly/every_n_days.
reasonYesWhat this reminder is for (you'll see this when it fires)
agent_idNoAgent ID (required when calling from MCP; ignored in agentic mode).
datetimeNoISO datetime for one_time (e.g. '2026-04-01T09:00:00+03:00'). Required for one_time.
timezoneNoIANA timezone (e.g. 'Europe/Moscow'). Defaults to UTC.
thread_idNoOptional thread ID to scope the reminder to. Omit for workspace-level reminders.
days_of_weekNoDays for weekly: 0=Mon, 1=Tue, 2=Wed, 3=Thu, 4=Fri, 5=Sat, 6=Sun. Required for weekly.
interval_daysNoFor every_n_days: fire every N days (min 2).
schedule_typeYesone_time = fires once at datetime. daily = fires daily at time. weekly = fires on specific days_of_week at time. every_n_days = fires every N days at time. interval = fires every N minutes.
interval_minutesNoFor interval: fire every N minutes (5-1440).
target_agent_slugNoOptional: activate a different staff member instead of yourself when the reminder fires.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations offer no behavioral hints, so the description carries full burden. It adds context about schedule types and optional scoping, but omits details like idempotency or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the core action, and no unnecessary words. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 11 parameters and no output schema, the description covers core functionality but misses return value information and some parameter dependencies.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds high-level context for schedule types and scoping but does not enhance individual parameter meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool schedules a reminder, distinguishing one-time from recurring. It does not explicitly differentiate from sibling tools like reminder_cancel, but the purpose is specific and distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (scheduling reminders) but does not provide explicit guidance on when not to use or alternatives like reminder_cancel for cancellation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_filesA
Read-onlyIdempotent
Inspect

Search files and attachments across the workspace — by content, filename, document type, or origin. For message content use search.messages; for links use search.links.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum files to return.
queryNoWhat to search for (content or filename).
file_originNoFile origin: 'generated' (created by tools), 'received' (from messages), 'uploaded' (manual). Use 'generated' for files the user created/sent. OMIT to include all origins.
document_typeNoFilter by document category. OMIT unless the user explicitly mentions one — picking a value narrows the search and is a common cause of zero-result mistakes.
attachment_nameNoExact filename filter. OMIT to skip (do NOT pass an empty string).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description's 'Search' aligns. It adds context by specifying search dimensions (content, filename, etc.) but doesn't mention pagination or limits beyond schema. Some added value but modest.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose, second differentiates siblings. No redundant words, front-loaded with core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with 5 fully documented parameters and clear sibling differentiation, the description is complete. No output schema needed for context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters have schema descriptions with detailed guidance (e.g., when to omit document_type), so the description adds no further semantic value. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches files and attachments by content, filename, document type, or origin. It explicitly distinguishes from sibling tools by directing message content searches to search.messages and link searches to search.links.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use and when-not-to-use guidance: use for files/attachments, not for messages or links. It lists criteria for searching, aiding appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_messagesA
Read-onlyIdempotent
Inspect

Search message content across all chats — semantic + keyword. Use to find what was said: quotes, topics, info exchanged. For chats/threads themselves use search.threads; for files use search.files; for links use search.links.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum messages to return.
queryNoWhat to search for in message content.
date_toNoISO8601 date (YYYY-MM-DD) upper bound. OMIT to skip.
date_fromNoISO8601 date (YYYY-MM-DD) lower bound. OMIT to skip.
participant_nameNoFilter to messages involving this participant/contact name. OMIT to search across everyone (do NOT pass an empty string).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint true, and destructiveHint false. The description adds that search is 'semantic + keyword', providing extra behavioral context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences: first defines purpose, second gives use cases, third differentiates siblings. No unnecessary words; front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with full schema documentation and annotations, the description covers purpose, usage context, and alternatives. No gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add any parameter-specific information beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches message content across all chats using semantic and keyword search, and explicitly distinguishes from sibling tools (search.threads, search.files, search.links).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides when to use (find quotes, topics, info) and when not (use siblings for threads, files, links), offering clear alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_threadsA
Read-onlyIdempotent
Inspect

Find or list chat threads/conversations — by topic, participant, unread/unanswered status, or recency. Omit query to list threads by filter. For message content use search.messages; for files use search.files. since filters by recency and pairs with only_unread / only_unanswered.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum threads to return.
queryNoTopic/keyword to search threads for. OMIT to list threads by filter.
sinceNoISO date (YYYY-MM-DD). Only threads with any message activity since this date (recency filter, not 'unanswered'). OMIT to skip.
only_unreadNoLimit to threads with unread messages. OMIT to include read threads.
only_unansweredNoLimit to threads where the last message is incoming (you haven't replied). Covers 'threads I haven't replied to'. OMIT to include answered threads too.
participant_nameNoFilter to threads with this participant/contact. OMIT to include everyone (do NOT pass an empty string).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and non-destructive behavior. The description adds useful behavioral context (e.g., omitting query for listing, pairing of parameters) without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that are front-loaded with purpose and immediately provide usage guidance and alternatives. Every sentence is impactful with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters and no output schema, the description is largely complete. It covers filtering modes and alternatives but lacks mention of sorting or default ordering. Still, it provides adequate context for a search/list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description still adds value by explaining parameter usage beyond schema definitions, such as the 'since' filter pairing and warning about empty string for 'participant_name'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Find or list' and resource 'chat threads/conversations', and specifies filtering by topic, participant, status, or recency. It also distinguishes from sibling tools like search.messages and search.files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to omit 'query' to list by filter, and directs to alternative tools for message content and files. It also explains how 'since' pairs with 'only_unread' and 'only_unanswered'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

system_sleepA
Read-onlyIdempotent
Inspect

Pause execution for a given number of seconds (max 30). Use when you need to wait for an external process to complete before retrying — e.g. message sync, backfill, or API propagation. Total sleep per run is capped at 60 seconds.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonNoWhy you are waiting (logged for debugging)
secondsYesNumber of seconds to sleep (1-30)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, destructiveHint, and idempotentHint. The description adds the total sleep cap (60s) and per-call max (30s), which are not in annotations, providing additional behavioral context beyond the structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, followed by usage and a global constraint. No unnecessary words or repetition; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple, no-output tool with full annotation coverage, the description covers purpose, limits, and usage examples. It is complete enough for an agent to decide when and how to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters (reason and seconds). The description repeats the max of 30 seconds for the seconds parameter but adds no new semantic information beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (pause execution) and resource (given number of seconds), with a specific max of 30 seconds. No sibling tool duplicates this function, making it distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases (wait for external process, message sync, backfill, API propagation) and a global cap of 60 seconds. Lacks explicit 'when not to use' but offers sufficient context for common scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_createCInspect

Create a new task in your to-do list.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleYesTask title
due_atNoISO datetime when task is due (e.g. '2026-03-31T15:00:00')
agent_idNoAgent ID whose tasks to access. Required when calling from MCP.
due_dateNoDate when task is due (e.g. '2026-03-31'). Use with due_time or alone.
due_timeNoTime when task is due (e.g. '15:00'). Used with due_date.
priorityNoTask priority (default: medium)
thread_idNoRelated thread ID
descriptionNoDetailed description
assigned_to_contact_idNoContact ID if assigned to someone
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-read-only and non-destructive, which is consistent with creation. However, the description adds no behavioral details such as side effects, authorization needs, or return behavior beyond the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no wasted words. Front-loaded with the essential purpose, achieving maximum conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description should clarify what the tool returns (e.g., the created task object). It does not. Given 9 parameters, some usage context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the schema itself fully documents parameters. The description adds no additional semantic value, thus baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a task in a to-do list. It uses a specific verb-resource combination and is distinct from siblings like tasks_delete or tasks_update, but does not explicitly differentiate from them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. No exclusions, prerequisites, or contextual advice are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_deleteCInspect

Delete a task from your to-do list by its ID.

ParametersJSON Schema
NameRequiredDescriptionDefault
task_idYesID of the task to delete
agent_idNoAgent ID whose task to delete. Required when calling from MCP.
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims deletion, a destructive action, but annotations set destructiveHint: false, creating a contradiction. No additional behavioral context provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no filler. Efficient and front-loaded, earning its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool but missing critical info: annotation contradiction, no return value description, and agent_id context not reinforced. Incomplete for safe usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and includes agent_id description, but the description mentions only 'by its ID' and omits the agent_id requirement context. No added value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Delete a task from your to-do list by its ID,' using specific verb and resource, and distinguishes from sibling tools like tasks_create and tasks_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives (e.g., tasks_update) or prerequisites (task must exist). No explicit when-not or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_listA
Read-onlyIdempotent
Inspect

List your tasks, or another agent's tasks (read-only) using from_agent_id. Use filters to narrow results.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax results (default 20)
statusNo
overdueNoIf true, only return tasks past due_at that are not done
agent_idNoAgent ID whose tasks to list. Required when calling from MCP.
thread_idNoFilter by related thread
from_agent_idNoList tasks of another agent (read-only). Omit to list your own.
assigned_to_contact_idNoFilter by assigned contact
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description reinforces the read-only nature for from_agent_id and adds the nuance of 'your tasks' vs 'another agent's tasks', which is useful. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, direct and front-loaded. Every sentence adds value without extraneous detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with good annotations and schema coverage, the description provides sufficient context. No output schema is needed as the return is standard for a listing operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is high (86%), so the schema already documents most parameters. The description mentions using from_agent_id and filters but does not add significant meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists tasks, distinguishes between listing own tasks and another agent's tasks (read-only via from_agent_id), and mentions filters. Among sibling tools like tasks_create, tasks_delete, tasks_update, this listing tool is clearly distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides guidance on using from_agent_id for read-only access to other agents' tasks, and suggests using filters to narrow results. However, it does not explicitly state when not to use this tool or mention alternatives, though the context is clear for a list operation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tasks_updateAInspect

Update an existing task. Set status='done' to complete it, 'cancelled' to cancel. Use summary for completion notes.

ParametersJSON Schema
NameRequiredDescriptionDefault
due_atNoISO datetime
statusNo
summaryNoCompletion note (stored when marking done)
task_idYesID of the task to update
agent_idNoAgent ID whose task to update. Required when calling from MCP.
priorityNo
descriptionNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide basic safety hints (readOnlyHint=false, destructiveHint=false). The description adds behavioral context specific to status updates and summary usage, but lacks detail on other parameters like agent_id (required from MCP), due_at, priority, and description. It does not explain partial update semantics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: a single sentence that front-loads the purpose, followed by specific instructions for key fields. No extraneous information; every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the core use case (updating status and summary) but omits details on other fields (due_at, priority, description) and does not reinforce the required agent_id from MCP context. Given 7 parameters and no output schema, the description leaves gaps for complete usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful semantics for status (setting to 'done' or 'cancelled') and summary (for completion notes), going beyond the schema's enum/string definitions. For other parameters like due_at, priority, description, no additional meaning is provided. Schema description coverage is 57%, and the description compensates for key fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update an existing task' with a specific verb and resource. It provides concrete examples of using status fields ('done' and 'cancelled') and summary for completion notes, distinguishing it from sibling tools like tasks_create, tasks_delete, and tasks_list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates when to use this tool (to modify an existing task) but does not explicitly contrast with alternatives or state when not to use it. The guidance on status values and summary provides clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

threads_updateAInspect

✏️ Update a conversation thread: rename it, add notes/description, or move to a folder.

When to use:

  • User wants to rename a chat or group

  • User wants to add notes/context about a conversation

  • User wants to organize threads into folders

For DM threads, renaming also updates the linked contact's display name by default. Requires thread_id from threads.list.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNoNew title for the thread (max 255 chars)
folder_idNoMove thread to this folder (null removes from folder)
thread_idYesThread ID from threads.list
descriptionNoAI context / notes for this thread. Empty string clears description.
update_contactNoFor DM threads, also rename the linked contact (default: true)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that for DM threads, renaming updates the linked contact's display name by default, and that thread_id comes from threads.list. Annotations are minimal (readOnlyHint=false, destructiveHint=false), so description adds useful context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, starting with the main action, then a 'When to use' list, and two important notes. Every sentence adds value with no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 5 parameters, the description covers main use cases, side effects, and prerequisites. It lacks details on return values or error handling, but is reasonably complete for an update operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. Description adds meaning by mapping actions to parameters (e.g., 'rename it' to title, 'notes' to description) and notes the default for update_contact. This adds value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates a conversation thread, listing specific actions (rename, add notes, move to folder). It uses a specific verb+resource and distinguishes from siblings by focusing on thread updates.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'when to use' bullet points covering rename, notes, and folder organization. However, it does not mention when not to use or alternative tools, missing some guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

videos_generateAInspect

Generate a short video (5-10s) from a text prompt using BytePlus Seedance. Optionally accepts up to 12 image file IDs from the user's attached files (visible in the [ATTACHMENTS] block) as reference_file_ids for style and composition. Returns immediately with a job_id; the video is delivered back via continuation when the job completes (~30-90s for fast model, ~2-5min for pro). Reference images are temporarily re-hosted on a third-party CDN (imgbb) for the duration of generation and deleted on completion — don't submit confidential references. Gated behind a workspace opt-in flag.

ParametersJSON Schema
NameRequiredDescriptionDefault
seedNoRandom seed for reproducibility (0-2147483647). Omit for random.
modelNoVideo model. Recommended: 'wan2.6-i2v-flash' (default, cheap, 720p/1080p, optional audio), 'wan2.6-i2v' (premium, always-on audio), 'wan2.6-t2v' (text-only input, 720p/1080p, no audio), 'wan2.2-i2v-flash' (cheapest, 480p/720p, no audio). Legacy BytePlus: 'seedance-2-fast', 'seedance-2-pro' (720p only).wan2.6-i2v-flash
styleNoStyle preset. Seedance models only. OMIT for no style preset.
promptYesText description of the video to generate (3-4000 chars).
durationNoOutput video duration in seconds. Single-clip: 5 or 10. Long-form (chained, i2v models only): 15, 20, 30, 45, or 60. Long-form videos are silent (no audio in v1) and use only reference_file_ids[0] when refs are provided.
shot_typeNoShot mode: 'single' (continuous) or 'multi' (scene cuts). wan2.6-t2v only. OMIT to use the model default.
resolutionNoOutput resolution. '720p' is the safe default; '1080p' is wan2.6 only; '480p' is wan2.2-i2v-flash only. Per-model support enforced by validation.720p
aspect_ratioNoOutput aspect ratio. Wan supports '16:9', '9:16', '1:1'; Seedance also supports '4:3', '3:4', '21:9'. Per-model support enforced by validation.16:9
camera_motionNoCamera motion preset. Seedance models only. OMIT for no camera motion.
generate_audioNoWhether the model should produce native audio. For wan2.6-i2v-flash this doubles the per-second rate (e.g., 720p+audio is $0.05/s vs $0.025/s silent) — set False for cheaper silent clips. wan2.6-i2v always produces audio regardless of this flag. wan2.6-t2v / wan2.2-i2v-flash / seedance-2-fast never produce audio.
negative_promptNoOptional text describing what to AVOID in the output. Honored by Wan and Seedance models.
reference_file_idsNoOptional list of up to 12 image file_ids to use as visual references (style, composition). Files must be image MIME types (image/png, image/jpeg, image/webp, image/gif). Get IDs from the [ATTACHMENTS] block, files.search, or search.files.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behaviors beyond annotations: asynchronous job with continuation, temporary third-party CDN hosting of reference images with deletion on completion, and model-specific audio behavior (e.g., wan2.6-i2v-flash doubles rate for audio). Annotations indicate mutation (readOnlyHint=false) and the description confirms this, with no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4 sentences) and front-loaded with the core purpose. Each sentence adds essential information: basic function, optional image inputs, job lifecycle, and security/privacy note. No unnecessary words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (12 parameters, no output schema), the description covers the main behavioral aspects: return type, job lifecycle, security considerations, and model-specific options. It could optionally mention error handling or how to poll for completion, but the description is comprehensive enough for effective tool use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 12 parameters are described in the schema (100% coverage). The description adds contextual value by summarizing key nuances (e.g., reference_file_ids from attachments, long-form duration limitations, per-model resolution support). While schema is already detailed, the description provides helpful synthesis.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's primary function: generating a short video from a text prompt using BytePlus Seedance. It also specifies optional image references and distinguishes from other tools by focusing on video generation. No other sibling tool produces video from text, so differentiation is implicit but clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides contextual guidance, including that it is gated behind a workspace opt-in flag and that the tool returns a job_id for asynchronous completion. It does not explicitly state when not to use it or mention alternatives, but the context is sufficient for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vision_queryA
Read-onlyIdempotent
Inspect

Look at the screen currently being shared in a meeting and answer a question about it. Returns a natural-language answer based on the visual content. Use ONLY when the user explicitly asks about the screen/slide/document being shown.

ParametersJSON Schema
NameRequiredDescriptionDefault
questionYesQuestion about the shared screen.
image_b64NoBase64-encoded JPEG image of the screen-share frame.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, covering safety. The description adds that it returns a natural-language answer, but does not elaborate on potential failure cases or requirements (e.g., ongoing screenshare). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences, both front-loaded and free of fluff. Every word adds value: first sentence defines action and return, second gives usage restriction.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is complete for a simple read-only tool: explains what it does, when to use it, and what it returns. No output schema is needed; the natural-language answer description suffices.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both 'question' and 'image_b64' already described adequately in the schema. The tool description does not add additional parameter semantics beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('look at the screen') and the resource ('currently being shared in a meeting'), specifying the exact use case. It distinguishes itself from sibling tools like 'calls_get_transcript' or 'knowledge_query' by focusing on live visual content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly directs to 'Use ONLY when the user explicitly asks about the screen/slide/document being shown,' providing clear when-to-use guidance. It does not list alternatives but the context of sibling tools makes it evident when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_fetchAInspect

Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL.

Modes (extract):

  • 'auto' (default): picks the right mode based on response content type.

  • 'markdown': for HTML pages; returns cleaned markdown plus the page .

  • 'text': for JSON/XML/plaintext APIs; returns the raw decoded body.

  • 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read.

Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn.

Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch (http or https). Must be publicly reachable.
extractNoHow to handle the response: 'auto' (default), 'markdown' (HTML → markdown), 'text' (raw body), or 'file' (ingest as binary, return file_id).auto
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Details four extraction modes with behaviors, notes that URL must be publicly reachable, and explains file mode's async-free ingestion. No contradiction with annotations (readOnlyHint=false, destructiveHint=false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-organized with clear sections for modes and usage guidance. Slightly lengthy but every sentence contributes meaningful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 2 parameters and no output schema, the description fully covers the tool's behavior, return expectations for each mode, and practical use cases, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds context: URL must be publicly reachable, extract modes elaborated with examples. Slight redundancy with schema enum descriptions but still adds value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Fetches a single URL and returns its content' and distinguishes from web.search and files.upload, making its specific purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use web.fetch (specific URL) vs web.search (need to find URL) and files.upload (async), with practical examples like after a web search or when a user pastes a URL.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_createAInspect

Create a new livechat widget for your website.

The widget will be created with default settings. You can customize theme, auto-reply mode, and more.

Use this when user wants to add a chat widget to their site.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesName for the widget (e.g., 'Website Chat', 'Support Widget')
positionNoWidget position on screenbottom-right
display_modeNoVisual mode of the widget. Pick exactly one: - 'chat' (default): full chat panel + voice mic — use for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget (e.g. 'just a voice button', 'no chat, just call'). - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design' / 'no widget chrome'.chat
header_titleNoTitle shown in chat headerChat with us
primary_colorNoPrimary color for widget theme (hex, e.g., '#2563eb')#2563eb
auto_reply_modeNoAuto-reply mode: 'draft' (review before sending) or 'auto' (send immediately)draft
voice_button_labelNoLocalized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if omitted.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide minimal behavioral hints. Description only mentions 'default settings' and customization, but fails to disclose potential side effects, permissions, or rate limits beyond the obvious creation behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences, front-loaded with purpose, no fluff, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with a clear action and fully described schema, the description adequately covers what the tool does and when to use it. Minor omission: no mention of widget activation after creation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The tool description adds 'customize theme, auto-reply mode, and more,' which maps to parameters but offers no additional meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Create a new livechat widget for your website.' with a specific verb and resource. It distinguishes from sibling tools like widgets_update, widgets_delete, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes 'Use this when user wants to add a chat widget to their site.' which is clear context but does not explicitly state when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_deleteA
DestructiveIdempotent
Inspect

Delete a livechat widget permanently.

This will remove the widget and its embed code will stop working. Existing chat history will be preserved.

Use this when user wants to remove a chat widget.

ParametersJSON Schema
NameRequiredDescriptionDefault
widget_idYesID of the widget to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that deletion is permanent, embed code stops working, and existing chat history is preserved. Adds behavioral details beyond annotations (destructiveHint, idempotentHint).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no wasted words. First sentence is the action, second explains consequences, third states usage context. Efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool (1 param, no output schema), the description covers the action, effects, and usage. No gaps for the complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single parameter (widget_id) described. Description does not add parameter-specific details beyond confirming the widget type. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it deletes a livechat widget permanently. Distinguishes from sibling tools like widgets_create, widgets_update, etc. Verb+resource is specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this when user wants to remove a chat widget.' Provides clear context for when to use, though no explicit exclusions or alternatives mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_getA
Read-onlyIdempotent
Inspect

Get full configuration of a single livechat widget.

Returns all settings including theme, identification, actions, and more.

Use this when user wants to see or verify a specific widget's settings.

ParametersJSON Schema
NameRequiredDescriptionDefault
widget_idYesID of the widget to retrieve
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds value by detailing what is returned: 'Returns all settings including theme, identification, actions, and more.' No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences efficiently convey purpose, content, and usage guidance. Front-loaded with key information, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is simple with one parameter. Annotations are rich. Description adequately covers return content and usage context. Complete for the tool's scope.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the parameter. The tool description does not add additional parameter semantics beyond what is in the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'Get full configuration of a single livechat widget', clearly specifying verb and resource. It also distinguishes from sibling tools like widgets_list (list all) and widgets_update (modify).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes explicit usage guidance: 'Use this when user wants to see or verify a specific widget's settings.' Provides clear context for when to use, though does not mention when not to use or alternative tools explicitly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_get_embed_codeA
Read-onlyIdempotent
Inspect

Get the embed code snippet for a livechat widget.

Returns HTML/JavaScript code to add to your website. The code should be placed before the closing tag.

Use this when user wants to install the chat widget on their site.

ParametersJSON Schema
NameRequiredDescriptionDefault
widget_idYesID of the widget to get embed code for
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and no destructiveness. The description adds value by explaining the output (HTML/JavaScript) and where to place it (before </body>). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each conveying essential information: purpose, return type, usage context. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description explains the return value and installation guidance, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a description for widget_id. The description does not add extra parameter information, but baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it gets the embed code snippet for a livechat widget, specifying verb and resource. It distinguishes itself from sibling tools like widgets_get, widgets_list, etc., which have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user wants to install the chat widget on their site.' Provides a clear use case, though no explicit when-not-to-use or alternatives are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_listA
Read-onlyIdempotent
Inspect

List all livechat widgets.

Returns widgets with their configuration, embed code, and status.

Use this when user wants to see their widgets or chat widgets.

ParametersJSON Schema
NameRequiredDescriptionDefault
active_onlyNoOnly return active widgets. OMIT to include inactive widgets too.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description adds value by specifying the returned content (configuration, embed code, status). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus a usage note, all front-loaded and concise. Every sentence adds value with zero redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has one optional parameter, no output schema, and clear annotations, the description is complete enough. It explains the return value and usage context, though it omits potential pagination or ordering—acceptable for a simple list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the only parameter (active_only). The tool description does not add additional parameter information beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists livechat widgets and specifies what is returned (configuration, embed code, status). It distinguishes from sibling tools like widgets_get (single widget) and widgets_create.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user wants to see their widgets or chat widgets,' providing clear context. However, it does not mention alternatives or when not to use it, which is acceptable given the straightforward nature of the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

widgets_updateAInspect

Update an existing livechat widget configuration.

You can change name, theme, auto-reply mode, and other settings. Only provided fields will be updated.

Use this when user wants to modify their chat widget settings.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew name for the widget
positionNoWidget position on screen. OMIT to leave the position unchanged.
is_activeNoEnable or disable the widget. OMIT to leave the active flag unchanged.
widget_idYesID of the widget to update
website_urlNoWebsite URL for product/site search integration
calendly_urlNoBooking URL for calendar action (e.g., 'https://calendly.com/yourname')
color_schemeNoWidget color scheme. 'auto' follows the visitor's OS dark/light mode preference. OMIT to leave the color scheme unchanged.
display_modeNoVisual mode of the widget. Pick exactly one: - 'chat': full chat panel + voice mic — default for support / sales / general. - 'voice_only': mic-only bubble that launches a voice call directly — pick only when the user explicitly asks for a voice-only widget. - 'headless': no UI; customer drives via window.DialogBrain JS API — pick only when the user explicitly says 'embed in our own design'. OMIT to leave the display mode unchanged.
header_titleNoTitle shown in chat header
greeting_textNoCustom greeting message shown when visitor opens the chat (e.g., 'Hello! How can I help you today?')
primary_colorNoPrimary color for widget theme (hex, e.g., '#2563eb')
voice_greetingNoSpoken opening line when a visitor starts a voice call through this widget. Played via TTS before the AI model runs. Empty string disables the greeting.
allowed_domainsNoList of allowed domains for the widget
auto_reply_modeNoAuto-reply mode: 'draft' or 'auto'. OMIT to leave the auto-reply mode unchanged.
header_subtitleNoSubtitle shown in chat header
greeting_enabledNoEnable or disable the proactive greeting. OMIT to leave this flag unchanged.
greeting_behaviorNonotification = show badge after delay; auto_open = open widget automatically after delay; on_open = greet only when visitor manually opens. OMIT to leave the greeting behavior unchanged.
enable_form_actionNoEnable or disable the contact form action button. OMIT to leave this flag unchanged.
voice_button_labelNoLocalized aria-label and hover tooltip for the voice-only mic bubble (only used when display_mode='voice_only'). ≤ 100 chars. Defaults to 'Talk to agent' if not set.
contact_form_fieldsNoFields to collect in contact form (e.g., ['name', 'email', 'phone'])
enable_search_actionNoEnable or disable the search action button. OMIT to leave this flag unchanged.
show_visitor_historyNoShow full chat history to returning visitors. OMIT to leave this flag unchanged.
identification_fieldsNoFields to require for visitor identification (e.g., ['name', 'email'])
enable_calendar_actionNoEnable or disable the calendar booking action button. OMIT to leave this flag unchanged.
greeting_delay_secondsNoDelay in seconds before the proactive greeting appears (0–300). 0 = send immediately on page load. Default: 30.
require_identificationNoRequire visitor to identify before chatting. OMIT to leave the identification policy unchanged.
returning_greeting_textNoGreeting for returning visitors who already have chat history (e.g., 'Welcome back! How can I help you today?'). Falls back to greeting_text if not set.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false, destructiveHint=false, and idempotentHint=false. The description adds important behavior: 'Only provided fields will be updated' (partial update semantics), which is not captured by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a short paragraph that quickly states the purpose and then lists changeable settings. It is front-loaded and minimal, though it could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (27 parameters, no output schema), the description provides the essential purpose and update behavior. However, it does not cover error cases or return values, leaving some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter having a descriptive definition. The description adds a general note about partial updates but does not enhance individual parameter semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Update an existing livechat widget configuration' with specific verb and resource. It lists the types of settings (name, theme, auto-reply mode) and distinguishes from sibling tools like widgets_create and widgets_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when user wants to modify their chat widget settings,' providing clear context. It does not mention alternatives or when not to use, but the sibling suite makes the tool's role obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_currentA
Read-onlyIdempotent
Inspect

Return the workspace this MCP API key is currently routed to, with the caller's role inside it. Use this to confirm context before/after workspace.switch.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true, so the safety profile is clear. The description adds the return value details but does not disclose additional behavioral traits beyond those already documented.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action. Every word earns its place; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters, no output schema, and rich annotations, the description fully explains the tool's purpose and recommended usage context. It is complete for a simple read operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are no parameters, so the description need not add any parameter-level meaning. The empty schema is fully covered, and the description adds no extraneous information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns the current workspace and the caller's role, with a specific verb and resource. It distinguishes itself from sibling workspace tools by focusing on the currently routed workspace, and implicitly contrasts with workspace_list which would list all workspaces.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is given: 'Use this to confirm context before/after workspace.switch.' This tells the agent exactly when to invoke this tool, providing clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_listA
Read-onlyIdempotent
Inspect

List every workspace the caller is a member of, with is_current marking the workspace this MCP key is currently routed to. Pair with workspace.switch to change the active workspace without reconnecting.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description adds context about the `is_current` field and the membership scope, but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with purpose, followed by usage pairing. No superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters, no output schema, and annotations covering safety, the description fully explains the tool's behavior, output, and usage context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist; schema coverage is 100%. The description does not need to add parameter info. Baseline 4 for zero-param tools.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'List' with the resource 'workspaces', explicitly states the output includes `is_current` marking, and distinguishes from siblings like `workspace_current` and `workspace_switch`.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit pairing with `workspace.switch` to change active workspace, implying when to use this tool versus alternatives. Offers clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workspace_switchAInspect

Re-point the active MCP API key to a different workspace. Pass exactly one of workspace_id or slug (find them via workspace.list). Takes effect on the very next tool call — no MCP reconnect, no new API key. Sequential checkpoint: do not parallelize tool calls across a switch — calls already in flight when the switch commits will run against the previous workspace.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugNoWorkspace slug to switch to. Resolved within the caller's memberships, so cross-tenant slug collisions are not possible. Mutually exclusive with `workspace_id`.
workspace_idNoNumeric workspace id to switch to. Mutually exclusive with `slug`.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (no readOnly, no idempotent), but the description richly explains behavior: no reconnect needed, immediate effect on next call, sequential checkpoint requirement. Adds critical behavioral context beyond structured annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences, each serving a distinct purpose: purpose, usage instruction, and concurrency warning. No extraneous words, well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, usage, and concurrency fully. No output schema is needed. Missing minor details like error behavior if both parameters are provided, but overall complete for a state-switching tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage and already describes mutual exclusivity. The description adds value by stating 'pass exactly one' and referencing workspace.list for lookup, but the schema already carries the semantic load. Minor improvement over baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 're-point' and the resource 'active MCP API key' to a different workspace. Distinguishes from sibling tools like workspace.list and workspace.current by specifying that it changes the active workspace for subsequent calls.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to pass exactly one of workspace_id or slug, directs to workspace.list for discovery, explains the effect timing (next tool call), and warns against parallelizing tool calls across the switch. Provides clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_commentA
DestructiveIdempotent
Inspect

Permanently delete a YouTube comment by id (or 'youtube:comment:'). Cannot be undone. Costs 50 quota units.

ParametersJSON Schema
NameRequiredDescriptionDefault
comment_idYesBare commentId OR 'youtube:comment:<id>'.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds the permanent nature and quota cost beyond the annotations (destructiveHint, idempotentHint). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two sentences) and front-loaded with the essential action. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with one parameter and no output schema, the description covers all necessary aspects: purpose, id format, irreversibility, and cost.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter, and its description in the schema already matches the tool description. The tool description adds no new meaning beyond the schema's description, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (delete), the resource (YouTube comment), and the id format. It distinguishes from siblings like youtube_moderate_comment or youtube_post_comment_reply by specifying permanent deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context by noting that deletion is permanent and costs quota, but does not explicitly mention when not to use it or suggest alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_delete_videoA
DestructiveIdempotent
Inspect

Permanently delete a YouTube video by id (or 'youtube:video:'). Cannot be undone. Costs 50 quota units. Caller must own the channel.

ParametersJSON Schema
NameRequiredDescriptionDefault
video_idYesBare videoId OR 'youtube:video:<id>'.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds beyond annotations: it states the action is permanent ('Cannot be undone'), costs 50 quota units, and requires channel ownership. Annotations already mark destructiveHint=true, and the description reinforces this with concrete details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two sentences that front-load the core action and quickly cover critical notes (permanence, cost, ownership). No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter, strong annotations, and no output schema, the description is complete. It covers the action, input format, behavioral impact, cost, and authorization.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers 100% of parameters, providing a clear description for video_id. The description merely restates the format without adding new meaning or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'delete' and the resource 'YouTube video', and specifies the input format as 'by id (or 'youtube:video:<id>')'. It distinguishes from sibling tools like youtube_update_video and youtube_upload_video by its destructive nature and permanence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: it is for deleting a video permanently. It implies that the caller must own the channel, but does not explicitly exclude other users or compare with alternative tools like 'unlist' or 'update'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_commentsA
Read-onlyIdempotent
Inspect

List comment threads on a YouTube video. Pass video_id (e.g. 'dQw4w9WgXcQ') or channel_ref ('youtube:video:'). Returns top-level comments with inline replies.

ParametersJSON Schema
NameRequiredDescriptionDefault
video_idYesYouTube videoId — bare 11-char form OR full 'youtube:video:<id>'.
page_tokenNoPagination cursor from a previous call's `next_page_token`.
max_resultsNoPage size, 1-100. Default 25.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true. Description adds that it returns top-level comments with inline replies, but does not disclose additional behavioral traits like pagination limits or authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. Front-loads purpose and includes an example. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is simple with no output schema. Description covers input and return type (top-level comments with inline replies). Could be more detailed on response fields, but sufficient for a read-only list tool with good schema and annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with all parameters described. Description adds useful clarification that video_id can be bare or with 'youtube:video:' prefix, which is not in schema. This adds value beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'list comment threads on a YouTube video' with specific resource (video) and verb (list). Distinguishes from sibling write tools like youtube_post_comment_reply and youtube_delete_comment by context, though not explicitly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs alternatives, such as when to read vs moderate or delete comments. Context implies usage for reading, but lacks formal when-not statements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_list_videosA
Read-onlyIdempotent
Inspect

List videos on the connected YouTube channel. Returns id, title, published_at, view_count. Paginate via page_token.

ParametersJSON Schema
NameRequiredDescriptionDefault
page_tokenNoPagination cursor returned in a previous call's `next_page_token`. Omit for the first page.
max_resultsNoPage size, 1-50. Default 25.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. Description adds pagination behavior and lists returned fields, which is useful beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, front-loaded with purpose and key return fields, no redundancy. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with two optional params and no output schema. Description lists return fields and pagination mechanism, which is sufficient given complexity and annotation richness. Could mention default max_results but schema already covers defaults.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both params described. Description mentions pagination via page_token but does not add new semantic detail beyond the schema. Baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb 'list' and resource 'videos on the connected YouTube channel'. Clearly distinguishes from sibling tools like youtube_delete_video or youtube_list_comments by focus on listing videos and returning specific fields.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance. However, sibling context clarifies this is the sole tool for listing videos, so implicit usage is clear. Lacks exclusions or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_moderate_commentAInspect

Apply a moderation status to a YouTube comment. Allowed status values: heldForReview, published, rejected, spam. Costs 50 quota units.

ParametersJSON Schema
NameRequiredDescriptionDefault
statusYesOne of: heldForReview, published, rejected, spam.
comment_idYesBare commentId OR 'youtube:comment:<id>'.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses quota cost (50 units), which is helpful. Annotations show readOnlyHint=false, destructiveHint=false, and description doesn't contradict. No further behavioral details provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, then lists allowed values and cost. Every sentence adds value; no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-param action, description covers purpose, allowed values, and cost. No output schema exists; response format not described, but acceptable given simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema already describes both parameters fully (100% coverage). Description repeats allowed status values, adding no new semantic information beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Apply a moderation status to a YouTube comment', with specific verb and resource. Distinguishes from sibling tools like youtube_delete_comment or youtube_list_comments by focusing on status change.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance. Context implies it's for moderation but doesn't contrast with alternatives like deleting or listing comments.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_post_comment_replyAInspect

Post a comment on a YouTube video, or reply to an existing comment. Pass video_id for a top-level comment, OR parent_comment_id to reply. AI-disclosure suffix appended automatically when configured.

ParametersJSON Schema
NameRequiredDescriptionDefault
textYesComment body. 1-10000 chars. AI-disclosure suffix may be auto-appended.
video_idNoBare videoId or 'youtube:video:<id>' — for a top-level comment.
parent_comment_idNoBare commentId or 'youtube:comment:<id>' — for a reply.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a non-destructive write operation. The description adds value by noting the auto-appended AI-disclosure suffix, which is a behavioral detail beyond annotations. It doesn't cover permissions or rate limits, but given annotations, it's adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two tightly packed sentences with no filler. The purpose is front-loaded, and every phrase earns its place (e.g., 'AI-disclosure suffix appended automatically when configured').

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple post action with 3 well-documented parameters and no output schema, the description covers essential usage. It could mention immediate vs delayed posting or confirmations, but overall it's sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already described. The description adds meaning by explaining the conditional usage of video_id vs parent_comment_id, the character limit range, and the auto-appended suffix, which goes beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action ('post a comment or reply'), the resource ('YouTube video/comment'), and distinguishes the two use cases with parameter guidance. It also mentions the AI-disclosure suffix, setting it apart from siblings like youtube_list_comments.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on using video_id for top-level comments and parent_comment_id for replies, implying exclusivity. However, it doesn't explicitly state not to use both or compare with moderation tools, but the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_update_videoAInspect

Update title, description, privacy, or tags on a YouTube video. Costs 1600 quota units. Only fields provided are changed.

ParametersJSON Schema
NameRequiredDescriptionDefault
tagsNoNew tags list. Omit to keep current.
titleNoNew title (max 100 chars). Omit to keep current.
privacyNo'private', 'unlisted', or 'public'. Omit to keep current.
video_idYesBare videoId OR 'youtube:video:<id>'.
descriptionNoNew description (max 5000 chars). Omit to keep current.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a non-read-only, non-destructive operation. The description adds important behavioral details: the cost of 1600 quota units and the partial update semantics. This goes beyond annotations to help the agent understand resource consumption and the update model.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, both valuable. The first identifies the action and scope, the second adds cost and update behavior. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the core action, fields, cost, and partial update. It does not specify the return value (common for updates), but given the simplicity of the tool and lack of output schema, it is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described. The description reinforces that omitted fields remain unchanged, which adds marginal value beyond the schema. It does not introduce new semantics but confirms behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Update' and the resource 'YouTube video', listing specific fields (title, description, privacy, tags) that can be modified. It distinguishes itself from sibling tools like youtube_delete_video and youtube_upload_video by specifying the update action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear usage guideline: 'Only fields provided are changed,' indicating partial update behavior. It does not explicitly state when not to use this tool or mention alternatives like youtube_delete_video for removal, but the context of sibling tools helps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_upload_videoAInspect

Upload a workspace-owned video file (file_id) to the connected YouTube channel. Returns video_id + thread_id. Costs 1600 quota units. Default privacy is 'private' — pass privacy='public' to publish.

ParametersJSON Schema
NameRequiredDescriptionDefault
tagsNoOptional list of tag strings (max ~500 chars total).
titleYesVideo title (max 100 chars).
file_idYesWorkspace `files.id` of the video to upload. Must be a video/* MIME type and `status='ready'`. Get IDs from the [ATTACHMENTS] block, files.search, or search.files.
privacyNoPrivacy status. 'private' (default), 'unlisted', or 'public'.private
category_idNoYouTube category ID (default '22' = People & Blogs). See https://developers.google.com/youtube/v3/docs/videoCategories/list.22
descriptionNoVideo description (max 5000 chars). OMIT to upload without a description.
made_for_kidsNoCOPPA flag. OMIT for the standard (non-kids) default.
channel_account_idNoThe connected YouTube channel_account.id. OMIT to auto-resolve the workspace's YouTube account.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds disclosure of quota cost (1600 units) and default privacy behavior, which goes beyond annotations. However, it does not mention potential delays or failure modes. Annotations already indicate it is not read-only or idempotent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise, front-loading the main action and key details in a few sentences with no filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 8 parameters and no output schema, the description covers the core purpose, returns, quota, and a key default. It could include notes on upload time or error handling, but is still quite complete for an upload tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage, so the description's additional context on file_id and privacy is helpful but minimal. It does not elaborate on other parameters like tags, description, or category_id.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action 'upload', the resource 'workspace-owned video file to the connected YouTube channel', and mentions return values (video_id + thread_id). It is distinct from sibling tools like youtube_list_videos or youtube_delete_video.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides context on when to use (uploading a video) and mentions default privacy and quota cost. However, it does not explicitly contrast with alternatives like youtube_update_video for modifications.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

youtube_video_queryA
Read-onlyIdempotent
Inspect

Ask Gemini about a YouTube video. Pass a video URL and any prompt — verbatim transcript with timestamps, summary, targeted Q&A about content or visuals, translation, etc. Works on any public/unlisted video.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesYouTube video URL. Supported forms: youtube.com/watch?v=…, youtu.be/…, youtube.com/shorts/…, m.youtube.com/watch?v=…. Pass-through to Gemini verbatim.
promptYesWhat to ask Gemini about the video. Examples: 'Provide a verbatim transcript with [HH:MM:SS] timestamps.' / 'What is the main claim made in the first 30 seconds?' / 'Describe what's shown on screen at 0:30.' / 'Translate the spoken Spanish to English.'
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds that the tool works on any public/unlisted video and passes the URL and prompt to Gemini, providing context not in annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences front-loaded with purpose, no filler. Every sentence earns its place, covering what, how, and scope.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple query tool with no output schema, the description adequately explains input requirements and capabilities. It could mention that the response is Gemini's answer, but this is implied by 'Ask Gemini'. The tool has only two parameters and clear behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides 100% coverage with descriptions for both parameters. The description adds value by giving specific examples of prompts (e.g., 'Provide a verbatim transcript with timestamps') and noting supported URL formats beyond the schema, enhancing usability.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Ask Gemini about a YouTube video' with a specific verb and resource. It lists diverse use cases (transcript, summary, Q&A, translation) and specifies it works on public/unlisted videos, distinguishing it from sibling youtube_* tools that modify or list videos.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use: pass a video URL and any prompt. It lists example prompts and mentions compatibility with public and unlisted videos. It could explicitly state not to use for video management tasks (covered by siblings like youtube_delete_video), but the context makes it clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources