Roboflow (Official)
Server Details
Roboflow computer vision for AI agents: datasets, annotation, versioning, workflows, inference.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.1/5 across 46 of 46 tools scored. Lowest: 3.4/5.
Most tools have distinct purposes, but there are a few potentially confusing pairs like trainings_cancel vs. trainings_stop and workflow_specs_run vs. workflows_run, which could cause misselection by an agent.
Names follow a general verb_noun pattern but with inconsistent ordering: some are domain_first (annotation_batches_get) and others action_first (versions_export). This mix may reduce predictability.
With 46 tools, the set is large but covers many subdomains of a computer vision platform. It's on the heavy side but not excessive given the scope.
The toolset covers CRUD operations for projects, models, devices, and workflows, plus additional features like training and inference. Minor gaps exist (e.g., annotation deletion) but core workflows are well-supported.
Available Tools
67 toolsagent_chatAInspect
Chat with the Roboflow AI agent.
Use this tool for:
Roboflow Q&A — the agent has the full Roboflow documentation indexed (SDKs, REST API, deployment options, training, batch processing, Universe, blocks, pricing, etc.). Ask it anything about how Roboflow works.
Advanced workflow building — workflows complex enough that direct block composition via
workflow_blocks_*is impractical. The agent knows every block and connection pattern.Solution planning — pass
mode="plan"and the user's problem; the agent uses a stronger planning model to scope a CV solution end-to-end before any building happens.
For straightforward workflows you can construct yourself, the
direct workflow_* tools are fine — you don't have to route
every workflow through the agent.
Conversation flow
The agent runs a multi-step conversation. It may ask
clarifying questions, recommend a model, or (in plan mode)
produce a plan for confirmation. Pass the returned
conversation_id back on follow-up calls to keep context.
Use agent_conversations_list and agent_conversation_get
to find and resume past conversations.
CRITICAL: the agent NEVER publishes workflows
Every workflow the agent creates or edits is saved as a draft. The published version that callers using the workflow by id will hit is unchanged until you explicitly publish.
To make agent edits live, call agent_workflow_publish with
the workflow url returned in the chat response.
Running an agent-built workflow
Two options:
Run the draft directly without publishing — pass the
specificationreturned in the chat response toworkflow_specs_run. Best for testing the draft, or for one-off runs where you don't want to disturb the currently-published version.Publish, then run by id — call
agent_workflow_publish(workflow_url=...)thenworkflows_run(workflow_id=..., images=...). Use this when you want the change to go live for everyone using the workflow by id.
Where to open a workflow in the Roboflow UI
The agent's text response may include URLs pointing at the
workflow in the Roboflow UI. Ignore those URLs — the agent
sometimes picks the wrong host or path. Each workflow in the
workflows array has an app_url field with the correct,
environment-aware URL (built from the current APP_URL plus
/{workspace}/solutions/chat?workflowUrl=...) — show that
one to the user instead.
Response shape
text— the agent's reply.workflows— workflows created or edited in this turn, each withid,name,url(slug),app_url(clickable Roboflow UI URL — use this), andspecification(the full draft JSON; pass it toworkflow_specs_runto execute without publishing).conversation_id— pass back to continue the conversation.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | Use 'plan' to plan a CV solution end-to-end before building (uses a stronger planning model). Use 'agent' (default) for Roboflow Q&A and advanced workflow construction. | |
| message | Yes | Message for the agent (a Roboflow question, a description of an advanced workflow to build, or a request to plan a CV solution). | |
| conversation_id | No | Conversation ID to continue a previous agent chat. Omit to start a new conversation. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully covers behavior: multi-step conversation, draft-only saving, never auto-publishes, clarifying questions, incorrect URL warning, and response shape.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Long but well-structured with sections, bullet points, and bold text; every sentence is justified. Slightly verbose but all information is relevant.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all important aspects: conversation flow, publishing, running workflows, response shape, and URL discrepancies. Despite complexity, no gaps remain.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and description adds meaningful context to each parameter (mode options, message purpose, conversation_id usage), enhancing beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool is for chatting with the Roboflow AI agent, lists specific use cases (Q&A, complex workflows, planning), and distinguishes from sibling workflow_* tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly describes when to use this tool (e.g., advanced workflows, planning) and when not to (straightforward workflows using direct tools). Also provides flow details, critical publishing notes, and URL handling.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_conversation_getAInspect
Get a single agent conversation with its full message history.
Returns the conversation metadata and the ordered list of
messages (role + text parts). Pass the same conversation_id
back to agent_chat to continue the conversation.
| Name | Required | Description | Default |
|---|---|---|---|
| conversation_id | Yes | Conversation ID returned by agent_chat or agent_conversations_list. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It states it returns metadata and ordered messages, but does not explicitly confirm it's read-only or mention any side effects. For a get operation, this is acceptable but lacks full transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with purpose, no redundant words. Every sentence contributes meaning without being verbose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one parameter) and the presence of an output schema, the description covers purpose, usage, and parameter meaning adequately. Minor lack of error condition details, but overall sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds value by explaining the usage of conversation_id in relation to agent_chat, beyond the schema's description. This helps the agent understand the workflow.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'Get a single agent conversation with its full message history', specifying verb and resource. It distinguishes from siblings like agent_chat and agent_conversations_list by indicating it returns the full history, not a list or continuation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly instructs to pass the conversation_id back to agent_chat to continue the conversation, providing clear context on when to use this tool (retrieve history) and how it relates to another tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_conversations_listAInspect
List Roboflow agent conversations in the workspace.
Returns conversation metadata (id, name, source, timestamps,
linked workflow ids). Use agent_conversation_get to load
the full message history for a single conversation, or pass
the conversation id back to agent_chat to resume it.
| Name | Required | Description | Default |
|---|---|---|---|
| source | No | Filter by origin: 'api' for conversations started via the MCP/API, 'web' for the Roboflow UI. Omit for all. | |
| workflow_url | No | Filter to conversations linked to this workflow URL slug. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden for behavioral disclosure. It states it returns metadata but does not mention pagination, limits, sorting, or side effects, leaving some gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, each earning its place: purpose, return type, and sibling guidance. Front-loaded and no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple listing tool with two optional filters, the description is adequate. It explains what is returned and references siblings. Lacks pagination or rate limit details but still sufficient for basic usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds no parameter-specific information beyond what the schema already provides (source and workflow_url filters).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists agent conversations in the workspace and specifies the returned metadata fields (id, name, source, timestamps, linked workflow ids). It also distinguishes itself from siblings by referencing agent_conversation_get for full history and agent_chat for resuming conversations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implicitly guides usage by directing to siblings for detailed history or resuming conversations. However, it lacks explicit when-to-use versus alternatives statements or prerequisites.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_workflow_publishAInspect
Publish the latest agent-edited draft of a workflow.
The agent never publishes on its own — every workflow it
creates or edits is saved as a draft. This tool promotes the
latest draft to a published version so it goes live for
callers using the workflow by id (workflows_run).
Errors with 400 if there is no draft to publish (i.e. the published version already matches the latest draft).
Returns { workflowId, workflowUrl, versionId, status }.
| Name | Required | Description | Default |
|---|---|---|---|
| workflow_url | Yes | URL slug of the workflow to publish (the 'url' field returned by agent_chat). |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It explains the promotion behavior, error condition (400 if no draft), and the return object. It could mention that publishing is a mutation but it's implied by 'goes live'.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with a clear front-loaded purpose, followed by context and error conditions. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema provided, the description fully specifies the expected return ({ workflowId, workflowUrl, versionId, status }) and mentions the error case. It is complete for this tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and the parameter description is already clear ('URL slug...'). The description adds context by linking the parameter to agent_chat output, enhancing usability.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Publish the latest agent-edited draft of a workflow' with specific verb 'publish' and resource 'draft'. It distinguishes from siblings by explaining that the agent never publishes on its own.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says when to use (when there is a draft to promote) and when not to use (errors if no draft). It also clarifies the outcome: 'goes live for callers using the workflow by id'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
annotation_batches_getARead-onlyIdempotentInspect
Get details about a specific batch.
Returns batch details including image count and status.
| Name | Required | Description | Default |
|---|---|---|---|
| batch_id | Yes | Batch ID | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (read-only, open-world, idempotent, non-destructive). The description adds value by specifying what details are returned (image count and status), which goes beyond annotations. It doesn't contradict annotations (which correctly indicate a safe read operation) and provides useful context about the return content.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise with just two sentences. The first sentence states the core purpose, and the second specifies what details are returned. Every word earns its place with zero redundancy or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (2 required parameters), comprehensive annotations, and existence of an output schema, the description is reasonably complete. It specifies what details are returned (image count and status), which is helpful since the output schema isn't visible here. However, it could mention that this is for annotation batches specifically (implied by tool name but not stated).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters clearly documented in the schema. The description doesn't add any parameter-specific information beyond what the schema provides (batch_id and project_id). However, it doesn't need to compensate since schema coverage is complete, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get details about a specific batch' with specific resources (batch details including image count and status). It distinguishes from sibling 'annotation_batches_list' by focusing on a single batch rather than listing multiple. However, it doesn't explicitly contrast with other batch-related tools that might exist.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by specifying 'a specific batch' and mentioning what details are returned. It differentiates from 'annotation_batches_list' by focusing on individual batch retrieval rather than listing. However, it lacks explicit guidance on when to use this versus alternatives like 'projects_get' or 'versions_get' for related metadata, or prerequisites for accessing batch details.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
annotation_batches_listBRead-onlyIdempotentInspect
List upload batches in a project.
Returns a list of batches with id, name, image count, and upload info.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare this as read-only, non-destructive, idempotent, and open-world, so the agent knows it's a safe, repeatable query. The description adds useful context about what information is returned (id, name, image count, upload info), which isn't covered by annotations. However, it doesn't mention pagination, rate limits, or authentication needs beyond what annotations imply.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two sentences: the first states the action and scope, the second specifies the return data. Every word serves a purpose, and key information is front-loaded. There's no redundancy or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one parameter, read-only operation), rich annotations, and the presence of an output schema (which handles return value documentation), the description is reasonably complete. It covers the core purpose and output structure. The main gap is lack of usage guidance relative to sibling tools, but overall it provides sufficient context for basic use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema fully documents the single required parameter 'project_id'. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., it doesn't clarify format variations or constraints). This meets the baseline of 3 when the schema handles parameter documentation effectively.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('List') and resource ('upload batches in a project'), making the purpose immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'projects_list' or 'workflows_list', which also list resources within projects, leaving some ambiguity about when this specific listing tool is appropriate.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'annotation_batches_get' (which might retrieve a single batch) or explain why one would list batches instead of using other listing tools. There's no context about prerequisites, timing, or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
annotation_jobs_createAInspect
Create an annotation job to assign a batch of images to a labeler.
Returns the created job details including id, name, and status.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Job name | |
| batch_id | Yes | Source batch ID containing images to annotate | |
| num_images | Yes | Number of images to include in the job | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| labeler_email | Yes | Email of the workspace member who will label | |
| reviewer_email | Yes | Email of the reviewer |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is a non-readOnly, non-destructive, non-idempotent, open-world operation. The description adds value by specifying it's for batch image assignment and returns job details, but doesn't disclose additional behavioral traits like rate limits, auth needs, or side effects beyond what annotations cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence and adds return details in the second, with zero wasted words. It's appropriately sized and structured for clarity without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (6 required parameters), rich annotations, and presence of an output schema, the description is mostly complete. It covers creation purpose and return values, but could improve by addressing usage context or behavioral nuances, though the output schema reduces need for return value details.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are fully documented in the schema. The description doesn't add meaning beyond the schema, such as explaining relationships between parameters (e.g., batch_id must exist). Baseline 3 is appropriate as the schema handles parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Create an annotation job') and resource ('batch of images to a labeler'), distinguishing it from sibling tools like annotation_batches_get or models_train. It precisely defines the tool's function without being vague or tautological.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as workflows_create or projects_create, nor does it mention prerequisites like needing an existing batch or project. It lacks explicit context or exclusions for usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
annotations_saveBIdempotentInspect
Save an annotation for an existing image.
| Name | Required | Description | Default |
|---|---|---|---|
| image_id | Yes | ID of the image to annotate | |
| labelmap | No | Label map for Darknet/TXT annotations, e.g. {'0': 'cat', '1': 'dog'} | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| annotation_name | Yes | Filename for the annotation (e.g. 'image1.xml') | |
| annotation_content | Yes | The annotation content (XML, JSON, or text) |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=false, openWorldHint=true, idempotentHint=true, and destructiveHint=false, covering key behavioral traits. The description adds minimal context beyond this, stating it saves annotations for existing images but not detailing effects like overwriting behavior, authentication needs, or rate limits. It doesn't contradict annotations, so a baseline score is appropriate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's front-loaded and wastes no space, making it easy for an agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (5 parameters, 4 required), rich annotations, and the presence of an output schema, the description is reasonably complete. It covers the core action but lacks details on error conditions or integration with sibling tools. The output schema reduces the need to explain return values, keeping the description adequate though not exhaustive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with all parameters well-documented in the input schema. The description doesn't add any meaningful semantic information beyond what the schema provides, such as explaining relationships between parameters or usage nuances. With high schema coverage, the baseline score of 3 is justified.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Save') and resource ('annotation for an existing image'), providing a specific verb+resource combination. However, it doesn't differentiate from sibling tools like 'annotation_batches_get' or 'annotation_jobs_create', which also involve annotations but serve different purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing image), exclusions, or comparisons to sibling tools like 'annotation_batches_get' or 'annotation_jobs_create', leaving the agent with no contextual usage information.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
async_tasks_getARead-onlyIdempotentInspect
Poll an async task by id. Poll every 5 seconds; processing may take up to 30 seconds to start.
Returns { taskId, status, progress, result?, error? }. status is
one of created, running, completed, failed.
| Name | Required | Description | Default |
|---|---|---|---|
| task_id | Yes | Task ID returned by an async-enqueueing tool, e.g. projects_fork |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnly, idempotent), the description adds polling frequency, start delay, and return shape including statuses. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences that front-load purpose and provide necessary behavioral details without redundancy or fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Output schema is present, and the description already outlines the return shape. Combined with annotations and parameter description, the tool definition is fully complete for a polling operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers task_id with description; the description adds context that the ID comes from an async-enqueueing tool (e.g., projects_fork), enhancing understanding beyond the schema alone.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states 'Poll an async task by id,' providing a specific verb and resource. It includes polling details (every 5 seconds, up to 30 seconds to start) that distinguish it from sibling tools focusing on other entities.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description gives clear usage guidance on polling interval and expected processing time. It does not explicitly exclude alternatives, but the context of sibling tools indicates this is the dedicated polling tool for async tasks.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
autolabel_job_getARead-onlyIdempotentInspect
Get per-subjob status and progress for an auto-label job.
Returns status, model/project type, image counts, subjob progress, ontology, confidence thresholds, and the linked annotation job id.
| Name | Required | Description | Default |
|---|---|---|---|
| autolabel_job_id | Yes | Auto-label job ID returned from autolabel_start |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint, openWorldHint, idempotentHint, and destructiveHint=false. The description adds value by listing specific return fields (status, model/project type, image counts, subjob progress, ontology, confidence thresholds, linked annotation job id), which go beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences with no wasted words. The first sentence fronts the core purpose, the second enumerates return items. Perfectly sized for the tool's simplicity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given an output schema exists (covering return structure) and annotations cover safety, the description adequately explains the tool's output. It could mention error conditions or pagination for completeness, but for a simple get-by-id tool, it's largely complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema covers the sole parameter (autolabel_job_id) with 100% description coverage. The description adds no new parameter-specific meaning beyond implying it identifies the job, which is already clear from the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool retrieves per-subjob status and progress for an auto-label job. It specifies the action (get) and resource (per-subjob status/progress). This distinguishes it from sibling tools like autolabel_start, which initiates jobs.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies use after starting a job but lacks explicit guidance on when to use versus alternatives, such as checking the linked annotation job. No when-not-to-use or context for alternatives is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
autolabel_startAInspect
Start a hosted auto-label job over a batch of images.
Returns {jobId, annotationJobId, message}. Poll progress with
autolabel_job_get using the returned jobId.
| Name | Required | Description | Default |
|---|---|---|---|
| model | Yes | Model identifier. For model_type='foundational', a foundational model name (e.g. 'sam3'). For model_type='roboflow', a Roboflow model id like 'project-slug/version' or 'workspace-slug/model-id'. | |
| run_nms | No | Whether to run non-max suppression (default True server-side) | |
| batch_id | Yes | Source batch ID containing images to auto-label | |
| ontology | No | Mapping of class name -> text prompt used to label, e.g. {'cat': 'a cat', 'dog': 'a dog'}. Required for foundational models (sam3: bare nouns, max 50 classes). Optional for model_type='roboflow' — when omitted, the trained model's own classes are used. | |
| model_type | Yes | How to interpret `model`. 'foundational' uses a hosted base model (sent as-is). 'roboflow' uses a Roboflow-trained model and is sent as 'custom_roboflow' with the id passed in modelOptions.modelId. | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| model_options | No | Model-specific options — e.g. {'outputFormat': 'polygon'} or {'outputFormat': 'rle'} for segmentation output shape. | |
| reviewer_email | No | Email of the reviewer for the resulting annotation job. Must be a workspace member. Defaults to the workspace owner. | |
| default_confidence | No | Confidence threshold applied to every ontology class (mirrors the UI slider). The backend fans this out across the ontology when confidence_thresholds is omitted. | |
| num_images_to_label | No | Number of images from the batch to auto-label. Defaults to the entire batch. | |
| confidence_thresholds | No | Per-class confidence threshold override, e.g. {'cat': 0.5, 'dog': 0.6}. Takes precedence over default_confidence. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=false, destructiveHint=false, openWorldHint=true, and idempotentHint=false. The description adds that the tool returns jobId and annotationJobId, and that it starts a job (non-idempotent). No contradictions, but it doesn't elaborate on side effects or state changes beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences: the first states the purpose, the second provides the return value and a pointer to the sibling tool for follow-up. No redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (11 parameters, nested objects, output schema), the description covers the core functionality and return value. It suggests polling with autolabel_job_get but does not explain when to use this over other annotation tools. Overall, it is fairly complete for a job-starting tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema already explains each parameter thoroughly. The tool description adds no additional parameter semantics beyond what's in the schema, earning a baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Start'), the resource ('hosted auto-label job'), and the scope ('over a batch of images'). It also specifies the return format and a sibling tool for polling, helping distinguish from autolabel_job_get and other annotation tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly tells the agent to poll progress with autolabel_job_get, providing a clear next step. However, it does not explicitly state when not to use this tool or compare it with other job creation tools like annotation_jobs_create.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_project_appBRead-onlyIdempotentInspect
Open a Prefab UI so the human can create a Roboflow project (calls projects_create when they confirm).
Use this when project settings are unknown, debatable, or should be chosen or reviewed in a form—not inferred entirely by the agent. MCP alone lacks the UX to settle those fields confidently; the UI collects name, project type, annotation label text, and license before anything is created.
Prefer projects_create only when the agent already has every field
and should create programmatically without a confirmation surface.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=true, but the description states it calls projects_create, which is a write operation, causing a contradiction. No additional behavioral traits disclosed beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that efficiently conveys the tool's purpose without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While the tool has no parameters and no output schema, the description lacks details about what the interactive app does, how to interact, or what the user can expect. Minimal but adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters to describe; schema coverage is 100%. Baseline score of 4 for zero-parameter tools applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it opens an interactive create-project app that calls projects_create, distinguishing it from direct creation tools like projects_create. However, it does not explicitly differentiate from other interactive tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives like projects_create directly. The description implies it is an interactive alternative but does not provide context or prerequisites.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_createAInspect
Provision a new v2 device.
Requires the device:update scope. Returns
{deviceId, installId, offlineProvisioningQrPayload?}:
installId is the short-lived token used by the device installer
(GET /devices/v2/:installId/install.sh).
offlineProvisioningQrPayload is only present for AI1 devices
provisioned with offline_mode=true.
| Name | Required | Description | Default |
|---|---|---|---|
| tags | No | Optional list of tag strings. | |
| device_name | Yes | Human-readable device name | |
| device_type | No | 'ai1', 'edge', or any custom string. Optional. | |
| workflow_id | No | Optional initial workflow assignment. For AI1 devices this seeds the default 'aione' stream; other device types typically bind workflows later when streams are configured. | |
| offline_mode | No | Only valid for AI1 devices on workspaces with the roboflowLiteMode feature flag. | |
| source_device_id | No | When set, duplicate this existing device's config instead of generating a fresh one. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnlyHint false), description adds: required scope, return format with fields explanation (deviceId, installId, offlineProvisioningQrPayload), and special case for AI1 devices. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences: purpose, scope and return values, followed by a clarifying detail. No unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With annotations and rich schema, the description covers purpose, auth, return structure, and edge case. Lacks explicit mention of output schema but describes the returned fields adequately.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% description coverage for all parameters. Description does not add significant new parameter-level info beyond the schema except explaining the return value conditions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description states 'Provision a new v2 device' with a clear verb+resource. Differentiates from sibling tools like 'devices_get' and 'devices_list' which are for retrieval or listing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies use for creating a new device, and the scope requirement is specified. No explicit alternatives or when-not-to-use, but the sibling tools are clearly different operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_getARead-onlyIdempotentInspect
Get a single device by id.
Returns the Device object. 404 if the device does not exist or belongs to a different workspace.
When inspecting a device, also call devices_get_config for its
running services and devices_get_default_config for the
workspace's recommended versions — comparing the two services
maps reveals whether the device is up to date or pinned to an
older build of any service.
| Name | Required | Description | Default |
|---|---|---|---|
| device_id | Yes | Device id |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds that a 404 is returned if the device does not exist or belongs to a different workspace, which is useful but does not cover other behavioral aspects like permissions or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise with two sentences, front-loading the core purpose and including only essential error information, with no unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple retrieval nature, the description adequately covers purpose, return type, and an error case. However, with many sibling tools for device data, a brief note distinguishing base device object from config/telemetry would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema fully describes the only parameter ('device_id' with a clear description), and the description does not add additional semantic value beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Get a single device by id' with a specific verb and resource, and it distinguishes itself from siblings like 'devices_list' and 'devices_create' through the singular retrieval focus.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives, such as 'devices_get_config' or 'devices_get_telemetry', leaving the agent without context for selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_get_configARead-onlyIdempotentInspect
Get the device's current runtime configuration.
Returns the full RFDM config JSON: device_id, device_name,
workspace_id, version, last_updated, config,
services. The response passes through environment_variables
and any embedded integration credentials, so treat the payload as
sensitive.
Tip: pair this with devices_get_default_config and diff the
services map. The default config carries the workspace's
recommended version for each service, so comparing the two tells
you whether the device is up to date or pinned to an older build.
| Name | Required | Description | Default |
|---|---|---|---|
| device_id | Yes | Device id |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds critical behavioral context: the response includes sensitive data (environment_variables and integration credentials), advising caution. This goes beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences: the first states the core purpose, the second enriches with return details and a security note. No wasted words, front-loaded, and every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple read tool with one parameter, output schema, and rich annotations, the description is nearly complete. It covers purpose, return fields, and sensitivity. Minor omission: no mention of error cases or prerequisites, but it's still sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter device_id has 100% schema description coverage ('Device id'). The tool description does not add any additional meaning or constraints beyond the schema, so it meets the baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Get the device's current runtime configuration' with a specific verb and resource. It distinguishes from sibling tools by focusing on 'current runtime configuration' and listing returned fields, differentiating it from devices_get, devices_get_config_history, etc.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for retrieving the full current config but does not explicitly state when to use this tool vs alternatives like devices_get or devices_get_config_history. No exclusions or alternative guidance is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_get_config_historyARead-onlyIdempotentInspect
List prior configuration revisions, newest first.
Returns {data: ConfigRevision[], pagination: {next_cursor,
has_more, limit}}. Each ConfigRevision carries revision_id,
created_at, and created_by.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Page size (1-500, default 10). | |
| cursor | No | ISO timestamp from a previous page's next_cursor. | |
| device_id | Yes | Device id |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds value by detailing the return structure (pagination with next_cursor, has_more, limit) and key fields (revision_id, created_at, created_by), going beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is exceptionally concise: two sentences, no filler, front-loaded with the primary action and ordering. Every sentence adds essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple list tool with pagination, the description is adequate. It explains the return format, ordering, and pagination. It could mention rate limits or error handling, but given the annotations and output schema context, it is sufficiently complete for an agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear descriptions for all three parameters. The description does not add new information about parameters beyond what the schema already provides, but it does clarify the return structure's fields, which is acceptable for a baseline 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists prior configuration revisions, specifies newest-first ordering, and distinguishes from sibling tools like devices_get_config (current config) and devices_get_logs.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Usage context is clear (listing history), but there is no explicit guidance on when to use this tool vs alternatives like devices_get_config, nor any exclusions or prerequisites.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_get_default_configARead-onlyIdempotentInspect
Get the workspace's default device configuration.
Returns {config, patch}: config is the base default config
with the workspace's config_patch applied (the template used
when provisioning new devices, including default service versions),
and patch is the raw workspace-level patch object.
Useful for inspecting which service versions a fresh device would
be created with, or for seeding a payload to devices_update_config.
Requires the device:read scope. Device-scoped api_keys are
rejected with 403 — this endpoint is workspace-wide.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate read-only, idempotent, non-destructive. Description adds return structure details, scope requirement, and that device-scoped keys are rejected, providing useful behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Five sentences, each adding value: purpose, return structure, use cases, scope, and caveat. Front-loaded with main action. Minimal and clear.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a zero-parameter tool with an output schema, the description explains the return format, use cases, and authorization requirements. Nothing missing for an agent to invoke correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters in the input schema, so no explanation needed. Baseline 4 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Get the workspace's default device configuration' and explains the return structure {config, patch}. It distinguishes from siblings by specifying that this is workspace-wide and device-scoped keys are rejected, setting it apart from per-device config tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides concrete use cases: inspecting service versions for new devices and seeding payloads to devices_update_config. Mentions required scope. Could explicitly mention when not to use (e.g., for per-device config), but overall strong guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_get_eventsARead-onlyIdempotentInspect
List device and stream lifecycle events.
Returns {data: Event[], pagination: {next_cursor, prev_cursor,
has_more, limit}}.
| Name | Required | Description | Default |
|---|---|---|---|
| event | No | Filter by event name. | |
| limit | No | Page size (1-1000, default 100). | |
| cursor | No | Opaque cursor from a previous page. | |
| end_time | No | ISO timestamp upper bound. | |
| device_id | Yes | Device id | |
| direction | No | Pagination direction. Default backward. | |
| entity_id | No | Filter to a single entity id. | |
| start_time | No | ISO timestamp lower bound. | |
| entity_type | No | Filter to one entity type (e.g. 'stream', 'device'). |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, destructiveHint, idempotentHint, and openWorldHint. The description adds the return shape (pagination structure) but does not disclose additional behavioral traits beyond what annotations provide. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise: two sentences front-loading purpose and return shape with no wasted words or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (9 parameters, paginated output), the description is minimal. It states purpose and return structure but does not explain event types or pagination details. However, with rich annotations and output schema present, it is minimally adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, with all parameters described in the input schema. The description adds no extra parameter details beyond what the schema already provides. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists device and stream lifecycle events, using a specific verb and resource. It distinguishes from sibling tools like devices_get_logs and devices_get_telemetry.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description does not provide any guidance on when to use this tool versus alternatives, nor does it mention any exclusions or conditions. It only states the basic purpose.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_get_logsARead-onlyIdempotentInspect
Fetch device logs from Elasticsearch.
Rate-limited at two tiers: 5 requests / minute per IP, and 50 requests / minute globally across all callers (Elasticsearch protection).
Returns {data: LogEntry[], pagination: {next_cursor, has_more,
limit}}.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Page size (1-1000, default 100). | |
| cursor | No | ISO timestamp from a previous page's next_cursor. | |
| service | No | Comma-separated service names to filter by. | |
| end_time | No | ISO timestamp upper bound. | |
| severity | No | Comma-separated severities (INFO, WARN, ERROR, ...). | |
| device_id | Yes | Device id | |
| start_time | No | ISO timestamp lower bound. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true. The description adds specific rate limits (5 req/min per IP, 50 req/min global) and the return format, providing behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise: two sentences plus a return format. It is front-loaded with the purpose, then rate limits, then output. No superfluous information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers rate limits and return format, and the schema documents all parameters. Minor missing details like time range inclusivity or sorting order, but overall fairly complete given the context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so each parameter is already documented. The description does not add extra meaning to parameters beyond what's in the schema, so a baseline of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Fetch device logs from Elasticsearch', which is a specific verb-resource pair. It distinguishes from sibling tools like devices_get (device info), devices_get_events (events), and devices_get_telemetry (telemetry).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides rate limits that inform usage but does not explicitly state when to use this tool versus alternatives or when not to use it. Given the many sibling tools, more guidance would be beneficial.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_get_telemetryARead-onlyIdempotentInspect
Get aggregated hardware metrics (cpu, memory, disk, gpu).
Rate-limited per-device: 60 requests / minute steady, plus a burst limit of 10 requests / 10 seconds.
Returns {time_period, bucket_interval, fill_interval_seconds,
buckets: [...]}.
| Name | Required | Description | Default |
|---|---|---|---|
| device_id | Yes | Device id | |
| time_period | No | Aggregation window. Default 24h. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations (readOnlyHint, idempotentHint, destructiveHint) already indicate safe read operations. The description adds specific rate-limiting details (60 requests/minute steady, burst 10/10s) and return structure, which are valuable beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences: first defines purpose, second adds rate constraints, third gives return format. No redundancy, front-loaded, and each sentence serves a clear function.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With simple parameters, 100% schema coverage, output schema implied by return format, and annotations covering safety, the description suffices for correct invocation. Lacks guidance on tool selection among siblings, but overall informative for a telemetry retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers 100% of parameters with descriptions. The description mentions rate limiting per device and the return structure, which subtly relates to device_id and time_period, but does not add explicit semantic details beyond the schema. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it retrieves aggregated hardware metrics (CPU, memory, disk, GPU) with a specific verb ('Get') and resource ('aggregated hardware metrics'). It is distinct from sibling tools like devices_get (general info), devices_get_events, and devices_get_logs, though it does not explicitly compare to them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives such as devices_get, devices_get_events, or devices_get_logs. It mentions rate limits but does not provide criteria for selection or exclusions, leaving the agent to infer usage from the description.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_listARead-onlyIdempotentInspect
List devices registered in the workspace.
Returns a list of Device objects with id, name, status, last_heartbeat,
platform/hardware info, tags, and created_at. status is online
if a heartbeat was received within the last 5 minutes, otherwise
offline (or unknown for devices that have never reported).
Requires the device:read scope. Device-scoped api_keys cannot call
this endpoint and will receive 403.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already mark it as read-only, idempotent, non-destructive. The description adds crucial context: the 5-minute heartbeat status rule, required scope, and the 403 error for device-scoped keys. This goes beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and well-structured: action first, then return fields, then status explanation, then auth. Every sentence provides value with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given zero parameters and an output schema likely covering return types, the description fully covers what the tool does, what it returns, and key constraints (auth, status). It is complete for a list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters exist, so the description need not explain them. Baseline for zero parameters is 4, and the description does not add unnecessary parameter info.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it lists devices in the workspace and details the returned fields (id, name, status, etc.), distinguishing it from sibling tools like devices_get (single device) and devices_create (create). The verb 'list' and resource 'devices' are specific.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies when to use (to get all devices) and includes authentication requirements (device:read scope, device-scoped keys get 403). However, it does not explicitly contrast with other list-like tools or state when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_streams_getARead-onlyIdempotentInspect
Get a single stream by id.
Returns the Stream object. 404 if the stream does not exist on the device.
| Name | Required | Description | Default |
|---|---|---|---|
| device_id | Yes | Device id | |
| stream_id | Yes | Stream id |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, idempotent behavior. The description adds that it returns the Stream object and a 404 error if missing, providing useful context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences with no redundant information. Front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given its simplicity and the presence of an output schema, the description covers all necessary information: what it does, what it returns, and error behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear parameter descriptions. The description adds no further semantic information beyond what is already in the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states 'Get a single stream by id', using a specific verb and resource. It clearly distinguishes from the sibling tool devices_streams_list.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives. The purpose is clear, but it lacks context for when not to use or any prerequisites.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_streams_listARead-onlyIdempotentInspect
List streams configured on the device.
Returns a list of Stream objects. Credential-bearing fields (URL
userinfo, password, api_key, etc.) are redacted from the
source field.
| Name | Required | Description | Default |
|---|---|---|---|
| device_id | Yes | Device id |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds value by noting that credential-bearing fields are redacted from the source field, providing useful behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with two sentences that are front-loaded: first states the action, then notes the return type and redaction behavior. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the description does not need to detail return values. It covers the redaction behavior and lists streams adequately. However, it could explicitly state that it returns all streams on the device, but it still is fairly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with a single parameter described as 'Device id'. The description does not add additional meaning for this parameter, so it meets the baseline of 3 but does not exceed it.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'List streams configured on the device' which clearly identifies the action and resource. However, it does not differentiate from sibling devices_streams_get (which likely gets a specific stream), so a score of 4 is appropriate.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like devices_streams_get. It lacks explicit when-to-use or when-not-to-use information, resulting in a low score.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
devices_update_configADestructiveInspect
Update the device's runtime configuration.
Requires the device:update scope. The merge is a Firestore
shallow update: omitted top-level fields stay as-is, but any
top-level field you supply replaces its entire nested object
wholesale. The payload is validated against the RFDM config
schema, and a new entry is appended to the config history.
| Name | Required | Description | Default |
|---|---|---|---|
| config | Yes | Partial config payload. Top-level keys you include overwrite their entire nested object (no deep merge); top-level keys you omit are preserved. To change a nested value, fetch the current config with devices_get_config, splice your change into the relevant top-level object, and pass that object back here. | |
| device_id | Yes | Device id |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description covers the update's destructive nature (shallow merge replaces top-level fields), validation against schema, and config history appending. It provides rich behavioral context beyond the annotations (destructiveHint=true), including how nested changes should be handled.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three sentences, each serving a purpose: stating the action, explaining the merge behavior and scope, and noting validation and history. It is concise, front-loaded, and contains no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (not shown), the description covers the update operation's key aspects: purpose, required scope, behavior (no deep merge), validation, and historical tracking. It is complete for an agent to correctly invoke the tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the description adds substantial meaning for the 'config' parameter by explaining the shallow merge effect and the correct approach for nested updates. For 'device_id', it adds no extra meaning but the schema description is sufficient. Overall, it significantly enhances parameter understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it updates the device's runtime configuration with a specific verb and resource, and distinguishes from sibling tools like devices_get_config and devices_get_config_history by explaining the update behavior.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions the required scope ('device:update') and explains the merge behavior, implying when to use it. It also provides guidance on updating nested values by fetching first. However, it does not explicitly state when to avoid this tool or name alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_prepare_uploadARead-onlyIdempotentInspect
Get an upload URL to upload a single image to a project.
Returns a pre-built upload URL and instructions. The caller must perform the actual upload using curl since the MCP server cannot access local files.
This endpoint uploads images only. To add annotations, call annotations_save with the image ID from the upload response. For bulk uploads with annotations, use images_prepare_upload_zip.
| Name | Required | Description | Default |
|---|---|---|---|
| split | No | Dataset split | train |
| tag_names | No | Tags to attach to the image | |
| batch_name | No | Group uploads under a named batch | |
| image_name | Yes | Filename for the image (e.g. 'photo.jpg') | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable context beyond the annotations. Annotations indicate read-only, open-world, idempotent, and non-destructive behavior, but the description explains that the tool returns a pre-built upload URL and instructions, and that the caller must perform the actual upload using curl since the MCP server cannot access local files. This clarifies the tool's operational behavior and limitations, though it doesn't detail rate limits or auth needs.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and well-structured, with three sentences that each serve a distinct purpose: stating the tool's function, explaining the upload process, and providing usage alternatives. There is no wasted text, and information is front-loaded, making it easy to understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity, rich annotations (read-only, open-world, idempotent, non-destructive), 100% schema description coverage, and the presence of an output schema, the description is complete. It covers the tool's purpose, behavioral context, and usage guidelines without needing to explain parameters or return values, which are handled by structured fields.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description does not mention any input parameters, focusing instead on the tool's purpose and usage. However, the input schema has 100% description coverage, with all parameters well-documented (e.g., 'project_id' as a project slug, 'image_name' as a filename). Since the schema provides comprehensive parameter details, the description's lack of parameter information is acceptable, resulting in a baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get an upload URL to upload a single image to a project.' It specifies the verb ('Get'), resource ('upload URL'), and scope ('single image'), and distinguishes it from sibling tools like 'images_prepare_upload_zip' for bulk uploads and 'annotations_save' for adding annotations. This is specific and avoids tautology.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool versus alternatives. It states: 'This endpoint uploads images only. To add annotations, call annotations_save with the image ID from the upload response. For bulk uploads with annotations, use images_prepare_upload_zip.' This clearly defines the tool's scope and directs users to other tools for related tasks.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_prepare_upload_zipAInspect
Prepare a zip upload of images and annotations to a project.
Supports zip archives containing images with COCO, YOLO, Pascal VOC, or classification-by-folder annotations. Up to 2 GB / 10k files.
Returns a signed URL and task ID. The caller must:
PUT the zip file to the signed URL
Poll the task status until completed
The signed URL expires in 1 hour.
| Name | Required | Description | Default |
|---|---|---|---|
| split | No | Default split for images | train |
| tag_names | No | Tags to attach to every uploaded image | |
| batch_name | No | Group uploads under a named batch and annotation job | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is a non-readOnly, non-destructive operation, which the description aligns with by describing a preparation step (not the actual upload). The description adds valuable behavioral context beyond annotations: it specifies file size/quantity limits (2 GB/10k files), the two-step process (signed URL + polling), and URL expiration (1 hour). However, it doesn't mention rate limits, authentication needs, or error handling, leaving some gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, followed by key constraints and required steps. Every sentence adds essential information: supported formats, limits, return values, and caller responsibilities. There's no redundant or vague language, and it's structured logically for quick comprehension.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of a two-step upload process, the description is complete: it explains the tool's purpose, constraints, return values (signed URL and task ID), and required follow-up actions. With an output schema present (implied by 'Has output schema: true'), it doesn't need to detail return values further. The annotations cover safety aspects, and the description fills in behavioral gaps appropriately.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already fully documents all 4 parameters. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain how 'split' or 'tag_names' interact with the zip content). This meets the baseline of 3, as the schema carries the full burden of parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Prepare a zip upload') and resource ('images and annotations to a project'), distinguishing it from sibling tools like 'images_prepare_upload' (which likely handles individual files) and 'images_upload_zip_status' (which polls status). It explicitly mentions supported annotation formats (COCO, YOLO, Pascal VOC, classification-by-folder), making the purpose highly specific and differentiated.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: for bulk uploads of images with annotations in zip format (up to 2 GB/10k files). It also outlines the required follow-up steps (PUT to signed URL, poll task status) and mentions the signed URL expiration (1 hour), which implicitly suggests not using it for small or immediate uploads without polling capability.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_searchARead-onlyIdempotentInspect
Search for images inside a project.
| Name | Required | Description | Default |
|---|---|---|---|
| tag | No | Filter results by tag | |
| batch | No | Filter to images in any batch | |
| limit | No | Maximum number of results | |
| query | Yes | Search prompt for project-scoped image discovery | |
| fields | No | Fields to include in each result | |
| offset | No | Result offset for pagination | |
| batch_id | No | Filter to a specific batch id | |
| class_name | No | Filter results by class name | |
| in_dataset | No | Filter to images currently in the dataset | |
| like_image | No | Find images visually similar to this image id/name | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| annotation_job | No | Filter to images assigned to any annotation job | |
| annotation_job_id | No | Filter to a specific annotation job id |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, and destructiveHint=false, covering safety and idempotency. The description adds minimal behavioral context beyond this, but doesn't contradict annotations. It implies a search operation which aligns with the read-only nature.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with zero wasted words. It's appropriately sized and front-loaded, communicating the essential purpose efficiently without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the rich annotations (covering safety and idempotency), 100% schema description coverage, and existence of an output schema, the description provides adequate context for a search tool. However, it lacks guidance on when to use this versus sibling search tools, which is a minor gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, all 13 parameters are well-documented in the schema itself. The description doesn't add any parameter-specific information beyond what's already in the schema, so it meets the baseline but doesn't enhance understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Search for images') and scope ('inside a project'), providing a specific verb+resource combination. However, it doesn't differentiate from sibling tools like 'universe_search' or 'images_prepare_upload', which prevents a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'universe_search' or other image-related tools. There's no mention of prerequisites, constraints, or comparative use cases with sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_upload_zip_statusARead-onlyIdempotentInspect
Check the status of a zip upload task.
Returns status (created, running, completed, failed), progress, and result when completed (uploaded count, duplicates, annotation errors, etc.).
| Name | Required | Description | Default |
|---|---|---|---|
| task_id | Yes | Task ID from images_prepare_upload_zip response |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint=true, idempotentHint=true, etc., covering safety traits. The description adds valuable context beyond annotations: it details return values (status, progress, result with counts like duplicates) and hints at asynchronous task behavior (statuses like 'running'), which helps the agent understand operational semantics without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by specific return details. Both sentences earn their place by providing essential information without redundancy, making it appropriately sized and efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter), rich annotations (readOnly, idempotent, etc.), and presence of an output schema, the description is complete enough. It covers purpose, usage hint, and return semantics, aligning well with structured data without needing to explain basic behaviors or output details.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the parameter 'task_id' fully documented in the schema. The description adds minimal semantics by referencing the source ('from images_prepare_upload_zip response'), but this is marginal beyond the schema. Baseline 3 is appropriate as the schema carries most of the burden.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Check') and resource ('status of a zip upload task'), making the purpose specific. It distinguishes from sibling tools like 'images_prepare_upload_zip' (which initiates uploads) and 'images_search' (which searches images), avoiding tautology by not just repeating the name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by referencing 'task_id from images_prepare_upload_zip response', guiding when to use this tool after that sibling. However, it lacks explicit when-not-to-use statements or alternatives, such as whether to use other status-checking tools or retry mechanisms.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
meta_feedback_sendAInspect
Report a bug, missing feature, UX friction, or documentation issue.
Call this proactively when you encounter errors using Roboflow tools, when the user expresses frustration, when a tool is missing for the task at hand, or when a parameter is poorly documented.
Returns confirmation that the feedback was recorded.
| Name | Required | Description | Default |
|---|---|---|---|
| message | Yes | What happened, what was expected, or what's missing | |
| category | No | Type of feedback | ux-friction |
| tool_name | No | Which tool this relates to, if any |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it explains the proactive nature of calling ('call this proactively'), specifies the types of issues to report, and mentions the return value ('confirmation that feedback was recorded'). Annotations cover safety aspects (non-destructive, non-idempotent, open-world), but the description provides practical usage context that enhances transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly structured and concise: two sentences that efficiently communicate purpose, usage guidelines, and behavioral outcomes. Every sentence earns its place with no redundant information. It's front-loaded with the core purpose followed by specific usage instructions.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's straightforward purpose, comprehensive annotations, 100% schema coverage, and existence of an output schema, the description provides complete contextual information. It explains what the tool does, when to use it, and what to expect in return, which is sufficient for this type of feedback submission tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already fully documents all three parameters. The description doesn't add any parameter-specific information beyond what's in the schema, so it meets the baseline of 3. The description focuses on usage context rather than parameter details.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool's purpose as reporting bugs, missing features, UX friction, or documentation issues. It uses specific verbs ('Report') and resources ('feedback'), and clearly distinguishes itself from all sibling tools which are operational Roboflow tools, while this is a meta-feedback mechanism.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: 'when you encounter errors using Roboflow tools, when the user expresses frustration, when a tool is missing for the task at hand, or when a parameter is poorly documented.' It clearly defines the triggering conditions without needing to reference alternatives since this is the only feedback tool among siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_getARead-onlyIdempotentInspect
Get the top-level summary for a single model evaluation.
Returns the eval metadata plus a summary of mAP / precision /
recall on done evals, and an app_url deep link into the UI panel.
| Name | Required | Description | Default |
|---|---|---|---|
| eval_id | Yes | Model evaluation id. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and destructiveHint=false, so the tool is known to be non-destructive. The description adds further context by detailing what the tool returns (a summary of metrics and an app_url), which goes beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two concise sentences. The first sentence states the core purpose, and the second elaborates on the return values. Every word contributes meaning without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one parameter, full schema coverage, rich annotations, and an output schema), the description provides all essential information: what it does and what it returns. It is complete for a retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with one required parameter 'eval_id' described as 'Model evaluation id.' The description does not add any additional meaning or format details beyond the schema, so the baseline score of 3 is justified.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get'), the resource ('a single model evaluation'), and the scope ('top-level summary'). It distinguishes from sibling tools like model_evals_list and model_evals_get_* by specifying that it returns a summary and an app_url deep link.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description does not provide any guidance on when to use this tool versus alternatives like model_evals_get_map_results or model_evals_get_performance_by_class. There is no explicit statement of use cases, prerequisites, or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_get_confidence_sweepARead-onlyIdempotentInspect
Get the precision/recall/F1 confidence sweep for an eval.
For each split returns perThreshold (metrics keyed by threshold
like "0.20"), the optimalThreshold and optimalMetrics,
and perClass sweeps.
| Name | Required | Description | Default |
|---|---|---|---|
| eval_id | Yes | Model evaluation id. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint. The description adds value by detailing the output structure (perThreshold, optimalThreshold, etc.), which helps the agent understand the response format without accessing the output schema.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise: two sentences that front-load the main action and then detail the return structure. No unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description provides a solid overview of the tool's output, noting splits, thresholds, and per-class sweeps. While it could mention the presence of an output schema, the detail is sufficient for a single-parameter read-only tool with good annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage for the single parameter eval_id with a basic description. The tool description does not add further semantic detail beyond the schema, so the baseline score of 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it retrieves a precision/recall/F1 confidence sweep for an eval, which is a specific resource and action. This distinguishes it from sibling tools like get_confusion_matrix or get_performance_by_class.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains what the tool returns but does not provide guidance on when to use it versus alternative eval analysis tools (e.g., confusion matrix, vector analysis). No exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_get_confusion_matrixARead-onlyIdempotentInspect
Get the confusion matrix for an eval.
Returns {split, confidenceThreshold, classes, matrix} where
matrix[actual][predicted] is the count.
| Name | Required | Description | Default |
|---|---|---|---|
| split | No | Split to inspect — defaults to 'test'. | |
| eval_id | Yes | Model evaluation id. | |
| confidence | No | Integer confidence threshold 0-100. Defaults to the eval's computed optimal threshold when omitted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare this as safe (readOnly, idempotent, not destructive). The description adds value by detailing the return structure (split, confidenceThreshold, classes, matrix) and how matrix indices are interpreted, which goes beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with two sentences: the first states the purpose, and the second explains the return format. There is no unnecessary information or repetition.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the moderate complexity (3 parameters, output schema present), the description covers the essential behavior. It explains the return structure in detail, though it omits error cases or caveats. Still, it is sufficient for an API-focused tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so all parameters are already documented. The tool description does not add any additional meaning or usage hints for the parameters beyond what is in the schema, meeting the baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool retrieves the confusion matrix for an evaluation. The verb 'Get' and resource 'confusion matrix' are specific, and the sibling tools (e.g., model_evals_get, model_evals_get_performance_by_class) suggest this is a distinct metric retrieval tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description does not explicitly state when to use this tool versus alternatives like model_evals_get or model_evals_get_performance_by_class. The purpose is implied by the name, but no usage context or exclusions are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_get_image_predictionsARead-onlyIdempotentInspect
Get per-image prediction stats for an eval (paginated).
Returns {split, confidenceThreshold, totalImages, offset, limit,
images: [...]}. Each image carries TP/FP/FN counts, precision,
recall, F1, the assigned cluster, and a per-class confusion list.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Page size (default 200, max 1000). | |
| split | No | Split to inspect — defaults to 'test'. | |
| offset | No | Offset into the result set for pagination. | |
| eval_id | Yes | Model evaluation id. | |
| confidence | No | Integer confidence threshold 0-100. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnly, idempotent, non-destructive behavior. The description adds pagination details and return format, which supplements the annotations without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences: first states purpose, second details return format, third is absent. Front-loaded and efficient with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema and rich annotations, the description covers the return format and pagination. It is complete for a read-only evaluation stats tool with clear parameters.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
All parameters have descriptions in the schema (100% coverage). The description mentions return fields but adds minimal extra meaning beyond the schema, meeting the baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's function: 'Get per-image prediction stats for an eval (paginated).' It specifies the verb, resource, and scope, and distinguishes itself from sibling tools like model_evals_get_confidence_sweep and model_evals_get_confusion_matrix.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives, such as other model_evals_get_* tools. The description only declares what it does, without context on selection or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_get_map_resultsARead-onlyIdempotentInspect
Get per-split mAP results for an eval.
Returns {splits: {train, valid, test}} where each split has
overall map50 / map50_95 / map75, byObjectSize
(small/medium/large), and perClass breakdowns.
| Name | Required | Description | Default |
|---|---|---|---|
| eval_id | Yes | Model evaluation id. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint false. The description adds behavioral detail by specifying the exact return structure (e.g., splits, map50/75, byObjectSize, perClass), which is valuable beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (3 sentences) and front-loaded with the primary purpose. The code block for return structure is informative, though slightly verbose for a tool description.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema and the detailed return description, the tool is fully documented. All key aspects (purpose, parameters, behavior, output) are covered, making it complete for an information-retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description does not add extra meaning to the eval_id parameter beyond what the schema already provides ('Model evaluation id').
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (Get) and the resource (per-split mAP results for an eval), specifying exactly what data is returned, which distinguishes it from sibling tools like model_evals_get_confidence_sweep or model_evals_get_image_predictions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies it's for obtaining mAP metrics per split, but it does not explicitly state when to use this tool versus other model_evals_* tools, leaving it to the agent to infer from the specialization.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_get_performance_by_classBRead-onlyIdempotentInspect
Get per-class performance metrics for a single split.
Returns {split, classes: [{className, map50, map50_95, map75,
precision, recall, f1, optimalThreshold}, ...]}.
| Name | Required | Description | Default |
|---|---|---|---|
| split | No | Split to inspect — defaults to 'test'. 'all' is rejected by the API. | |
| eval_id | Yes | Model evaluation id. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description adds the return structure but no additional behavioral traits (e.g., auth needs, rate limits, side effects). It does not contradict annotations (no contradiction).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences: first states purpose, second gives return format. No extraneous text. Front-loaded and efficient. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (two parameters, one required) and presence of an output schema, the description covers the essentials. It explains what is returned and the default split behavior. Could add more context about when classes appear or edge cases, but it is complete for most use cases.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, and both parameters have descriptive schema text. The description repeats the split default and rejection of 'all' already in the schema, adding no new semantics. Baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'Get per-class performance metrics for a single split' with a specific verb and resource. It also provides the return format. However, it does not explicitly differentiate from sibling tools like model_evals_get_map_results or model_evals_get_confidence_sweep, though the name and description imply the scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives. The description does not mention when-not-to-use, prerequisites, or exclusions. Sibling tools exist for other metrics (e.g., confusion matrix, recommendations), but the description offers no direction on choosing among them.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_get_recommendationsARead-onlyIdempotentInspect
Get the LLM-generated recommendations for an eval, if available.
Returns {generated: false} until the recommendations job has
run. When ready, returns {generated: true, generatedAt,
recommendations: {summary, items}}.
| Name | Required | Description | Default |
|---|---|---|---|
| eval_id | Yes | Model evaluation id. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true; description adds behavioral context by stating it returns {generated: false} until the job runs and then includes recommendations. No auth or rate limit details are disclosed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences covering purpose, return format, and behavioral nuance. No extraneous information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Output schema exists (not shown here), so description explains return shape. With annotations, it's fairly complete, though could mention if recommendations are always available or prerequisites.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% description coverage for eval_id. The description does not add additional meaning beyond what the schema provides, so baseline 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool gets LLM-generated recommendations for an eval, distinguishing it from sibling tools like model_evals_get which likely retrieves the eval itself.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Usage is implied (when you want recommendations after an eval), but no explicit when-not or alternatives are provided. The description lacks guidance on prerequisites or context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_get_vector_analysisARead-onlyIdempotentInspect
Get UMAP + HDBSCAN clustering of image embeddings for an eval.
Returns {clustering, preprocessing, clusters: [...]} — useful
for finding pockets of poor-performing images. Each cluster carries
numImages, splitDistribution, F1 stats, and sample image refs.
| Name | Required | Description | Default |
|---|---|---|---|
| eval_id | Yes | Model evaluation id. | |
| confidence | No | Integer confidence threshold 0-100 for cluster F1 stats. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, openWorldHint, and idempotentHint; description adds value by detailing the return structure and cluster attributes beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences: first states purpose, second shows return format, third explains utility. No wasted words and front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists, the description sufficiently covers purpose, return structure, and use case. No gaps for this moderately complex tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage; the description does not add extra meaning beyond what is already in the schema for eval_id and confidence.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool retrieves UMAP and HDBSCAN clustering of image embeddings for an eval, distinguishing it from other model_evals_get_* tools like confusion matrix or confidence sweep.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions it is useful for finding poor-performing image pockets, implying when to use, but lacks explicit guidance on when not to use or alternative tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
model_evals_listARead-onlyIdempotentInspect
List model evaluations in the current workspace.
Returns {evals: [...]} where each entry has evalId, status,
project (URL slug), versionId, modelId, and createdAt. Use
model_evals_get for the headline summary (mAP/precision/recall)
on a specific eval.
At most one of project / version / model may be set per
call (most-specific wins). Combinations return 400.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of evals to return (default 50, max 200). | |
| model | No | Filter by model id. | |
| status | No | Filter by eval status. | |
| project | No | Filter by project URL slug (e.g. 'chess-pieces-fmhpz'). | |
| version | No | Filter by version number (string). |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnly, openWorld, idempotent, non-destructive. Description adds constraints on filter combinations and the return structure, providing behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two focused sentences plus a constraint note. No fluff, front-loaded with purpose, and uses clear structure.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With output schema present, description doesn't need to detail return values. It covers filtering constraints, sibling differentiation, and basic behavior. Complete for a list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema descriptions cover all 5 parameters (100% coverage). Description adds the rule that at most one of project/version/model may be set, returning 400 otherwise, which is crucial usage info not in schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it lists model evaluations in the current workspace, specifying the returned fields. Distinguishes from sibling model_evals_get by directing to it for detailed summary.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly tells when to use model_evals_get instead for detailed summaries on a specific eval. Also warns that at most one filter out of project/version/model can be set, avoiding errors.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_getBRead-onlyIdempotentInspect
Get details for a trained model.
| Name | Required | Description | Default |
|---|---|---|---|
| model_id | Yes | Model id (URL slug). |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a read-only, non-destructive, idempotent operation with open-world data, so the description doesn't need to repeat these safety traits. It adds value by specifying that it retrieves 'details' for a 'trained model', which implies comprehensive metadata beyond basic info, but doesn't elaborate on what those details include or any behavioral nuances like error handling.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with no wasted words. It's front-loaded with the core action and resource, making it easy to parse quickly without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the annotations cover key behavioral aspects (read-only, non-destructive, etc.) and an output schema exists (so return values are documented elsewhere), the description is reasonably complete for a simple lookup tool. However, it could be more helpful by clarifying the scope of 'details' or distinguishing it from sibling tools, especially since the context includes many model-related alternatives.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema fully documents the single required parameter 'model_id' as a 'Model id (URL slug)'. The description doesn't add any parameter-specific information beyond implying that 'model_id' corresponds to a 'trained model', which is already inferred from the schema. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get details') and resource ('trained model'), making the purpose immediately understandable. However, it doesn't differentiate this from sibling tools like 'models_get_training_status' or 'models_list', which also retrieve model-related information but with different scopes or details.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to choose 'models_get' over 'models_list' (for listing all models) or 'models_get_training_status' (for status details), nor does it specify prerequisites like needing a specific model ID.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_get_training_statusARead-onlyIdempotentInspect
Get the training progress and metrics for a dataset version.
Use this tool to check on a training job started with models_train.
Returns training status, progress (current/total epochs), latest metrics (mAP, loss), and the URL to view training in the dashboard.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number being trained |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare this as read-only, non-destructive, idempotent, and open-world, covering the core safety profile. The description adds valuable behavioral context by specifying what information is returned (training status, progress, metrics, dashboard URL) and that it's for monitoring ongoing training jobs, which enhances understanding beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly structured: first sentence states the purpose, second provides usage guidance, third details the return values. Every sentence earns its place with zero waste, and it's front-loaded with the most important information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given that annotations cover safety aspects, schema covers parameters fully, and an output schema exists (so return values don't need explanation in description), the description provides exactly what's needed: clear purpose, usage guidance, and context about what information is returned. It's complete for this tool's complexity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters (project_id and version_number) fully documented in the schema. The description doesn't add any parameter-specific information beyond what's already in the schema, so it meets the baseline of 3 for adequate coverage without extra value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Get the training progress and metrics') and resource ('for a dataset version'), distinguishing it from sibling tools like models_train (which starts training) and models_get (which likely retrieves model metadata). It precisely identifies what this tool does.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use this tool: 'Use this tool to check on a training job started with models_train.' This provides clear context and directly names the alternative tool (models_train) for comparison, giving perfect guidance on when this tool is appropriate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_inferARead-onlyIdempotentInspect
Run hosted inference on an image using a trained model. Returns JSON predictions only. For visualized/annotated images, use workflow_specs_run with a visualization block instead.
| Name | Required | Description | Default |
|---|---|---|---|
| image | Yes | Image as an HTTPS URL or base64-encoded string. | |
| overlap | No | Overlap threshold between 0.0 and 1.0 for object detection | |
| model_id | Yes | Hosted model as 'project_id/version'. | |
| confidence | No | Confidence threshold between 0.0 and 1.0 | |
| project_type | No | Optional project type override. If omitted, MCP derives it from project metadata. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it specifies the output format ('Returns JSON predictions only'), which annotations don't cover. Annotations already declare readOnlyHint=true, destructiveHint=false, openWorldHint=true, and idempotentHint=true, so the agent knows this is a safe, repeatable query operation. The description complements this by clarifying the output type, earning a high score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and front-loaded: the first sentence states the core purpose, and the second provides critical usage guidance. Every sentence earns its place with no wasted words, making it highly efficient for an AI agent to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (inference with multiple parameters), rich annotations (covering safety and behavior), and the presence of an output schema, the description is complete enough. It clarifies the purpose, distinguishes from siblings, and specifies output format, addressing key contextual gaps without redundancy.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description does not add any parameter-specific information beyond what the input schema provides. However, with schema description coverage at 100%, the baseline score is 3. The schema already fully documents all parameters (image, overlap, model_id, confidence, project_type), so no additional compensation is needed from the description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Run hosted inference on an image using a trained model') and distinguishes it from a sibling tool ('For visualized/annotated images, use workflow_specs_run with a visualization block instead'). It identifies both the verb (run inference) and resource (image with trained model) with precision.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool versus an alternative: 'For visualized/annotated images, use workflow_specs_run with a visualization block instead.' This directly addresses the key decision point between this tool and its sibling, offering clear exclusion criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_listARead-onlyIdempotentInspect
List trained models associated with a project.
Each row carries metrics and (for NAS children) nasFamily, group,
train.results.{hardware,latency,map5095,paretoOptimalFor}, plus a
derived recommended flag when the model is the recommended pick for
any (metric, hardware) bucket on its parent version.
Pass group=<modelGroup> to filter to a single NAS run — that is the
canonical "list NAS models per run" path.
| Name | Required | Description | Default |
|---|---|---|---|
| group | No | Optional NAS modelGroup to scope the list to a single NAS run. Get this value from trainings_get_results. | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a read-only, non-destructive, idempotent operation with open-world data. The description adds value by clarifying the scope ('trained models associated with a project'), which isn't covered by annotations. It doesn't contradict annotations—listing is consistent with read-only behavior—and provides useful context about what data is returned.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that efficiently conveys the tool's purpose without unnecessary words. It's front-loaded with the core action and resource, making it easy to parse. Every part of the sentence earns its place by specifying scope ('trained models') and context ('associated with a project').
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (one required parameter), rich annotations (covering safety and behavior), and presence of an output schema (which handles return values), the description is reasonably complete. It covers the essential 'what' and 'scope', though it could be enhanced with more usage guidance. For a simple list operation with good structured data, this is sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the single parameter 'project_id' fully documented in the schema. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., no examples or constraints). Baseline is 3 since the schema handles parameter documentation adequately, and the description doesn't compensate with extra insights.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List') and resource ('trained models associated with a project'), making the purpose immediately understandable. It distinguishes from siblings like 'models_get' (singular) and 'models_train' (creation), though it doesn't explicitly name alternatives. The description avoids tautology by specifying scope beyond just the name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by mentioning 'associated with a project', suggesting this tool is for retrieving models within a specific project. However, it doesn't provide explicit guidance on when to use this versus alternatives like 'models_get' (for single model details) or 'projects_list' (for listing projects). No exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_star_nasAIdempotentInspect
Star or unstar a NAS-trained model.
NAS-only by design — the server rejects non-NAS modelTypes with a
MODEL_NOT_NAS error. Starring triggers TRT compilation for the
model's recommended hardware so the model becomes deployable as an
edge target.
| Name | Required | Description | Default |
|---|---|---|---|
| starred | No | True to star, false to unstar. | |
| model_id | Yes | Public model id (the `url` field from models_list[] or models[].modelId from trainings_get_results), e.g. 'beer-can-hackathon-410-nas-gpu-b'. Just the bare id — workspace prefix added automatically. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate idempotency and non-destructiveness. The description adds critical behavioral context: triggering TRT compilation and the server error for non-NAS models. No contradictions; description enriches understanding beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, straight to the point, with no extraneous information. Every sentence adds value: first defines action and scope, second explains limitation and side effect. Highly concise and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (2 parameters, 1 required) and the presence of an output schema, the description is fully adequate. It covers purpose, constraints, and side effects, leaving no major gaps for an agent to understand correct usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and each parameter has a clear description in the schema. The tool description does not add significant new semantic meaning beyond what the schema already provides, so a baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Star or unstar a NAS-trained model', identifying the verb and resource. It distinguishes from sibling tools by specifying NAS-only scope and triggering TRT compilation, making the tool's unique role evident.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context: 'NAS-only by design — the server rejects non-NAS modelTypes' tells exactly when to use and when to avoid. It also explains the effect of starring, but does not explicitly contrast with alternatives (though none exist). Leaves little ambiguity about applicability.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_trainAInspect
Start training a model on a dataset version.
IMPORTANT: A dataset version must exist before training. Use the versions_generate tool first to create one with the desired preprocessing and augmentation settings.
IMPORTANT: Each version can only have ONE trained model. If this version already has a model, you must generate a new version first with versions_generate, then train on that new version.
This tool validates prerequisites before starting training: it checks the version has no existing model and that the required dataset export is ready. If the export is not ready, it will be triggered automatically — wait ~30 seconds and retry.
Training runs in the background on Roboflow servers.
| Name | Required | Description | Default |
|---|---|---|---|
| speed | No | Training speed (deprecated — model_type is usually sufficient) | |
| epochs | No | Number of training epochs (max 300) | |
| checkpoint | No | Checkpoint to initialize from (COCO, a Universe model, or a previous version) | |
| model_type | No | Model architecture ID. Examples: 'rfdetr-medium' (object detection, recommended), 'yolov11n' (fast YOLO), 'rfdetr-seg-medium' (instance segmentation), 'yolo26n-pose' (keypoint), 'vit-base-patch16-224-in21k' (classification), 'qwen3_5-2b-peft' (multimodal/VLM). For NAS sweeps: 'rfdetr-nas-parent' (requires ≥15 validation images). If an invalid ID is passed, the API returns the full list of valid model types for the project's task. Read the training-and-evaluation skill for the complete list and model selection guidance. | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number to train on | |
| business_context | No | Brief description of the business problem being solved. The AI assistant should fill this from conversation context. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond what annotations provide: it explains the asynchronous nature ('returns immediately', 'training runs in the background'), specifies the prerequisite checks needed, and mentions the return includes a progress monitoring URL. Annotations already indicate this is not read-only, not idempotent, and not destructive, but the description enriches this with operational details without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with clear sectioning: it states the core purpose upfront, provides two IMPORTANT prerequisite warnings, explains the asynchronous behavior, and describes the return value. Every sentence serves a distinct purpose with zero redundancy, making it easy to scan and understand.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (asynchronous training operation with prerequisites), the description provides complete contextual guidance. It covers prerequisites, behavioral expectations, and return information. With both comprehensive annotations and an output schema available, the description appropriately focuses on operational context rather than repeating structured data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already documents all 7 parameters thoroughly. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., it doesn't explain default behaviors for optional parameters like speed or epochs). This meets the baseline expectation when schema coverage is complete.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Start training a model') on a specific resource ('on a dataset version'). It distinguishes this tool from siblings like models_get_training_status (monitoring) and versions_generate (prerequisite). The verb 'start' accurately reflects the asynchronous nature of the operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit, imperative guidance on when to use this tool: it names two prerequisite tools (versions_generate and versions_get) that must be called first, specifies the conditions that must be met (dataset version must exist, must have train/validation images), and distinguishes this from monitoring tools by noting it 'returns immediately' while training runs in background.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_createBInspect
Create a new computer vision project.
Returns the created project's id, name, type, and url.
| Name | Required | Description | Default |
|---|---|---|---|
| license | No | Project license. Required for public/research workspaces. | |
| annotation | Yes | What you're annotating (e.g. 'defects', 'vehicles') | |
| project_name | Yes | Display name for the project | |
| project_type | Yes | Computer vision task type | |
| business_context | No | Brief description of the business problem being solved. The AI assistant should fill this from conversation context. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a write operation (readOnlyHint=false) that's non-destructive and non-idempotent. The description adds minimal behavioral context beyond this - it mentions the return values (which the output schema would cover) but doesn't provide additional behavioral details like authentication requirements, rate limits, or what happens if creation fails.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two focused sentences: one stating the action and one describing the return values. Every word earns its place, and the information is front-loaded with the primary purpose stated first.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a creation tool with comprehensive annotations and a complete input schema, the description provides adequate context. The presence of an output schema means the description doesn't need to detail return values. However, it could better address when to use this tool versus alternatives given the sibling tools available.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already fully documents all 5 parameters. The description adds no parameter-specific information beyond what's in the schema. The baseline score of 3 reflects adequate coverage when the schema does all the work.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Create a new computer vision project') and the resource ('computer vision project'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this tool from sibling tools like 'workflows_create' or 'annotation_jobs_create' which might also create related resources.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. There's no mention of prerequisites, when this should be used instead of other creation tools like 'workflows_create', or what context should trigger its use. The agent must infer usage from the tool name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_forkAInspect
Enqueue an async fork of a public Universe project into your workspace.
Provide either a Universe url or source_project.
Returns the platform's 202 payload verbatim — at minimum
{ taskId, url }, where url points at the async-task polling
endpoint. Poll completion with async_tasks_get(task_id=taskId)
every 5 seconds until status is terminal (completed or failed).
Processing may take up to 30 seconds to start.
| Name | Required | Description | Default |
|---|---|---|---|
| url | No | Full Universe project URL, e.g. https://universe.roboflow.com/workspace/project | |
| source_project | No | Universe project slug to fork into your workspace; optional if url is provided |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description fully discloses the async nature, the need to poll for completion, the expected wait time (up to 30 seconds), and the minimum response fields. This adds significant value beyond the annotations, which only provide hints about readOnly, openWorld, idempotent, and destructive behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (6 sentences), front-loaded with the main action, and every sentence adds necessary information without redundancy. It is well-structured and efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (async, polling needed, two parameters), the description covers all critical aspects: how to invoke it, what the response contains, how to check completion, and approximate timing. The output schema is mentioned, so further detail is not needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage for both parameters, and the description reinforces that either url or source_project should be provided, with an example URL. While the schema already documents the parameters, the description adds context about their relationship and usage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool enqueues an async fork of a public Universe project into the user's workspace, specifying inputs (url or source_project) and the response shape, which distinguishes it from sibling tools like projects_create or projects_get.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear usage context: it tells the agent to provide either a Universe url or source_project, and explains that the response includes a taskId for polling via async_tasks_get. It does not explicitly state when not to use it or list alternatives, but the guidance is strong enough for correct selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_getARead-onlyIdempotentInspect
Get detailed info about a project including versions, classes, splits, and trained models.
Returns full project details with workspace, project info, versions, and classes.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (readOnly, openWorld, idempotent, non-destructive). The description adds context about what information is returned (workspace, project info, versions, classes) which helps the agent understand the scope of data retrieved. However, it doesn't mention rate limits, authentication requirements, or pagination behavior beyond what annotations cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two clear sentences that efficiently convey the tool's purpose and return value. The first sentence states what the tool does, the second clarifies the scope of returned data. No redundant information or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has comprehensive annotations, 100% schema coverage, and an output schema exists, the description provides adequate context. It clearly explains this is a retrieval operation for detailed project information. The main gap is lack of explicit guidance on when to use versus sibling tools like 'projects_list'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% with the single parameter 'project_id' fully documented in the schema. The description doesn't add any parameter-specific information beyond what's in the schema. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but doesn't need to.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose as 'Get detailed info about a project' with specific components mentioned (versions, classes, splits, trained models). It distinguishes from siblings like 'projects_list' (which likely lists projects without details) and 'projects_create' (which creates rather than retrieves). However, it doesn't explicitly contrast with 'projects_list' in the description text itself.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when detailed project information is needed, but doesn't explicitly state when to use this tool versus alternatives like 'projects_list' (for listing projects) or 'versions_get' (for specific version details). No explicit 'when-not' guidance or prerequisite context is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_healthARead-onlyIdempotentInspect
Get dataset health check for a project.
Running this for the first time or with regenerate=True might take up to a couple minutes.
Returns class balance, image quality issues, annotation statistics, and other dataset health metrics.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| regenerate | No | Set to true to regenerate the health report from scratch. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds context about potential multi-minute latency and the nature of returned metrics, which is useful beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences: purpose, caveat, then return items. Every sentence adds value, no fluff, well front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the description adequately covers purpose, timing, and return contents. It is complete for a 2-param tool with good annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage, with both parameters explained. The description only adds a timing remark for regenerate, which does not significantly enhance semantic understanding beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb 'Get' and resource 'dataset health check', and distinguishes from siblings as no other tool mentions health checks. It clearly states what it returns (class balance, image quality, etc.).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides a timing caveat for first use or with regenerate=True, implying when to expect delays. However, it does not explicitly state when to use this tool vs alternatives, nor does it mention when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_listARead-onlyIdempotentInspect
List all projects in the workspace associated with the API key.
Returns a list of projects with id, name, type, image count, and more.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (readOnlyHint: true, openWorldHint: true, idempotentHint: true, destructiveHint: false), so the description's burden is lower. It adds some value by specifying the return format ('list of projects with id, name, type, image count, and more'), but doesn't disclose additional behavioral traits like pagination, rate limits, or authentication requirements beyond the API key mention.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and well-structured with just two sentences: the first states the purpose, and the second describes the return format. Every word earns its place, and it's front-loaded with the core functionality.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (0 parameters, rich annotations, and an output schema exists), the description is reasonably complete. It covers the purpose and return format, though it could benefit from usage guidelines relative to siblings. The output schema likely details the return values, so the description doesn't need to fully explain them.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0 parameters and 100% schema description coverage, the schema fully documents the lack of inputs. The description appropriately doesn't discuss parameters, which is correct for this tool, earning a high baseline score. No additional parameter semantics are needed or provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List all projects') and resource ('projects in the workspace associated with the API key'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this from sibling tools like 'projects_get' or 'projects_create', which would be needed for a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'projects_get' (for a single project) or 'projects_create' (for creating new projects). It also doesn't mention any prerequisites, context, or exclusions for usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trainings_cancelADestructiveIdempotentInspect
Cancel an in-flight training run.
Works for any architecture. For NAS runs the underlying handler
accepts the mining status — same call cancels mining or training
phases.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug; 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number whose training to cancel. | |
| continue_if_no_refund | No | If true, cancel even when the run is past the refund window. Default false (the server will return refund:false without cancelling). |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already mark destructiveHint=true and idempotentHint=true. The description adds useful behavioral context: it works for any architecture and for NAS runs the same call cancels mining or training phases. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with primary action, efficient coverage of applicability and NAS details. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given output schema exists and parameters are well-documented, the description covers the core action and edge cases (NAS). Missing explicit differentiation from sibling 'trainings_stop', but overall sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage with clear parameter descriptions. The description adds no additional parameter semantics beyond the schema, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Cancel' and resource 'training run', with specificity 'in-flight'. It distinguishes from siblings like 'trainings_stop' by noting it works for any architecture and handles NAS phases.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use (cancelling in-flight runs) and adds NAS-specific nuance. However, it does not explicitly compare to sibling 'trainings_stop' or state when not to use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trainings_get_resultsARead-onlyIdempotentInspect
Run-level training results bundle.
For NAS sweeps (one Training produces many child Models) returns:
{ trainingId, status, modelGroup, modelCount,
recommendedByHardware: {[hardware]: modelId},
mining?: { mining: {progress, frontier, ...}, baseline? },
models: [ { modelId, nasFamily, metrics, recommended }, ... ] }.
For non-NAS trainings returns a minimal bundle with the produced model.
Use this for the run-level dashboard. For full per-model metadata, call models_list with the returned modelGroup.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug; 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number whose training to summarize. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations (readOnlyHint, idempotentHint, destructiveHint) already indicate a safe read operation. The description adds behavioral detail about conditional returns (NAS vs non-NAS), though it doesn't elaborate on error states. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and front-loaded with the purpose. It uses a clear structure with examples for NAS and non-NAS, though the code block makes it slightly longer than necessary.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (different return structures for NAS vs non-NAS) and the presence of an output schema, the description adequately covers what the agent needs to know. It explains when to use the tool and what to expect.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% coverage with clear descriptions for both parameters. The description does not add extra meaning beyond the schema, so baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it returns 'Run-level training results bundle' and distinguishes between NAS sweeps (returns many models) and non-NAS trainings (minimal bundle). This ensures a specific verb+resource match and differentiates from siblings like models_list.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says 'Use this for the run-level dashboard. For full per-model metadata, call models_list with the returned modelGroup.' This provides clear context on when to use this tool versus alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trainings_stopADestructiveIdempotentInspect
Request an early stop on an in-flight training run.
Distinct from cancel: the run finishes the current phase gracefully (mining or training) instead of terminating immediately.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug; 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number whose training to stop early. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description discloses that the stop is graceful (finishes current phase), adding context beyond annotations which mark it as destructive but not contradicting them.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, first gives the action, second contrasts with a sibling. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Has output schema, only two required parameters, and the description covers purpose and behavioral nuance adequately for a straightforward tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema already provides 100% coverage for parameters (project_id and version_number), and the description adds minimal extra meaning beyond mentioning 'in-flight training run'.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool requests an early stop on an in-flight training run, and distinguishes it from cancel by explaining it finishes the current phase gracefully.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly contrasts with cancel, providing clear context for when to use this tool, though it doesn't mention when not to use it or list alternatives beyond cancel.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
universe_dataset_images_searchARead-onlyIdempotentInspect
Search images inside a public Universe dataset URL.
The MCP app runs inside a host iframe, so URL parsing belongs on the server. This tool accepts the selected Universe result URL and derives the workspace and project slugs before calling the same image-search backend.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of results | |
| query | No | Search prompt for image discovery inside this public dataset | |
| offset | No | Result offset for pagination | |
| dataset_url | Yes | Full Universe dataset URL, e.g. https://universe.roboflow.com/workspace/project |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint, idempotentHint, and non-destructive nature. The description adds useful context about server-side URL parsing and consistent backend, without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, efficiently front-loading the purpose, rationale, and method. No superfluous content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With output schema and annotations present, the description covers the key points. It lacks mention of error handling for invalid URLs, but overall sufficient for a search tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear parameter descriptions. The tool description does not add extra semantic value beyond what the schema provides, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description specifies the verb 'search images' and resource 'public Universe dataset URL', clearly distinguishing it from sibling tools like universe_search_app or images_search by focusing on image search within a dataset.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains the tool's specific use case (accepting a Universe dataset URL) and mentions the rationale (URL parsing on server), but does not explicitly contrast with similar tools like images_search or state when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
universe_searchARead-onlyIdempotentInspect
Search Roboflow Universe for datasets or models.
The query supports operators mixed with free-text:
Add 'model' to only return datasets with trained models
'class:helmet,person' filters by class names
'images>500' filters by image count (also >=, <, <=)
'sort:stars' sorts results (stars, images, downloads, views, updated)
'object detection' filters by project type
'updated:30d' filters by recency
Example: 'fire smoke class:fire,smoke images>200 model sort:stars'
| Name | Required | Description | Default |
|---|---|---|---|
| page | No | Page number starting at 1 | |
| limit | No | Maximum number of results | |
| query | Yes | Universe search query | |
| result_type | No | Optional result type filter |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover safety (readOnlyHint=true, destructiveHint=false) and idempotency, but the description adds valuable behavioral context: it explains the query syntax with operators (e.g., 'model', 'class:', 'images>500'), sorting options, and filtering capabilities. This goes beyond annotations by detailing how the search behaves and what inputs it supports.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, followed by detailed query syntax and an example. Every sentence adds value: the first states the purpose, the next explains operators, and the last provides an illustrative query. No wasted words, and it's structured for clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (search with operators), rich annotations, 100% schema coverage, and the presence of an output schema, the description is complete. It explains the search behavior, query syntax, and provides an example, covering what's needed beyond structured fields without redundancy.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters. The description adds context for the 'query' parameter by explaining operators and examples, but it does not provide additional meaning for 'page', 'limit', or 'result_type' beyond what the schema states. Baseline 3 is appropriate as the schema handles most param documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Search Roboflow Universe for datasets or models.' It specifies the verb ('Search'), resource ('Roboflow Universe'), and target ('datasets or models'), distinguishing it from siblings like 'images_search' or 'models_list' which search different resources.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for usage by detailing query operators and examples, but it does not explicitly state when to use this tool versus alternatives like 'images_search' or 'models_list'. It implies usage for searching Universe content but lacks explicit comparisons or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
universe_search_appCRead-onlyIdempotentInspect
Open a Prefab Universe UI for search, visual comparison, image skim, and fork—human picks the dataset.
Use this when the next step needs human judgment or visible UX: exploring queries,
comparing public projects, opening thumbnails/previews, or confirming
which dataset to fork into the workspace. Choosing among datasets is data the user must see;
ranked JSON from universe_search alone usually cannot substitute for that decision surface.
Prefer universe_search when the agent only needs structured results,
pagination, or scripted lookup without a person clicking through options.
For labeling or editing annotations after import, send the user to the
Roboflow web app (see product-navigation skill)—this tool does not
host the full annotate editor.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, so the description's mention of 'read-only dataset image browse' aligns. However, the description does not add significant behavioral context beyond what annotations provide, such as the meaning of 'fork workflow' or any side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence with no wasted words. However, it could be more informative while remaining concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of an output schema and the presence of sibling tools, the description is too minimal. It does not explain what 'fork workflow' entails, how the tool is used, or what the output or effect is. This leaves significant gaps for an agent to correctly select and invoke the tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has no parameters, so there is no need for parameter documentation. The description does not discuss parameters, but this is acceptable as schema coverage is 100% by default.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Open Universe search' with specifics like 'fork workflow' and 'read-only dataset image browse', which gives a clear verb and resource. However, it is ambiguous whether this opens a UI or performs a search, and it does not clearly differentiate from sibling tools like 'universe_search' and 'universe_dataset_images_search'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. With siblings like 'universe_search' and 'universe_dataset_images_search', the description should indicate the context for using this specific tool, but it does not.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
versions_exportAIdempotentInspect
Check or trigger a dataset export for a version.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| export_format | Yes | Export format such as 'coco', 'yolov8', or 'jsonl' | |
| version_number | Yes | Version number |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide hints (e.g., not read-only, idempotent, non-destructive), but the description adds valuable context by specifying the dual functionality ('check or trigger'), which isn't covered by annotations. It doesn't disclose rate limits, auth needs, or detailed behavioral traits, but it compensates with operational clarity beyond the structured data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core functionality without any wasted words. It's appropriately sized for the tool's complexity, making it easy for an agent to parse and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of annotations and an output schema, the description is reasonably complete for a tool with clear parameters. It covers the basic action and resource, but could benefit from more context on usage scenarios or output interpretation, though the output schema mitigates some of this need.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema fully documents the parameters. The description doesn't add any extra meaning or examples beyond what's in the schema (e.g., it doesn't explain the interaction between 'check' and 'trigger' modes with the parameters), resulting in a baseline score of 3 as the schema handles the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('check or trigger') and resource ('dataset export for a version'), making the purpose understandable. However, it doesn't differentiate from sibling tools like 'versions_generate' or 'versions_get', which might handle related version operations, leaving some ambiguity about when to choose this specific tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as 'versions_generate' or other export-related tools that might exist. It lacks context about prerequisites, timing, or scenarios where this is the appropriate choice, leaving the agent to infer usage from the tool name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
versions_generateAInspect
Create a new dataset version with optional preprocessing and augmentation.
IMPORTANT: Before calling this tool, you MUST ask the user which preprocessing and augmentation options they want to apply. Present them with the available options listed below and let them choose. Do not assume defaults — explicitly confirm their choices before generating.
IMPORTANT: This operation can take several minutes for large datasets. You MUST spawn a sub-agent to run this tool in the background.
Returns the generated version number, image count, and split sizes.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| augmentation | No | Augmentation settings dict | |
| preprocessing | No | Preprocessing settings dict |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover readOnlyHint=false, openWorldHint=true, idempotentHint=false, and destructiveHint=false. The description adds valuable behavioral context beyond annotations: it discloses that the operation 'can take several minutes for large datasets' (performance characteristic) and specifies the return format ('Returns the generated version number, image count, and split sizes'). This provides practical guidance not captured in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized but not optimally structured. It front-loads the purpose but includes lengthy IMPORTANT sections that could be streamlined. Every sentence earns its place by providing critical guidance, but the formatting with all-caps sections reduces readability slightly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (creation tool with processing options, long-running operation), the description provides excellent contextual completeness. It covers purpose, prerequisites, performance characteristics, execution strategy (sub-agent), and return values. With annotations covering safety aspects and an output schema presumably detailing the return structure, this description gives the agent everything needed to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters. The description mentions 'optional preprocessing and augmentation' which aligns with the schema but doesn't add semantic details beyond what's in the schema descriptions. The baseline score of 3 is appropriate since the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Create a new dataset version') and resources involved ('with optional preprocessing and augmentation'). It distinguishes from sibling tools like 'versions_get' (read) and 'versions_export' (export) by emphasizing creation with processing options.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage instructions: 'Before calling this tool, you MUST ask the user which preprocessing and augmentation options they want to apply' and 'Do not assume defaults — explicitly confirm their choices before generating.' It also specifies when to use a sub-agent ('You MUST spawn a sub-agent to run this tool in the background').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
versions_getARead-onlyIdempotentInspect
Get info about a dataset version including splits and model metrics.
Returns version details with id, name, images, splits, preprocessing, augmentation, and model info if trained.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false, openWorldHint=true, and idempotentHint=true, covering safety and idempotency. The description adds context about what information is returned (splits, model metrics, preprocessing, etc.), which is useful beyond annotations, but doesn't mention rate limits, auth needs, or other behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, front-loaded with the core purpose and followed by return details. It's efficient with minimal waste, though the second sentence could be slightly more structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (2 required parameters), rich annotations, and presence of an output schema, the description is reasonably complete. It covers the purpose and return content adequately, though it could benefit from more usage context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters well-documented in the schema. The description doesn't add any parameter-specific details beyond what the schema provides, so it meets the baseline of 3 for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Get' and resource 'info about a dataset version', specifying it includes splits and model metrics. It distinguishes from sibling tools like 'projects_get' or 'models_get' by focusing on dataset versions, though it doesn't explicitly contrast with them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when needing version details, but doesn't specify when to use this tool versus alternatives like 'versions_export' or 'versions_generate'. No explicit when-not-to-use guidance or prerequisites are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_blocks_get_schemaARead-onlyIdempotentInspect
Get the full schema of a specific Workflow block.
Returns all properties, required fields, and descriptions for a
block identified by its manifest name (as returned by
workflow_blocks_list).
| Name | Required | Description | Default |
|---|---|---|---|
| manifest | Yes | Manifest key of the block (e.g. 'ObjectDetectionModelManifest') |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover read-only, open-world, idempotent, and non-destructive traits, but the description adds value by specifying that it returns 'all properties, required fields, and descriptions,' which clarifies the output scope beyond what annotations provide. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by additional details in a second sentence. Both sentences are essential, with no wasted words, making it highly efficient and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter), rich annotations, and the presence of an output schema, the description is complete. It adequately explains the purpose, usage, and output scope without needing to detail return values or behavioral traits already covered elsewhere.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the parameter 'manifest' fully documented in the schema. The description adds minimal semantics by noting it's a 'manifest name' from 'workflow_blocks_list,' but this is redundant with the schema's description. Baseline 3 is appropriate as the schema carries the burden.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Get' and the resource 'full schema of a specific Workflow block,' specifying it returns properties, required fields, and descriptions. It distinguishes from sibling 'workflow_blocks_list' by focusing on a single block's schema rather than listing blocks.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use this tool: for a block identified by its manifest name, as returned by 'workflow_blocks_list.' This provides clear context and an alternative tool, guiding the agent on proper usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_blocks_listARead-onlyIdempotentInspect
List all available Workflow blocks with a short summary of each.
Returns a list of blocks, each with manifest (schema key), name,
block_type, and short_description. Use workflow_blocks_get_schema
to get the full schema of a specific block.
| Name | Required | Description | Default |
|---|---|---|---|
| block_type | No | Filter by block category. If omitted, all blocks are returned. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (readOnly, openWorld, idempotent, non-destructive), so the bar is lower. The description adds valuable context about the return format ('list of blocks, each with manifest, name, block_type, and short_description') and the filtering capability via the block_type parameter. However, it doesn't mention pagination, rate limits, or authentication requirements, keeping it from a perfect score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with three sentences that each serve a distinct purpose: stating the tool's function, providing usage guidance, and describing the return format. There is no wasted text, and key information is front-loaded appropriately.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (single optional parameter), comprehensive annotations, and the presence of an output schema, the description provides complete contextual information. It explains what the tool does, when to use it, what it returns, and how it relates to other tools, leaving no significant gaps for an AI agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% with the parameter fully documented in the schema. The description mentions 'Filter by block category' which aligns with the schema but doesn't add significant semantic value beyond what's already in the structured data. This meets the baseline expectation when schema coverage is complete.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verb ('List') and resource ('all available Workflow blocks'), and distinguishes it from its sibling 'workflow_blocks_get_schema' by explaining this tool provides summaries while the sibling provides full schemas. This explicit differentiation earns the highest score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('to discover which blocks can be used when building a Workflow definition') and when to use an alternative ('To get the full schema... call workflow_blocks_get_schema'). This clear context and named alternative meet the criteria for a perfect score.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_createAInspect
Create and save a new Workflow in the workspace.
IMPORTANT: Always validate the config with workflow_specs_validate before creating the workflow.
The config is the same JSON format used by workflow_specs_run and workflow_specs_validate. Once saved, the workflow can be executed by ID via workflows_run.
Returns the created workflow including its document ID. Save this ID — it is required for workflows_update.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Human-readable workflow name | |
| config | Yes | Workflow JSON definition with 'version', 'inputs', 'steps', and 'outputs' |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover basic hints (e.g., not read-only, not destructive), but the description adds valuable behavioral context: it specifies that the config must be validated first, describes the return value ('Returns the created workflow including its document ID'), and advises saving the ID for future use with workflows_update. This goes beyond annotations without contradicting them.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with the core purpose. Each sentence adds value: the first states the action, the second gives a critical prerequisite, the third explains config format and execution, and the fourth details the return. There is no wasted text, making it efficient and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (creation tool with validation prerequisite), rich annotations, and the presence of an output schema, the description is complete. It covers purpose, usage guidelines, behavioral details like validation and ID usage, and references to sibling tools, without needing to explain return values since an output schema exists.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, providing clear details for both parameters (name and config). The description adds some semantics by noting that the config uses the 'same JSON format' as workflow_specs_run and workflow_specs_validate, but this is minimal enhancement. Baseline 3 is appropriate as the schema already does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Create and save') and resource ('a new Workflow in the workspace'), making the purpose specific. It distinguishes from sibling tools like workflows_get, workflows_list, workflows_update, and workflows_run by focusing on creation rather than retrieval, listing, modification, or execution.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: it instructs to 'Always validate the config with workflow_specs_validate before creating the workflow,' naming a specific alternative tool. It also mentions that the created workflow can be executed via workflows_run, offering context on related actions without redundancy.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_getBRead-onlyIdempotentInspect
Get details for a saved workflow.
| Name | Required | Description | Default |
|---|---|---|---|
| workflow_id | Yes | Workflow URL slug or ID |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already cover key behavioral traits (readOnlyHint: true, destructiveHint: false, etc.), so the bar is lower. The description adds minimal context beyond this, stating it retrieves details but not elaborating on aspects like error handling or response format. It doesn't contradict annotations, but offers limited additional behavioral insight.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with no wasted words, making it highly concise and front-loaded. Every part of the sentence directly contributes to understanding the tool's purpose without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter), rich annotations, and the presence of an output schema, the description is reasonably complete. It covers the basic action, though it could benefit from more context on usage relative to siblings or error cases, but the structured data compensates well.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents the single parameter 'workflow_id'. The description adds no extra meaning about parameters beyond what the schema provides, such as format examples or usage tips, meeting the baseline for high coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Get') and resource ('details for a saved workflow'), making the purpose evident. However, it doesn't differentiate from sibling tools like 'workflows_list' or 'workflows_get' (if another exists), which would require more specificity to score a 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives such as 'workflows_list' for listing workflows or 'workflows_get' for other retrieval contexts. It lacks explicit when/when-not instructions or named alternatives, offering only basic usage context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_listARead-onlyIdempotentInspect
List saved workflows in the current workspace.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already cover key behavioral traits (read-only, open-world, idempotent, non-destructive), so the bar is lower. The description adds minimal context beyond this, specifying the scope ('current workspace') but not detailing aspects like pagination, sorting, or response format. It doesn't contradict annotations, providing some value but limited behavioral insight.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. Every word earns its place, making it highly concise and well-structured for quick understanding.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (0 parameters, no nested objects), rich annotations, and presence of an output schema, the description is reasonably complete. It covers the basic purpose and scope, though it could benefit from more usage guidance or behavioral details to be fully comprehensive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0 parameters and 100% schema description coverage, the baseline is high. The description doesn't need to explain parameters, and it appropriately avoids redundant information. It adds value by clarifying the resource scope ('saved workflows in the current workspace'), which is useful semantic context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List') and resource ('saved workflows'), with the scope 'in the current workspace' providing useful context. However, it doesn't explicitly differentiate from sibling tools like 'workflow_blocks_list' or 'projects_list', which also list resources in the workspace, missing full sibling distinction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'workflows_get' (for a single workflow) or 'workflow_specs_run' (for execution). It lacks explicit when/when-not instructions or named alternatives, offering only basic context without usage differentiation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_specs_runAInspect
Execute a Workflow from an inline JSON definition.
Unlike workflows_run which runs a saved workflow by ID,
this tool accepts a full workflow JSON spec and executes it
directly. Useful for testing workflows before saving them, or
for running an agent-built draft without publishing — pass
the specification returned by agent_chat.
IMPORTANT: Always call workflow_specs_validate first to
check the definition is valid before running it.
IMPORTANT: Images must be public URLs or base64-encoded data. Local file paths do NOT work — the API runs remotely and cannot access your filesystem.
Returns workflow outputs as defined by the workflow's output blocks.
| Name | Required | Description | Default |
|---|---|---|---|
| images | Yes | Map of input names to image values (HTTPS URLs or base64). Example: {'image': 'https://...'} | |
| parameters | No | Optional runtime parameters defined in the workflow | |
| specification | Yes | Full Workflow JSON definition with 'version', 'inputs', 'steps', and 'outputs' |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover basic hints (e.g., not read-only, not destructive), but the description adds valuable behavioral context beyond annotations: it advises validation before execution, recommends background processing for large image sets to prevent user blocking, and mentions that it returns workflow outputs. This enhances transparency without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with the core purpose, followed by sibling differentiation and usage guidelines. Every sentence adds value (e.g., testing use case, validation requirement, background processing advice, return information), with no wasted words, making it efficient and clear.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (executing workflows with inline JSON), the description is complete: it covers purpose, sibling differentiation, usage guidelines, behavioral advice (validation and background processing), and output information. With annotations providing safety hints and an output schema existing, the description adds necessary context without redundancy, making it fully adequate for agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description does not add specific parameter details beyond what the schema provides (e.g., it mentions 'full workflow JSON spec' but doesn't elaborate on structure). Baseline 3 is appropriate as the schema handles parameter documentation adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Execute[s] a Workflow from an inline JSON definition,' specifying the verb (execute) and resource (workflow). It explicitly distinguishes it from sibling 'run_workflow' by noting this tool uses an inline JSON spec versus a saved workflow ID, providing clear differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidelines: it states when to use this tool (for testing workflows before saving) versus 'run_workflow' (for saved workflows by ID). It includes two IMPORTANT notes: always call 'workflow_specs_validate' first for validation, and spawn a sub-agent for more than 10 images to avoid blocking the user, offering clear when/when-not and alternative actions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_specs_validateARead-onlyIdempotentInspect
Validate a Workflow JSON definition without executing it.
Check whether a workflow definition is syntactically and semantically correct before saving or running it.
Example workflow definition — detects objects, enlarges bounding boxes, crops, runs a second detection filtering for dogs, and classifies the breed only when exactly one dog is found:
.. code-block:: json
{
"version": "1.0",
"inputs": [
{"type": "WorkflowImage", "name": "image"}
],
"steps": [
{
"type": "ObjectDetectionModel",
"name": "first_detection",
"image": "$inputs.image",
"model_id": "yolov8n-640"
},
{
"type": "DetectionsTransformation",
"name": "enlarging_boxes",
"predictions": "$steps.first_detection.predictions",
"operations": [
{"type": "DetectionsOffset", "offset_x": 50, "offset_y": 50}
]
},
{
"type": "Crop",
"name": "first_crop",
"image": "$inputs.image",
"predictions": "$steps.enlarging_boxes.predictions"
},
{
"type": "ObjectDetectionModel",
"name": "second_detection",
"image": "$steps.first_crop.crops",
"model_id": "yolov8n-640",
"class_filter": ["dog"]
},
{
"type": "ContinueIf",
"name": "continue_if",
"condition_statement": {
"type": "StatementGroup",
"statements": [
{
"type": "BinaryStatement",
"left_operand": {
"type": "DynamicOperand",
"operand_name": "prediction",
"operations": [{"type": "SequenceLength"}]
},
"comparator": {"type": "(Number) =="},
"right_operand": {
"type": "StaticOperand",
"value": 1
}
}
]
},
"evaluation_parameters": {
"prediction": "$steps.second_detection.predictions"
},
"next_steps": ["$steps.classification"]
},
{
"type": "ClassificationModel",
"name": "classification",
"image": "$steps.first_crop.crops",
"model_id": "dog-breed-xpaq6/1"
}
],
"outputs": [
{
"type": "JsonField",
"name": "dog_classification",
"selector": "$steps.classification.predictions"
}
]
}Key patterns shown above:
$inputs.<name>references a workflow input.$steps.<step_name>.<output>references another step's output.ContinueIfenables conditional branching based on runtime values.Steps can chain: detect → transform → crop → detect → classify.
Returns validation status. A valid workflow returns
{"status": "ok"}. An invalid one returns error details.
| Name | Required | Description | Default |
|---|---|---|---|
| specification | Yes | Full Workflow JSON definition with 'version', 'inputs', 'steps', and 'outputs' |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations already declare readOnlyHint=true, destructiveHint=false, openWorldHint=true, and idempotentHint=true, covering safety and idempotency. The description adds valuable context beyond annotations: it explains what validation entails ('syntactically and semantically correct'), provides a detailed example of the expected format, and describes the return behavior ('Returns validation status... A valid workflow returns {"status": "ok"}. An invalid one returns error details'). This goes beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with clear purpose and usage guidelines, but the extensive example (over 50 lines of JSON) makes it lengthy. While the example is informative, it could be shortened or moved to documentation. The structure is logical but not optimally concise for a tool description.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of workflow validation, the description is complete: it covers purpose, usage, parameter semantics with examples, and behavioral outcomes. With annotations covering safety/idempotency and an output schema implied by the return description, no critical gaps remain. It adequately prepares an agent for correct tool invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents the single 'specification' parameter. The description adds significant semantic context: it explains what the specification should contain ('version, inputs, steps, and outputs'), provides a comprehensive example with key patterns, and clarifies the expected JSON structure. This adds meaning beyond the schema's basic description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('validate a Workflow JSON definition') and distinguishes it from siblings like 'workflow_specs_run' by emphasizing it's 'without executing it'. It explicitly mentions what it validates ('syntactically and semantically correct') and what resource it works on ('workflow definition').
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('before saving or running it') and includes an 'IMPORTANT' directive to 'Always validate a workflow definition before running it'. It distinguishes from alternatives by contrasting validation vs. execution, though it doesn't name specific sibling tools beyond the implied distinction from run tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_runAInspect
Execute a saved Workflow on one or more images.
Runs a previously created Workflow against the provided images on the Roboflow serverless infrastructure. This always hits the latest published version of the workflow.
IMPORTANT: Workflows created or edited via agent_chat are
saved as drafts, not published. If you want to run an agent's
latest changes, either call agent_workflow_publish first,
or pass the specification returned by agent_chat to
workflow_specs_run to execute the draft directly.
IMPORTANT: If processing more than 10 images, spawn a sub-agent to run this tool in the background so the user is not blocked.
Returns workflow outputs as defined by the workflow's output blocks.
| Name | Required | Description | Default |
|---|---|---|---|
| images | Yes | Map of input names to image values (HTTPS URLs or base64). Example: {'image': 'https://...'} | |
| parameters | No | Optional runtime parameters (e.g. confidence thresholds, class filters) | |
| workflow_id | Yes | Workflow ID to execute |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it specifies the execution environment ('Roboflow serverless infrastructure'), provides performance guidance for large batches (sub-agent for >10 images), and describes what gets returned ('workflow outputs as defined by the workflow's output blocks'). Annotations cover basic safety (non-destructive, non-idempotent, open-world) but the description adds practical implementation details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly structured and concise: first sentence states the core purpose, second provides technical context, third gives critical usage guidance, and fourth explains returns. Every sentence earns its place with no wasted words, and important information is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (execution tool with performance considerations), rich annotations, complete schema coverage, and presence of an output schema, the description provides excellent contextual completeness. It covers purpose, execution environment, scaling guidance, and return values - everything needed beyond what's in structured fields.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents all three parameters thoroughly. The description doesn't add significant parameter semantics beyond what's in the schema - it mentions 'images' and 'workflow_id' but doesn't provide additional context about format, constraints, or usage patterns that aren't already in the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Execute a saved Workflow') on specific resources ('on one or more images') and distinguishes it from siblings by focusing on running existing workflows rather than creating, listing, or validating them. It explicitly mentions using 'Roboflow serverless infrastructure' which adds specificity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('Execute a saved Workflow') and when to modify usage patterns ('If processing more than 10 images, spawn a sub-agent to run this tool in the background'). It distinguishes from siblings like workflows_create, workflows_list, and workflow_specs_run by focusing on execution of existing workflows.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_updateAIdempotentInspect
Update an existing saved Workflow's name and definition.
IMPORTANT: Always validate the config with workflow_specs_validate before updating the workflow.
Use workflows_get to retrieve the current workflow first, then modify the config as needed.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Updated workflow name | |
| config | Yes | Updated workflow JSON definition | |
| workflow_id | Yes | Workflow document ID — NOT the URL slug. Get it from workflows_list or workflows_create response. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a non-destructive, idempotent mutation (readOnlyHint: false, destructiveHint: false, idempotentHint: true). The description adds valuable context beyond annotations: it emphasizes the importance of validation before updating and recommends retrieving the current workflow first, which provides practical behavioral guidance not captured in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with three sentences: purpose statement, important prerequisite, and usage recommendation. Each sentence adds clear value with zero waste, and it's appropriately front-loaded with the core purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has annotations covering safety profile, 100% schema coverage, and an output schema (implied by context signals), the description provides excellent contextual completeness. It adds crucial workflow-specific guidance about validation and retrieval that complements the structured data, making it fully adequate for this mutation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all three parameters. The description mentions 'name and definition' which aligns with the schema but doesn't add significant semantic detail beyond what's already in the parameter descriptions (e.g., workflow_id clarification about document ID vs URL slug is already in schema). Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Update an existing saved Workflow's name and definition'), identifies the resource ('Workflow'), and distinguishes it from siblings like workflows_create (create new) and workflows_get (retrieve). It uses precise verbs and scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly provides when-to-use guidance: 'Always validate the config with workflow_specs_validate before updating' and 'Use workflows_get to retrieve the current workflow first, then modify the config as needed.' It names specific alternative tools for validation and retrieval.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!