Roboflow (Official)
Server Details
Roboflow computer vision for AI agents: datasets, annotation, versioning, workflows, inference.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4/5 across 30 of 30 tools scored. Lowest: 3.4/5.
Most tools have distinct purposes, but there is some overlap between workflow-related tools (e.g., workflows_run vs. workflow_specs_run) and image upload tools (images_prepare_upload vs. images_prepare_upload_zip) that could cause confusion. However, descriptions clarify the differences, so misselection is unlikely with careful reading.
Tool names follow a consistent snake_case pattern with clear verb_noun structures (e.g., annotation_batches_get, models_train, workflow_specs_validate). There are no deviations in naming conventions, making the set predictable and easy to navigate.
With 30 tools, the count feels heavy for a single server, potentially overwhelming for agents. While the domain (computer vision platform) is broad, the toolset includes many specialized or overlapping tools (e.g., multiple workflow tools) that could be consolidated or better organized.
The toolset provides comprehensive coverage of the Roboflow domain, including project management, dataset handling, model training/inference, annotation workflows, and workflow automation. There are no obvious gaps; tools support full CRUD/lifecycle operations and include utilities like meta_feedback_send for reporting issues.
Available Tools
30 toolsannotation_batches_getARead-onlyIdempotentInspect
Get details about a specific batch.
Returns batch details including image count and status.
| Name | Required | Description | Default |
|---|---|---|---|
| batch_id | Yes | Batch ID | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (read-only, open-world, idempotent, non-destructive). The description adds value by specifying what details are returned (image count and status), which goes beyond annotations. It doesn't contradict annotations (which correctly indicate a safe read operation) and provides useful context about the return content.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise with just two sentences. The first sentence states the core purpose, and the second specifies what details are returned. Every word earns its place with zero redundancy or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (2 required parameters), comprehensive annotations, and existence of an output schema, the description is reasonably complete. It specifies what details are returned (image count and status), which is helpful since the output schema isn't visible here. However, it could mention that this is for annotation batches specifically (implied by tool name but not stated).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters clearly documented in the schema. The description doesn't add any parameter-specific information beyond what the schema provides (batch_id and project_id). However, it doesn't need to compensate since schema coverage is complete, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get details about a specific batch' with specific resources (batch details including image count and status). It distinguishes from sibling 'annotation_batches_list' by focusing on a single batch rather than listing multiple. However, it doesn't explicitly contrast with other batch-related tools that might exist.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by specifying 'a specific batch' and mentioning what details are returned. It differentiates from 'annotation_batches_list' by focusing on individual batch retrieval rather than listing. However, it lacks explicit guidance on when to use this versus alternatives like 'projects_get' or 'versions_get' for related metadata, or prerequisites for accessing batch details.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
annotation_batches_listBRead-onlyIdempotentInspect
List upload batches in a project.
Returns a list of batches with id, name, image count, and upload info.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare this as read-only, non-destructive, idempotent, and open-world, so the agent knows it's a safe, repeatable query. The description adds useful context about what information is returned (id, name, image count, upload info), which isn't covered by annotations. However, it doesn't mention pagination, rate limits, or authentication needs beyond what annotations imply.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two sentences: the first states the action and scope, the second specifies the return data. Every word serves a purpose, and key information is front-loaded. There's no redundancy or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one parameter, read-only operation), rich annotations, and the presence of an output schema (which handles return value documentation), the description is reasonably complete. It covers the core purpose and output structure. The main gap is lack of usage guidance relative to sibling tools, but overall it provides sufficient context for basic use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema fully documents the single required parameter 'project_id'. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., it doesn't clarify format variations or constraints). This meets the baseline of 3 when the schema handles parameter documentation effectively.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('List') and resource ('upload batches in a project'), making the purpose immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'projects_list' or 'workflows_list', which also list resources within projects, leaving some ambiguity about when this specific listing tool is appropriate.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'annotation_batches_get' (which might retrieve a single batch) or explain why one would list batches instead of using other listing tools. There's no context about prerequisites, timing, or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
annotation_jobs_createAInspect
Create an annotation job to assign a batch of images to a labeler.
Returns the created job details including id, name, and status.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Job name | |
| batch_id | Yes | Source batch ID containing images to annotate | |
| num_images | Yes | Number of images to include in the job | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| labeler_email | Yes | Email of the workspace member who will label | |
| reviewer_email | Yes | Email of the reviewer |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is a non-readOnly, non-destructive, non-idempotent, open-world operation. The description adds value by specifying it's for batch image assignment and returns job details, but doesn't disclose additional behavioral traits like rate limits, auth needs, or side effects beyond what annotations cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence and adds return details in the second, with zero wasted words. It's appropriately sized and structured for clarity without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (6 required parameters), rich annotations, and presence of an output schema, the description is mostly complete. It covers creation purpose and return values, but could improve by addressing usage context or behavioral nuances, though the output schema reduces need for return value details.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are fully documented in the schema. The description doesn't add meaning beyond the schema, such as explaining relationships between parameters (e.g., batch_id must exist). Baseline 3 is appropriate as the schema handles parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Create an annotation job') and resource ('batch of images to a labeler'), distinguishing it from sibling tools like annotation_batches_get or models_train. It precisely defines the tool's function without being vague or tautological.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as workflows_create or projects_create, nor does it mention prerequisites like needing an existing batch or project. It lacks explicit context or exclusions for usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
annotations_saveBIdempotentInspect
Save an annotation for an existing image.
| Name | Required | Description | Default |
|---|---|---|---|
| image_id | Yes | ID of the image to annotate | |
| labelmap | No | Label map for Darknet/TXT annotations, e.g. {'0': 'cat', '1': 'dog'} | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| annotation_name | Yes | Filename for the annotation (e.g. 'image1.xml') | |
| annotation_content | Yes | The annotation content (XML, JSON, or text) |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=false, openWorldHint=true, idempotentHint=true, and destructiveHint=false, covering key behavioral traits. The description adds minimal context beyond this, stating it saves annotations for existing images but not detailing effects like overwriting behavior, authentication needs, or rate limits. It doesn't contradict annotations, so a baseline score is appropriate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's front-loaded and wastes no space, making it easy for an agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (5 parameters, 4 required), rich annotations, and the presence of an output schema, the description is reasonably complete. It covers the core action but lacks details on error conditions or integration with sibling tools. The output schema reduces the need to explain return values, keeping the description adequate though not exhaustive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with all parameters well-documented in the input schema. The description doesn't add any meaningful semantic information beyond what the schema provides, such as explaining relationships between parameters or usage nuances. With high schema coverage, the baseline score of 3 is justified.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Save') and resource ('annotation for an existing image'), providing a specific verb+resource combination. However, it doesn't differentiate from sibling tools like 'annotation_batches_get' or 'annotation_jobs_create', which also involve annotations but serve different purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing image), exclusions, or comparisons to sibling tools like 'annotation_batches_get' or 'annotation_jobs_create', leaving the agent with no contextual usage information.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_prepare_uploadARead-onlyIdempotentInspect
Get an upload URL to upload a single image to a project.
Returns a pre-built upload URL and instructions. The caller must perform the actual upload using curl since the MCP server cannot access local files.
This endpoint uploads images only. To add annotations, call annotations_save with the image ID from the upload response. For bulk uploads with annotations, use images_prepare_upload_zip.
| Name | Required | Description | Default |
|---|---|---|---|
| split | No | Dataset split | train |
| tag_names | No | Tags to attach to the image | |
| batch_name | No | Group uploads under a named batch | |
| image_name | Yes | Filename for the image (e.g. 'photo.jpg') | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable context beyond the annotations. Annotations indicate read-only, open-world, idempotent, and non-destructive behavior, but the description explains that the tool returns a pre-built upload URL and instructions, and that the caller must perform the actual upload using curl since the MCP server cannot access local files. This clarifies the tool's operational behavior and limitations, though it doesn't detail rate limits or auth needs.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and well-structured, with three sentences that each serve a distinct purpose: stating the tool's function, explaining the upload process, and providing usage alternatives. There is no wasted text, and information is front-loaded, making it easy to understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity, rich annotations (read-only, open-world, idempotent, non-destructive), 100% schema description coverage, and the presence of an output schema, the description is complete. It covers the tool's purpose, behavioral context, and usage guidelines without needing to explain parameters or return values, which are handled by structured fields.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description does not mention any input parameters, focusing instead on the tool's purpose and usage. However, the input schema has 100% description coverage, with all parameters well-documented (e.g., 'project_id' as a project slug, 'image_name' as a filename). Since the schema provides comprehensive parameter details, the description's lack of parameter information is acceptable, resulting in a baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get an upload URL to upload a single image to a project.' It specifies the verb ('Get'), resource ('upload URL'), and scope ('single image'), and distinguishes it from sibling tools like 'images_prepare_upload_zip' for bulk uploads and 'annotations_save' for adding annotations. This is specific and avoids tautology.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool versus alternatives. It states: 'This endpoint uploads images only. To add annotations, call annotations_save with the image ID from the upload response. For bulk uploads with annotations, use images_prepare_upload_zip.' This clearly defines the tool's scope and directs users to other tools for related tasks.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_prepare_upload_zipAInspect
Prepare a zip upload of images and annotations to a project.
Supports zip archives containing images with COCO, YOLO, Pascal VOC, or classification-by-folder annotations. Up to 2 GB / 10k files.
Returns a signed URL and task ID. The caller must:
PUT the zip file to the signed URL
Poll the task status until completed
The signed URL expires in 1 hour.
| Name | Required | Description | Default |
|---|---|---|---|
| split | No | Default split for images | train |
| tag_names | No | Tags to attach to every uploaded image | |
| batch_name | No | Group uploads under a named batch and annotation job | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is a non-readOnly, non-destructive operation, which the description aligns with by describing a preparation step (not the actual upload). The description adds valuable behavioral context beyond annotations: it specifies file size/quantity limits (2 GB/10k files), the two-step process (signed URL + polling), and URL expiration (1 hour). However, it doesn't mention rate limits, authentication needs, or error handling, leaving some gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, followed by key constraints and required steps. Every sentence adds essential information: supported formats, limits, return values, and caller responsibilities. There's no redundant or vague language, and it's structured logically for quick comprehension.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of a two-step upload process, the description is complete: it explains the tool's purpose, constraints, return values (signed URL and task ID), and required follow-up actions. With an output schema present (implied by 'Has output schema: true'), it doesn't need to detail return values further. The annotations cover safety aspects, and the description fills in behavioral gaps appropriately.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already fully documents all 4 parameters. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain how 'split' or 'tag_names' interact with the zip content). This meets the baseline of 3, as the schema carries the full burden of parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Prepare a zip upload') and resource ('images and annotations to a project'), distinguishing it from sibling tools like 'images_prepare_upload' (which likely handles individual files) and 'images_upload_zip_status' (which polls status). It explicitly mentions supported annotation formats (COCO, YOLO, Pascal VOC, classification-by-folder), making the purpose highly specific and differentiated.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: for bulk uploads of images with annotations in zip format (up to 2 GB/10k files). It also outlines the required follow-up steps (PUT to signed URL, poll task status) and mentions the signed URL expiration (1 hour), which implicitly suggests not using it for small or immediate uploads without polling capability.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_searchARead-onlyIdempotentInspect
Search for images inside a project.
| Name | Required | Description | Default |
|---|---|---|---|
| tag | No | Filter results by tag | |
| batch | No | Filter to images in any batch | |
| limit | No | Maximum number of results | |
| query | Yes | Search prompt for project-scoped image discovery | |
| fields | No | Fields to include in each result | |
| offset | No | Result offset for pagination | |
| batch_id | No | Filter to a specific batch id | |
| class_name | No | Filter results by class name | |
| in_dataset | No | Filter to images currently in the dataset | |
| like_image | No | Find images visually similar to this image id/name | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| annotation_job | No | Filter to images assigned to any annotation job | |
| annotation_job_id | No | Filter to a specific annotation job id |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, and destructiveHint=false, covering safety and idempotency. The description adds minimal behavioral context beyond this, but doesn't contradict annotations. It implies a search operation which aligns with the read-only nature.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with zero wasted words. It's appropriately sized and front-loaded, communicating the essential purpose efficiently without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the rich annotations (covering safety and idempotency), 100% schema description coverage, and existence of an output schema, the description provides adequate context for a search tool. However, it lacks guidance on when to use this versus sibling search tools, which is a minor gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, all 13 parameters are well-documented in the schema itself. The description doesn't add any parameter-specific information beyond what's already in the schema, so it meets the baseline but doesn't enhance understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Search for images') and scope ('inside a project'), providing a specific verb+resource combination. However, it doesn't differentiate from sibling tools like 'universe_search' or 'images_prepare_upload', which prevents a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'universe_search' or other image-related tools. There's no mention of prerequisites, constraints, or comparative use cases with sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
images_upload_zip_statusARead-onlyIdempotentInspect
Check the status of a zip upload task.
Returns status (created, running, completed, failed), progress, and result when completed (uploaded count, duplicates, annotation errors, etc.).
| Name | Required | Description | Default |
|---|---|---|---|
| task_id | Yes | Task ID from images_prepare_upload_zip response |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint=true, idempotentHint=true, etc., covering safety traits. The description adds valuable context beyond annotations: it details return values (status, progress, result with counts like duplicates) and hints at asynchronous task behavior (statuses like 'running'), which helps the agent understand operational semantics without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by specific return details. Both sentences earn their place by providing essential information without redundancy, making it appropriately sized and efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter), rich annotations (readOnly, idempotent, etc.), and presence of an output schema, the description is complete enough. It covers purpose, usage hint, and return semantics, aligning well with structured data without needing to explain basic behaviors or output details.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the parameter 'task_id' fully documented in the schema. The description adds minimal semantics by referencing the source ('from images_prepare_upload_zip response'), but this is marginal beyond the schema. Baseline 3 is appropriate as the schema carries most of the burden.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Check') and resource ('status of a zip upload task'), making the purpose specific. It distinguishes from sibling tools like 'images_prepare_upload_zip' (which initiates uploads) and 'images_search' (which searches images), avoiding tautology by not just repeating the name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by referencing 'task_id from images_prepare_upload_zip response', guiding when to use this tool after that sibling. However, it lacks explicit when-not-to-use statements or alternatives, such as whether to use other status-checking tools or retry mechanisms.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
meta_feedback_sendAInspect
Report a bug, missing feature, UX friction, or documentation issue.
Call this proactively when you encounter errors using Roboflow tools, when the user expresses frustration, when a tool is missing for the task at hand, or when a parameter is poorly documented.
Returns confirmation that the feedback was recorded.
| Name | Required | Description | Default |
|---|---|---|---|
| message | Yes | What happened, what was expected, or what's missing | |
| category | No | Type of feedback | ux-friction |
| tool_name | No | Which tool this relates to, if any |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it explains the proactive nature of calling ('call this proactively'), specifies the types of issues to report, and mentions the return value ('confirmation that feedback was recorded'). Annotations cover safety aspects (non-destructive, non-idempotent, open-world), but the description provides practical usage context that enhances transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly structured and concise: two sentences that efficiently communicate purpose, usage guidelines, and behavioral outcomes. Every sentence earns its place with no redundant information. It's front-loaded with the core purpose followed by specific usage instructions.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's straightforward purpose, comprehensive annotations, 100% schema coverage, and existence of an output schema, the description provides complete contextual information. It explains what the tool does, when to use it, and what to expect in return, which is sufficient for this type of feedback submission tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already fully documents all three parameters. The description doesn't add any parameter-specific information beyond what's in the schema, so it meets the baseline of 3. The description focuses on usage context rather than parameter details.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool's purpose as reporting bugs, missing features, UX friction, or documentation issues. It uses specific verbs ('Report') and resources ('feedback'), and clearly distinguishes itself from all sibling tools which are operational Roboflow tools, while this is a meta-feedback mechanism.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: 'when you encounter errors using Roboflow tools, when the user expresses frustration, when a tool is missing for the task at hand, or when a parameter is poorly documented.' It clearly defines the triggering conditions without needing to reference alternatives since this is the only feedback tool among siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_getBRead-onlyIdempotentInspect
Get details for a trained model.
| Name | Required | Description | Default |
|---|---|---|---|
| model_id | Yes | Model id (URL slug). |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a read-only, non-destructive, idempotent operation with open-world data, so the description doesn't need to repeat these safety traits. It adds value by specifying that it retrieves 'details' for a 'trained model', which implies comprehensive metadata beyond basic info, but doesn't elaborate on what those details include or any behavioral nuances like error handling.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with no wasted words. It's front-loaded with the core action and resource, making it easy to parse quickly without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the annotations cover key behavioral aspects (read-only, non-destructive, etc.) and an output schema exists (so return values are documented elsewhere), the description is reasonably complete for a simple lookup tool. However, it could be more helpful by clarifying the scope of 'details' or distinguishing it from sibling tools, especially since the context includes many model-related alternatives.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema fully documents the single required parameter 'model_id' as a 'Model id (URL slug)'. The description doesn't add any parameter-specific information beyond implying that 'model_id' corresponds to a 'trained model', which is already inferred from the schema. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get details') and resource ('trained model'), making the purpose immediately understandable. However, it doesn't differentiate this from sibling tools like 'models_get_training_status' or 'models_list', which also retrieve model-related information but with different scopes or details.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to choose 'models_get' over 'models_list' (for listing all models) or 'models_get_training_status' (for status details), nor does it specify prerequisites like needing a specific model ID.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_get_training_statusARead-onlyIdempotentInspect
Get the training progress and metrics for a dataset version.
Use this tool to check on a training job started with models_train.
Returns training status, progress (current/total epochs), latest metrics (mAP, loss), and the URL to view training in the dashboard.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number being trained |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare this as read-only, non-destructive, idempotent, and open-world, covering the core safety profile. The description adds valuable behavioral context by specifying what information is returned (training status, progress, metrics, dashboard URL) and that it's for monitoring ongoing training jobs, which enhances understanding beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly structured: first sentence states the purpose, second provides usage guidance, third details the return values. Every sentence earns its place with zero waste, and it's front-loaded with the most important information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given that annotations cover safety aspects, schema covers parameters fully, and an output schema exists (so return values don't need explanation in description), the description provides exactly what's needed: clear purpose, usage guidance, and context about what information is returned. It's complete for this tool's complexity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters (project_id and version_number) fully documented in the schema. The description doesn't add any parameter-specific information beyond what's already in the schema, so it meets the baseline of 3 for adequate coverage without extra value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Get the training progress and metrics') and resource ('for a dataset version'), distinguishing it from sibling tools like models_train (which starts training) and models_get (which likely retrieves model metadata). It precisely identifies what this tool does.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use this tool: 'Use this tool to check on a training job started with models_train.' This provides clear context and directly names the alternative tool (models_train) for comparison, giving perfect guidance on when this tool is appropriate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_inferARead-onlyIdempotentInspect
Run hosted inference on an image using a trained model. Returns JSON predictions only. For visualized/annotated images, use workflow_specs_run with a visualization block instead.
| Name | Required | Description | Default |
|---|---|---|---|
| image | Yes | Image as an HTTPS URL or base64-encoded string. | |
| overlap | No | Overlap threshold between 0.0 and 1.0 for object detection | |
| model_id | Yes | Hosted model as 'project_id/version'. | |
| confidence | No | Confidence threshold between 0.0 and 1.0 | |
| project_type | No | Optional project type override. If omitted, MCP derives it from project metadata. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it specifies the output format ('Returns JSON predictions only'), which annotations don't cover. Annotations already declare readOnlyHint=true, destructiveHint=false, openWorldHint=true, and idempotentHint=true, so the agent knows this is a safe, repeatable query operation. The description complements this by clarifying the output type, earning a high score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and front-loaded: the first sentence states the core purpose, and the second provides critical usage guidance. Every sentence earns its place with no wasted words, making it highly efficient for an AI agent to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (inference with multiple parameters), rich annotations (covering safety and behavior), and the presence of an output schema, the description is complete enough. It clarifies the purpose, distinguishes from siblings, and specifies output format, addressing key contextual gaps without redundancy.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description does not add any parameter-specific information beyond what the input schema provides. However, with schema description coverage at 100%, the baseline score is 3. The schema already fully documents all parameters (image, overlap, model_id, confidence, project_type), so no additional compensation is needed from the description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Run hosted inference on an image using a trained model') and distinguishes it from a sibling tool ('For visualized/annotated images, use workflow_specs_run with a visualization block instead'). It identifies both the verb (run inference) and resource (image with trained model) with precision.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool versus an alternative: 'For visualized/annotated images, use workflow_specs_run with a visualization block instead.' This directly addresses the key decision point between this tool and its sibling, offering clear exclusion criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_listARead-onlyIdempotentInspect
List trained models associated with a project.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a read-only, non-destructive, idempotent operation with open-world data. The description adds value by clarifying the scope ('trained models associated with a project'), which isn't covered by annotations. It doesn't contradict annotations—listing is consistent with read-only behavior—and provides useful context about what data is returned.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that efficiently conveys the tool's purpose without unnecessary words. It's front-loaded with the core action and resource, making it easy to parse. Every part of the sentence earns its place by specifying scope ('trained models') and context ('associated with a project').
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (one required parameter), rich annotations (covering safety and behavior), and presence of an output schema (which handles return values), the description is reasonably complete. It covers the essential 'what' and 'scope', though it could be enhanced with more usage guidance. For a simple list operation with good structured data, this is sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the single parameter 'project_id' fully documented in the schema. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., no examples or constraints). Baseline is 3 since the schema handles parameter documentation adequately, and the description doesn't compensate with extra insights.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List') and resource ('trained models associated with a project'), making the purpose immediately understandable. It distinguishes from siblings like 'models_get' (singular) and 'models_train' (creation), though it doesn't explicitly name alternatives. The description avoids tautology by specifying scope beyond just the name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by mentioning 'associated with a project', suggesting this tool is for retrieving models within a specific project. However, it doesn't provide explicit guidance on when to use this versus alternatives like 'models_get' (for single model details) or 'projects_list' (for listing projects). No exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_trainAInspect
Start training a model on a dataset version.
IMPORTANT: A dataset version must exist before training. Use the versions_generate tool first to create one with the desired preprocessing and augmentation settings.
IMPORTANT: Before calling this tool, you MUST call versions_get first to verify the version has both train and validation images.
This tool returns immediately. Training runs in the background on Roboflow servers.
Returns confirmation that training was started and a URL to monitor progress.
| Name | Required | Description | Default |
|---|---|---|---|
| speed | No | Training speed | |
| epochs | No | Number of training epochs | |
| checkpoint | No | Checkpoint to initialize from | |
| model_type | No | Model architecture (e.g. 'yolov8n', 'rf-detr-base', 'paligemma2-3b') | |
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number to train on | |
| business_context | No | Brief description of the business problem being solved. The AI assistant should fill this from conversation context. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond what annotations provide: it explains the asynchronous nature ('returns immediately', 'training runs in the background'), specifies the prerequisite checks needed, and mentions the return includes a progress monitoring URL. Annotations already indicate this is not read-only, not idempotent, and not destructive, but the description enriches this with operational details without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with clear sectioning: it states the core purpose upfront, provides two IMPORTANT prerequisite warnings, explains the asynchronous behavior, and describes the return value. Every sentence serves a distinct purpose with zero redundancy, making it easy to scan and understand.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (asynchronous training operation with prerequisites), the description provides complete contextual guidance. It covers prerequisites, behavioral expectations, and return information. With both comprehensive annotations and an output schema available, the description appropriately focuses on operational context rather than repeating structured data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already documents all 7 parameters thoroughly. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., it doesn't explain default behaviors for optional parameters like speed or epochs). This meets the baseline expectation when schema coverage is complete.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Start training a model') on a specific resource ('on a dataset version'). It distinguishes this tool from siblings like models_get_training_status (monitoring) and versions_generate (prerequisite). The verb 'start' accurately reflects the asynchronous nature of the operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit, imperative guidance on when to use this tool: it names two prerequisite tools (versions_generate and versions_get) that must be called first, specifies the conditions that must be met (dataset version must exist, must have train/validation images), and distinguishes this from monitoring tools by noting it 'returns immediately' while training runs in background.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_createBInspect
Create a new computer vision project.
Returns the created project's id, name, type, and url.
| Name | Required | Description | Default |
|---|---|---|---|
| license | No | Project license. Required for public/research workspaces. | |
| annotation | Yes | What you're annotating (e.g. 'defects', 'vehicles') | |
| project_name | Yes | Display name for the project | |
| project_type | Yes | Computer vision task type | |
| business_context | No | Brief description of the business problem being solved. The AI assistant should fill this from conversation context. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a write operation (readOnlyHint=false) that's non-destructive and non-idempotent. The description adds minimal behavioral context beyond this - it mentions the return values (which the output schema would cover) but doesn't provide additional behavioral details like authentication requirements, rate limits, or what happens if creation fails.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two focused sentences: one stating the action and one describing the return values. Every word earns its place, and the information is front-loaded with the primary purpose stated first.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a creation tool with comprehensive annotations and a complete input schema, the description provides adequate context. The presence of an output schema means the description doesn't need to detail return values. However, it could better address when to use this tool versus alternatives given the sibling tools available.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already fully documents all 5 parameters. The description adds no parameter-specific information beyond what's in the schema. The baseline score of 3 reflects adequate coverage when the schema does all the work.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Create a new computer vision project') and the resource ('computer vision project'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this tool from sibling tools like 'workflows_create' or 'annotation_jobs_create' which might also create related resources.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. There's no mention of prerequisites, when this should be used instead of other creation tools like 'workflows_create', or what context should trigger its use. The agent must infer usage from the tool name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_getARead-onlyIdempotentInspect
Get detailed info about a project including versions, classes, splits, and trained models.
Returns full project details with workspace, project info, versions, and classes.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (readOnly, openWorld, idempotent, non-destructive). The description adds context about what information is returned (workspace, project info, versions, classes) which helps the agent understand the scope of data retrieved. However, it doesn't mention rate limits, authentication requirements, or pagination behavior beyond what annotations cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two clear sentences that efficiently convey the tool's purpose and return value. The first sentence states what the tool does, the second clarifies the scope of returned data. No redundant information or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has comprehensive annotations, 100% schema coverage, and an output schema exists, the description provides adequate context. It clearly explains this is a retrieval operation for detailed project information. The main gap is lack of explicit guidance on when to use versus sibling tools like 'projects_list'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% with the single parameter 'project_id' fully documented in the schema. The description doesn't add any parameter-specific information beyond what's in the schema. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but doesn't need to.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose as 'Get detailed info about a project' with specific components mentioned (versions, classes, splits, trained models). It distinguishes from siblings like 'projects_list' (which likely lists projects without details) and 'projects_create' (which creates rather than retrieves). However, it doesn't explicitly contrast with 'projects_list' in the description text itself.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when detailed project information is needed, but doesn't explicitly state when to use this tool versus alternatives like 'projects_list' (for listing projects) or 'versions_get' (for specific version details). No explicit 'when-not' guidance or prerequisite context is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
projects_listARead-onlyIdempotentInspect
List all projects in the workspace associated with the API key.
Returns a list of projects with id, name, type, image count, and more.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (readOnlyHint: true, openWorldHint: true, idempotentHint: true, destructiveHint: false), so the description's burden is lower. It adds some value by specifying the return format ('list of projects with id, name, type, image count, and more'), but doesn't disclose additional behavioral traits like pagination, rate limits, or authentication requirements beyond the API key mention.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and well-structured with just two sentences: the first states the purpose, and the second describes the return format. Every word earns its place, and it's front-loaded with the core functionality.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (0 parameters, rich annotations, and an output schema exists), the description is reasonably complete. It covers the purpose and return format, though it could benefit from usage guidelines relative to siblings. The output schema likely details the return values, so the description doesn't need to fully explain them.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0 parameters and 100% schema description coverage, the schema fully documents the lack of inputs. The description appropriately doesn't discuss parameters, which is correct for this tool, earning a high baseline score. No additional parameter semantics are needed or provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List all projects') and resource ('projects in the workspace associated with the API key'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this from sibling tools like 'projects_get' or 'projects_create', which would be needed for a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'projects_get' (for a single project) or 'projects_create' (for creating new projects). It also doesn't mention any prerequisites, context, or exclusions for usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
universe_searchARead-onlyIdempotentInspect
Search Roboflow Universe for datasets or models.
The query supports operators mixed with free-text:
Add 'model' to only return datasets with trained models
'class:helmet,person' filters by class names
'images>500' filters by image count (also >=, <, <=)
'sort:stars' sorts results (stars, images, downloads, views, updated)
'object detection' filters by project type
'updated:30d' filters by recency
Example: 'fire smoke class:fire,smoke images>200 model sort:stars'
| Name | Required | Description | Default |
|---|---|---|---|
| page | No | Page number starting at 1 | |
| limit | No | Maximum number of results | |
| query | Yes | Universe search query | |
| result_type | No | Optional result type filter |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover safety (readOnlyHint=true, destructiveHint=false) and idempotency, but the description adds valuable behavioral context: it explains the query syntax with operators (e.g., 'model', 'class:', 'images>500'), sorting options, and filtering capabilities. This goes beyond annotations by detailing how the search behaves and what inputs it supports.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, followed by detailed query syntax and an example. Every sentence adds value: the first states the purpose, the next explains operators, and the last provides an illustrative query. No wasted words, and it's structured for clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (search with operators), rich annotations, 100% schema coverage, and the presence of an output schema, the description is complete. It explains the search behavior, query syntax, and provides an example, covering what's needed beyond structured fields without redundancy.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters. The description adds context for the 'query' parameter by explaining operators and examples, but it does not provide additional meaning for 'page', 'limit', or 'result_type' beyond what the schema states. Baseline 3 is appropriate as the schema handles most param documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Search Roboflow Universe for datasets or models.' It specifies the verb ('Search'), resource ('Roboflow Universe'), and target ('datasets or models'), distinguishing it from siblings like 'images_search' or 'models_list' which search different resources.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for usage by detailing query operators and examples, but it does not explicitly state when to use this tool versus alternatives like 'images_search' or 'models_list'. It implies usage for searching Universe content but lacks explicit comparisons or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
versions_exportAIdempotentInspect
Check or trigger a dataset export for a version.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| export_format | Yes | Export format such as 'coco', 'yolov8', or 'jsonl' | |
| version_number | Yes | Version number |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide hints (e.g., not read-only, idempotent, non-destructive), but the description adds valuable context by specifying the dual functionality ('check or trigger'), which isn't covered by annotations. It doesn't disclose rate limits, auth needs, or detailed behavioral traits, but it compensates with operational clarity beyond the structured data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core functionality without any wasted words. It's appropriately sized for the tool's complexity, making it easy for an agent to parse and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of annotations and an output schema, the description is reasonably complete for a tool with clear parameters. It covers the basic action and resource, but could benefit from more context on usage scenarios or output interpretation, though the output schema mitigates some of this need.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema fully documents the parameters. The description doesn't add any extra meaning or examples beyond what's in the schema (e.g., it doesn't explain the interaction between 'check' and 'trigger' modes with the parameters), resulting in a baseline score of 3 as the schema handles the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('check or trigger') and resource ('dataset export for a version'), making the purpose understandable. However, it doesn't differentiate from sibling tools like 'versions_generate' or 'versions_get', which might handle related version operations, leaving some ambiguity about when to choose this specific tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as 'versions_generate' or other export-related tools that might exist. It lacks context about prerequisites, timing, or scenarios where this is the appropriate choice, leaving the agent to infer usage from the tool name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
versions_generateAInspect
Create a new dataset version with optional preprocessing and augmentation.
IMPORTANT: Before calling this tool, you MUST ask the user which preprocessing and augmentation options they want to apply. Present them with the available options listed below and let them choose. Do not assume defaults — explicitly confirm their choices before generating.
IMPORTANT: This operation can take several minutes for large datasets. You MUST spawn a sub-agent to run this tool in the background.
Returns the generated version number, image count, and split sizes.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| augmentation | No | Augmentation settings dict | |
| preprocessing | No | Preprocessing settings dict |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover readOnlyHint=false, openWorldHint=true, idempotentHint=false, and destructiveHint=false. The description adds valuable behavioral context beyond annotations: it discloses that the operation 'can take several minutes for large datasets' (performance characteristic) and specifies the return format ('Returns the generated version number, image count, and split sizes'). This provides practical guidance not captured in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized but not optimally structured. It front-loads the purpose but includes lengthy IMPORTANT sections that could be streamlined. Every sentence earns its place by providing critical guidance, but the formatting with all-caps sections reduces readability slightly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (creation tool with processing options, long-running operation), the description provides excellent contextual completeness. It covers purpose, prerequisites, performance characteristics, execution strategy (sub-agent), and return values. With annotations covering safety aspects and an output schema presumably detailing the return structure, this description gives the agent everything needed to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters. The description mentions 'optional preprocessing and augmentation' which aligns with the schema but doesn't add semantic details beyond what's in the schema descriptions. The baseline score of 3 is appropriate since the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Create a new dataset version') and resources involved ('with optional preprocessing and augmentation'). It distinguishes from sibling tools like 'versions_get' (read) and 'versions_export' (export) by emphasizing creation with processing options.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage instructions: 'Before calling this tool, you MUST ask the user which preprocessing and augmentation options they want to apply' and 'Do not assume defaults — explicitly confirm their choices before generating.' It also specifies when to use a sub-agent ('You MUST spawn a sub-agent to run this tool in the background').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
versions_getARead-onlyIdempotentInspect
Get info about a dataset version including splits and model metrics.
Returns version details with id, name, images, splits, preprocessing, augmentation, and model info if trained.
| Name | Required | Description | Default |
|---|---|---|---|
| project_id | Yes | Project slug (e.g. 'my-project'); 'workspace/my-project' is also accepted. | |
| version_number | Yes | Version number |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false, openWorldHint=true, and idempotentHint=true, covering safety and idempotency. The description adds context about what information is returned (splits, model metrics, preprocessing, etc.), which is useful beyond annotations, but doesn't mention rate limits, auth needs, or other behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, front-loaded with the core purpose and followed by return details. It's efficient with minimal waste, though the second sentence could be slightly more structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (2 required parameters), rich annotations, and presence of an output schema, the description is reasonably complete. It covers the purpose and return content adequately, though it could benefit from more usage context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters well-documented in the schema. The description doesn't add any parameter-specific details beyond what the schema provides, so it meets the baseline of 3 for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Get' and resource 'info about a dataset version', specifying it includes splits and model metrics. It distinguishes from sibling tools like 'projects_get' or 'models_get' by focusing on dataset versions, though it doesn't explicitly contrast with them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when needing version details, but doesn't specify when to use this tool versus alternatives like 'versions_export' or 'versions_generate'. No explicit when-not-to-use guidance or prerequisites are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_blocks_get_schemaARead-onlyIdempotentInspect
Get the full schema of a specific Workflow block.
Returns all properties, required fields, and descriptions for a
block identified by its manifest name (as returned by
workflow_blocks_list).
| Name | Required | Description | Default |
|---|---|---|---|
| manifest | Yes | Manifest key of the block (e.g. 'ObjectDetectionModelManifest') |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover read-only, open-world, idempotent, and non-destructive traits, but the description adds value by specifying that it returns 'all properties, required fields, and descriptions,' which clarifies the output scope beyond what annotations provide. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by additional details in a second sentence. Both sentences are essential, with no wasted words, making it highly efficient and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter), rich annotations, and the presence of an output schema, the description is complete. It adequately explains the purpose, usage, and output scope without needing to detail return values or behavioral traits already covered elsewhere.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the parameter 'manifest' fully documented in the schema. The description adds minimal semantics by noting it's a 'manifest name' from 'workflow_blocks_list,' but this is redundant with the schema's description. Baseline 3 is appropriate as the schema carries the burden.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Get' and the resource 'full schema of a specific Workflow block,' specifying it returns properties, required fields, and descriptions. It distinguishes from sibling 'workflow_blocks_list' by focusing on a single block's schema rather than listing blocks.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use this tool: for a block identified by its manifest name, as returned by 'workflow_blocks_list.' This provides clear context and an alternative tool, guiding the agent on proper usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_blocks_listARead-onlyIdempotentInspect
List all available Workflow blocks with a short summary of each.
Use this tool to discover which blocks can be used when building a
Workflow definition. To get the full schema (properties, required
fields, etc.) of a specific block, call workflow_blocks_get_schema
with the block's manifest name.
Returns a list of blocks, each with manifest (schema key), name, block_type, and short_description.
| Name | Required | Description | Default |
|---|---|---|---|
| block_type | No | Filter by block category. If omitted, all blocks are returned. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide comprehensive behavioral hints (readOnly, openWorld, idempotent, non-destructive), so the bar is lower. The description adds valuable context about the return format ('list of blocks, each with manifest, name, block_type, and short_description') and the filtering capability via the block_type parameter. However, it doesn't mention pagination, rate limits, or authentication requirements, keeping it from a perfect score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with three sentences that each serve a distinct purpose: stating the tool's function, providing usage guidance, and describing the return format. There is no wasted text, and key information is front-loaded appropriately.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (single optional parameter), comprehensive annotations, and the presence of an output schema, the description provides complete contextual information. It explains what the tool does, when to use it, what it returns, and how it relates to other tools, leaving no significant gaps for an AI agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% with the parameter fully documented in the schema. The description mentions 'Filter by block category' which aligns with the schema but doesn't add significant semantic value beyond what's already in the structured data. This meets the baseline expectation when schema coverage is complete.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verb ('List') and resource ('all available Workflow blocks'), and distinguishes it from its sibling 'workflow_blocks_get_schema' by explaining this tool provides summaries while the sibling provides full schemas. This explicit differentiation earns the highest score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('to discover which blocks can be used when building a Workflow definition') and when to use an alternative ('To get the full schema... call workflow_blocks_get_schema'). This clear context and named alternative meet the criteria for a perfect score.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_createAInspect
Create and save a new Workflow in the workspace.
IMPORTANT: Always validate the config with workflow_specs_validate before creating the workflow.
The config is the same JSON format used by workflow_specs_run and workflow_specs_validate. Once saved, the workflow can be executed by ID via workflows_run.
Returns the created workflow including its document ID. Save this ID — it is required for workflows_update.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Human-readable workflow name | |
| config | Yes | Workflow JSON definition with 'version', 'inputs', 'steps', and 'outputs' |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover basic hints (e.g., not read-only, not destructive), but the description adds valuable behavioral context: it specifies that the config must be validated first, describes the return value ('Returns the created workflow including its document ID'), and advises saving the ID for future use with workflows_update. This goes beyond annotations without contradicting them.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with the core purpose. Each sentence adds value: the first states the action, the second gives a critical prerequisite, the third explains config format and execution, and the fourth details the return. There is no wasted text, making it efficient and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (creation tool with validation prerequisite), rich annotations, and the presence of an output schema, the description is complete. It covers purpose, usage guidelines, behavioral details like validation and ID usage, and references to sibling tools, without needing to explain return values since an output schema exists.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, providing clear details for both parameters (name and config). The description adds some semantics by noting that the config uses the 'same JSON format' as workflow_specs_run and workflow_specs_validate, but this is minimal enhancement. Baseline 3 is appropriate as the schema already does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Create and save') and resource ('a new Workflow in the workspace'), making the purpose specific. It distinguishes from sibling tools like workflows_get, workflows_list, workflows_update, and workflows_run by focusing on creation rather than retrieval, listing, modification, or execution.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: it instructs to 'Always validate the config with workflow_specs_validate before creating the workflow,' naming a specific alternative tool. It also mentions that the created workflow can be executed via workflows_run, offering context on related actions without redundancy.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_getBRead-onlyIdempotentInspect
Get details for a saved workflow.
| Name | Required | Description | Default |
|---|---|---|---|
| workflow_id | Yes | Workflow URL slug or ID |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already cover key behavioral traits (readOnlyHint: true, destructiveHint: false, etc.), so the bar is lower. The description adds minimal context beyond this, stating it retrieves details but not elaborating on aspects like error handling or response format. It doesn't contradict annotations, but offers limited additional behavioral insight.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with no wasted words, making it highly concise and front-loaded. Every part of the sentence directly contributes to understanding the tool's purpose without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter), rich annotations, and the presence of an output schema, the description is reasonably complete. It covers the basic action, though it could benefit from more context on usage relative to siblings or error cases, but the structured data compensates well.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents the single parameter 'workflow_id'. The description adds no extra meaning about parameters beyond what the schema provides, such as format examples or usage tips, meeting the baseline for high coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Get') and resource ('details for a saved workflow'), making the purpose evident. However, it doesn't differentiate from sibling tools like 'workflows_list' or 'workflows_get' (if another exists), which would require more specificity to score a 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives such as 'workflows_list' for listing workflows or 'workflows_get' for other retrieval contexts. It lacks explicit when/when-not instructions or named alternatives, offering only basic usage context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_listARead-onlyIdempotentInspect
List saved workflows in the current workspace.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already cover key behavioral traits (read-only, open-world, idempotent, non-destructive), so the bar is lower. The description adds minimal context beyond this, specifying the scope ('current workspace') but not detailing aspects like pagination, sorting, or response format. It doesn't contradict annotations, providing some value but limited behavioral insight.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. Every word earns its place, making it highly concise and well-structured for quick understanding.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (0 parameters, no nested objects), rich annotations, and presence of an output schema, the description is reasonably complete. It covers the basic purpose and scope, though it could benefit from more usage guidance or behavioral details to be fully comprehensive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0 parameters and 100% schema description coverage, the baseline is high. The description doesn't need to explain parameters, and it appropriately avoids redundant information. It adds value by clarifying the resource scope ('saved workflows in the current workspace'), which is useful semantic context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List') and resource ('saved workflows'), with the scope 'in the current workspace' providing useful context. However, it doesn't explicitly differentiate from sibling tools like 'workflow_blocks_list' or 'projects_list', which also list resources in the workspace, missing full sibling distinction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'workflows_get' (for a single workflow) or 'workflow_specs_run' (for execution). It lacks explicit when/when-not instructions or named alternatives, offering only basic context without usage differentiation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_specs_runAInspect
Execute a Workflow from an inline JSON definition.
Unlike run_workflow which runs a previously saved workflow by ID,
this tool accepts a full workflow JSON spec and executes it directly.
This is useful for testing workflows before saving them.
IMPORTANT: Always call workflow_specs_validate first to check the
definition is valid before running it.
IMPORTANT: If processing more than 10 images, spawn a sub-agent to run this tool in the background so the user is not blocked.
Returns workflow outputs as defined by the workflow's output blocks.
| Name | Required | Description | Default |
|---|---|---|---|
| images | Yes | Map of input names to image values (HTTPS URLs or base64). Example: {'image': 'https://...'} | |
| parameters | No | Optional runtime parameters defined in the workflow | |
| specification | Yes | Full Workflow JSON definition with 'version', 'inputs', 'steps', and 'outputs' |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations cover basic hints (e.g., not read-only, not destructive), but the description adds valuable behavioral context beyond annotations: it advises validation before execution, recommends background processing for large image sets to prevent user blocking, and mentions that it returns workflow outputs. This enhances transparency without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with the core purpose, followed by sibling differentiation and usage guidelines. Every sentence adds value (e.g., testing use case, validation requirement, background processing advice, return information), with no wasted words, making it efficient and clear.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (executing workflows with inline JSON), the description is complete: it covers purpose, sibling differentiation, usage guidelines, behavioral advice (validation and background processing), and output information. With annotations providing safety hints and an output schema existing, the description adds necessary context without redundancy, making it fully adequate for agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description does not add specific parameter details beyond what the schema provides (e.g., it mentions 'full workflow JSON spec' but doesn't elaborate on structure). Baseline 3 is appropriate as the schema handles parameter documentation adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Execute[s] a Workflow from an inline JSON definition,' specifying the verb (execute) and resource (workflow). It explicitly distinguishes it from sibling 'run_workflow' by noting this tool uses an inline JSON spec versus a saved workflow ID, providing clear differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidelines: it states when to use this tool (for testing workflows before saving) versus 'run_workflow' (for saved workflows by ID). It includes two IMPORTANT notes: always call 'workflow_specs_validate' first for validation, and spawn a sub-agent for more than 10 images to avoid blocking the user, offering clear when/when-not and alternative actions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflow_specs_validateARead-onlyIdempotentInspect
Validate a Workflow JSON definition without executing it.
Use this tool to check whether a workflow definition is syntactically and semantically correct before saving or running it. The definition should follow the standard Workflow format with version, inputs, steps, and outputs.
IMPORTANT: Always validate a workflow definition before running it.
Example workflow definition — detects objects, enlarges bounding boxes, crops, runs a second detection filtering for dogs, and classifies the breed only when exactly one dog is found:
.. code-block:: json
{
"version": "1.0",
"inputs": [
{"type": "WorkflowImage", "name": "image"}
],
"steps": [
{
"type": "ObjectDetectionModel",
"name": "first_detection",
"image": "$inputs.image",
"model_id": "yolov8n-640"
},
{
"type": "DetectionsTransformation",
"name": "enlarging_boxes",
"predictions": "$steps.first_detection.predictions",
"operations": [
{"type": "DetectionsOffset", "offset_x": 50, "offset_y": 50}
]
},
{
"type": "Crop",
"name": "first_crop",
"image": "$inputs.image",
"predictions": "$steps.enlarging_boxes.predictions"
},
{
"type": "ObjectDetectionModel",
"name": "second_detection",
"image": "$steps.first_crop.crops",
"model_id": "yolov8n-640",
"class_filter": ["dog"]
},
{
"type": "ContinueIf",
"name": "continue_if",
"condition_statement": {
"type": "StatementGroup",
"statements": [
{
"type": "BinaryStatement",
"left_operand": {
"type": "DynamicOperand",
"operand_name": "prediction",
"operations": [{"type": "SequenceLength"}]
},
"comparator": {"type": "(Number) =="},
"right_operand": {
"type": "StaticOperand",
"value": 1
}
}
]
},
"evaluation_parameters": {
"prediction": "$steps.second_detection.predictions"
},
"next_steps": ["$steps.classification"]
},
{
"type": "ClassificationModel",
"name": "classification",
"image": "$steps.first_crop.crops",
"model_id": "dog-breed-xpaq6/1"
}
],
"outputs": [
{
"type": "JsonField",
"name": "dog_classification",
"selector": "$steps.classification.predictions"
}
]
}Key patterns shown above:
$inputs.<name>references a workflow input.$steps.<step_name>.<output>references another step's output.ContinueIfenables conditional branching based on runtime values.Steps can chain: detect → transform → crop → detect → classify.
Returns validation status. A valid workflow returns
{"status": "ok"}. An invalid one returns error details.
| Name | Required | Description | Default |
|---|---|---|---|
| specification | Yes | Full Workflow JSON definition with 'version', 'inputs', 'steps', and 'outputs' |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations already declare readOnlyHint=true, destructiveHint=false, openWorldHint=true, and idempotentHint=true, covering safety and idempotency. The description adds valuable context beyond annotations: it explains what validation entails ('syntactically and semantically correct'), provides a detailed example of the expected format, and describes the return behavior ('Returns validation status... A valid workflow returns {"status": "ok"}. An invalid one returns error details'). This goes beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with clear purpose and usage guidelines, but the extensive example (over 50 lines of JSON) makes it lengthy. While the example is informative, it could be shortened or moved to documentation. The structure is logical but not optimally concise for a tool description.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of workflow validation, the description is complete: it covers purpose, usage, parameter semantics with examples, and behavioral outcomes. With annotations covering safety/idempotency and an output schema implied by the return description, no critical gaps remain. It adequately prepares an agent for correct tool invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents the single 'specification' parameter. The description adds significant semantic context: it explains what the specification should contain ('version, inputs, steps, and outputs'), provides a comprehensive example with key patterns, and clarifies the expected JSON structure. This adds meaning beyond the schema's basic description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('validate a Workflow JSON definition') and distinguishes it from siblings like 'workflow_specs_run' by emphasizing it's 'without executing it'. It explicitly mentions what it validates ('syntactically and semantically correct') and what resource it works on ('workflow definition').
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('before saving or running it') and includes an 'IMPORTANT' directive to 'Always validate a workflow definition before running it'. It distinguishes from alternatives by contrasting validation vs. execution, though it doesn't name specific sibling tools beyond the implied distinction from run tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_runAInspect
Execute a saved Workflow on one or more images.
Runs a previously created Workflow against the provided images on the Roboflow serverless infrastructure.
IMPORTANT: If processing more than 10 images, spawn a sub-agent to run this tool in the background so the user is not blocked.
Returns workflow outputs as defined by the workflow's output blocks.
| Name | Required | Description | Default |
|---|---|---|---|
| images | Yes | Map of input names to image values (HTTPS URLs or base64). Example: {'image': 'https://...'} | |
| parameters | No | Optional runtime parameters (e.g. confidence thresholds, class filters) | |
| workflow_id | Yes | Workflow ID to execute |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it specifies the execution environment ('Roboflow serverless infrastructure'), provides performance guidance for large batches (sub-agent for >10 images), and describes what gets returned ('workflow outputs as defined by the workflow's output blocks'). Annotations cover basic safety (non-destructive, non-idempotent, open-world) but the description adds practical implementation details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly structured and concise: first sentence states the core purpose, second provides technical context, third gives critical usage guidance, and fourth explains returns. Every sentence earns its place with no wasted words, and important information is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (execution tool with performance considerations), rich annotations, complete schema coverage, and presence of an output schema, the description provides excellent contextual completeness. It covers purpose, execution environment, scaling guidance, and return values - everything needed beyond what's in structured fields.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents all three parameters thoroughly. The description doesn't add significant parameter semantics beyond what's in the schema - it mentions 'images' and 'workflow_id' but doesn't provide additional context about format, constraints, or usage patterns that aren't already in the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Execute a saved Workflow') on specific resources ('on one or more images') and distinguishes it from siblings by focusing on running existing workflows rather than creating, listing, or validating them. It explicitly mentions using 'Roboflow serverless infrastructure' which adds specificity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('Execute a saved Workflow') and when to modify usage patterns ('If processing more than 10 images, spawn a sub-agent to run this tool in the background'). It distinguishes from siblings like workflows_create, workflows_list, and workflow_specs_run by focusing on execution of existing workflows.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
workflows_updateAIdempotentInspect
Update an existing saved Workflow's name and definition.
IMPORTANT: Always validate the config with workflow_specs_validate before updating the workflow.
Use workflows_get to retrieve the current workflow first, then modify the config as needed.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Updated workflow name | |
| config | Yes | Updated workflow JSON definition | |
| workflow_id | Yes | Workflow document ID — NOT the URL slug. Get it from workflows_list or workflows_create response. |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate this is a non-destructive, idempotent mutation (readOnlyHint: false, destructiveHint: false, idempotentHint: true). The description adds valuable context beyond annotations: it emphasizes the importance of validation before updating and recommends retrieving the current workflow first, which provides practical behavioral guidance not captured in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with three sentences: purpose statement, important prerequisite, and usage recommendation. Each sentence adds clear value with zero waste, and it's appropriately front-loaded with the core purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has annotations covering safety profile, 100% schema coverage, and an output schema (implied by context signals), the description provides excellent contextual completeness. It adds crucial workflow-specific guidance about validation and retrieval that complements the structured data, making it fully adequate for this mutation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all three parameters. The description mentions 'name and definition' which aligns with the schema but doesn't add significant semantic detail beyond what's already in the parameter descriptions (e.g., workflow_id clarification about document ID vs URL slug is already in schema). Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Update an existing saved Workflow's name and definition'), identifies the resource ('Workflow'), and distinguishes it from siblings like workflows_create (create new) and workflows_get (retrieve). It uses precise verbs and scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly provides when-to-use guidance: 'Always validate the config with workflow_specs_validate before updating' and 'Use workflows_get to retrieve the current workflow first, then modify the config as needed.' It names specific alternative tools for validation and retrieval.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!