switch
Server Details
Generate, manage and explore your Switch AI image and video library, scoped to your account.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.1/5 across 29 of 29 tools scored. Lowest: 3.1/5.
Tools have mostly distinct purposes, especially the apply_* tools each targeting a specific photographic style. However, some conceptual overlap exists between apply_iphone_realism and apply_ugc, which both simulate casual, amateur-like captures, potentially confusing an agent.
Tool names predominantly follow a verb_noun pattern (e.g., apply_cinematic_anamorphic, list_generations). Minor inconsistencies exist, such as 'lip_sync_video' vs 'talking_avatar_video' and the single noun 'voice', but the overall convention is clear and predictable.
With 29 tools, the server is rich but not excessive given the broad scope of image/video generation, style application, asset management, and account utilities. Each tool serves a distinct function, though the high count may feel heavy for simpler use cases.
The tool surface covers core workflows like generating media, applying styles, managing folders, and checking balance. However, notable gaps exist: there is no delete or cancel operation for assets or jobs, and editing capabilities are absent, which may hinder full lifecycle management.
Available Tools
32 toolsapply_cinematic_anamorphicApply Cinematic AnamorphicARead-onlyIdempotentInspect
ARRI Alexa anamorphic widescreen film look. Choose grade: warm golden, cool noir, or moody desaturated.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | warm_golden = late-afternoon honey. cool_noir = neon-fill desaturated. moody_desaturated = soft window low-contrast. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint and idempotentHint are true, which is not contradicted. However, the description does not elaborate on behavioral aspects (e.g., whether it modifies the existing asset, returns a new image, or requires specific input). It adds minimal behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences front-load the core purpose and key options. No redundant or extraneous information; every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description lacks explanation of the return value (no output schema) and the role of the 'subject' parameter. It does not specify what the tool outputs (e.g., a modified image or a new generation) or how the style is applied, leaving gaps in contextual understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with full descriptions for both parameters. The description only summarizes the style enum labels without adding new meaning; it omits the 'subject' parameter entirely. Thus it does not enhance understanding beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it applies an ARRI Alexa anamorphic widescreen film look and lists three specific grades. This distinguishes it from sibling tools like 'apply_iphone_realism' or 'apply_high_fashion_editorial', making purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies the tool is for achieving cinematic anamorphic looks but does not explicitly state when to use it versus alternatives (e.g., 'apply_movie_scene'). No guidance on when not to use or how the grades differ contextually.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_graphic_editorial_portraitApply Graphic Editorial PortraitBRead-onlyIdempotentInspect
Sharp graphic editorial portrait — premium fashion-magazine grade, hard graphic composition. Classic studio or golden-hour outdoor.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | classic = Hasselblad H6D studio. golden_hour = Canon R5 outdoor. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=true and idempotentHint=true, which already signal safe behavior. The description adds aesthetic context but does not disclose additional traits like processing time, quality guarantees, or any constraints beyond the schema.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is just two sentences: the first succinctly captures the tool's purpose and tone, the second lists the two variants. Every word contributes, and it is front-loaded with the core concept.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter tool with complete annotations, the description is minimally adequate. However, it lacks any mention of output format, expected results, or side effects, which would be helpful despite no output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and the description enriches the 'style' parameter by mapping enum values to specific camera setups (Hasselblad H6D studio vs. Canon R5 outdoor). The 'subject' parameter is simply restated without added value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it applies a 'graphic editorial portrait' with specific attributes like 'premium fashion-magazine grade' and 'hard graphic composition'. This distinguishes it from sibling tools like 'apply_high_fashion_editorial' and 'apply_cinematic_anamorphic', but does not explicitly differentiate them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives such as 'apply_magic_hour_portrait' or 'apply_high_fashion_editorial'. The agent is left to infer context from the style description alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_high_fashion_editorialApply High Fashion EditorialARead-onlyIdempotentInspect
High-fashion magazine cover/editorial energy. Choose a photographer mood: Mario Testino glossy, Steven Klein dark cinematic, Inez & Vinoodh hard-flash, Annie Leibovitz painterly, Tim Walker dreamlike, Peter Lindbergh black-and-white natural, or Cass Bird off-duty.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | Photographer attribution drives the lighting + camera + grade stack. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint, and the description does not contradict these. It adds context that style drives lighting, camera, and grade stack, but does not disclose other behavioral traits beyond that.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, front-loaded with purpose, and efficient. No wasted words, but could be slightly more structured with bullet points for clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (image style application) and complete schema, the description is sufficient. No output schema required. Provides enough context for an AI agent to use it correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds valuable interpretation: style enum as 'photographer attribution drives the lighting + camera + grade stack' and subject with concrete examples, enhancing schema meaning beyond parameter descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it applies high-fashion magazine cover/editorial energy with specific photographer moods, effectively differentiating from siblings like 'apply_graphic_editorial_portrait' and 'apply_cinematic_anamorphic'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description lacks explicit guidance on when to use this tool versus alternatives, no when-not-to-use notes, and no mention of prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_iphone_realismApply Iphone RealismARead-onlyIdempotentInspect
Phone-shot amateur look — looks like a real person snapped it on their phone. Casual, candid, pore-level real, no professional gloss. Three flavors: digital phone, 35mm film point-and-shoot, or off-duty intimate.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | digital_phone = Sony A7IV + 50mm f/1.4 GM phone-style realism. film_pointshoot = Contax T2 35mm Portra 400. off_duty_intimate = Cass Bird natural-window editorial. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint and idempotentHint, suggesting non-destructive operation. The description adds stylistic details but does not elaborate on behavioral aspects like what exactly happens to an image (e.g., filtering or generation). It does not contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise with two front-loaded sentences: one defining purpose, one listing flavors. No unnecessary words or repetition.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (2 params, no output schema), the description covers purpose, style options, and subject input. It does not explain output format or operational context, but this is sufficient for a style-application tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%. The description provides extra meaning for each enum style (e.g., camera and film references) and clarifies the 'subject' parameter, adding value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool applies a 'Phone-shot amateur look' and specifies three distinct flavors (digital_phone, film_pointshoot, off_duty_intimate), making it distinct from sibling tools like cinematic or editorial.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for casual, candid realism but does not explicitly state when to use this tool versus alternatives or provide exclusions. The context from sibling tool names helps, but the description itself lacks direct guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_magic_hour_portraitApply Magic Hour PortraitARead-onlyIdempotentInspect
Golden-hour rim-light editorial portrait. Choose camera: Canon R5 + 85mm f/1.2 or Hasselblad H6D + 80mm.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | canon_85mm = Canon R5 portrait standard. hasselblad_80mm = medium-format luxury. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint=true and idempotentHint=true. The description adds that the tool creates a golden-hour portrait effect, but does not elaborate on any additional behavioral traits beyond what annotations specify. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with the core purpose. No filler words. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (2 params, high schema coverage, annotations present), the description is mostly complete. It lacks an explicit statement that the tool generates an image, but this is clear from context. Minor gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, with descriptions for both parameters. The description repeats the camera options but does not add new semantic info beyond the enum values already documented in the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool applies a golden-hour rim-light editorial portrait style, and specifies two camera options. This verb+resource combination distinguishes it from sibling tools like apply_cinematic_anamorphic or apply_graphic_editorial_portrait.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus other portrait styles like apply_graphic_editorial_portrait or apply_high_fashion_editorial. The description only lists camera options without contextual usage advice.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_movie_sceneApply Movie SceneARead-onlyIdempotentInspect
Put me in a movie — full cinematic film look matching specific film genres. Choose: neon-noir action thriller, 80s finance excess, comic-book superhero blockbuster, video-game key art, or generic action thriller.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | neon_noir_action = wet streets + neon + anamorphic. glamour_finance_excess = 1980s Wall Street mahogany / gold. superhero_blockbuster = comic-book key art. video_game_character = Unreal-Engine character render. generic_action_thriller = ARRI cinematic. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint and idempotentHint, so the description adds non-conflicting context by listing genre options, but does not disclose additional behavioral traits such as whether the tool returns an image or video.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and front-loaded, with a clear hook and a bullet-like list of options. Every sentence adds value without waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has only 2 parameters and no output schema, the description covers the style selection but omits what the output looks like (image vs video) and lacks usage guidelines, making it moderately complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description does not add meaning beyond what is in the schema, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool applies a cinematic film look with specific genres. It uses a specific verb and resource, but does not explicitly differentiate from sibling tools like apply_cinematic_anamorphic, leaving some ambiguity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies the tool is for applying movie-like styles with a list of genres, but it lacks explicit guidance on when to use this tool versus alternatives, and no exclusions or context are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_productApply ProductBRead-onlyIdempotentInspect
Product photography. Choose: clean studio hero shot, real-world lifestyle, extreme macro detail, or top-down flat lay.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | clean_studio = seamless backdrop hero. lifestyle = product in use. macro_detail = extreme close-up texture. flat_lay = top-down catalog. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, non-destructive behavior. The description adds no further behavioral details (e.g., effects on input, return state), which is acceptable given the annotation coverage, but does not contradict them.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise with two sentences, immediately stating the tool's purpose and listing key choices. Every word is purposeful, with no redundancy or filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter tool with complete schema descriptions, the high-level overview suffices but lacks integration context (e.g., what input it operates on, what output it produces). Without output schema, the agent may not fully understand the tool's role in the workflow.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers 100% of parameters with detailed descriptions (e.g., style options explained). The tool description only summarizes enum values without adding new meaning, so it provides marginal value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Product photography' and lists four style options, clearly indicating the tool applies a product photography aesthetic. It differentiates from sibling tools by its product focus. However, it does not explicitly state the action (e.g., 'apply' or 'generate'), relying on the title.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus sibling tools like apply_ugc or apply_magic_hour_portrait. The description does not specify appropriate contexts, prerequisites, or exclusions, leaving the agent without decision support.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_travelApply TravelARead-onlyIdempotentInspect
Luxury travel + hotel editorial. Real architecture is preserved exactly (no inventing buildings). Choose subject: hotel hero, rural property, scenic view, drone aerial, lifestyle moment, or interior. If you attach a reference image of a real property, the architecture lock kicks in automatically.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | hotel_hero = property is the star. rural_property = country estate. scenic_view = pure landscape. drone_aerial = top-down or 45° from above. lifestyle = model + destination. interior = inside the property. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description adds context beyond annotations: 'Real architecture is preserved exactly' and auto-lock for reference images. Annotations already indicate readOnly and idempotent, so description reinforces safe operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences front-loaded with purpose, a critical rule, and subject choices. No filler or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Explains purpose, usage constraints, and subject options. Lacks description of return value or output behavior, but this is partially compensated by idempotent/read-only annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline 3. Description repeats enum values but doesn't add new detail beyond schema's parameter descriptions. No amplification of parameter meaning.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states 'Luxury travel + hotel editorial' and enumerates specific subjects. Distinguishes from sibling apply tools by domain (travel/hotel) and unique architecture preservation rule.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Describes the tool's domain (travel/hotel editorial) and provides a key behavioral rule ('real architecture preserved exactly'). However, no explicit comparison to sibling tools or when-not-to-use scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_ugcApply UgcBRead-onlyIdempotentInspect
User-generated content — looks like a real person captured it casually. Choose: phone shot, film point-and-shoot, mirror selfie, or car selfie.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | phone_shot = iPhone-style snap. film_pointshoot = Contax T2 grain. mirror_selfie = bathroom/bedroom mirror. car_selfie = inside-the-car phone. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint, so the description does not need to re-state safety. The description adds the style options but no extra behavioral context (e.g., effect on existing media, required references).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is short and front-loaded with purpose. It lists styles concisely, though a more structured format (e.g., bullet points under a sentence) could improve readability.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With only 2 parameters and no output schema, the description should explain what the tool produces (e.g., a transformed image) and how the 'subject' parameter is used. Currently, it remains vague about the output and usage context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with adequate parameter descriptions. The tool description adds stylistic context but no additional semantic value beyond what the schema already provides for the parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool applies a user-generated content aesthetic, listing specific styles. However, it could be more explicit about the action (e.g., 'applies a UGC filter to an image' rather than just describing the style).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus sibling tools like apply_iphone_realism or apply_movie_scene. The description only lists style options without providing selection criteria or exclusion conditions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apply_wellnessApply WellnessARead-onlyIdempotentInspect
Wellness / yoga / fitness / lifestyle campaign — warm amber tropical, tropical paradise cinematic, or high-key cyan beach.
| Name | Required | Description | Default |
|---|---|---|---|
| style | Yes | warm_amber_tropical = warm honey grade with golden haze. hanalei_cinematic = soft golden mist + infinity pool reflection. high_key_cyan_beach = bright daylit cyan ocean. | |
| subject | No | What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint and idempotentHint true, which the description does not contradict. The description adds no further behavioral context beyond the style options.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence with no redundancy; information is front-loaded with the domain and three distinct style options, achieving high efficiency.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool simplicity (2 parameters, no output schema), the description adequately covers its purpose and style options, though additional guidance on when to use this vs. siblings could improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters described in the schema. The tool description adds minimal additional meaning, mainly summarizing the style variants already detailed in the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool applies a 'Wellness / yoga / fitness / lifestyle campaign' with specific style options, distinguishing it from sibling tools like apply_movie_scene or apply_product.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies use for wellness-related campaigns but does not provide explicit when-to-use or when-not-to-use guidance, nor alternative tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
cancel_my_taskCancel TaskAIdempotentInspect
Stop one of your generation tasks by task id — works on queued AND running tasks. Already-saved images stay in your library; nothing is deleted or refunded. Returns how many images were saved out of how many you requested.
| Name | Required | Description | Default |
|---|---|---|---|
| taskId | Yes | Task id from generate_image or list_my_tasks. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate idempotentHint=true and readOnlyHint=false. The description adds key behavioral details: nothing is deleted or refunded, and it returns a count of saved/requested images. This goes beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, zero fluff. The main purpose is front-loaded, and each sentence adds new information (scope, side effects, return value).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter tool with no output schema and good annotations, the description fully covers purpose, scope, side effects, and return value. Nothing essential is missing.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds little beyond the schema for 'taskId', but does provide context that it works on tasks from specific generators.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Stop') and resource ('generation tasks'), clarifies scope ('queued AND running tasks'), and explicitly states what it does not affect ('Already-saved images stay'). This clearly distinguishes it from sibling tools like 'generate_image' or 'list_generations'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description states when to use (to cancel queued or running tasks). It does not explicitly mention when not to use or name alternatives, but the context is clear enough for an agent to decide.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
check_balanceCheck BalanceARead-onlyIdempotentInspect
Check your daily Switch spending — what you have spent today, your daily limit, and what is remaining. Optionally pass an estimatedCost (USD) to also get whether you can afford it.
| Name | Required | Description | Default |
|---|---|---|---|
| estimatedCost | No | Optional dollar amount to test against your daily limit. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations confirm read-only and idempotent behavior. The description adds value by detailing what information is returned (spent, limit, remaining) and the optional cost check.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences cover the tool's core function and optional capability without waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple tool with one optional parameter and no output schema, the description fully covers what the agent needs to know.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers the single parameter fully. The description adds meaning by explaining the parameter's purpose (test against daily limit).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool checks daily Switch spending, limit, and remaining, with an optional affordability test. It is distinct from sibling creative tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains the tool's function and the optional parameter, though it does not explicitly contrast with siblings. The context makes the usage clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
check_job_statusCheck Job StatusARead-onlyIdempotentInspect
Polling-friendly status check for one of your tasks. Returns a slim shape with status, progressPct, and eta so you can poll without refetching the full payload.
| Name | Required | Description | Default |
|---|---|---|---|
| taskId | Yes | Task id to check. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare read-only and idempotent. Description adds valuable context: 'polling-friendly' and 'slim shape', enhancing transparency without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, no fluff. Efficiently conveys purpose, return shape, and polling suitability.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple tool with annotations and full schema coverage, the description is complete. It explains the return value and use case without needing more detail.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear parameter description. Description adds no additional meaning beyond 'Task id to check.' Baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it is a polling-friendly status check for a task, returning a slim shape with status, progressPct, and eta. Differentiates from siblings like get_video_status by emphasizing lightweight polling.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies use for polling status but does not explicitly state when not to use or mention alternatives among the many sibling tools. No exclusion criteria given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
explore_modelsExplore ModelsARead-onlyIdempotentInspect
Browse the image-generation models available to your Switch account. Returns model id, display name, brand, and credits-per-image so you can pick one before calling generate_image.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint, so the safety profile is clear. Description adds account-specific scope and return fields but minimal extra behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with action verb, followed by key details. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a zero-parameter listing tool, the description fully covers purpose, output contents, and usage context. No additional details are needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters exist, schema coverage is 100%. The description doesn't need to add param info, and baseline for 0 params is 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it browses image-generation models for the account, lists returned fields (id, name, brand, credits-per-image), and specifies its role before calling generate_image. Distinguishes from siblings like generate_image and list_video_models.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises using this before generate_image to pick a model. Although it lacks explicit when-not-to-use guidance, the context of a simple listing tool makes this adequate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_imageGenerate ImageAInspect
Generate one or more Switch images. Auto-routes to the right model based on subject (Nano Banana 2 default, GPT Image 2 for swimwear/beach, Switch Model/Ultra/Pro for sexier content, Nano Banana Pro for typography-heavy). Counts <= 8 render inline in chat; counts > 8 queue to your Switch Studio with progress polling. All images persist to your Studio library and folder. Pass an optional style (e.g. "wellness/warm_amber_tropical", "high_fashion_editorial/testino_glossy", "movie_scene/neon_noir_action") to apply a curated photographic stack from the apply_* skill tools.
| Name | Required | Description | Default |
|---|---|---|---|
| count | No | How many images to generate. Default 4. <= 8 returns inline, > 8 queues to Studio. Beta limit: max 50 per request — larger asks are capped at 50 and the response says so. | |
| model | No | Optional explicit model. If omitted, auto-routed based on subject content (see tool description). | |
| style | No | Optional curated style stack from the apply_* skill tools. Format "<skill>/<style_key>", e.g. "wellness/warm_amber_tropical" or "high_fashion_editorial/leibovitz_painterly". | |
| subject | Yes | Plain-English description of what to generate. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony, model wearing a robe". | |
| folder_name | No | Optional Switch Studio folder name. Auto-created if missing. Defaults to the chat-derived title. | |
| aspect_ratio | No | Image aspect ratio. Default 9:16 (vertical, social-friendly). | |
| reference_image_urls | No | Optional public image URLs to use as face/body/scene references. |
Output Schema
| Name | Required | Description |
|---|---|---|
| asset | No | |
| images | No | |
| _widget | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses key behavioral traits: auto-routing to models, persistence to Studio library, inline vs queue behavior based on count, and the ability to optionally apply curated styles. No annotation contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear front-loading of purpose. It packs useful information but could be slightly more concise, e.g., the list of style examples is verbose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description lacks explicit mention of return format (e.g., image URLs or IDs) and does not detail error cases. It adequately covers count thresholds and persistence but leaves some ambiguity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, baseline 3. The description adds value by explaining auto-routing logic for model, linking style to apply_* tools, and clarifying count behavior. It goes beyond schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it generates Switch images, auto-routes to the appropriate model based on subject, and allows optional style application. It distinguishes from sibling apply_* tools by mentioning auto-routing and embedding style capability.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides when to use (generate images), mentions count thresholds for inline vs queue, and hints at style usage from apply_* tools. However, it does not explicitly exclude alternatives or state when not to use this tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_videoGenerate VideoAInspect
Generate Switch video across the real provider lineup (Kling, Seedance, Switch Video/WAN 2.7, Switch Video Edit, Topaz upscale) and modes (text-to-video, image-to-video, frame-to-frame, motion, omni, reference-to-video, video-edit, upscale). ALWAYS call list_video_models first to pick the right model + mode and see its required inputs. Pass one shot, or shots:[...] for a storyboard (max 4 by default, hard max 10) where EACH shot is DIFFERENT — never repeat one prompt to get copies. Renders async (~30-90s); a background job delivers each clip to the library. Returns a task_id per shot — poll get_video_status or list_my_videos.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | Video mode. Must be supported by the chosen model (see list_video_models). | |
| audio | No | Omni / Seedance refs: generate audio. Omni is ON by default; set false for a silent clip. Other models ignore this. See list_video_models for which models generate audio and the max seconds with vs without audio. | |
| model | No | Model id from list_video_models (e.g. kling-v3, seedance-2.0-t2v, wan-2.7-t2v, topaz). Or prefer option_id from list_video_models. | |
| shots | No | A storyboard of 1-10 DISTINCT shots. Each item takes the same fields as a single shot (subject, model, mode, image_url, etc.). | |
| subject | No | The shot: subject + motion + scene (video needs motion language, e.g. "slow push-in"). | |
| duration | No | Clip length in seconds. Default 5. 15s only on Switch Video (WAN); other models cap at 10 — see list_video_models. | |
| image_url | No | Required for image-to-video / frame-to-frame / motion. A storage-backed (Switch) image URL. | |
| option_id | No | Optional catalog id from list_video_models (e.g. "kling-image"); use instead of model+mode. | |
| video_url | No | Required for video-edit and upscale (the source clip). Must be a publicly downloadable https URL. | |
| aspect_ratio | No | e.g. 9:16, 16:9, 1:1. Must be allowed for the model (see list_video_models). | |
| end_image_url | No | End frame for frame-to-frame mode. | |
| reference_audio_urls | No | Seedance reference/omni only: up to 3 reference audio files to drive synthesized audio. Requires at least one reference image or video. | |
| reference_image_urls | No | Reference images. Seedance reference/omni accepts up to 9; Kling Omni accepts up to 7. For Seedance, at least one image or video reference is required. | |
| reference_video_urls | No | Seedance reference/omni only: up to 3 reference video clips for motion/style guidance. A Seedance video ref can satisfy the required visual anchor. NOTE: the AUDIO track of these clips is IGNORED — never extracted or preserved. | |
| character_orientation | No | Motion mode only: follow the character image (default) or the reference video. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses async behavior (~30-90s), background job delivery, max shots (4 default, 10 hard max), and distinctions between modes. Annotations only provide readOnlyHint=false, so the description adds significant behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is somewhat long but well-structured, starting with the main purpose, followed by instructions and details. It could be slightly more concise, but every sentence adds value and is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (15 parameters, no output schema), the description is remarkably complete. It covers the workflow, constraints, async nature, and how to get results without missing crucial information.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds value by grouping parameters (e.g., 'Pass one shot, or shots:[...]') and providing usage context (e.g., 'each shot is DIFFERENT'), which helps beyond the schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it generates video across multiple providers and modes, distinguishing it from sibling tools like 'generate_image' by mentioning specific providers and the need to call 'list_video_models' first.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly instructs to call 'list_video_models' first to pick the right model and mode, explains when to use single shot vs. storyboard, and tells how to retrieve results (poll 'get_video_status' or 'list_my_videos'). No alternative generation tools are mentioned, but the flow is complete.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_my_active_referencesGet Active ReferencesARead-onlyIdempotentInspect
Read the user's active reference strip in Switch Studio — the typed slots (face, body, outfit, scenery, product, general) the user fills with reference images before generating. Returns the count, per-type breakdown, and the refs themselves with their type labels and URLs. Call this BEFORE generate_image whenever the user says "use my refs", "use my reference images", references images they prepared in Studio, or wants to generate from a scene they already laid out. The strip is the bridge: pictures the user drops into Studio land here, and Studio's own generations read from here. Pass the returned URLs into generate_image's reference_image_urls so the same refs anchor the result.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint and idempotentHint, so no contradiction. The description adds behavioral context by explaining the reference strip concept and that it returns specific data, which is consistent with a safe read operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is moderately lengthy but every sentence adds value: purpose, usage trigger, and data flow. It could be slightly more concise but is well-structured with front-loaded purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has 0 parameters and no output schema, but the description fully explains the return structure (count, breakdown, refs with type labels and URLs) and provides sufficient context for an agent to know when and how to use it.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters exist, so schema coverage is 100%. The description does not need to add parameter details. The baseline for 0 params is 4, and the description is clear about the tool's zero-input nature.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it reads the user's active reference strip, listing specific typed slots and what is returned (count, breakdown, refs with type labels and URLs). It distinguishes from sibling tools like generate_image by focusing on the reference retrieval step.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly instructs to call this BEFORE generate_image when the user mentions 'use my refs' or similar phrases. Explains the strip as a bridge and directs to pass returned URLs into generate_image's reference_image_urls parameter.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_video_statusGet Video StatusARead-onlyIdempotentInspect
Check the status of one of your video jobs by task_id (from generate_video) or job_id. Returns status, a viewable view_url when finished, or the error if it failed. Poll this every ~20s — do not loop rapidly.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | No | Alternatively, the job_id. | |
| task_id | No | Task id returned by generate_video. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true, so no contradiction. The description adds value by explaining return values (status, view_url, error) and polling behavior, which goes beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences with front-loaded purpose. Every sentence adds value: purpose and parameters, returns, and polling guidance. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description explains the return values (status, view_url, error) fully. It covers necessary context: identifiers, polling interval, and relationship to generate_video. Complete for a simple status-check tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear descriptions for both parameters. The description adds minimal extra context (e.g., that they are alternatives), but does not significantly augment what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool checks the status of video jobs using task_id or job_id. It specifies the resource (video jobs) and the action (check status), and the mention of 'from generate_video' distinguishes it from general job status tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit polling guidance ('Poll this every ~20s — do not loop rapidly') and context for when to use it (after generate_video). It does not explicitly state when not to use it or list alternatives among siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
lip_sync_videoLip Sync VideoAInspect
Lip-sync audio onto a face in a video (Kling). Three steps you orchestrate: (1) action="identify-face" with video_url to detect faces (video must be MP4/MOV, 2-60 seconds, <=100MB, 720p or 1080p); (2) action="create" with session_id + a face_id + audio (sound_file as a base64 data URI, or an audio_id) + timing IN MILLISECONDS (sound_start_time, sound_end_time, sound_insert_time) + optional speech_volume/original_audio_volume (0-100); (3) action="status" with the task_id to poll — returns a branded SwitchApp view_url when done. Charges credits on create; failed jobs are refunded.
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Which step to run. | |
| face_id | No | create: a face_id from identify-face (one face supported). | |
| task_id | No | status: the task_id from create. | |
| audio_id | No | create: alternative to sound_file — an existing audio id. | |
| video_url | No | identify-face: the source video (MP4/MOV, 2-60s, <=100MB, 720p/1080p). Use a SwitchApp/public URL. | |
| session_id | No | create: from identify-face. | |
| sound_file | No | create: base64 data URI of the audio (e.g. data:audio/mpeg;base64,...). | |
| speech_volume | No | create: 0-100 (default 100). | |
| sound_end_time | No | create: audio end, in MILLISECONDS. | |
| sound_start_time | No | create: audio start, in MILLISECONDS. | |
| sound_insert_time | No | create: where in the video to place the audio, in MILLISECONDS. | |
| original_audio_volume | No | create: 0-100 (default 0). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations only provide readOnlyHint=false. The description adds multi-step orchestration, credit charges, refunds on failure, polling via status, and return of a view_url. This is extensive behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single dense paragraph. It is front-loaded with purpose and contains all necessary information without fluff, but could be structured more clearly (e.g., bullet points for steps).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (multi-step, 12 parameters, no output schema), the description covers video/audio constraints, timing, volume options, billing, and expected output (view_url). It is thorough and leaves little ambiguity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds context: grouping parameters by action, emphasizing timing in milliseconds, clarifying sound_file as base64 data URI, and explaining the relationship between face_id, session_id, etc. This adds meaningful value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Lip-sync audio onto a face in a video (Kling)', which is a specific verb and resource. It clearly distinguishes from sibling tools like talking_avatar_video or generate_video.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly outlines three sequential steps with detailed parameter requirements and constraints (video format, size, duration). It also notes credit charging and refunds. However, it does not explicitly compare to alternatives or state when not to use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_generationsList GenerationsARead-onlyInspect
List your recent and active generation tasks. Returns counts per status (pending / running / completed / failed) plus an array of your tasks with id, status, prompts, model, ref counts, scheduledAt, finishedAt.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Default 10. Max 50. | |
| status | No | "all" for everything, or array like ["pending","running"]. Default: active + recent. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true. The description adds context about the return structure (counts per status, array of tasks with specific fields) and implies read-only behavior. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (3 sentences) and front-loads the core purpose. Every sentence adds value, though the parameter information is redundant with the schema.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simplicity (2 params, no output schema), the description adequately covers the tool's behavior and return structure, including status counts and task fields. Minor gaps like pagination details are acceptable.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed descriptions for limit and status. The description merely repeats the schema's defaults and allowed values without adding new semantic meaning or examples.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it lists your recent and active generation tasks, specifying the returned data including counts per status and an array with key fields. This distinguishes it from sibling tools like list_my_assets or show_generation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides default limits and status behavior but does not explicitly guide when to use this tool versus alternatives like show_generation or list_my_assets. No when-not-to-use guidance is given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_my_assetsList AssetsARead-onlyInspect
Return asset METADATA only (id, truncated prompt, model, created date), newest first. This does NOT display images and must NOT be used to show pictures — if the user says "show me / display my last image(s)", call show_media instead (it renders them; pass count=N for several). Use list_my_assets only when you need ids/metadata for another tool (e.g. move_asset) or a plain text list.
| Name | Required | Description | Default |
|---|---|---|---|
| count | No | Default 20. Max 50. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, so the read-only nature is covered. The description adds that results are ordered newest first and lists returned fields, which is useful behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences, front-loaded with key information. No wasted words. Every sentence serves a purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple tool with one parameter and no output schema, the description covers what the tool does, what it returns, and a use case. It is sufficiently complete for an agent to invoke correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear description for the single parameter 'count' (default 20, max 50). The description does not add further meaning, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it lists recent assets newest first, specifying returned fields. However, it doesn't explicitly distinguish from sibling list tools like list_generations or list_my_folders, though it provides a use case.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly suggests using this tool when looking for an asset ID for show_media or move_asset, providing clear context. It doesn't mention when not to use it or alternatives, but the purpose is well-targeted.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_my_foldersList FoldersARead-onlyIdempotentInspect
List the folders in your Switch library (id, name, parent). Use this to find an existing folder before move_asset or create_folder.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint and idempotentHint. The description adds that it returns id, name, parent, but does not discuss pagination, ordering, or error behavior. Adequate but not extra.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences, front-loaded with action and result, no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple list tool with no output schema, the description covers purpose, fields returned, and usage context. Could mention if folders are top-level only but generally complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters exist, and schema coverage is 100%. Baseline is 4; no parameter explanation needed.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it lists folders, specifies returned fields (id, name, parent), and distinguishes from siblings like list_my_assets by targeting folders with a specific use case.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises using this tool to find an existing folder before move_asset or create_folder, providing clear context for when to invoke it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_my_videosList VideosARead-onlyInspect
List your recent Switch videos, newest first — id, status, prompt, model, and a viewable view_url for finished clips. Use this to check whether videos finished and to let the user choose which one they want.
| Name | Required | Description | Default |
|---|---|---|---|
| count | No | How many to return. Default 10. Max 50. | |
| status | No | Optional filter: submitted, processing, succeed, failed, or all. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true. Description adds that list is 'recent' and includes viewable URLs for finished clips. No contradiction; useful behavioral details beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, concise, front-loaded with purpose, no filler. Every sentence serves a purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, returned fields, and use case. No output schema but the described fields suffice. Lacks mention of pagination, but max count 50 reduces need. Adequate for a simple list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Both parameters (count, status) have complete descriptions in the input schema. Description does not add extra meaning beyond 'how many' and 'optional filter'. Schema coverage is 100%, so baseline 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states verb 'list', resource 'your recent Switch videos', sorting 'newest first', and lists returned fields (id, status, etc.). Differentiates from siblings like list_generations by specifying video-specific content.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says to use for checking if videos finished and for user selection. Provides a clear use case. Lacks explicit exclusion of alternatives, but sufficient for a simple list tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_video_modelsList Video ModelsARead-onlyIdempotentInspect
List the video providers, models, and modes available to your Switch account, with each model's required inputs, allowed aspect ratios and durations, and a rough per-second cost. Call this before generate_video so you pick a real model + mode and supply the right inputs.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint as true, making the safe, non-destructive nature clear. The description enriches this by detailing the output content (providers, models, modes, inputs, aspect ratios, durations, cost), adding behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, no redundant words, front-loaded with actionable information. Every sentence serves a clear purpose: stating what the tool does and why it should be used.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has no parameters and a read-only, idempotent nature, the description fully explains what it returns and its role in the workflow. No output schema is needed because the description lists the output categories.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With zero parameters and 100% schema coverage, the description has no need to explain parameters. Baseline score of 4 is appropriate as it adds no parameter detail required.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('List') and identifies the resource ('video providers, models, and modes') with detailed attributes (required inputs, aspect ratios, durations, cost). It clearly distinguishes from sibling tools like explore_models or generate_image by focusing on video-specific metadata for pre-generation selection.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises 'Call this before generate_video so you pick a real model + mode and supply the right inputs,' providing both when-to-use and the tool's role in a workflow. This directly guides an AI agent away from incorrect tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
search_my_librarySearch LibraryARead-onlyInspect
Search your library by prompt substring (metadata only — id, prompt, date). Optional folderId scopes to one folder. Only your own assets are returned. This does NOT display images; to show/display results to the user, pass their ids to show_media.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Default 20. | |
| query | Yes | ||
| folderId | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnly. Description adds context: only own assets are returned, and optional folder scoping. No contradictions, and provides value beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences with no extraneous information. Front-loads the primary action effectively.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simplicity of the tool, the description fully covers purpose, parameter meanings, scope constraints, and ownership. No output schema needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Description explains 'query' as substring search and 'folderId' as scope, compensating for the 33% schema description coverage. 'limit' is only minimally described in schema as 'Default 20.' but lacks usage guidance.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it searches user's library assets by prompt substring, with optional folder scoping, and returns only own assets. Distinguishes from sibling listing tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly mentions when to use (search by prompt and optional folder scope). Provides clear context that only own assets are returned. Lacks explicit exclusion statements but is sufficient for correct selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
show_generationShow GenerationARead-onlyInspect
Get the full detail of one of your generations by task id — prompts, model, ref counts, saved/failed counts, ETA hint, asset ids.
| Name | Required | Description | Default |
|---|---|---|---|
| taskId | Yes | Task id from generate_image or list_generations. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true. Description adds details on what the response contains (prompts, model, ref counts, etc.), providing behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence that is front-loaded with the main purpose and includes specific fields in a list, no redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Low complexity tool with one parameter and no output schema. Description adequately explains the return value fields, though a more structured list could be clearer. Sufficient for an AI agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers taskId with 100% coverage. Description adds valuable context that taskId comes from generate_image or list_generations, which aids correct parameter selection.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'Get' and resource 'full detail of one of your generations' and lists included fields (prompts, model, ref counts, etc.), clearly distinguishing from sibling tools like list_generations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
States usage context: requires a taskId from generate_image or list_generations, but does not explicitly mention when not to use or alternatives. Clear enough for this simple tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
show_mediaShow MediaARead-onlyIdempotentInspect
Display the user's images inline — one or many. Users speak plainly and will NOT know asset ids; never ask for one, resolve it yourself. For "show me" or "show me my last image" call with NO arguments (shows the most recent image). For "show me my last 4 images / my last 10 pictures" pass count=N (returns a clean grid, up to 12). For a specific known image pass assetId. Renders a branded SwitchApp media card with a Download action per result; do not just print URLs. (Videos are not shown here — use list_my_videos and return the newest finished video's view_url, which plays.)
| Name | Required | Description | Default |
|---|---|---|---|
| count | No | Optional. How many of the most recent images to show as a grid (default 1, max 12). Use when the user says "my last N images/pictures". | |
| assetId | No | Optional. A specific image id (from list_my_assets, search_my_library, or show_generation). Omit to show the most recent image(s). |
Output Schema
| Name | Required | Description |
|---|---|---|
| asset | No | |
| images | No | |
| _widget | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds context beyond annotations by explicitly stating 'Always returns a fresh public URL — no expired-signed-URL failures.' Annotations already indicate read-only and idempotent, so this is extra useful detail.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with no wasted words. The most important info (inline display, return data, fresh URL) is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema, but description enumerates returned metadata fields (prompt, model, etc.), making the tool's behavior adequately complete for a display operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline 3. The description adds that assetId comes from specific tools (list_my_assets, search_my_library, show_generation), providing value beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it displays an asset inline and lists the returned data (image, metadata). It distinguishes from siblings like list_my_assets by specifying inline display and fresh URL.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit when-to-use or alternative guidance, but the purpose is clear enough that an agent can infer usage. No exclusions or comparisons to siblings are stated.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
stitch_videosStitch VideosAInspect
Stitch several of your Switch videos together into ONE video, played back-to-back in the order you give. Pass clip_asset_ids: an ORDERED list of your video ids (get them from list_my_videos) — the first id plays first. Optional orientation (landscape|portrait|square), fps, quality. Renders asynchronously via the Switch editor engine and returns a job_id — poll get_video_status or list_my_videos for the finished, downloadable video. Use this whenever the user wants to combine, join, merge, or concatenate multiple clips into one.
| Name | Required | Description | Default |
|---|---|---|---|
| fps | No | Frames per second. Default 30. | |
| quality | No | draft, standard (default), or high. | |
| orientation | No | landscape (1920x1080, default), portrait (1080x1920), or square (1080x1080). | |
| project_name | No | Optional name for the output video. | |
| clip_asset_ids | Yes | Ordered list of your video ids (from list_my_videos). At least 2. Output order = this order. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses key behavioral aspects: asynchronous rendering, return of a job_id, and polling instructions. Since annotations indicate readOnlyHint=false, the mutation is expected. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (4 sentences) and front-loaded with the primary purpose. Every sentence adds necessary information without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description adequately explains the async workflow and how to retrieve results. It could optionally mention limits (e.g., max clips), but is complete enough for use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds value by emphasizing the ordered nature of clip_asset_ids and explaining orientation values with resolutions. It also clarifies the source of video ids.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: stitching several Switch videos into one back-to-back. It uses a specific verb (stitch) and resource (videos), and the scope ('several of your Switch videos') distinguishes it from sibling tools that apply effects or generate new content.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly tells when to use this tool ('whenever the user wants to combine, join, merge, or concatenate multiple clips into one') and how to get prerequisite data (video ids from list_my_videos). It doesn't mention alternatives or when not to use it, but the context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
talking_avatar_videoTalking Avatar VideoAInspect
Turn a face photo into a lip-synced talking-head video that speaks your text (or your audio). Provide image_url (a clear face photo) and either script (text to speak, max 2500 characters) or audio_url. Optional voice_id / language / voice_settings. Renders in ~1-5 minutes (single call, returns the finished branded video) and is saved to your library. Charged per video.
| Name | Required | Description | Default |
|---|---|---|---|
| script | No | Text the avatar speaks. Max 2500 characters. Required unless audio_url is given. | |
| language | No | Optional language code (default en). | |
| voice_id | No | Optional voice id (from clone_voice / your library). | |
| audio_url | No | Pre-recorded audio URL to lip-sync instead of generating speech from script. | |
| image_url | Yes | A clear face photo (Switch/public URL). Required. | |
| voice_settings | No | Optional: { stability, similarityBoost, style, useSpeakerBoost } 0-1. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description goes well beyond the minimal annotation (readOnlyHint=false) by detailing rendering time (~1-5 minutes), synchronous completion ('single call, returns the finished branded video'), persistence ('saved to your library'), and cost ('Charged per video'). This provides rich behavioral context for the agent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (~70 words) and front-loaded with the core action. Each sentence adds distinct information (purpose, required inputs, optional inputs, behavior, cost). Could be slightly more structured (e.g., bullet points), but it remains clear and efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool complexity (6 params, no output schema, nested objects), the description covers purpose, inputs, timing, storage, and cost. It lacks details on output format or error handling, but the mention of 'returns the finished branded video' provides some closure. The lack of differentiation from sibling 'lip_sync_video' is a minor gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema coverage is 100%, so baseline is 3. The description adds value by clarifying that image_url should be a 'clear face photo', emphasizing the required-or-alternative relationship between script and audio_url, and noting the optional voice_settings. It does not repeat all schema details, but the added nuance justifies a 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it creates a talking-head video from a face photo and text/audio. It specifies the key inputs (image_url, script/audio_url) and output (lip-synced video). However, it does not explicitly differentiate from the sibling tool 'lip_sync_video', which appears similar, so purpose clarity is strong but not perfect.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies when to use the tool (to create a talking-head video) but does not provide explicit guidance on when not to use it or alternatives. It mentions the two input modes (script or audio_url) but lacks context on scenarios where one might prefer the sibling 'lip_sync_video' or 'generate_video'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
upload_mediaUpload MediaAInspect
Upload one image into the user's Switch library in a single call. Pass url (any public https) OR base64 + mime. Switch fetches/decodes it server-side, stores it, and returns a clean public URL plus the new asset id. Use this URL directly in generate_image's reference_image_urls — no presigned PUT, no curl, no confirm-upload step needed.
| Name | Required | Description | Default |
|---|---|---|---|
| url | No | Any public https URL — Switch fetches it server-side. | |
| mime | No | MIME type when sending base64. Default image/png. | |
| base64 | No | Base64-encoded image bytes (use this when there is no public URL). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses that Switch fetches/decodes the image server-side, stores it, and returns a clean public URL plus asset ID. This adds value beyond the annotations (readOnlyHint=false, idempotentHint=false) by explaining the side effects and return format. It does not mention rate limits or file size constraints, but the core behavioral traits are well covered.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is 70 words, well-structured with the purpose first, then options and workflow. Every sentence adds value—no fluff. It is concise yet informative, making it easy for an agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (upload image, no output schema), the description covers input methods, processing, output, and how to use the result in generate_image. It does not mention file size limits, supported MIME types beyond image/png, or error handling, but these are not critical for basic usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and the description adds meaning by explaining the relationship between parameters: 'Pass url (any public https) OR base64 + mime'. It also notes the default MIME type as image/png, which is not in the schema. This clarifies usage beyond the parameter descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool uploads one image into the user's Switch library and explains the two input methods (url or base64+mime). It distinguishes itself by mentioning the return of a clean URL for use in generate_image's reference_image_urls, which differentiates it from other tools like list_my_assets or generate_image itself.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear guidance on when to use url vs base64+mime: 'Pass url (any public https) OR base64 + mime'. It also states that no presigned PUT, curl, or confirm-upload step is needed, explaining the workflow concisely. However, it does not explicitly exclude other tools or mention cases where this tool should not be used.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
upload_reference_assetUpload ReferenceAInspect
Upload an image, video, or audio reference into Switch cloud and get a ready-to-use reference URL. Pass kind=image|video|audio. Returns reference_image_urls / reference_video_urls / reference_audio_urls for generate_image and generate_video. Image and video references are also added to your active Studio reference strip (the same one your desktop uses) unless activate=false. PREFERRED for real files: call with presign=true to get an upload_url, PUT the bytes straight to it (no base64 through the model), then call again with confirm_path to verify and add it — works for image, video, and audio. base64/url is only for tiny inline files.
| Name | Required | Description | Default |
|---|---|---|---|
| url | No | Public https URL to fetch server-side. | |
| kind | Yes | Reference type to upload. | |
| mime | No | MIME for base64. Images: jpg/png/webp/gif. Videos: mp4/mov. Audio: mp3/wav/m4a/aac. | |
| base64 | No | Base64 bytes (optionally a data: URL). Best for small files; large video should use presign. | |
| presign | No | Return an upload_url to PUT the file bytes directly to (no base64). Video always; image/audio when enabled. | |
| activate | No | Image/video: add to the active Studio reference strip. Default true. Audio never touches the strip. | |
| filename | No | Optional source filename for extension/display. | |
| frame_type | No | Image strip label: ref (default), face, body, clothes, scenery, product, typography. | |
| confirm_path | No | The storage_path from a presign call, after you PUT the file — verifies the object, records it, and adds it to your strip. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds significant behavioral context beyond annotations: describes the two-step presign workflow, the effect on the active Studio reference strip (and activate=false to skip), and that audio never touches the strip. Annotations indicate readOnlyHint=false (mutation) and idempotentHint=false, which align with the description.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with the main purpose. It's slightly verbose but each sentence adds value (e.g., presign flow, activate behavior). Could be tightened slightly, but overall efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (9 parameters, presign flow, no output schema), the description covers the key aspects: output URLs, strip addition, and the two-step upload process. Could mention error handling or size limits, but it's largely complete for a tool with good annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds meaning by explaining the presign/confirm_path workflow, the frame_type labeling, the activate behavior, and the size recommendation for base64 vs presign. This goes beyond the schema's basic parameter descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: uploading image, video, or audio references into Switch cloud to obtain a ready-to-use URL. It explicitly links to downstream tools (generate_image, generate_video) and distinguishes from sibling tools like upload_media by specifying the reference context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on preferred usage (presign for real files, base64/url for tiny inline) and explains the activate parameter difference between image/video and audio. However, it doesn't explicitly mention when to avoid this tool in favor of alternatives like upload_media.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
voiceManage VoicesAInspect
Manage custom voices for talking_avatar_video. action="clone" registers a voice from audio_sample_url (a 10-30 second clip) under voice_name (charged 2 credits, sample stored durably) and returns a voice_id; action="list" returns your saved voices; action="delete" removes one by voice_id. Use the returned voice_id as talking_avatar_video.voice_id.
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Which operation to run. | |
| voice_id | No | delete: the voice_id to remove. | |
| voice_name | No | clone: a name for the voice (unique per account). | |
| audio_sample_url | No | clone: a 10-30 second voice sample URL (reachable). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses side effects of clone (charges 2 credits, stores sample durably, returns voice_id) and implies mutation (delete, clone). Annotations indicate readOnlyHint=false, which is consistent. The description adds value beyond annotations by detailing costs and durability.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is only two sentences, with the first sentence covering all actions and parameters concisely, and the second providing a critical usage hint. No wasted words; every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers all three actions with parameter requirements, side effects, and usage context. However, it lacks details on the return format for 'list' and 'delete' operations (e.g., array of voice objects). Given no output schema, slightly more detail would enhance completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already defines parameters. The description adds operational context (e.g., audio_sample_url must be reachable, 10-30 seconds; voice_name unique per account) and clarifies the relationship between action and required parameters, enhancing understanding beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies the tool as managing custom voices for talking_avatar_video, listing three distinct actions (clone, list, delete) with specific operations and resources, differentiating it from sibling tools like talking_avatar_video.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit instructions for each action, including constraints (e.g., audio sample length of 10-30 seconds, unique voice_name, cost of 2 credits) and how to use the returned voice_id. While it lacks explicit 'when not to use' guidance, the context is sufficient for selecting the correct action.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!