switch

by ai.switchapp

Server Details

Generate, manage and explore your Switch AI image and video library, scoped to your account.

Status: Healthy
Last Tested: 2026-07-24 19:38
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.9/5.0

Tool DescriptionsA

Average 4.4/5 across 34 of 34 tools scored. Lowest: 3.6/5.

Server CoherenceA

Disambiguation4/5

Most tools have clearly distinct purposes. The apply_* styles are differentiated by style name, though apply_iphone_realism and apply_ugc share some casual aesthetic overlap. Overall, an agent can reliably distinguish tools by their descriptions.

Naming Consistency4/5

The majority of tools follow a consistent verb_noun pattern (e.g., generate_image, list_my_videos). The 'voice' tool deviates as a noun, and some apply_ names are lengthy, but the pattern is recognizable.

Tool Count3/5

34 tools is on the higher end, but the server covers a broad domain (image, video, audio generation, style presets, asset management, voice, etc.). Each tool earns its place, though some could potentially be consolidated.

Completeness3/5

The tool surface is largely complete for media generation and management, but notable gaps exist: there is no tool to delete or move assets, which limits full lifecycle management. This omission may cause agent failures if users expect cleanup operations.

Available Tools

40 tools

analyze_videoAnalyze VideoA

Read-onlyIdempotent

Inspect

Switch Vision — watch and understand a video (or image) like a human and answer a question about it: scenes, subjects, actions, on-screen text, pacing, mood and sentiment. Pass video_url (a public https video URL, including YouTube) OR one of your own Switch videos (a video/asset id from list_my_videos / list_my_assets / upload_media). Add an optional question to focus the analysis (e.g. "what is the tone and energy?", "list the cuts and what each shot shows"). Use this whenever the user gives you a reference video and wants its style, energy, structure or content understood — for example before making a new video that matches it.

ParametersJSON Schema

Name	Required	Description	Default
`question`	No	Optional. What to find out about the video — tone, structure, on-screen text, sentiment, etc.
`video_url`	Yes	A public https video URL (YouTube ok), OR one of your own Switch video/asset ids.

Tool Definition Quality

A4.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds valuable context about what the tool analyzes, but could be clearer on whether images are fully supported (mentions 'image' but schema requires video_url). Nevertheless, no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two well-structured sentences, no redundancy. The first sentence packs the core purpose and capabilities; the second provides usage guidance. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description sufficiently covers inputs, optional features, and usage context. It fits well among sibling tools as the analysis counterpart to generation tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions are clear (100% coverage), but the description elaborates further: explains video_url can be a public URL or Switch ID from specific tools, and gives example questions for the optional parameter. This adds significant meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'watch and understand a video (or image) like a human and answer a question about it', listing specific aspects (scenes, subjects, actions, etc.). It distinguishes itself from sibling tools which are mostly apply-style tools, making its purpose unique.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this whenever the user gives you a reference video and wants its style, energy, structure or content understood', and provides an example context ('before making a new video that matches it'). This guides when to use vs. the apply-style siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

analyze_video_reportFull Video AnalysisInspect

Run the FULL Switch Vision analysis on a video, the same premium report the Video Analysis page produces: it watches AND listens in three forensic passes and returns a structured report with every category: overview (scores and takeaways), a second by second timeline, audio, visual craft, story and retention, speech transcript, ready to run recreation prompts, and metadata. Pass video_url (a public https video URL, YouTube included) OR one of your own Switch video ids. For an external file also pass duration_seconds (YouTube and your own videos are measured automatically) because the analysis is billed per second. Optional question focuses the analysis. Returns a report_id right away; poll get_vision_report until status is succeeded (a few minutes). If it cannot finish, your tokens are returned automatically. For one quick question about a video use analyze_video instead; this tool is the full paid report.

ParametersJSON Schema

Name	Required	Description
`question`	No	Optional. Something to pay special attention to.
`video_url`	Yes	A public https video URL (YouTube ok), OR one of your own Switch video ids.
`duration_seconds`	No	Length in seconds. Required for external files; YouTube and your own Switch videos are measured automatically.

apply_cinematic_anamorphicApply Cinematic AnamorphicA

Read-onlyIdempotent

Inspect

ARRI Alexa anamorphic widescreen film look. Choose grade: warm golden, cool noir, or moody desaturated. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	warm_golden = late-afternoon honey. cool_noir = neon-fill desaturated. moody_desaturated = soft window low-contrast.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and idempotentHint=true, so description's mention of 'returns the styled prompt stack' confirms non-destructive behavior. However, description does not add significant behavioral context beyond annotations, so score is moderate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey purpose, options, and usage. No redundancy or unnecessary information. Well-suited for quick agent comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description explains the return value (styled prompt stack) and how it integrates with generate_image. For a simple 2-param tool, this is sufficient to understand the workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage with clear enum values and subject description. Description's 'Choose grade' and listing of grades adds minimal extra meaning beyond schema. Baseline score due to high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool applies an ARRI Alexa anamorphic widescreen film look with specific grades. It specifies the output (styled prompt stack) and usage context (pair with generate_image). This distinguishes it from sibling 'apply_*' tools like apply_movie_scene or apply_magic_hour_portrait.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description mentions pairing with generate_image, implying typical use case, but lacks explicit guidance on when to choose this tool over similar ones (e.g., when to use anamorphic vs other cinematic looks). No alternative or exclusion criteria provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_graphic_editorial_portraitApply Graphic Editorial PortraitA

Read-onlyIdempotent

Inspect

Sharp graphic editorial portrait — premium fashion-magazine grade, hard graphic composition. Classic studio or golden-hour outdoor. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	classic = Hasselblad H6D studio. golden_hour = Canon R5 outdoor.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already declare readOnlyHint and idempotentHint, so the description's note about returning a prompt stack is consistent but adds no new behavioral detail beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that conveys the tool's purpose, output, and a sibling pairing hint. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 parameters and no output schema, the description adequately explains the tool's role and output. Given the sibling tools, it provides enough context for an agent to select and use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for both style (enum with studio/outdoor camera references) and subject (free text). The description adds minor context about the output but not about parameters. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it applies a 'sharp graphic editorial portrait' and specifies the two style options ('classic studio' or 'golden-hour outdoor'). It also explains the output is a prompt stack for use with generate_image, distinguishing it from sibling tools like apply_high_fashion_editorial or apply_magic_hour_portrait.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to pair it with generate_image, indicating the intended workflow. It implies when to use this tool (for graphic editorial portraits) but does not explicitly say when not to use it or suggest alternatives, though the sibling list provides context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_high_fashion_editorialApply High Fashion EditorialA

Read-onlyIdempotent

Inspect

High-fashion magazine cover/editorial energy. Choose a photographer mood: Mario Testino glossy, Steven Klein dark cinematic, Inez & Vinoodh hard-flash, Annie Leibovitz painterly, Tim Walker dreamlike, Peter Lindbergh black-and-white natural, or Cass Bird off-duty. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	Photographer attribution drives the lighting + camera + grade stack.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint. Description adds that it returns a prompt stack, not an actual image, clarifying its read-only nature. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences, front-loaded with purpose, examples, and usage instruction. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 params and no output schema, description covers purpose, parameters, and integration with generate_image. Could mention optional subject or more on output format, but sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. Tool description adds human-readable examples for style enum and clarifies subject as 'what you want to shoot', enhancing understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it applies high-fashion editorial energy with specific photographer moods. Distinguishes from sibling tools like apply_cinematic_anamorphic or apply_graphic_editorial_portrait by naming its unique stylistic options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'pair it with generate_image', indicating it's a preparatory step. Implicitly tells when to use (for high-fashion) but doesn't explicitly exclude other cases. Naming photographer moods provides context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_iphone_realismApply Iphone RealismA

Read-onlyIdempotent

Inspect

Phone-shot amateur look — looks like a real person snapped it on their phone. Casual, candid, pore-level real, no professional gloss. Three flavors: digital phone, 35mm film point-and-shoot, or off-duty intimate. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	digital_phone = Sony A7IV + 50mm f/1.4 GM phone-style realism. film_pointshoot = Contax T2 35mm Portra 400. off_duty_intimate = Cass Bird natural-window editorial.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint. The description adds that the tool returns a prompt stack and must be paired with generate_image, providing useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences, front-loaded with the core purpose ('Phone-shot amateur look'), and contains no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the tool's output (styled prompt stack), its usage with generate_image, and the three style options. It does not detail the prompt format, but given the simplicity and good schema/annotations, it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema already provides detailed descriptions for both parameters. The tool description adds little new parameter information beyond listing the styles, so it meets the baseline without significant enhancement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool applies a 'phone-shot amateur look' and returns a styled prompt stack for use with generate_image. It distinguishes itself from sibling style tools by specifying 'casual, candid, pore-level real, no professional gloss' and listing three specific flavors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use this tool (when an amateur phone look is desired) and mentions pairing with generate_image. It does not explicitly state when not to use it or name alternatives, but the context is clear given the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_magic_hour_portraitApply Magic Hour PortraitA

Read-onlyIdempotent

Inspect

Golden-hour rim-light editorial portrait. Choose camera: Canon R5 + 85mm f/1.2 or Hasselblad H6D + 80mm. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	canon_85mm = Canon R5 portrait standard. hasselblad_80mm = medium-format luxury.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and idempotent; the description adds that it returns a 'styled prompt stack' for use with another tool, which gives behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the style and purpose, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given only two parameters and no output schema, the description adequately explains the return value and usage context, though it could mention the output format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats camera choices already in the schema's enum descriptions and does not add new meaning to the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Golden-hour rim-light editorial portrait' and specifies camera choices, making the tool's purpose distinct from sibling tools like 'apply_cinematic_anamorphic'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It instructs to pair with 'generate_image' but does not explicitly state when to use this tool vs alternatives; however, the context of 'editorial portrait' provides implicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_movie_sceneApply Movie SceneA

Read-onlyIdempotent

Inspect

Put me in a movie — full cinematic film look matching specific film genres. Choose: neon-noir action thriller, 80s finance excess, comic-book superhero blockbuster, video-game key art, or generic action thriller. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	neon_noir_action = wet streets + neon + anamorphic. glamour_finance_excess = 1980s Wall Street mahogany / gold. superhero_blockbuster = comic-book key art. video_game_character = Unreal-Engine character render. generic_action_thriller = ARRI cinematic.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description's statement about returning a prompt stack is consistent but adds little beyond what annotations provide. No behavioral surprises or additional context like side effects or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the purpose and immediately list the options and output. Every word serves a purpose; no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only 2 simple parameters, no nested objects, and no output schema, the description adequately covers purpose and usage. It could briefly explain what a 'styled prompt stack' looks like, but it's sufficient for an agent to understand the tool's role in a pipeline.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the input schema already provides detailed enum values with descriptions. The tool description repeats the enum list but adds no new semantic meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it applies a cinematic film look to a scene, listing specific genres. It pairs with generate_image. However, it doesn't explicitly differentiate from sibling tools like apply_cinematic_anamorphic or apply_graphic_editorial_portrait, missing a chance to clarify its unique role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by saying 'pair it with generate_image', providing context on when to use it. However, it lacks explicit guidance on when not to use it or how it compares to alternatives among the many sibling apply tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_productApply ProductA

Read-onlyIdempotent

Inspect

Product photography. Choose: clean studio hero shot, real-world lifestyle, extreme macro detail, or top-down flat lay. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	clean_studio = seamless backdrop hero. lifestyle = product in use. macro_detail = extreme close-up texture. flat_lay = top-down catalog.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Informs that tool returns a styled prompt stack and is read-only/idempotent, consistent with annotations. Provides step-by-step context that annotations alone do not capture.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences with no filler. Front-loaded with purpose, then actionable output and next step. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 2 parameters, full schema coverage, and no output schema, the description explains the tool's role, options, and integration with generate_image. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions. Description reiterates the enum values and adds the output context (prompt stack), which enhances understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool applies product photography styles, listing four distinct options. It distinguishes from sibling tools like apply_cinematic_anamorphic by focusing on product shots, not cinematic or portrait styles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to pair output with generate_image, indicating a two-step workflow. Does not mention when not to use or alternatives, but context is clear enough for proper invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_travelApply TravelA

Read-onlyIdempotent

Inspect

Luxury travel + hotel editorial. Real architecture is preserved exactly (no inventing buildings). Choose subject: hotel hero, rural property, scenic view, drone aerial, lifestyle moment, or interior. If you attach a reference image of a real property, the architecture lock kicks in automatically. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	hotel_hero = property is the star. rural_property = country estate. scenic_view = pure landscape. drone_aerial = top-down or 45° from above. lifestyle = model + destination. interior = inside the property.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) already indicate safe, deterministic behavior. The description adds valuable behavioral context: 'Real architecture is preserved exactly' and automatic architecture lock with reference images. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—three sentences pack purpose, styles, behavioral rules, and output. Front-loaded with the core function. No superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 params, no output schema), the description covers the return value ('styled prompt stack'), behavioral edge cases (architecture lock), and example subjects. Minor gap: no explicit mention of required vs optional parameters, but schema covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed enum descriptions. The description adds minor clarity (e.g., 'hotel_hero = property is the star') but does not significantly expand beyond schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool applies a travel/hotel editorial style with specific subject options. It distinguishes from siblings by listing six concrete styles (hotel_hero, rural_property, etc.) and the architecture preservation rule.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies when to use the tool ('luxury travel + hotel editorial') and provides subject examples. It also notes a reference image triggers architecture lock. It lacks explicit exclusion of other styles, but the sibling context makes alternatives obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_ugcApply UgcA

Read-onlyIdempotent

Inspect

User-generated content — looks like a real person captured it casually. Choose: phone shot, film point-and-shoot, mirror selfie, or car selfie. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	phone_shot = iPhone-style snap. film_pointshoot = Contax T2 grain. mirror_selfie = bathroom/bedroom mirror. car_selfie = inside-the-car phone.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the tool returns a styled prompt stack, which is behavioral information beyond the readOnlyHint and idempotentHint annotations. It does not contradict annotations. The return format is not specified, but the description gives a good idea of the output.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the purpose and outcome. Every word is informative, with no redundancy or wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (two parameters, one required) and the presence of annotations, the description adequately covers what the tool does and how to use it. It could mention the subject parameter's role explicitly, but the schema already does, and the description implies it by referring to 'your shot.'

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description reiterates the style choices but does not add new meaning beyond the schema's detailed enum descriptions. No additional parameter semantics are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it applies a user-generated content style and returns a styled prompt stack. It specifies the four sub-styles (phone shot, film point-and-shoot, mirror selfie, car selfie), effectively distinguishing it from other apply_* sibling tools by focusing on casual, amateur aesthetics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for casual, user-generated content looks. It also advises pairing with generate_image. However, it does not explicitly state when not to use it or compare to other style tools, but the specificity of the styles makes the use case clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

apply_wellnessApply WellnessA

Read-onlyIdempotent

Inspect

Wellness / yoga / fitness / lifestyle campaign — warm amber tropical, tropical paradise cinematic, or high-key cyan beach. Returns the styled prompt stack for your shot — pair it with generate_image.

ParametersJSON Schema

Name	Required	Description	Default
`style`	Yes	warm_amber_tropical = warm honey grade with golden haze. hanalei_cinematic = soft golden mist + infinity pool reflection. high_key_cyan_beach = bright daylit cyan ocean.
`subject`	No	What you want to shoot. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony".

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and idempotentHint=true, so the tool is safe and deterministic. The description adds that it returns a 'styled prompt stack', which is the key behavioral outcome. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with dash, front-loads purpose. Efficient but could benefit from clearer structure (e.g., listing the styles).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 2 parameters, no output schema, and annotations present, the description covers all necessary aspects: what the tool does, what it returns, and how to use it. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed enum descriptions. The description repeats the style names but doesn't add new parameter semantics. Baseline 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it applies a wellness/yoga/fitness/lifestyle campaign style and returns a styled prompt stack. It names specific styles (warm amber tropical, tropical paradise cinematic, high-key cyan beach). This distinguishes it from sibling style tools like 'apply_cinematic_anamorphic' or 'apply_travel'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises pairing with 'generate_image', providing clear usage context. While it doesn't explicitly exclude alternative tools, the 'wellness' theme and specific style names guide appropriate selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cancel_my_taskCancel TaskA

Idempotent

Inspect

Stop one of your generation tasks by task id — works on queued AND running tasks. Already-saved images stay in your library; nothing is deleted or refunded. Returns how many images were saved out of how many you requested.

ParametersJSON Schema

Name	Required	Description	Default
`taskId`	Yes	Task id from generate_image or list_my_tasks.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotentHint=true and readOnlyHint=false. The description adds key behavioral details: nothing is deleted or refunded, and it returns a count of saved/requested images. This goes beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, zero fluff. The main purpose is front-loaded, and each sentence adds new information (scope, side effects, return value).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no output schema and good annotations, the description fully covers purpose, scope, side effects, and return value. Nothing essential is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds little beyond the schema for 'taskId', but does provide context that it works on tasks from specific generators.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Stop') and resource ('generation tasks'), clarifies scope ('queued AND running tasks'), and explicitly states what it does not affect ('Already-saved images stay'). This clearly distinguishes it from sibling tools like 'generate_image' or 'list_generations'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use (to cancel queued or running tasks). It does not explicitly mention when not to use or name alternatives, but the context is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_balanceCheck BalanceA

Read-onlyIdempotent

Inspect

Check your daily Switch spending — what you have spent today, your daily limit, and what is remaining. Optionally pass an estimatedCost (USD) to also get whether you can afford it.

ParametersJSON Schema

Name	Required	Description	Default
`estimatedCost`	No	Optional dollar amount to test against your daily limit.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations confirm read-only and idempotent behavior. The description adds value by detailing what information is returned (spent, limit, remaining) and the optional cost check.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences cover the tool's core function and optional capability without waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one optional parameter and no output schema, the description fully covers what the agent needs to know.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers the single parameter fully. The description adds meaning by explaining the parameter's purpose (test against daily limit).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks daily Switch spending, limit, and remaining, with an optional affordability test. It is distinct from sibling creative tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the tool's function and the optional parameter, though it does not explicitly contrast with siblings. The context makes the usage clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_job_statusCheck Job StatusA

Read-onlyIdempotent

Inspect

Polling-friendly status check for one of your tasks. Returns a slim shape with status, progressPct, and eta so you can poll without refetching the full payload.

ParametersJSON Schema

Name	Required	Description	Default
`taskId`	Yes	Task id to check.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only and idempotent. Description adds valuable context: 'polling-friendly' and 'slim shape', enhancing transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no fluff. Efficiently conveys purpose, return shape, and polling suitability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with annotations and full schema coverage, the description is complete. It explains the return value and use case without needing more detail.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter description. Description adds no additional meaning beyond 'Task id to check.' Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it is a polling-friendly status check for a task, returning a slim shape with status, progressPct, and eta. Differentiates from siblings like get_video_status by emphasizing lightweight polling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use for polling status but does not explicitly state when not to use or mention alternatives among the many sibling tools. No exclusion criteria given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_depth_mapCreate Depth MapInspect

Turn a video into a DEPTH MAP: a grayscale video where brightness encodes distance, used as a motion reference so a new generated subject moves exactly like your source clip. Pass video_url (a public https video URL) OR one of your own Switch video ids (from list_my_videos or list_my_assets). For an external URL also pass duration_seconds (the clip length; your own Switch videos carry it automatically) because the render is billed per second of video. Returns a task_id right away; poll get_depth_map_status until the download URL is ready (usually a few minutes). If the render fails, your tokens are returned automatically.

ParametersJSON Schema

Name	Required	Description	Default
`video_url`	Yes	A public https video URL, OR one of your own Switch video ids.
`duration_seconds`	No	Clip length in seconds. Required for external URLs; your own Switch videos are measured automatically.

explore_modelsExplore ModelsA

Read-onlyIdempotent

Inspect

Browse the image-generation models available to your Switch account. Returns model id, display name, brand, and credits-per-image so you can pick one before calling generate_image.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the safety profile is clear. Description adds account-specific scope and return fields but minimal extra behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action verb, followed by key details. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter listing tool, the description fully covers purpose, output contents, and usage context. No additional details are needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, schema coverage is 100%. The description doesn't need to add param info, and baseline for 0 params is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it browses image-generation models for the account, lists returned fields (id, name, brand, credits-per-image), and specifies its role before calling generate_image. Distinguishes from siblings like generate_image and list_video_models.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using this before generate_image to pick a model. Although it lacks explicit when-not-to-use guidance, the context of a simple listing tool makes this adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_audioGenerate AudioAInspect

Generate spoken audio from text: narration, a voiceover, a read-aloud script, or a multi-voice dialogue. Pass text (up to 2048 chars) — the words to be spoken. To speak in one of YOUR saved voices, pass voice with the voice NAME (or id): users speak plain language and never know ids, so resolve the name yourself (the voice tool, action "list", shows every saved voice) and never ask the user for an id. Reference voices, trained clones and preset voices are all routed correctly by kind. To match a voice instantly from a clip instead, pass reference_audio_url (a short clip) or up to 3 reference_audio_urls and address them as @Audio1, @Audio2, @Audio3 in the text for dialogue. Alternatively pass image_url to voice a scene from a picture (cannot combine with reference audio). Optional speech_rate (-50..100), pitch (-12..12), loudness (-50..100). Returns a playable audio_url, duration_seconds, and generation_id (also saved to your library).

ParametersJSON Schema

Name	Required	Description
`text`	Yes	The words to speak / narrate / perform. Max 2048 chars. For dialogue, address voices as @Audio1, @Audio2, @Audio3.
`pitch`	No	Optional. Pitch, -12 to 12. 0 is normal.
`voice`	No	Optional. A saved voice — pass its NAME (or id); it is resolved and routed by kind automatically. Omit for a natural default voice.
`format`	No	Optional output format. Default mp3.
`loudness`	No	Optional. Loudness, -50 (quieter) to 100 (louder). 0 is normal.
`image_url`	No	Optional. Voice a scene from a picture. Cannot be combined with reference audio.
`speech_rate`	No	Optional. Speaking speed, -50 (slower) to 100 (faster). 0 is normal.
`reference_audio_url`	No	Optional. A short clip URL to instantly match that voice.
`reference_audio_urls`	No	Optional. Up to 3 reference clip URLs for multi-voice dialogue.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses constraints (2048 char limit), voice routing, combination rules (no reference audio with image), optional ranges, and side effects (audio saved to library). Annotations are minimal, so description carries full burden.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five sentences, front-loaded with purpose, then logically covers options and constraints. No redundant phrases; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete coverage for a complex tool with 9 parameters: explains all options, constraints, return values (audio_url, duration_seconds, generation_id), and side effect of saving to library.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds significant meaning beyond schema: explains voice resolution process, dialogue addressing syntax, and constraints like reference audio vs image_url. Schema coverage is 100%, but description enriches usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it generates spoken audio from text, listing specific use cases like narration, voiceover, dialogue. It differentiates from sibling tools (e.g., voice, generate_image) by mentioning the voice tool for listing saved voices.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on when to use voice parameter vs reference audio vs image_url, and mentions the voice tool for resolving names. Lacks explicit 'when not to use' but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_imageGenerate ImageAInspect

Generate one or more Switch images. Auto-routes to the right model based on subject (Nano Banana 2 default, GPT Image 2 for swimwear/beach, Switch Model/Ultra/Pro for sexier content, Nano Banana Pro for typography-heavy). Counts <= 8 render inline in chat; counts > 8 queue to your Switch Studio with progress polling. All images persist to your Studio library and folder. Pass an optional style (e.g. "wellness/warm_amber_tropical", "high_fashion_editorial/testino_glossy", "movie_scene/neon_noir_action") to apply a curated photographic stack from the apply_* skill tools.

ParametersJSON Schema

Name	Required	Description
`count`	No	How many images to generate. Default 4. <= 8 returns inline, > 8 queues to Studio. Beta limit: max 50 per request — larger asks are capped at 50 and the response says so.
`model`	No	Optional explicit model. If omitted, auto-routed based on subject content (see tool description).
`style`	No	Optional curated style stack from the apply_* skill tools. Format "<skill>/<style_key>", e.g. "wellness/warm_amber_tropical" or "high_fashion_editorial/leibovitz_painterly".
`subject`	Yes	Plain-English description of what to generate. E.g. "a woman walking through a hotel lobby" or "morning coffee on the balcony, model wearing a robe".
`folder_name`	No	Optional Switch Studio folder name. Auto-created if missing. Defaults to the chat-derived title.
`aspect_ratio`	No	Image aspect ratio. Default 9:16 (vertical, social-friendly).
`real_photo_look`	No	Optional. Adds the casual real-photo texture (film grain, amateur iPhone feel). OFF by default — only set true when the user asks for the realistic, unpolished look.
`face_reference_ids`	No	Face reference asset ids from upload_reference_asset (frame_type "face"). The ONLY way to use a face/likeness reference. Each id is verified server-side (your own untouched original + identity verification) before anything generates or is charged; a URL or generic upload here is rejected.
`reference_image_urls`	No	Optional public image URLs used as GENERIC references (products, scenery, outfits, style). These are never treated as face references — for a person's face/likeness use face_reference_ids.

Output Schema

ParametersJSON Schema

Name	Required	Description
`asset`	No
`images`	No
`_widget`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key behaviors: auto-routing, inline vs queue, persistence to Studio, and face reference verification. Annotations only have readOnlyHint=false, so description adds substantial value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with purpose, well-structured. Slightly verbose but each sentence serves a purpose. Could be tightened slightly but remains clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensively covers behavior, routing, persistence, style, face references, and limits. Given complexity and output schema existence (not shown), description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description adds context beyond parameter descriptions, such as auto-routing logic, queue threshold, style format linking to apply_* tools, and examples. Enriches understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Generate one or more Switch images' and details auto-routing to specific models based on subject, inline vs queue behavior, and persistence. It distinguishes itself from sibling audio/video tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use by explaining auto-routing logic and inline vs queue thresholds. Lacks explicit 'when not to use' scenarios but given it's the primary image generation tool, the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_videoGenerate VideoAInspect

Generate Switch video across the real provider lineup (Kling, Seedance, Switch Video/WAN 2.7, Switch Video Edit, Topaz upscale) and modes (text-to-video, image-to-video, frame-to-frame, motion, omni, reference-to-video, video-edit, upscale). ALWAYS call list_video_models first to pick the right model + mode and see its required inputs. Pass one shot, or shots:[...] for a storyboard (max 4 by default, hard max 10) where EACH shot is DIFFERENT — never repeat one prompt to get copies. Renders async (~30-90s); a background job delivers each clip to your library. Returns a task_id per shot — poll get_video_status or list_my_videos.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Video mode. Must be supported by the chosen model (see list_video_models).
`audio`	No	Omni / Seedance refs: generate audio. Omni is ON by default; set false for a silent clip. Other models ignore this. See list_video_models for which models generate audio and the max seconds with vs without audio.
`model`	No	Model id from list_video_models (e.g. kling-v3, seedance-2.0-t2v, wan-2.7-t2v, topaz). Or prefer option_id from list_video_models.
`shots`	No	A storyboard of 1-10 DISTINCT shots. Each item takes the same fields as a single shot (subject, model, mode, image_url, etc.).
`subject`	No	The shot: subject + motion + scene (video needs motion language, e.g. "slow push-in").
`duration`	No	Clip length in seconds. Default 5. Seedance does 4-15s; Switch Video (WAN) does 5/10/15; Kling/Switch Video Edit cap at 10 — see each model's durations in list_video_models.
`image_url`	No	Required for image-to-video / frame-to-frame / motion. Accepts EITHER a Switch asset id (from show_media / list_my_assets / upload_media) OR a public https url. An asset id is resolved server-side, so just pass the id you have — no need to fetch a url first.
`option_id`	No	Optional catalog id from list_video_models (e.g. "kling-image"); use instead of model+mode.
`video_url`	No	Required for video-edit and upscale (the source clip). Must be a publicly downloadable https URL.
`resolution`	No	Output resolution. Defaults to 1080p where the model supports it. 720p is cheaper and faster. 480p is the cheapest, only on Seedance 2.0 Mini (budget tier). 4K is only on Kling v3 text/image and Kling Omni; Seedance text-to-video is 720p only. Each model lists its available resolutions in list_video_models.
`aspect_ratio`	No	e.g. 9:16, 16:9, 1:1. Must be allowed for the model (see list_video_models).
`end_image_url`	No	End frame for frame-to-frame mode.
`face_reference_ids`	No	Face reference asset ids from upload_reference_asset (frame_type "face") — the ONLY way to use a face/likeness reference in video. Each id is verified server-side (your own untouched original + identity verification) before the shot fires or is charged; URLs and generic uploads here are rejected.
`reference_audio_urls`	No	Seedance reference/omni only: up to 3 reference audio files to drive synthesized audio. Requires at least one reference image or video.
`reference_image_urls`	No	GENERIC reference images (products, scenery, outfits, style). Each entry accepts EITHER a Switch asset id (from show_media / list_my_assets / upload_media / get_my_active_references) OR a public https url — asset ids are resolved server-side. Seedance reference/omni accepts up to 9; Kling Omni up to 7. For Seedance, at least one image or video reference is required. For a person's face/likeness use face_reference_ids instead.
`reference_video_urls`	No	Seedance reference/omni only: up to 3 reference video clips for motion/style guidance. A Seedance video ref can satisfy the required visual anchor. NOTE: the AUDIO track of these clips is IGNORED — never extracted or preserved.
`character_orientation`	No	Motion mode only: follow the character image (default) or the reference video.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only state readOnlyHint=false (not destructive), so description carries burden. Describes async rendering (30-90s), background delivery, max shots, uniqueness requirement, and server-side verification of face_reference_ids. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose and key instruction. It covers many details in a structured manner, though a bit lengthy. Every sentence adds value; no redundancy with schema. Minor room for tighter phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given tool complexity (17 params, no output schema), description fully covers pre-conditions (call list_video_models), async nature, storyboard rules, and references sibling tools for status. No gaps identified; return values not needed without output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description adds meaningful context beyond schema, e.g., 'each shot is DIFFERENT', 'max 4 by default, hard max 10', and clarifies that image_url accepts asset IDs or URLs. These details help the agent avoid mistakes.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates videos across multiple providers and modes, with specific verb 'generate' and resource 'video'. It distinguishes from siblings by explicitly referencing list_video_models as a prerequisite and listing various modes and providers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to 'ALWAYS call list_video_models first' and provides when-to-use guidance for single shots vs. storyboards. Also directs agents to poll get_video_status or list_my_videos for async results, setting clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_depth_map_statusGet Depth Map Status

Read-onlyIdempotent

Inspect

Check one of your depth map renders started with create_depth_map. Pass the task_id it returned. While rendering it reports processing; when finished it returns depth_video_url, a download link for the grayscale motion reference video. If the render failed, it says so and confirms your tokens were returned.

ParametersJSON Schema

Name	Required	Description	Default
`task_id`	Yes	The task_id returned by create_depth_map.

get_my_active_referencesGet Active ReferencesA

Read-onlyIdempotent

Inspect

Read the user's staged references in Switch Studio. Returns TWO groups: (1) the image-generation reference strip (typed face/body/outfit/scenery/product slots) under refs, and (2) the VIDEO-tab references the user staged in the Omni/Image video tabs (the @Image1/@Image2 strip) under videoReferences, with usable signed URLs. Call this before generate_image or generate_video whenever the user says "use my refs" or refers to images they staged in Studio (including "the images in my video tab"). To make a video from the video-tab refs, pass videoReferences.imageUrls into generate_video reference_image_urls (and videoUrls into reference_video_urls) in reference-to-video / omni mode. Refs marked alive:false are dead (stored file gone) and are already excluded from the usable url lists. NOTE: a photo the user just attached in THIS chat is in neither group — for that, call upload_media and use its returned url/asset id directly.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description doesn't need to reiterate safety. It adds value by explaining that refs marked alive:false are excluded and that URLs are signed and usable. This provides useful behavioral context beyond what annotations offer.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is longer than necessary but packs essential information without fluff. It front-loads the main purpose and then details groups and usage. Every sentence adds value, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description fully describes the return structure (refs, videoReferences, alive:false, signed URLs) and explains how to use the output for generate_video. It covers purpose, usage, edge cases, and integration with other tools, making it complete for a zero-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are 0 parameters (schema coverage 100%), so the description has no parameter details to add. It compensates by explaining the return structure and how to use the output, which is valuable for a parameterless tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads the user's staged references in Switch Studio and returns two distinct groups: 'refs' (image-generation reference strip) and 'videoReferences' (VIDEO-tab references). This is specific and distinguishes it from sibling tools like upload_media or list_my_assets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use the tool: 'Call this before generate_image or generate_video whenever the user says "use my refs" or refers to images they staged in Studio.' It also notes what is not included (a photo attached in chat) and directs to upload_media for that, and gives instructions for using videoReferences with generate_video.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_video_statusGet Video StatusA

Read-onlyIdempotent

Inspect

Check the status of one of your video jobs by task_id (from generate_video) or job_id. Returns status, a viewable view_url when finished, or the error if it failed. Poll this every ~20s — do not loop rapidly.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	Alternatively, the job_id.
`task_id`	No	Task id returned by generate_video.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and idempotentHint=true, so no contradiction. The description adds value by explaining return values (status, view_url, error) and polling behavior, which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with front-loaded purpose. Every sentence adds value: purpose and parameters, returns, and polling guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explains the return values (status, view_url, error) fully. It covers necessary context: identifiers, polling interval, and relationship to generate_video. Complete for a simple status-check tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters. The description adds minimal extra context (e.g., that they are alternatives), but does not significantly augment what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks the status of video jobs using task_id or job_id. It specifies the resource (video jobs) and the action (check status), and the mention of 'from generate_video' distinguishes it from general job status tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit polling guidance ('Poll this every ~20s — do not loop rapidly') and context for when to use it (after generate_video). It does not explicitly state when not to use it or list alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_vision_reportGet Analysis Report

Read-onlyIdempotent

Inspect

Fetch one of your finished Video Analysis reports by report_id (from analyze_video_report or list_vision_reports). Returns the complete structured report: overview scores and takeaways, the timeline of scenes, audio, visual, story, speech, the recreation section with every master prompt, and metadata, plus recreation_prompt (the ready to run prompt) at the top level. While an analysis is still running this reports processing; poll it every 20 to 30 seconds.

ParametersJSON Schema

Name	Required	Description	Default
`report_id`	Yes	The report id to fetch.

lip_sync_videoLip Sync VideoAInspect

Lip-sync audio onto one of your videos. RECOMMENDED: action="create" with engine="best" + video_url + sound_file (base64 data URI) — syncs the whole clip on the highest-quality engine, no face step needed. Kling flow (manual timing control): (1) action="identify-face" with video_url (MP4/MOV, 2-60s, <=100MB, 720p/1080p); (2) action="create" with session_id + face_id + audio + timing IN MILLISECONDS (sound_start_time, sound_end_time, sound_insert_time) + optional speech_volume/original_audio_volume (0-100); (3) action="status" with the task_id to poll — returns a branded SwitchApp view_url when done. Charges credits on create; failed jobs are refunded.

ParametersJSON Schema

Name	Required	Description
`action`	Yes	Which step to run.
`engine`	No	create: "best" = highest-quality whole-clip sync (needs only video_url + sound_file). Default "kling" (timeline flow).
`face_id`	No	create: a face_id from identify-face (one face supported).
`task_id`	No	status: the task_id from create.
`audio_id`	No	create: alternative to sound_file — an existing audio id.
`video_url`	No	identify-face: the source video (MP4/MOV, 2-60s, <=100MB, 720p/1080p). Use a SwitchApp/public URL.
`session_id`	No	create: from identify-face.
`sound_file`	No	create: base64 data URI of the audio (e.g. data:audio/mpeg;base64,...).
`speech_volume`	No	create: how loud the new speech is, as a percent 0-100 (default 100).
`sound_end_time`	No	create: audio end, in MILLISECONDS.
`sound_start_time`	No	create: audio start, in MILLISECONDS.
`sound_insert_time`	No	create: where in the video to place the audio, in MILLISECONDS.
`original_audio_volume`	No	create: how loud the clip's own sound stays, as a percent 0-100 (default 0).

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false, implying mutation. The description confirms this by stating it charges credits on create and refunds on failure, and details the multi-step process. It adds behavioral context beyond annotations, such as timing in milliseconds and volume ranges.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but well-structured with a recommended flow and numbered steps. It is front-loaded with the easiest path. Every sentence adds value, though some redundancy exists between the description and parameter descriptions in schema.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 13 parameters, 3 actions, no output schema, and potential complexity, the description is complete. It covers all steps, parameter usage, timing, volume, and error handling (refunds). It adequately prepares an AI to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds meaning beyond the schema by explaining the recommended flow, timing units (milliseconds), and audio format (base64 data URI). This compensates for the high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lip-syncs audio onto a video, with specific verb 'lip-sync' and resource 'video'. It distinguishes itself from sibling tools like 'talking_avatar_video' or 'generate_video' by focusing on lip-syncing and providing detailed workflows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit recommendations for the best flow (action='create' with engine='best') and a step-by-step guide for the Kling flow. It explains when to use each action. However, it does not explicitly state when NOT to use this tool (e.g., for simple audio dubbing without lip sync).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_generationsList GenerationsA

Read-only

Inspect

List your recent and active generation tasks. Returns counts per status (pending / running / completed / failed) plus an array of your tasks with id, status, prompts, model, ref counts, scheduledAt, finishedAt.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Default 10. Max 50.
`status`	No	"all" for everything, or array like ["pending","running"]. Default: active + recent.

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. The description adds context about the return structure (counts per status, array of tasks with specific fields) and implies read-only behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences) and front-loads the core purpose. Every sentence adds value, though the parameter information is redundant with the schema.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (2 params, no output schema), the description adequately covers the tool's behavior and return structure, including status counts and task fields. Minor gaps like pagination details are acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for limit and status. The description merely repeats the schema's defaults and allowed values without adding new semantic meaning or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists your recent and active generation tasks, specifying the returned data including counts per status and an array with key fields. This distinguishes it from sibling tools like list_my_assets or show_generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides default limits and status behavior but does not explicitly guide when to use this tool versus alternatives like show_generation or list_my_assets. No when-not-to-use guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_assetsList AssetsA

Read-only

Inspect

Return asset METADATA only (id, truncated prompt, model, created date), newest first. This does NOT display images and must NOT be used to show pictures — if the user says "show me / display my last image(s)", call show_media instead (it renders them; pass count=N for several). Use list_my_assets only when you need ids/metadata for another tool (e.g. move_asset) or a plain text list.

ParametersJSON Schema

Name	Required	Description	Default
`count`	No	Default 20. Max 50.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds that it returns only metadata, not images, and orders by newest first. No contradictions, but could include more detail on pagination or error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. Front-loaded with key information, efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Even without output schema, description explains the return structure (id, truncated prompt, model, created date) and ordering. For a single-parameter read-only tool, this is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear description for count (Default 20. Max 50.). Description does not add extra meaning beyond schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool returns asset metadata (id, truncated prompt, model, created date) newest first, and clearly distinguishes itself from show_media by stating it does not display images.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use and when-not-to-use guidance: use for ids/metadata for other tools or plain text list; do not use for showing images, instead call show_media with count=N.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_foldersList FoldersA

Read-onlyIdempotent

Inspect

List the folders in your Switch library (id, name, parent). Use this to find an existing folder before move_asset or create_folder.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint. The description adds that it returns id, name, parent, but does not discuss pagination, ordering, or error behavior. Adequate but not extra.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with action and result, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with no output schema, the description covers purpose, fields returned, and usage context. Could mention if folders are top-level only but generally complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema coverage is 100%. Baseline is 4; no parameter explanation needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists folders, specifies returned fields (id, name, parent), and distinguishes from siblings like list_my_assets by targeting folders with a specific use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using this tool to find an existing folder before move_asset or create_folder, providing clear context for when to invoke it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_videosList VideosA

Read-only

Inspect

List your recent Switch videos, newest first — id, status, prompt, model, and a viewable view_url for finished clips. Use this to check whether videos finished and to let the user choose which one they want.

ParametersJSON Schema

Name	Required	Description	Default
`count`	No	How many to return. Default 10. Max 50.
`status`	No	Optional filter: submitted, processing, succeed, failed, or all.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true. Description adds that list is 'recent' and includes viewable URLs for finished clips. No contradiction; useful behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, concise, front-loaded with purpose, no filler. Every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, returned fields, and use case. No output schema but the described fields suffice. Lacks mention of pagination, but max count 50 reduces need. Adequate for a simple list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters (count, status) have complete descriptions in the input schema. Description does not add extra meaning beyond 'how many' and 'optional filter'. Schema coverage is 100%, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'list', resource 'your recent Switch videos', sorting 'newest first', and lists returned fields (id, status, etc.). Differentiates from siblings like list_generations by specifying video-specific content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use for checking if videos finished and for user selection. Provides a clear use case. Lacks explicit exclusion of alternatives, but sufficient for a simple list tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_video_modelsList Video ModelsA

Read-onlyIdempotent

Inspect

List the video providers, models, and modes available to your Switch account, with each model's required inputs, allowed aspect ratios and durations, and a rough per-second cost. Call this before generate_video so you pick a real model + mode and supply the right inputs.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint as true, making the safe, non-destructive nature clear. The description enriches this by detailing the output content (providers, models, modes, inputs, aspect ratios, durations, cost), adding behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundant words, front-loaded with actionable information. Every sentence serves a clear purpose: stating what the tool does and why it should be used.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters and a read-only, idempotent nature, the description fully explains what it returns and its role in the workflow. No output schema is needed because the description lists the output categories.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With zero parameters and 100% schema coverage, the description has no need to explain parameters. Baseline score of 4 is appropriate as it adds no parameter detail required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('List') and identifies the resource ('video providers, models, and modes') with detailed attributes (required inputs, aspect ratios, durations, cost). It clearly distinguishes from sibling tools like explore_models or generate_image by focusing on video-specific metadata for pre-generation selection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises 'Call this before generate_video so you pick a real model + mode and supply the right inputs,' providing both when-to-use and the tool's role in a workflow. This directly guides an AI agent away from incorrect tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_vision_reportsList Analysis Reports

Read-onlyIdempotent

Inspect

List your Video Analysis history, newest first: report_id, date, status, source kind, duration, engine, tokens charged, and each report's headline. Use it to find a past analysis, then pass its report_id to get_vision_report (full report) or video_to_prompt (just the recreation prompt).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

search_my_librarySearch LibraryA

Read-only

Inspect

Search your library by prompt substring (metadata only — id, prompt, date). Optional folderId scopes to one folder. Only your own assets are returned. This does NOT display images; to show/display results to the user, pass their ids to show_media.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Default 20.
`query`	Yes
`folderId`	No

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the read-only behavior is covered. The description adds key behavioral details: only returns user's own assets, searches only metadata (not content), and does not display images. This adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two well-structured sentences. No superfluous information; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with no output schema, the description is fairly complete. It explains what is searched, what is returned, and what is not. Could mention pagination or response format, but not necessary for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (33%), but the description compensates by explaining that 'query' is a prompt substring, 'folderId' scopes to one folder, and 'limit' defaults to 20. This adds meaning beyond the schema's limited descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches the library by prompt substring (metadata only), which is a specific verb and resource. It distinguishes from siblings like show_media and list_my_assets by noting it does not display images and returns only metadata, not the assets themselves.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use (searching metadata), when not to use (not for displaying images), and directs to the sibling tool show_media for displaying. It also mentions optional scoping by folderId, providing clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

show_generationShow GenerationA

Read-only

Inspect

Get the full detail of one of your generations by task id — prompts, model, ref counts, saved/failed counts, ETA hint, asset ids.

ParametersJSON Schema

Name	Required	Description	Default
`taskId`	Yes	Task id from generate_image or list_generations.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds details on what the response contains (prompts, model, ref counts, etc.), providing behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded with the main purpose and includes specific fields in a list, no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Low complexity tool with one parameter and no output schema. Description adequately explains the return value fields, though a more structured list could be clearer. Sufficient for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers taskId with 100% coverage. Description adds valuable context that taskId comes from generate_image or list_generations, which aids correct parameter selection.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'Get' and resource 'full detail of one of your generations' and lists included fields (prompts, model, ref counts, etc.), clearly distinguishing from sibling tools like list_generations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States usage context: requires a taskId from generate_image or list_generations, but does not explicitly mention when not to use or alternatives. Clear enough for this simple tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

show_mediaShow MediaA

Read-onlyIdempotent

Inspect

Display the user's images inline — one or many. Users speak plainly and will NOT know asset ids; never ask for one, resolve it yourself. For "show me" or "show me my last image" call with NO arguments (shows the most recent image). For "show me my last 4 images / my last 10 pictures" pass count=N (returns a clean grid, up to 12). For a specific known image pass assetId. Renders a branded SwitchApp media card with a Download action per result; do not just print URLs. (Videos are not shown here — use list_my_videos and return the newest finished video's view_url, which plays.)

ParametersJSON Schema

Name	Required	Description	Default
`count`	No	Optional. How many of the most recent images to show as a grid (default 1, max 12). Use when the user says "my last N images/pictures".
`assetId`	No	Optional. A specific image id (from list_my_assets, search_my_library, or show_generation). Omit to show the most recent image(s).

Output Schema

ParametersJSON Schema

Name	Required	Description
`asset`	No
`images`	No
`_widget`	No

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark readOnlyHint and idempotentHint as true. Description adds crucial behavioral context: it renders a branded SwitchApp media card with a Download action, not just URLs. Also states users will not know asset IDs, so the agent must resolve them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, using five sentences to cover purpose, use cases, constraints, and sibling differentiation. No fluff, every sentence carries weight.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has two optional parameters, full schema coverage, output schema, and clear annotations, the description completes the picture by adding usage patterns, behavioral traits, and exclusions. Nothing missing for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The tool description enriches understanding: count default is 1, max 12; assetId is for specific known images; omitting shows most recent. This adds practical meaning beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool displays user images inline. Distinguishes from list_my_videos by noting that videos are not shown here. Provides specific verbs like 'Display', 'show', and differentiates between showing most recent, a grid, or a specific asset.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly describes when to call with no arguments (most recent), count=N (grid up to 12), or assetId (specific image). Instructs not to ask users for asset IDs, resolving them automatically. Also tells agent to use list_my_videos for videos instead of this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stitch_videosStitch VideosAInspect

Stitch several of your Switch videos together into ONE video, played back-to-back in the order you give. Pass clip_asset_ids: an ORDERED list of your video ids (get them from list_my_videos) — the first id plays first. Optional orientation (landscape|portrait|square), fps, quality. Renders the combined video with ffmpeg and returns the finished, downloadable video url right away (also saved to list_my_videos). Use this whenever the user wants to combine, join, merge, or concatenate multiple clips into one.

ParametersJSON Schema

Name	Required	Description
`fps`	No	Frames per second. Default 30.
`quality`	No	draft, standard (default), or high.
`orientation`	No	landscape (1920x1080, default), portrait (1080x1920), or square (1080x1080).
`project_name`	No	Optional name for the output video.
`clip_asset_ids`	Yes	Ordered list of your video ids (from list_my_videos). At least 2. Output order = this order.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only indicate non-read-only. The description adds context: the rendering process uses ffmpeg, returns a downloadable URL immediately, and saves the result to list_my_videos. No contradictions; behavioral traits are well disclosed beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose, and every word is essential. No redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters and no output schema, the description covers input requirements (ordered list, source), optional parameters, output format (URL and saved to list), and even mentions the underlying technology (ffmpeg). It is sufficiently complete for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the ordering of clip_asset_ids and where to obtain them, plus listing optional params with defaults. This extra context enhances understanding beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool stitches several Switch videos into one, with a specific verb ('stitch') and resource ('videos'). It distinguishes from sibling tools (e.g., apply_* effects, generate_video), as none of them perform video concatenation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to use this tool when the user wants to combine, join, merge, or concatenate clips. It also advises getting clip_asset_ids from list_my_videos. However, it does not explicitly mention when not to use it or provide alternatives for other scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talking_avatar_videoTalking Avatar VideoAInspect

Turn a face photo into a lip-synced talking-head video that speaks your text (or your audio). Provide image_url (a clear face photo) and either script (text to speak, max 2500 characters) or audio_url. Optional voice_id / language / voice_settings. Renders in ~1-5 minutes (single call, returns the finished branded video) and is saved to your library. Charged per video.

ParametersJSON Schema

Name	Required	Description
`script`	No	Text the avatar speaks. Max 2500 characters. Required unless audio_url is given.
`language`	No	Optional language code (default en).
`voice_id`	No	Optional voice id (from clone_voice / your library).
`audio_url`	No	Pre-recorded audio URL to lip-sync instead of generating speech from script.
`image_url`	Yes	A clear face photo (Switch/public URL). Required.
`voice_settings`	No	Optional: { stability, similarityBoost, style, useSpeakerBoost } 0-1.

Tool Definition Quality

A4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes well beyond the minimal annotation (readOnlyHint=false) by detailing rendering time (~1-5 minutes), synchronous completion ('single call, returns the finished branded video'), persistence ('saved to your library'), and cost ('Charged per video'). This provides rich behavioral context for the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (~70 words) and front-loaded with the core action. Each sentence adds distinct information (purpose, required inputs, optional inputs, behavior, cost). Could be slightly more structured (e.g., bullet points), but it remains clear and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool complexity (6 params, no output schema, nested objects), the description covers purpose, inputs, timing, storage, and cost. It lacks details on output format or error handling, but the mention of 'returns the finished branded video' provides some closure. The lack of differentiation from sibling 'lip_sync_video' is a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100%, so baseline is 3. The description adds value by clarifying that image_url should be a 'clear face photo', emphasizing the required-or-alternative relationship between script and audio_url, and noting the optional voice_settings. It does not repeat all schema details, but the added nuance justifies a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it creates a talking-head video from a face photo and text/audio. It specifies the key inputs (image_url, script/audio_url) and output (lip-synced video). However, it does not explicitly differentiate from the sibling tool 'lip_sync_video', which appears similar, so purpose clarity is strong but not perfect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (to create a talking-head video) but does not provide explicit guidance on when not to use it or alternatives. It mentions the two input modes (script or audio_url) but lacks context on scenarios where one might prefer the sibling 'lip_sync_video' or 'generate_video'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upload_mediaUpload MediaAInspect

Upload one image into your Switch library in a single call. Pass url (any public https) OR base64 + mime. Switch fetches/decodes it server-side, stores it, and returns a clean public URL plus the new asset id. This is THE way to use a photo the user attached in chat as a reference: pass the returned url directly into generate_image's reference_image_urls, OR into generate_video's image_url (image-to-video) or reference_image_urls (reference / omni video). The returned URL is provider-fetchable as-is — no presigned PUT, no curl, no confirm-upload step. Do NOT call get_my_active_references for a chat-attached photo; that strip only holds Studio-managed refs.

ParametersJSON Schema

Name	Required	Description
`url`	No	Any public https URL — Switch fetches it server-side.
`mime`	No	MIME type when sending base64. Default image/png.
`base64`	No	Base64-encoded image bytes (use this when there is no public URL).

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only show readOnlyHint=false and idempotentHint=false. Description adds that Switch fetches/decodes server-side, stores, returns public URL and asset id, and notes no presigned PUT or confirm-upload step needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Dense paragraph with every sentence adding value. Could be slightly restructured for clarity, but no wasted words given the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explicitly states return values (public URL and asset id). Provides usage context with generate_image/generate_video. Complete enough for agent to use tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds value by explaining usage of url vs base64+mime, implying mutual exclusivity, and giving context like 'any public https' and 'when there is no public URL'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

States verb (upload), resource (image into Switch library), and distinguishes from siblings like get_my_active_references. Also clarifies it's for chat-attached photos, not Studio-managed refs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (for chat-attached photos) and when not to (do NOT call get_my_active_references). Provides clear alternative: pass returned URL into generate_image or generate_video.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upload_reference_assetUpload ReferenceAInspect

Upload an image, video, or audio reference into Switch cloud and get a ready-to-use reference URL. Pass kind=image|video|audio. Returns reference_image_urls / reference_video_urls / reference_audio_urls for generate_image and generate_video. Image and video references are also added to your active Studio reference strip (the same one your desktop uses) unless activate=false. PREFERRED for real files: call with presign=true to get an upload_url, PUT the bytes straight to it (no base64 through the model), then call again with confirm_path to verify and add it — works for image, video, and audio. base64/url is only for tiny inline files.

ParametersJSON Schema

Name	Required	Description
`url`	No	Public https URL to fetch server-side.
`kind`	Yes	Reference type to upload.
`mime`	No	MIME for base64. Images: jpg/png/webp/gif. Videos: mp4/mov. Audio: mp3/wav/m4a/aac.
`base64`	No	Base64 bytes (optionally a data: URL). Best for small files; large video should use presign.
`presign`	No	Return an upload_url to PUT the file bytes directly to (no base64). Video always; image/audio when enabled.
`activate`	No	Image/video: add to the active Studio reference strip. Default true. Audio never touches the strip.
`filename`	No	Optional source filename for extension/display.
`frame_type`	No	Image strip label: ref (default), face, body, clothes, scenery, product, typography. Use "face" for a person's face/likeness — face uploads are stored as untouched originals in the private reference bucket and their returned asset_id is the ONLY handle face-capable generation accepts (KYC-verified accounts only).
`confirm_path`	No	The storage_path from a presign call, after you PUT the file — verifies the object, records it, and adds it to your strip.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a mutating operation (readOnlyHint=false, idempotentHint=false). The description corroborates by detailing upload to cloud, modification of Studio strip for image/video, and the two-step presign/confirm workflow. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description packs substantial information into a single paragraph without fluff. It is front-loaded with purpose, then details workflow and caveats. Slightly verbose in spots (e.g., 'PREFERRED for real files') but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers most aspects: main purpose, two upload methods, Studio strip interaction, kind-specific behavior, and return value usage. Minor gap: does not explicitly describe the response structure beyond reference URLs, but given no output schema, the description is quite complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing baseline of 3. The description adds significant value by explaining the presign/confirm_path workflow, the meaning of frame_type (especially face upload handling), and the effect of activate on Studio strip. Enhances understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool uploads an image, video, or audio reference into Switch cloud and provides a ready-to-use URL. It distinguishes from sibling tools like upload_media by specifying the intended usage for generation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance: prefer presign for real files, base64/url only for tiny inline files. Explains when to set activate=false to avoid adding to the Studio strip. Also notes audio never touches the strip.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

video_to_promptVideo To Prompt

Read-only

Inspect

Turn one of your finished Video Analysis reports into ONE reusable generation prompt that recreates the source video's look, energy, pacing and mood, with a {your photo} placeholder where your own subject goes. Pass report_id (from analyze_video_report or list_vision_reports) or video_url (the exact source URL you already analyzed). Free: it rewrites the analysis you already paid for and never charges. If the video has not been analyzed yet, run analyze_video_report first. Optional focus: pass mode to control what the prompt describes, and engine to pick the model format — also free.

ParametersJSON Schema

Name	Required	Description
`mode`	No	What the prompt focuses on. action = motion/gestures only, no appearance or scene. scene = setting/camera/lighting only, no subject. action_scene = both, no appearance. description_scene (default) = full prompt including the subject's appearance.
`engine`	No	Which model format to return: seedance (default, control-format), kling (cinematic prose), or gemini (plain paragraph for Omni).
`report_id`	No	A finished report id from analyze_video_report or list_vision_reports.
`video_url`	No	Alternative: the exact public https URL you already analyzed.

voiceManage VoicesAInspect

Your saved voices — one tool for the whole voice library. Users speak plain language and never know ids: resolve every voice by NAME yourself (call action "list" first if unsure) and never ask the user for an id. action="list" returns every saved voice with voice_id, name, kind and ready — kind "reference" is an instant voice match saved from a clip and kind "clone" is a trained voice (both speak through generate_audio: pass the NAME as its voice param); kind "avatar" voices drive talking_avatar_video. action="create" saves a NEW reference voice from a clip: voice_name plus audio_url (e.g. the url upload_media returned) or audio_base64 (+ format) — free, ready instantly. action="rename" renames a saved voice (voice_id takes the id OR the current name, new_name is the new name). action="clone" registers a voice for talking_avatar_video from audio_sample_url + voice_name (charged 2 credits). action="delete" removes a voice by voice_id or name.

ParametersJSON Schema

Name	Required	Description
`action`	Yes	Which operation to run.
`format`	No	create: clip format when sending audio_base64. Default wav.
`new_name`	No	rename: the new name for the voice.
`voice_id`	No	delete/rename: the voice id OR its name — names are resolved for you.
`audio_url`	No	create: URL of a 10-30 second clip of the voice — e.g. the url returned by upload_media.
`voice_name`	No	create/clone: what to call the voice (unique per account).
`audio_base64`	No	create: the clip as base64 when there is no URL.
`audio_sample_url`	No	clone: a 10-30 second voice sample URL (reachable).

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses multiple behavioral traits beyond the minimal annotations: actions have different costs (clone charged 2 credits), create is free and instant, rename/delete accept id or name, and readiness of created voices.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-organized with a front-loaded summary and action-by-action details. Every sentence adds value, though a more structured layout (e.g., bullets) could improve scannability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers return format for 'list' but lacks explicit return descriptions for other actions. Given the tool's complexity and no output schema, it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds significant meaning beyond the 100% schema coverage: clarifies that audio_url should come from upload_media, audio_base64 is used when no URL, voice_name is unique per account, and voice_id accepts name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it manages the voice library, lists actions, and distinguishes from siblings like generate_audio and talking_avatar_video by explaining how they use voices from this tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance for when to use each action, prerequisites (e.g., upload_media for audio_url), and hints to use list first if unsure. Also explains how different voice kinds are used by other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?