Avocado AI
Server Details
Create ads inside any AI assistant with Avocado, create, edit and make AI UGC in chat.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.5/5 across 17 of 17 tools scored. Lowest: 3.8/5.
Each tool targets a distinct media type or action (credit check, job polling, flow creation, storyboard management, various media generation, upload preparation, model listing, onboarding). No two tools have overlapping purposes, and even the paired 'generate to storyboard' tools are clearly distinct from their base generation counterparts.
Most tools follow a verb_noun pattern (e.g., generate_image, list_storyboards, create_flow). However, 'models_list' deviates from the pattern (should be 'list_models'), and 'get_started' uses a different verb tense. Overall, the convention is mostly consistent.
17 tools is on the higher side but justified by the server's broad scope (image, video, music, speech, effects, flow, storyboard, account, models, upload). Each tool has a clear role, though a few (like 'generate_image_to_storyboard') could be optional parameters rather than separate tools.
The tool surface covers core generation tasks (image, video, music, speech, effects) and supporting operations (upload, polling, account, models). However, storyboard management is missing update/delete functionality, and there is no tool for cancelling jobs or editing generated videos/audio.
Available Tools
28 toolsaccount_check_creditsARead-onlyIdempotentInspect
Check your Avocado AI credit balance. Returns available credits, membership tier, and what you can generate with your current balance.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds value by detailing the return data (credits, tier, generation capability), which is not covered by annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that conveys all necessary information without extraneous words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description covers the return values adequately. For a simple read tool with no inputs, this is complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With zero parameters and 100% schema coverage, the description does not need to explain parameters. It provides no further detail but also has no gaps.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it checks credit balance and specifies what it returns (credits, membership tier, what can be generated). It is distinct from sibling tools like generate_image, which consume credits.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use the tool (to check credit balance) but does not explicitly state when not to use it or provide alternatives. It implies usage before generation but lacks direct guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
check_jobARead-onlyIdempotentInspect
Always call this tool after generate_image, edit_image, or generate_video to retrieve the result. Pass the jobId returned by the generation tool. Returns status (queued, processing, completed, failed), result URLs when ready, and error details on failure. When an image job is completed, the resulting image(s) are returned as inline image content blocks so they render directly in chat alongside the JSON metadata. If status is queued or processing, wait 5 to 10 seconds and call again; image jobs typically finish in 10 to 60 seconds, video jobs in 2 to 10 minutes.
| Name | Required | Description | Default |
|---|---|---|---|
| jobId | Yes | The jobId returned by a generation tool. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds significant behavioral detail beyond the annotations: it specifies return values (status, URLs, error details, inline image blocks), polling guidance, and typical durations. No contradiction with annotations (readOnly, idempotent, non-destructive).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a concise paragraph that front-loads the critical usage instruction. It could be structured with bullet points for improved readability, but it efficiently conveys all necessary information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity and lack of output schema, the description fully explains the return structure, behavior (polling), and typical timings. It covers all essential information for an agent to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter jobId is fully described in the schema. The description repeats the same information without adding new semantics. With 100% schema coverage, baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool checks the status of generation jobs and retrieves results. It explicitly mentions the generation tools it follows (generate_image, edit_image, generate_video) and what it returns, distinguishing it from siblings that create jobs.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear when-to-use instructions ('always call this tool after...'), what to pass (jobId), and retry behavior with time estimates. It doesn't explicitly list when not to use, but the context is sufficiently clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_flowAInspect
Create a new Avocado AI Flow pre-built with a node-graph pipeline, and return
its id and direct URL so the user can open it on the canvas. You design the
whole pipeline: pass the nodes and edges and the server validates socket
compatibility, aligns video models to the input shape, lays the graph out
left-to-right, and adds a caption per step. Edges reference nodes by 0-based
index in the nodes array. This creates (does not run) the flow — the user
runs it from the editor.
Use the capability map below to choose node types, models, and handles:
You are Avo, a senior creative-workflow designer inside Avocado AI's Flow editor. The user describes a creative goal; you respond with a node-graph proposal that the editor previews on the canvas. Think like a production director: design the FULL pipeline needed to get a polished result, not the minimum number of nodes.
DESIGN PRINCIPLES — build capable, complete pipelines:
Match the pipeline's ambition to the request. A throwaway test is 2-3 nodes; a real deliverable (an ad, a UGC video, a product shot, a music video) is usually 5-12 nodes. Use up to 24 when it genuinely helps.
Prefer multi-stage quality: generate → refine (imageEditor) → upscale → animate, rather than a single generate node. Add an upscale step before any final image/video deliverable.
Use BRANCHING and FAN-OUT. One output can feed many nodes: e.g. one hero image → three different video models for variations the user can pick from; one script → both a voiceover and the video prompt.
Use PARALLEL TRACKS that converge: e.g. a voice track and an image track both feeding a lip-sync video; or a music track plus a visuals track.
Use the
llmnode to do creative thinking inside the graph — write or expand a script, brainstorm a prompt, turn a rough idea into a detailed image/video prompt — then wire its text output into the next node.Pick the BEST model for each step (see the menus below). Don't leave everything on defaults — choosing models is a big part of the value.
Set per-node settings (aspect ratio, resolution, duration, voice, variations) when the request implies them (e.g. 'vertical' → 9:16, 'short' → duration 5, '3 options' → variations 3 or three branches).
HARD RULES:
Use only the node types listed below. Never invent new ones.
Every edge must connect compatible socket types (text→text, image→image, audio→audio, video→video).
Give every runnable node a short
stepLabel('Step N — …') — it renders as a caption beneath that node.stickyNoteis only for standalone notes; never use it to caption a node (usestepLabel). Optionally add ONE stickyNote describing the workflow.Any schema field you don't need must be
null(numbers likevariationstoo).
MODEL MENUS (set the node's model to one of these ids):
image (text-to-image) — model ids:
• fal-ai/nano-banana-2 — fast, strong all-rounder (default)
• fal-ai/gpt-image-2 — best instruction-following & legible text
• fal-ai/bytedance/seedream/v5/lite/text-to-image — photoreal
• fal-ai/flux-pro/v1.1-ultra — high detail / fidelity
• fal-ai/nano-banana-pro — premium quality
• fal-ai/recraft/v4/text-to-image — design, brand, vector-style
• fal-ai/ideogram/v3 — posters & typography
imageEditor (image + prompt → edited image) — model ids:
• fal-ai/nano-banana-2/edit — default, multi-image (up to 14 inputs)
• openai/gpt-image-2/edit — precise instruction edits
• fal-ai/bytedance/seedream/v5/lite/edit — photoreal edits
• fal-ai/flux-pro/kontext/max/text-to-image — style / context transfer
• fal-ai/gemini-25-flash-image/edit — fast edits
(the image input accepts MULTIPLE connections for compositing/restyle)
imageUpscale (image → larger image) — model ids:
• fal-ai/topaz/upscale/image — best quality (default)
• fal-ai/recraft-crisp-upscale, fal-ai/clarity-upscaler,
fal-ai/crystal-upscaler
llm (text → text) — model ids: claude-haiku (default), gpt-4o-mini,
kimi-k2, seed-1.8. Put the instruction in prompt.
voice (text → speech) — pick a voice by name. ElevenLabs (English-first):
Sarah (cheerful), Roger (deep), Laura (soft), Charlie (warm), George
(bold), Callum (energetic), River (calm), Liam (reliable). Seed Audio
(multilingual en/zh + more, cheaper for short lines): Vivi, Mindy, Kian,
Sophie, Magnus, Nadia. The script comes from an upstream text/llm
node wired into in — do NOT put the script in the voice node's prompt.
music (text → music) — set duration to one of 30,60,90,120,180,240,300
(seconds). Put the music description in prompt.
videoUpscale (video → sharper video) — add after a video node for final deliverables. No model field.
VIDEO node — choose model to match the input shape (it drives which input
handles the node renders):
• Text → video: kling3-pro, sora-2, veo3-1-fast, seedance-2.0-t2v.
Wire text to prompt.
• Image → video (I2V): veo3-1-fast, kling3-pro, seedance-2.0-i2v,
hailuo-pro. Wire the image to image. For keyframe models
(kling-o1, veo3-1) wire start-frame + end-frame.
• Lip-sync / talking-head: fabric (image + audio, NO prompt — never wire
text into Fabric) or infinitalk (prompt + image + audio). Wire audio
to audio. Audio-over-stills narration: ltx2-audio.
• Multi-image reference / character consistency: vidu (≤7),
veo3-1-ref (≤10), kling-elements (2-4 ordered frames),
happy-horse-ref (≤9). Wire EACH image to the SAME ref-images handle
(it accepts multiple connections). Never use the plain image handle.
• Seedance reference (image + video + audio refs): seedance-2.0-ref /
seedance-2.0-ref-fast. Wire to ref-images / ref-videos / ref-audio.
• Motion control (drive a character with a motion video):
kling3-motion-control. Wire character to image, motion clip
(videoUpload) to motion-video.
• Video edit (change an existing video with an instruction):
gemini-omni-flash-edit. Wire the source video (videoUpload or an
upstream video node) to motion-video and the edit instruction to
prompt. Output length follows the source video (3-10s).
• Text/Image → video with synced audio baked in: gemini-omni-flash
(3-10s, 720p, 16:9 or 9:16). Multi-image refs: gemini-omni-flash-ref
(≤10, wire to ref-images).
Edge handle hints:
When the target has multiple typed inputs (Video, Image Editor), set
toHandleexplicitly (prompt,image,audio,ref-images,start-frame,end-frame,motion-video). The editor otherwise picks the first type-compatible handle, which may be the wrong slot.Never wire text into Fabric. Never wire a single image into a multi-ref model's
imageslot — useref-images.
Available node types (id — purpose — inputs / outputs):
text — Prompt — in: in | out: out
llm — LLM — in: text, image, audio, video, document | out: out
upload — Image Upload — in: — | out: out
videoUpload — Video Upload — in: — | out: out
image — Image — in: in | out: out
imageEditor — Image Editor — in: prompt, image | out: out
imageUpscale — Image Upscale — in: image | out: out
video — Video — in: prompt, image, start-frame, end-frame, ref-images, ref-videos, ref-audio, audio, motion-video | out: out
videoUpscale — Video Upscale — in: video | out: out
voice — Voice — in: in, ref-audio | out: out
music — Music — in: in | out: out
stickyNote — Sticky Note — in: in | out: out
Edges reference nodes by index in the nodes array (0-based). In the
examples below, any field not shown is null.
EXAMPLES — study the PATTERNS (multi-stage, fan-out, parallel tracks), copy the handle names exactly:
Example 1 — UGC talking-head with scripted voice + final upscale: nodes=[ {type:"llm",stepLabel:"Step 1 — Write a punchy 15s script",prompt:"Write a 15-second energetic UGC script for the product.",model:"claude-haiku"}, {type:"voice",stepLabel:"Step 2 — Voiceover",voice:"George"}, {type:"upload",stepLabel:"Step 3 — Upload character photo"}, {type:"video",stepLabel:"Step 4 — Lip-sync video",model:"fabric"}, {type:"videoUpscale",stepLabel:"Step 5 — Upscale to deliver"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"audio"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"video"} ]
Example 2 — Text → image → refine → upscale (quality chain): nodes=[ {type:"text",stepLabel:"Step 1 — Prompt",prompt:"A cinematic product shot of a matte-black bottle on wet stone, golden hour"}, {type:"image",stepLabel:"Step 2 — Generate hero",model:"fal-ai/flux-pro/v1.1-ultra",aspectRatio:"4:3"}, {type:"imageEditor",stepLabel:"Step 3 — Add brand label",prompt:"Add a minimal embossed logo on the bottle",model:"fal-ai/nano-banana-2/edit"}, {type:"imageUpscale",stepLabel:"Step 4 — Upscale",model:"fal-ai/topaz/upscale/image"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"} ]
Example 3 — Fan-out: one image → three video variations (different models): nodes=[ {type:"upload",stepLabel:"Step 1 — Source image"}, {type:"text",stepLabel:"Step 2 — Motion brief",prompt:"Slow cinematic push-in, gentle parallax"}, {type:"video",stepLabel:"Variation A — Veo",model:"veo3-1-fast",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation B — Kling",model:"kling3-pro",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation C — Seedance",model:"seedance-2.0-i2v",aspectRatio:"9:16",duration:"5"} ] edges=[ {fromIndex:0,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"image"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"prompt"} ]
Example 4 — Multi-image reference video (character consistency): nodes=[ {type:"upload",stepLabel:"Ref 1 — Character front"}, {type:"upload",stepLabel:"Ref 2 — Character side"}, {type:"upload",stepLabel:"Ref 3 — Outfit detail"}, {type:"text",stepLabel:"Scene prompt",prompt:"The character walks through a neon market at night"}, {type:"video",stepLabel:"Generate with refs",model:"veo3-1-ref",aspectRatio:"16:9"} ] edges=[ {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:2,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"prompt"} ]
Example 5 — Music video: parallel music + visuals tracks converging: nodes=[ {type:"music",stepLabel:"Track 1 — Score",prompt:"Dreamy lo-fi beat, 90 BPM",duration:"60"}, {type:"text",stepLabel:"Track 2 — Scene",prompt:"A lone astronaut drifting past a glowing planet"}, {type:"image",stepLabel:"Keyframe",model:"fal-ai/nano-banana-pro",aspectRatio:"16:9"}, {type:"video",stepLabel:"Animate",model:"ltx2-audio",aspectRatio:"16:9"} ] edges=[ {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"in"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"audio"} ]
Return only the structured object — no prose, no markdown.
| Name | Required | Description | Default |
|---|---|---|---|
| edges | No | Connections between nodes (by index). Omit for a single-node flow. | |
| nodes | Yes | The pipeline's nodes (1-24). | |
| title | No | Flow title. Defaults to 'Untitled Flow'. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses all salient behaviors: validation, alignment, layout, captioning, and the fact that it does not run the flow. Annotations are consistent and add no contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely verbose (mini-manual), far beyond typical tool descriptions. While informative, it lacks conciseness and could be much shorter for quick comprehension.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity and lack of output schema, the description is remarkably complete, covering node types, edges, models, examples, and edge cases, leaving minimal ambiguity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage, the description adds immense value by detailing valid model IDs, node I/O semantics, edge handle hints, and design patterns, which are critical for correct usage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: creating a new Avocado AI Flow with a node-graph pipeline, returning id and URL. It distinguishes from sibling tools by emphasizing the multi-node pipeline design.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides extensive when-to-use and when-not-to-use guidance, including design principles, hard rules, model selections, and examples. It implicitly contrasts with single-node sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_storyboardAInspect
Create a new empty Avocado AI storyboard for the user. Returns the new board's id and direct URL so the user can open it.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Title for the new storyboard. Defaults to 'Untitled Storyboard'. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations are minimal (readOnlyHint: false, destructiveHint: false), so the description adds valuable context: it creates an empty board and returns the id and URL. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences that are front-loaded with the key purpose and return information. No unnecessary words or details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple creation tool with 1 optional parameter and no output schema, the description adequately explains the action and return values (id and URL). Could mention error handling or authorization but not essential.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage with a description for the only parameter (title). The description does not add extra meaning beyond what the schema already provides, meeting the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Create', the resource 'new empty Avocado AI storyboard', and the return values (board id and direct URL). It effectively distinguishes from sibling tools like 'list_storyboards' and other creation tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for creating a storyboard but does not explicitly guide when to use it versus alternatives like 'generate_image_to_storyboard' or 'generate_video_to_storyboard'. No exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
describe_avocadoARead-onlyIdempotentInspect
Describe what Avocado AI is and what it can do. Call this when a user asks about Avocado AI, wants to know what AI media tools are available, or is deciding whether to sign up. Returns capabilities, supported models, use cases, pricing overview, and how to connect.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds value by listing what the tool returns (capabilities, supported models, pricing, etc.), providing context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with three sentences. It front-loads the purpose and adds usage scenarios efficiently. Could be slightly more streamlined, but overall well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (no parameters, no output schema), the description covers all necessary context: purpose, when to use, and what information is returned. Annotations provide safety cues. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has zero parameters with 100% schema coverage, so baseline is 4. The description does not need to add parameter information, and it correctly focuses on the tool's function.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: describing Avocado AI, its capabilities, and offerings. It specifies the verb 'Describe' and the resource 'Avocado AI', and distinguishes from siblings by being the only tool for general platform info.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly tells when to use this tool (when user asks about Avocado AI, wants available tools, or is deciding to sign up). It doesn't specify when not to use, but for a non-ambiguous tool this is sufficient.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
edit_imageEdit ImageAInspect
Modify an existing image, standalone — the result is retrievable ONLY via check_job and is NOT placed anywhere automatically. If a flow_id is in play (the Flows Director / a storyboard), do NOT use this tool — use edit_image_to_flow instead, so the result actually lands in the user's Director Library. REQUIRED input: exactly one of file_id OR image_url. base64 is NOT accepted — do not try to pass image bytes as a tool argument, the call will be rejected. For chat-attached images you MUST first call prepare_image_upload to get a signed PUT URL, upload the bytes there (via the inline widget on Claude.ai, or via curl on Claude Desktop / Claude Code), then call this tool with the returned file_id. For URLs the user has pasted, use image_url directly. Returns a jobId immediately; call check_job with the jobId to retrieve the edited image inline. Models: 'nano-banana-2' (fast, default, 1 credit/image), 'nano-banana-2-lite' (fastest/cheapest, single-image touch-ups, 1 credit/image), and 'gpt-image-2' (higher quality, 1-4 credits/image by quality tier).
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Edit model. 'nano-banana-2' is fast and cheap (default). 'nano-banana-2-lite' is even faster/cheaper for simple single-image touch-ups (not optimized for multiple reference images). 'gpt-image-2' is higher quality but costs more credits. | |
| prompt | Yes | What to change about the image. Be specific. Example: 'Replace the background with a sunset beach' or 'Add reading glasses to the person'. | |
| file_id | No | file_id returned by prepare_image_upload after the image was uploaded to the signed URL. This is the ONLY supported path for chat-attached images. Format: 'mcp-source/{userId}/{uuid}.{ext}'. | |
| quality | No | Quality tier. Only applies to 'gpt-image-2'. low=1 credit, medium=1-2, high=4 credits per image. Defaults to 'medium'; only use 'high' when the user asks for maximum quality. Ignored by 'nano-banana-2'. | |
| image_url | No | HTTPS URL of the image to edit. Use only when the user pasted a public URL. Otherwise call prepare_image_upload first. | |
| num_images | No | Number of edited variants to produce (1-4). Defaults to 1. | |
| aspect_ratio | No | Output aspect ratio. Omit to keep the source image's shape (best for retouching an existing photo). Set it when composing a new layout around the input — e.g. '9:16' or '3:4' for a vertical poster, '2:3' for a tall portrait. For 4:5-style feed posts use '3:4' (closest supported portrait). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses that the result is returned as a jobId and must be retrieved via check_job. Annotations show readOnlyHint=false and destructiveHint=false, but the description does not explicitly state whether the original image is altered or preserved. Minor omission keeps it from a 5.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is information-dense and front-loaded with the most critical points (purpose, flow exclusion, input requirement). While every sentence adds value, length could be slightly reduced without losing substance.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 100% schema coverage, the description provides essential workflow context, model selection guidance, and explains the asynchronous nature (jobId + check_job). No output schema exists, but the description sufficiently covers what the agent needs to know.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Adds significant meaning beyond the schema: explains that exactly one of file_id or image_url is required, base64 is rejected, and provides example prompts. Also clarifies quality tier application and model use cases, which are not fully detailed in schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool modifies an existing image and specifies that the result is only retrievable via check_job, not placed automatically. It distinguishes from the sibling tool edit_image_to_flow, which places results in the Director Library.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly tells when not to use this tool (when a flow_id is in play, use edit_image_to_flow instead). Provides a step-by-step workflow for chat-attached images and direct URLs. Specifies required input exactly one of file_id or image_url, and rejects base64.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
edit_image_to_flowEdit Image to FlowAInspect
Edit an existing image and place the result directly on a user's Avocado AI flow (the Flows Director). This is the flow-native edit: use it (NOT the plain edit_image) for ANY edit inside a flow, so the result lands in the user's Director Library. This MODIFIES ONE SPECIFIC existing image (file_id/image_url is required) — it does not generate new frames from a prompt. To generate a NEW image guided by one or more reference images (e.g. a storyboard beat that must keep the same cast/location consistent), use generate_image_to_flow with reference_image_urls instead. Drops a 'Generating...' tile immediately, then swaps in the edited image when ready (10-60s). REQUIRED: exactly one of file_id OR image_url (HTTPS). For chat-attached images call prepare_image_upload first, then pass the returned file_id. Models: 'nano-banana-2' (fast, default, 1 credit), 'nano-banana-2-lite' (fastest/cheapest, single-image touch-ups, 1 credit), and 'gpt-image-2' (higher quality, 1-4 credits by quality). To regenerate/retouch an EXISTING tile in place (same tile stays 'the' cast/location/beat reference — nothing downstream breaks) instead of creating a duplicate, pass replace_node_id (get it from a prior generate_image_to_flow/edit_image_to_flow response's nodeIds, or from list_flow_assets); typically image_url would then be that same tile's own current url.
| Name | Required | Description | Default |
|---|---|---|---|
| role | No | Director Library grouping. 'cast' = a character reference, 'location' = a place/world reference, 'beat' = a storyboard frame. Omit for a plain edited image. With replace_node_id, defaults to the existing tile's current role. | |
| label | No | Short human label shown in the Director Library, e.g. 'Pip - hero (v2)' or 'Beat 3 - Into the Backrooms'. With replace_node_id, defaults to the existing tile's current label. | |
| model | No | Edit model. 'nano-banana-2' is fast and cheap (default). 'nano-banana-2-lite' is even faster/cheaper for simple single-image touch-ups (not optimized for multiple reference images). 'gpt-image-2' is higher quality but costs more. | |
| prompt | Yes | What to change about the image. Be specific, e.g. 'Replace the background with a sunset beach' or 'Put the same character in the bank lobby'. | |
| file_id | No | file_id from prepare_image_upload (the ONLY path for chat-attached images). Provide exactly one of file_id or image_url. | |
| flow_id | Yes | The id of the flow to add the edited image to. Must be owned by, or shared with edit access to, the authenticated user. | |
| quality | No | Quality tier ('gpt-image-2' only). low=1 credit, medium=1-2, high=4 per image. Defaults to 'medium'. | |
| image_url | No | HTTPS URL of the image to edit (e.g. an existing flow asset, or a URL the user pasted). Provide exactly one of file_id or image_url. | |
| num_images | No | Number of edited variants (1-4). Defaults to 1. One tile per variant. Not compatible with replace_node_id (which always produces exactly one). | |
| beat_number | No | For role 'beat': the beat's position in story order (1..N). With replace_node_id, defaults to the existing tile's current beat_number. | |
| aspect_ratio | No | Output aspect ratio. Omit to keep the source image's shape (best for retouching). Set it when composing a new layout. Also controls the tile shape on the flow. | |
| replace_node_id | No | Regenerate an EXISTING tile in place instead of creating a new one — use when the user asks to redo/regenerate/tweak a specific cast, location, or beat tile that's already on the flow. The node keeps its position, and (unless overridden) its role/label/beat_number. Get the id from a prior generate_image_to_flow/edit_image_to_flow response's nodeIds, or from list_flow_assets. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With all annotations false, the description carries the full behavioral burden. It details the writing behavior, immediate tile placement, eventual swap (10-60s), model choices and credits, and replace_node_id behavior. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is long but well-structured, front-loading key purpose and distinctions. Minor redundancy (e.g., 'Generating...' tile timing) but overall efficient for the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's 12 parameters, no output schema, and complexity of flow modifications, the description covers all essential contexts: prerequisites, optional parameters like replace_node_id, model choices, and timing. It is complete for agent selection and invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so schema already describes parameters. The description adds extra context on relationships between parameters (e.g., file_id vs image_url, replace_node_id usage, prepare_image_upload prerequisite), enhancing understanding beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool edits an existing image and places it on an Avocado flow. It distinguishes itself from the sibling 'edit_image' by specifying it's the flow-native edit, and contrasts with 'generate_image_to_flow' for new image generation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly tells when to use this tool ('for ANY edit inside a flow') and when not to ('NOT the plain edit_image'), and provides alternative tool ('generate_image_to_flow') for generating new images. Also explains replace_node_id for in-place regeneration.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
extract_brandAIdempotentInspect
Extract the user's brand from their website (what they do, audience, tone, colors, logo) and save it as their brand + your profile of them. Use this during onboarding once the user CONFIRMS their website. Pass the confirmed https URL. No credits. After it succeeds, you already know their business: briefly confirm what you learned and move on. Do NOT also emit a SOUL_SAVE marker in the same turn; this tool saves the profile for you.
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The user's confirmed website URL, e.g. https://acme.com |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Explains side effects: saves the profile, uses no credits, and informs post-success action. Aligns with annotations (idempotentHint, openWorldHint) and adds concrete behavioral context beyond them.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with action, each sentence serves a purpose. No redundant information. Highly efficient for the amount of guidance provided.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all essential aspects: action, timing, input, post-behavior, and exclusions (no extra save marker). Given the simple schema and no output schema, description is fully complete for correct tool invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema already has 100% coverage with description for 'url', including example. The tool description adds only minor emphasis ('confirmed https URL') without new semantic value. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb 'Extract' and resource 'brand from their website', and distinguishes it from siblings like 'research_business' by specifying it's for onboarding after website confirmation. The purpose is unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
States exact usage context 'during onboarding once the user CONFIRMS their website', requires passing the confirmed URL, and explicitly warns not to emit a SOUL_SAVE marker. Provides clear when-to-use and what not to do.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_imageGenerate ImageAInspect
Generate an AI image using Avocado AI. Returns a jobId immediately; image generation completes in 10-60 seconds. After calling, use the check_job tool with the returned jobId to retrieve the result, once complete, check_job returns the image inline so it renders directly in chat. Run models_list to see available models. Costs 1-4 credits per image depending on model and quality.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model slug from models_list. Always honor an explicit model request from the user, generate with whatever model they ask for. When the user has NOT named a model, default to 'nano-banana-2' (fast, 1 credit); only reach for 'gpt-image-2' on your own when the task needs precise on-image text or composite layouts. | |
| prompt | Yes | Text description of the image to generate. Be descriptive for best results. | |
| quality | No | Quality tier. Only applies to 'gpt-image-2'. low=1 credit, medium=1-2 credits, high=4 credits per image. Ignored by other models. Defaults to 'medium'; only use 'high' when the user asks for maximum quality or needs flawless fine detail / text. | |
| num_images | No | Number of images to generate (1-4). Defaults to 1. | |
| aspect_ratio | No | Image aspect ratio. Defaults to '1:1'. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Clearly discloses asynchronous behavior (returns jobId immediately, completion in 10-60 seconds) and credit costs. Annotations are all false, so no contradiction; description adds essential behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise: three sentences covering purpose, async flow, and key parameters. No redundancy; every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description fully explains return mechanism (jobId) and subsequent steps. Covers all parameters, costs, and sequencing, making the tool self-contained.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for all 5 parameters. The description adds valuable context beyond schema, such as default model, quality tier rules, and when to use high quality.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Generate an AI image using Avocado AI' with a specific verb and resource. It distinguishes from sibling tools like edit_image and generate_video by focusing on image generation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly instructs to use check_job for result retrieval and models_list for available models. Provides cost per image and default behaviors, guiding proper tool usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_image_to_flowGenerate Image to FlowAInspect
Generate an AI image and place it directly on a user's Avocado AI flow (the Flows Director). Drops a 'Generating...' tile on the flow immediately, then swaps it for the final image when generation completes (10-60s). It appears live on the open canvas and in the Director Library, grouped by role. For a MULTI-BEAT storyboard with a recurring character or setting, this (with reference_image_urls set) is the tool to use for every beat — not edit_image_to_flow, which only modifies one specific existing image. For role 'beat', if you omit reference_image_urls this tool AUTO-USES the flow's current cast/location tiles (the most recently (re)generated role='cast' and role='location' images), so consistency holds even across a fresh conversation with no memory of prior URLs — you rarely need to pass reference_image_urls yourself for beats. To regenerate a specific existing tile (cast, location, or one beat) IN PLACE instead of creating a duplicate, pass replace_node_id (get it from this tool's own past responses, or from list_flow_assets). Costs match generate_image (1-4 credits per image depending on model and quality).
| Name | Required | Description | Default |
|---|---|---|---|
| role | No | Director Library grouping. 'cast' = a character reference, 'location' = a place/world reference, 'beat' = a single storyboard frame (still) for a beat. Omit for a plain generated image. With replace_node_id, defaults to the existing tile's current role. | |
| label | No | Short human label shown in the Director Library, e.g. 'Pip - hero character' or 'Beat 3 - Into the Backrooms'. With replace_node_id, defaults to the existing tile's current label. | |
| model | No | Model slug from models_list. Honor an explicit user request. When the user has NOT named a model, default to 'nano-banana-2' (fast, 1 credit); only reach for 'gpt-image-2' when the task needs precise on-image text or composite layouts. | |
| prompt | Yes | Text description of the image to generate. | |
| flow_id | Yes | The id of the flow to add the image to. Must be owned by, or shared with edit access to, the authenticated user. | |
| quality | No | Quality tier ('gpt-image-2' only). low=1 credit, medium=1-2 credits, high=4 credits per image. Defaults to 'medium'. | |
| num_images | No | Number of images to generate (1-4). Defaults to 1. One tile per image. Not compatible with replace_node_id (which always produces exactly one). | |
| beat_number | No | For role 'beat': the beat's position in story order (1..N). Keeps frames and clips ordered in the Library. With replace_node_id, defaults to the existing tile's current beat_number. | |
| aspect_ratio | No | Image aspect ratio. Defaults to '1:1'. Also controls the tile shape on the flow. | |
| replace_node_id | No | Regenerate an EXISTING tile in place instead of creating a new one — use when the user asks to redo/regenerate/iterate on a specific cast, location, or beat tile that's already on the flow. The node keeps its position, and (unless overridden) its role/label/beat_number. Get the id from a prior generate_image_to_flow/edit_image_to_flow response's nodeIds, or from list_flow_assets. | |
| reference_image_urls | No | HTTPS URLs of already-generated reference images to KEEP CONSISTENT — pass the cast (character) image URL(s) and/or the location image URL to steer generation. Must be permanent URLs (Supabase or fal-hosted) — it doesn't matter which tool produced them, pasted/temporary URLs are dropped and you'll be told in the response. When set, the image is built with an edit/reference model (nano-banana-2 by default, or gpt-image-2) instead of plain text-to-image. For role 'beat' you can normally OMIT this — see the tool description. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (all false, implying mutation but no further clues), the description details the 'Generating...' tile behavior, timing (10-60s), live canvas and Library placement, cost equivalence to generate_image, and handling of reference URLs. This fully compensates for the lack of annotation detail.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately detailed for a complex tool with 11 parameters. It is well-organized: starts with core purpose, then immediate behavior, sibling differentiation, auto-use, replacement, and costs. Every sentence serves a purpose with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the high parameter count, lack of output schema, and sparse annotations, the description comprehensively covers behavior, usage patterns, common pitfalls (auto-use, temporary URLs), and outcome. It equips an agent to handle the tool correctly in various scenarios.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage, the description adds significant value for nearly every parameter: role's auto-use logic, model default recommendation, replace_node_id source, reference_image_urls constraints, beat_number ordering, and more. This arms the agent with practical selection rules beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('generate an AI image and place it directly on a flow') and resource, distinguishing from sibling tools like edit_image_to_flow. It also specifies the immediate behavior and auto-use of flow assets, making the purpose unmistakable.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on when to use this tool vs. edit_image_to_flow, especially for multi-beat storyboards. Also explains when to omit reference_image_urls and how to use replace_node_id for in-place regeneration, giving clear context for correct invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_image_to_storyboardGenerate Image to StoryboardAInspect
Generate an AI image and place it directly on a user's Avocado AI storyboard. Drops 'Generating...' placeholder(s) on the board immediately, then the webhook swaps each placeholder for the final image when generation completes (10-60s). Use list_storyboards or create_storyboard first to obtain the storyboard_id. If the user has the storyboard tab open, they may need to refresh once for the image to appear (the canvas does not yet support live realtime updates from MCP). Costs match generate_image (1-4 credits per image depending on model and quality).
| Name | Required | Description | Default |
|---|---|---|---|
| role | No | Studio Library grouping for storyboard work. 'cast' = a character reference, 'location' = a place/world reference, 'beat' = a single storyboard frame (still) for a beat. Omit for a plain generated image. | |
| label | No | Short human label shown in the Studio Library, e.g. 'Pip - hero character' or 'Beat 3 - Into the Backrooms'. | |
| model | No | Model slug from models_list. Always honor an explicit model request from the user, generate with whatever model they ask for. When the user has NOT named a model, default to 'nano-banana-2' (fast, 1 credit); only reach for 'gpt-image-2' on your own when the task needs precise on-image text or composite layouts. | |
| prompt | Yes | Text description of the image to generate. | |
| quality | No | Quality tier ('gpt-image-2' only). low=1 credit, medium=1-2 credits, high=4 credits per image. Defaults to 'medium'; only use 'high' when the user asks for maximum quality. | |
| num_images | No | Number of images to generate (1-4). Defaults to 1. One placeholder per image. | |
| beat_number | No | For role 'beat': the beat's position in story order (1..N). Used to keep frames and clips ordered in the Library. | |
| aspect_ratio | No | Image aspect ratio. Defaults to '1:1'. Also controls placeholder shape on the board. | |
| storyboard_id | Yes | The id of the storyboard to add the image to. Must be owned by, or shared with edit access to, the authenticated user. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses placeholder drop, webhook swap (10-60s), and refresh requirement. Explains credit costs relative to generate_image. Adds value beyond annotations (readOnlyHint=false, destructiveHint=false). No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured, front-loaded with action, but slightly verbose. Every sentence adds value; could tighten minor redundancies.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers workflow, prerequisites, behavior, and cost adequately for 9-param tool without output schema. Missing return value details, but acceptable given no output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and description adds useful context like default model/quality and guidance for when to override. Complements schema without redundancy.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool generates an AI image and places it on a storyboard, with specific verb 'generate' and resource 'storyboard'. It distinguishes from sibling tools like generate_image (no placement) and generate_video_to_storyboard (video variant).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
It advises obtaining storyboard_id via list_storyboards or create_storyboard first, and notes potential refresh need. While not explicitly excluding alternatives, the context implies usage for storyboard placement vs. plain generation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_musicAInspect
Generate AI music using Avocado AI. Create original music tracks from text prompts describing genre, mood, tempo, and instruments. Tracks can be 30 seconds to 5 minutes. Costs 4 credits per 30-second block. The track is saved to your Music Studio at https://www.avocadoai.co/music-studio.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Title for the music track. | |
| prompt | Yes | Description of the music to generate. Include genre, mood, tempo, instruments, and style. Example: 'Upbeat electronic dance music with synth leads, punchy drums, 128 BPM, energetic and euphoric mood' | |
| duration_seconds | No | Duration in seconds (30-300). Defaults to 30. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses credit consumption and persistent storage behavior (saved to Music Studio), adding value beyond the annotations which only indicate non-destructive/non-readOnly. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences with no redundancy. The first sentence immediately clarifies the tool's purpose, and subsequent sentences add necessary constraints without verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description adequately explains the outcome (track saved to Music Studio) and the cost model, making the tool's behavior fully understandable for invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, and the description does not add significant new meaning beyond the parameter descriptions already present in the schema. Example provided in schema, so baseline score is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states 'Generate AI music' and details the creation of original tracks from text prompts, clearly distinguishing it from sibling tools like generate_sfx or generate_speech.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
It provides concrete constraints on duration (30s-5min) and cost (4 credits per 30s block), along with the output destination (Music Studio URL). While it doesn't explicitly state when not to use, the context is sufficient for decision-making.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_music_to_flowAInspect
Generate an AI music track and place it directly on a user's Avocado AI flow (the Flows Director). Drops a 'Generating…' audio node on the flow immediately and returns right away; the finished track swaps in automatically (30s-5min later) — no need to wait or check_job (there is no check_job for audio). It appears live on the open canvas and in the Director Library (Audio). Tracks can be 30 seconds to 5 minutes. Costs 4 credits per 30-second block. Use this (not generate_music) when working on a flow.
| Name | Required | Description | Default |
|---|---|---|---|
| label | No | Short human label shown in the Director Library, e.g. 'Main theme'. | |
| title | No | Title for the music track. | |
| prompt | Yes | Description of the music: genre, mood, tempo, instruments, style. | |
| flow_id | Yes | The id of the flow to add the music to. Must be owned by, or shared with edit access to, the authenticated user. | |
| duration_seconds | No | Duration in seconds (30-300). Defaults to 30. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses non-blocking nature: drops 'Generating…' node immediately, finished track swaps in automatically after 30s-5min. Also mentions live appearance on canvas and Director Library. Annotations are silent on these behavioral details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single paragraph, front-loaded with main action. Every sentence adds necessary information: purpose, async behavior, duration, credit cost, and sibling comparison.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 5 params and no output schema, description covers credit cost, async update, sibling tool differentiation, and behavior on the flow. No obvious gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. Description adds value by explaining label usage ('Short human label shown in the Director Library'), prompt scope, and default duration. Provides context beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it generates AI music and places it directly on a flow. It explicitly distinguishes from sibling tool generate_music: 'Use this (not generate_music) when working on a flow.'
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Directly states when to use this tool over generate_music. Provides specific context: async behavior, no need to wait or check_job, credit cost, and duration range.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_sfxAInspect
Generate AI sound effects using Avocado AI. Create short sound effects from text prompts describing the sound. Effects can be 1 to 22 seconds. Costs 1 credit per 5-second block. The effect is saved to your Music Studio at https://www.avocadoai.co/music-studio.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Title for the sound effect. | |
| prompt | Yes | Description of the sound effect to generate. Example: 'Glass shattering on a tile floor with sharp reverberation' or 'Heavy footsteps on wet concrete in a dark alley' | |
| duration_seconds | No | Duration in seconds (1-22). Defaults to 5. | |
| prompt_influence | No | How closely to follow the prompt (0-1). Higher = more literal, lower = more creative. Defaults to 0.35. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (all false), the description adds important behavioral details: credit cost per 5-second block, duration limits, and where the effect is saved (Music Studio URL).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with the main purpose, then constraints and cost. No fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, parameters, cost, and destination. However, with no output schema and sibling tools including check_job, it likely returns a job ID asynchronously, but the description omits this, which is a significant gap for an agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%; description adds context about purpose of duration (1-22 seconds) and prompt influence (default 0.35) via examples, plus cost and saving location, enhancing interpretation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it generates AI sound effects from text prompts, distinguishing from sibling tools like generate_music or generate_speech by specifying short sound effects and saving to Music Studio.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for short sound effects (1-22 seconds) and mentions credit cost, but does not explicitly say when to use this tool versus alternatives or provide exclusion criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_speechAInspect
Convert text to natural-sounding speech using Avocado AI. Supports multiple voices and languages. Costs 3 credits per 1000 characters. Audio will appear in your Avocado AI workspace.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The text to convert to speech. | |
| voice | No | Voice to use. Defaults to 'rachel'. Options: rachel (female, calm), adam (male, deep), josh (male, young), bella (female, soft), sam (male, raspy). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations, the description adds cost per character and where the audio appears, which are useful behavioral details not present in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with core purpose, then cost and result location. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple input schema and no output schema, the description covers the major aspects: what it does, cost, and where the result appears. Missing error or speed details but adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the baseline is 3. The description adds cost context but does not add new parameter-level meaning beyond the schema's descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it converts text to speech using Avocado AI, mentions multiple voices and languages, and distinguishes from siblings like generate_music or generate_sfx.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies when to use (text-to-speech) but does not explicitly provide when-not or alternatives, leaving the agent to infer based on sibling tool names.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_speech_to_flowAInspect
Convert text to natural-sounding speech and place the voiceover directly on a user's Avocado AI flow (the Flows Director). Drops a 'Generating…' audio node immediately and returns right away; the finished voiceover swaps in automatically — no need to wait or check_job (there is no check_job for audio). It appears live on the open canvas and in the Director Library (Audio). ElevenLabs voices (rachel..sam) cost 3 credits per 1000 characters. Seed Audio voices (vivi, mindy, kian, sophie, magnus, nadia — multilingual en/zh and more) are pro-rated at 5 credits per 1000 characters with a 1-credit minimum (cheaper for short lines; max 2048 characters). Use this (not generate_speech) when working on a flow.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The text to convert to speech. | |
| label | No | Short human label shown in the Director Library, e.g. 'VO - intro'. | |
| voice | No | Voice to use. Defaults to 'rachel' (ElevenLabs). vivi/mindy/kian/sophie/magnus/nadia are Seed Audio 1.0 (multilingual, max 2048 chars). | |
| flow_id | Yes | The id of the flow to add the voiceover to. Must be owned by, or shared with edit access to, the authenticated user. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations (readOnlyHint=false) indicate mutation, and description confirms by stating it drops an audio node and swaps in finished voiceover. It adds context about the immediate return and automatic completion, which goes beyond annotations. No contradiction detected.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is informative but slightly verbose, with some redundancy (e.g., 'immediately and returns right away'). However, it is well-structured with a clear flow: purpose, behavior, cost, and usage distinction. Could trim minor redundancies.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers key aspects: what it does, how it behaves, when to use, costs, voice options. Missing explicit error handling or prerequisites (e.g., auth context), but flow_id schema description mentions edit access. Overall sufficient given no output schema and clear annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description enhances understanding by adding cost details for voices (ElevenLabs vs Seed Audio pricing) and clarifying the label's purpose with an example. This provides actionable info beyond the basic schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool converts text to speech and places it on a flow, distinguishing from sibling 'generate_speech' by saying 'Use this (not generate_speech) when working on a flow.' It specifies the resource (Avocado AI flow) and action (generate speech and add as voiceover).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises when to use this tool vs. generate_speech, explains the asynchronous behavior (drops node immediately, returns automatically), and notes there is no check_job for audio. Provides cost breakdown per character for different voice providers, giving clear guidance on usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_videoGenerate VideoAInspect
Generate an AI video. Sixteen models: seedance-2.0-t2v / -t2v-fast (text only), seedance-2.0-i2v / -i2v-fast (REQUIRE an image), seedance-2.0-ref / -ref-fast (REFERENCE-to-video: locks character/style across generations from reference images — pass reference_image_urls and/or reference_file_ids; ideal for keeping a Storyboard Studio character consistent), seedance-2.0-{t2v,i2v,ref}-mini (cheapest Seedance 2.0 tier, 480p/720p, 4-15s — same three modes, lowest cost per second), kling3-standard (720p, 5-10s), kling3-pro (1080p, 5-10s), kling3-4k & kling-o3-4k (4K, 3-15s; all four Kling 3.x variants support BOTH text-to-video and image-to-video — supplying image_url or file_id automatically picks image mode), grok-imagine-video-v1-5 (480p/720p, 1-15s, REQUIRES an image — image-to-video only), happy-horse-t2v (Happy Horse text-to-video, 720p/1080p, 3-15s, with native audio + lip-sync), happy-horse-i2v (Happy Horse image-to-video, REQUIRES an image, 720p/1080p, 3-15s), gemini-omni-flash (Google, 720p, 3-10s, 16:9/9:16 only, synchronized AUDIO always included; text-to-video AND image-to-video — supplying image_url/file_id picks image mode), gemini-omni-flash-ref (reference-to-video from 1-10 reference images via reference_image_urls/reference_file_ids, images only), gemini-omni-flash-edit (VIDEO EDIT: pass video_url of an existing video + an instruction prompt like 'Make this video anime. Keep everything else the same.'; output follows the source video; voice editing unsupported). For image-to-video on any host: call prepare_image_upload first, then pass the returned file_id here. Renders take 2-10 minutes; the inline result card polls for completion. Pricing is per-second, varies by model and resolution.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model. Defaults to 'seedance-2.0-t2v'. Use the -i2v variant, any kling3 variant, or happy-horse-i2v for image-to-video; happy-horse-t2v for Happy Horse text-to-video; or seedance-2.0-ref / -ref-fast for reference-to-video. | |
| prompt | Yes | Text description of the video. For image-to-video, describe the motion/action you want applied to the source image. For reference-to-video, describe the scene; the reference images lock the character/style. | |
| file_id | No | file_id from prepare_image_upload — preferred for chat attachments. Required for seedance-2.0-i2v / -i2v-fast. Optional for kling3-* (presence triggers image-to-video mode). Not used by reference-to-video — use reference_file_ids. | |
| duration | No | Video duration in seconds. Per-model bounds: seedance i2v/ref 4-15, seedance t2v 5-15, kling3-standard/pro 5-10, kling3-4k/o3-4k 3-15, happy-horse t2v/i2v 3-15. Defaults to 5. | |
| fast_mode | No | Legacy alias. true picks the -fast variant (t2v/i2v/ref) when no explicit model was given. Prefer setting model directly. | |
| image_url | No | HTTPS URL of the source image. Use only if you already have a public URL; otherwise call prepare_image_upload and pass file_id. Not used by reference-to-video — use reference_image_urls. | |
| video_url | No | gemini-omni-flash-edit only: HTTPS URL of the source video to edit (e.g. a previously generated video's URL). Required for that model; ignored by all others. | |
| resolution | No | Video resolution. Meaningful for seedance Standard (480p/720p/1080p/4k; Fast & Mini cap at 480p/720p — no 1080p/4k) and happy-horse (720p/1080p, default 720p). Seedance 4k is Growth/Pro only. Kling models lock resolution by variant. | |
| aspect_ratio | No | Aspect ratio. Defaults to '16:9'. Ignored for image-to-video (aspect derives from input). | |
| generate_audio | No | Generate audio (Kling 3 standard/pro only). Ignored for other models. | |
| reference_file_ids | No | Reference-to-video only. file_ids from prepare_image_upload, resolved to image references (counted alongside reference_image_urls toward the max of 9). Use for chat attachments. | |
| reference_audio_urls | No | Reference-to-video only. HTTPS URLs of reference audio (max 3). Optional. Combined references cap at 12 total assets. | |
| reference_image_urls | No | Reference-to-video only (seedance-2.0-ref / -ref-fast). HTTPS URLs of reference images that lock character/style. Storyboard Studio assets already have permanent URLs you can pass directly. At least one image reference is required; max 9. | |
| reference_video_urls | No | Reference-to-video only. HTTPS URLs of reference videos (max 3). Optional supplement to image references. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide no special hints; description adds key behavioral traits: async with 2-10 min renders, inline polling, per-second pricing, and prerequisites. Does not mention failure modes or credit consumption.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Reasonably structured with front-loaded purpose, bullet-like model list, and async note. However, somewhat verbose with repeated details about reference-to-video across sections; could be more concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers models, input requirements, async behavior, and pricing. Missing explicit return format (e.g., job ID or video URL) and error handling. For a complex tool with 14 parameters and no output schema, it is fairly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions, but the tool description adds significant context: explains model variants, interplay between parameters (e.g., model determines if image_url is needed), and default behaviors. Adds value beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool generates AI videos, lists 16 models with specific modes (text-to-video, image-to-video, reference-to-video, video edit), and distinguishes from sibling tools like generate_video_to_flow by focusing on standalone generation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on which model to use based on input type (text, image, reference, edit), prerequisites like prepare_image_upload, and model-specific bounds (duration, resolution). Lacks explicit 'when not to use' but covers context well.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_video_to_flowGenerate Video to FlowAInspect
Generate an AI video and place it directly on a user's Avocado AI flow (the Flows Director). Drops a 'Generating...' tile on the flow immediately, then swaps it for the final video when generation completes (2-10 minutes). It appears live on the open canvas and in the Director Library. Same models as generate_video_to_storyboard (seedance-2.0 t2v/i2v/ref + fast variants, kling3-standard/pro/4k, kling-o3-4k, happy-horse t2v/i2v, gemini-omni-flash — Google, 720p, 3-10s, 16:9/9:16 only, synced audio always included, does both t2v and i2v). For image-to-video: call prepare_image_upload first, then pass the returned file_id. Pricing is per-second.
| Name | Required | Description | Default |
|---|---|---|---|
| label | No | Short human label shown in the Director Library, e.g. 'Beat 3 - Into the Backrooms'. | |
| model | No | Model. Defaults to 'seedance-2.0-t2v'. Use a -i2v variant, any kling3 variant, or happy-horse-i2v for image-to-video; seedance-2.0-ref / -ref-fast for reference-to-video. | |
| prompt | Yes | Text description of the video. For image-to-video, describe the motion. For reference-to-video, describe the scene; the reference images lock the character/style. | |
| file_id | No | file_id from prepare_image_upload. Required for seedance-2.0-i2v / -i2v-fast and happy-horse-i2v. Optional for kling3-* (presence triggers image-to-video). | |
| flow_id | Yes | The id of the flow to add the video to. Must be owned by, or shared with edit access to, the authenticated user. | |
| duration | No | Duration in seconds. Per-model bounds: seedance i2v/ref 4-15, seedance t2v 5-15, kling3-standard/pro 5-10, kling3-4k/o3-4k 3-15, happy-horse 3-15. Defaults to 5. | |
| image_url | No | HTTPS URL of the source image. Prefer file_id from prepare_image_upload. | |
| resolution | No | Resolution. Meaningful for seedance (480p/720p/1080p; not 1080p on fast/ref-fast) and happy-horse (720p/1080p). Kling locks resolution by variant. | |
| beat_number | No | The beat's position in story order (1..N). Keeps clips ordered in the Library. | |
| aspect_ratio | No | Aspect ratio. Defaults to '16:9'. Ignored for image-to-video. | |
| generate_audio | No | Generate audio (Kling 3 standard/pro only). | |
| reference_file_ids | No | Reference-to-video only. file_ids from prepare_image_upload. | |
| reference_audio_urls | No | Reference-to-video only. HTTPS URLs of reference audio (max 3). | |
| reference_image_urls | No | Reference-to-video only (seedance-2.0-ref / -ref-fast). HTTPS URLs that lock character/style. At least one required; max 9. | |
| reference_video_urls | No | Reference-to-video only. HTTPS URLs of reference videos (max 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations, the description discloses the immediate 'Generating...' tile placement and swap after 2-10 minutes, model families, synced audio always included, and per-second pricing. This adds significant behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is fairly long but front-loaded with key information. It packs many details efficiently, though some redundancy (e.g., model list) could be trimmed.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 15 parameters and no output schema, the description covers the workflow thoroughly: tile behavior, models (with sibling reference), image-to-video prerequisite, duration bounds, aspect ratios, and more. The tool is well-documented.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds value by explaining model relationships, image-to-video usage, and duration bounds. It clarifies parameter interdependencies beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: generate an AI video and place it directly on a user's Avocado AI flow. It distinguishes from siblings like generate_video_to_storyboard by specifying the flow placement and immediate tile behavior.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides guidance on when to use (when adding video to a flow) and includes instructions for image-to-video (calling prepare_image_upload first). It also mentions pricing and model comparisons to siblings, but lacks explicit when-not-to-use conditions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_video_to_storyboardGenerate Video to StoryboardAInspect
Generate an AI video and place it directly on a user's Avocado AI storyboard. Drops a 'Generating...' placeholder on the board immediately, then the storyboard's recovery hook swaps it for the final video when generation completes (2-10 minutes). Use list_storyboards or create_storyboard first to obtain the storyboard_id. If the user has the storyboard tab open, they may need to refresh once for the video to appear (the canvas does not yet support live realtime swap from MCP). Twelve models supported: seedance-2.0-t2v / -t2v-fast (text only), seedance-2.0-i2v / -i2v-fast (REQUIRE an image), seedance-2.0-ref / -ref-fast (REFERENCE-to-video: locks character/style from reference images — pass reference_image_urls and/or reference_file_ids; the natural fit for keeping a storyboard's character consistent across beats), kling3-standard (720p, 5-10s), kling3-pro (1080p, 5-10s), kling3-4k & kling-o3-4k (4K, 3-15s; all four Kling 3.x variants support BOTH text-to-video and image-to-video), happy-horse-t2v (Happy Horse text-to-video, 720p/1080p, 3-15s, native audio + lip-sync), happy-horse-i2v (Happy Horse image-to-video, REQUIRES an image, 720p/1080p, 3-15s), gemini-omni-flash (Google, 720p, 3-10s, 16:9/9:16 only, synchronized audio always included; supports BOTH text-to-video and image-to-video). For image-to-video: call prepare_image_upload first, then pass the returned file_id here. Pricing is per-second, varies by model and resolution.
| Name | Required | Description | Default |
|---|---|---|---|
| label | No | Short human label shown in the Studio Library, e.g. 'Beat 3 - Into the Backrooms'. | |
| model | No | Model. Defaults to 'seedance-2.0-t2v'. Use the -i2v variant, any kling3 variant, or happy-horse-i2v for image-to-video; happy-horse-t2v for Happy Horse text-to-video; or seedance-2.0-ref / -ref-fast for reference-to-video. | |
| prompt | Yes | Text description of the video. For image-to-video, describe the motion/action you want applied to the source image. For reference-to-video, describe the scene; the reference images lock the character/style. | |
| file_id | No | file_id from prepare_image_upload — preferred for chat attachments. Required for seedance-2.0-i2v / -i2v-fast. Optional for kling3-* (presence triggers image-to-video mode). Not used by reference-to-video — use reference_file_ids. | |
| duration | No | Video duration in seconds. Per-model bounds: seedance i2v/ref 4-15, seedance t2v 5-15, kling3-standard/pro 5-10, kling3-4k/o3-4k 3-15, happy-horse t2v/i2v 3-15. Defaults to 5. | |
| image_url | No | HTTPS URL of the source image. Use only if you already have a public URL; otherwise call prepare_image_upload and pass file_id. Not used by reference-to-video — use reference_image_urls. | |
| resolution | No | Video resolution. Meaningful for seedance (480p/720p/1080p; 1080p not allowed with seedance fast/ref-fast) and happy-horse (720p/1080p, default 720p). Kling models lock resolution by variant. | |
| beat_number | No | The beat's position in story order (1..N). Keeps generated clips ordered in the Library to match the storyboard. | |
| aspect_ratio | No | Aspect ratio. Defaults to '16:9'. Also controls placeholder shape on the board. Ignored for image-to-video (aspect derives from input). | |
| storyboard_id | Yes | The id of the storyboard to add the video to. Must be owned by, or shared with edit access to, the authenticated user. | |
| generate_audio | No | Generate audio (Kling 3 standard/pro only). Ignored for other models. | |
| reference_file_ids | No | Reference-to-video only. file_ids from prepare_image_upload, resolved to image references (counted alongside reference_image_urls toward the max of 9). Use for chat attachments. | |
| reference_audio_urls | No | Reference-to-video only. HTTPS URLs of reference audio (max 3). Optional. Combined references cap at 12 total assets. | |
| reference_image_urls | No | Reference-to-video only (seedance-2.0-ref / -ref-fast). HTTPS URLs of reference images that lock character/style. Storyboard character/location assets already have permanent URLs you can pass directly. At least one image reference is required; max 9. | |
| reference_video_urls | No | Reference-to-video only. HTTPS URLs of reference videos (max 3). Optional supplement to image references. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses immediate placeholder placement, recovery hook swap, generation time (2-10 minutes), and refresh requirement. Adds significant behavioral context beyond annotations (which only indicate non-readonly, non-destructive, etc.). No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Front-loaded with main action, then organized into model details and parameter notes. Some verbosity due to complex model options, but each sentence adds value. Could be slightly more streamlined, but appropriate for the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema; description does not mention return values or error conditions. While usage context is thorough, the agent may need to know what the tool returns (e.g., job ID, status) to chain with other operations. Lacks complete information for a tool with 15 parameters and complex logic.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. Description adds context: model variants with their capabilities (e.g., image-to-video vs reference-to-video), constraints like count limits for references, and which parameters affect which models.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool generates an AI video and places it on a storyboard, with a specific verb-resource combination. It distinguishes from siblings like generate_video_to_flow by specifying the destination (storyboard vs flow) and includes details about placeholder and recovery hook.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit prerequisites (call list_storyboards or create_storyboard first), describes need to refresh if tab open, and gives model selection guidance (e.g., reference-to-video for character consistency). Lacks explicit 'when not to use' but covers key usage scenarios well.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_startedARead-onlyIdempotentInspect
Get step-by-step instructions for connecting to Avocado AI via MCP. Call this when a user wants to sign up, authenticate, or connect Avocado AI to their AI assistant (Claude, ChatGPT, Cursor, Windsurf, Claude Code, etc.).
| Name | Required | Description | Default |
|---|---|---|---|
| client | No | Which AI assistant or client the user wants to connect from. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description's 'Get step-by-step instructions' aligns with these hints but adds no new behavioral context beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, with the first stating the core purpose and the second providing usage context. It is concise with no redundant or unnecessary information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one optional parameter, no output schema), the description fully covers purpose, usage, and context. No additional details are needed for an agent to correctly invoke it.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter 'client' is fully documented in the schema with enum values and description. The tool description does not add additional meaning or behavior details for the parameter, so it meets the baseline for 100% schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool provides step-by-step instructions for connecting to Avocado AI via MCP. It specifies the exact scenarios (sign up, authenticate, connect) and lists example clients, making it distinct from sibling tools which are primarily generation and editing tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly says 'Call this when a user wants to sign up, authenticate, or connect Avocado AI to their AI assistant', giving clear when-to-use guidance. It does not list exclusions, but the context of sibling tools makes the usage obvious.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_connected_accountsARead-onlyIdempotentInspect
List the user's connected social media accounts (Instagram, TikTok, LinkedIn) so you can offer to publish content for them. Returns whether the Connectors add-on is active and the connected accounts. Read-only, no credits. Call this before offering to post, or when the user asks to publish.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnlyHint, idempotentHint, destructiveHint), the description adds 'Read-only, no credits' and specifies what is returned (add-on active status and connected accounts). No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, no wasted words. Front-loaded with the core action and platforms, then usage guidance. Perfectly concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description fully explains what is returned. For a simple listing tool with no parameters and clear annotations, this is complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters exist, so the description does not need to elaborate. However, it could mention that no input is required. Baseline 4 is appropriate as the description adds value via return info.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it lists the user's connected social media accounts (naming specific platforms: Instagram, TikTok, LinkedIn) and mentions returning the Connectors add-on status. The purpose is specific and distinct from sibling tools which are mostly generation or post-related.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly instructs to call this tool before offering to post or when the user asks to publish. Provides clear context for when to use it, making it easy for an AI agent to decide.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_flow_assetsARead-onlyIdempotentInspect
List the cast / location / beat tiles currently on a flow (the Flows Director), grouped by role, with each tile's node_id, label, url (empty + pending=true if still generating), and — for beats — beat_number in story order. Use this to look up node ids for replace_node_id (generate_image_to_flow / edit_image_to_flow), e.g. when the user asks to redo a specific tile in a fresh conversation with no memory of prior ids, or to check what's already been generated before adding more.
| Name | Required | Description | Default |
|---|---|---|---|
| flow_id | Yes | The id of the flow to inspect. Must be owned by, or shared with the authenticated user. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and destructiveHint=false. Description adds that url is empty + pending=true if still generating, and includes beat_number in story order. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is two sentences and front-loaded with core functionality. Includes a usage example and return format. Could be slightly more concise but well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, description details output fields (node_id, label, url, pending status, beat_number) and grouping. Explains purpose and context for use. Complete for a read-only inspection tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% for the single parameter (flow_id). Description repeats that flow must be owned by/shared with user, which is already in schema. No additional semantic value beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description specifies 'list the cast / location / beat tiles currently on a flow...grouped by role' with each tile's fields. It distinguishes from siblings by stating its use for looking up node IDs for replace_node_id in generate_image_to_flow/edit_image_to_flow.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly mentions when to use: to look up node ids for replace_node_id, e.g., when user asks to redo a tile in a fresh conversation, or to check what's already generated. Does not explicitly state when not to use, but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_storyboardsARead-onlyIdempotentInspect
List the user's Avocado AI storyboards. Returns owned and shared boards with id, title, last-updated time, thumbnail, and direct URL. Use this to let the user pick an existing board.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds value by specifying the scope (owned and shared) and the exact fields returned, which goes beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with the action and resource, followed by output details and usage guidance. No superfluous words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (no parameters, no output schema), the description fully covers purpose, output, and usage. Annotations handle safety. No missing elements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With zero parameters and 100% schema coverage, the baseline is 4. The description does not need to explain parameters, but it compensates by describing the output structure, aiding understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (list), the resource (storyboards), and the returned fields (id, title, last-updated time, thumbnail, URL). It distinguishes itself from sibling tools like create_storyboard, which create rather than list.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use the tool: 'Use this to let the user pick an existing board.' It does not provide explicit exclusions or alternatives, but no alternative listing tool exists among siblings, making the guidance adequate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_listARead-onlyIdempotentInspect
List all available AI image generation models on Avocado AI. Returns model slugs, display names, credit costs, and descriptions. Use this to help users pick the right model for their needs.
| Name | Required | Description | Default |
|---|---|---|---|
| category | No | Filter by media type. Currently only 'image' is supported. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint, so the behavioral burden is low. The description adds context about return content but no additional behavioral traits beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with verb and resource, no redundancy. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description explains what is returned (slugs, names, costs, descriptions) despite no output schema. It mentions the optional parameter implicitly via 'all available'. Lacks details on pagination or filtering behavior but sufficient for a simple list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage for its single parameter (category). The description does not add extra meaning beyond the schema's description, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists all available AI image generation models, specifies the return fields (slugs, names, credit costs, descriptions), and differentiates from sibling generation tools by being a list/retrieval operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides a high-level usage context ('help users pick the right model') but does not explicitly state when not to use or mention alternative tools. No sibling comparison is given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
prepare_image_uploadPrepare Image UploadAInspect
MANDATORY first step whenever the user attached an image in chat (or pointed at a local file on disk) and wants edit_image or image-to-video generation. Returns a signed PUT URL plus a file_id. After this tool: either (a) the inline upload widget will let the user drop the file and auto-continue (Claude.ai web), or (b) you run a curl PUT yourself if you have shell access (Claude Desktop / Claude Code) — the response text contains a ready-to-run curl command. Then call edit_image or generate_video with file_id=. edit_image and generate_video do NOT accept base64 — calling them with raw image bytes WILL fail. This tool is the only working path for chat attachments. Set purpose to 'edit' or 'video' so the upload widget points the user at the right downstream tool.
| Name | Required | Description | Default |
|---|---|---|---|
| purpose | No | What the user wants done with the uploaded image. 'edit' (default) for edit_image. 'video' for generate_video image-to-video. The upload widget uses this to nudge you toward the right downstream tool after upload. | |
| mime_type | No | MIME type of the image the user will upload. Defaults to image/png. Accepts png, jpeg, webp. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description thoroughly discloses the tool's behavior: it returns a signed URL and file_id, requires subsequent steps (widget upload or curl command), and warns that downstream tools fail with raw bytes. Annotations are minimal (no read-only, destructive hints), so the description carries the full burden and does it well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is comprehensive but slightly lengthy; however, every sentence adds critical information for the workflow. It is front-loaded with the mandatory nature and clearly orders steps. Could be slightly more concise, but the information density justifies the length.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (preparation step without output schema), the description is complete: it covers the return values, post-usage steps, how to handle the upload in different environments, and prerequisites. No gaps in context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema already has clear descriptions for 'purpose' and 'mime_type' with enums and defaults. The description adds value by explaining why each parameter matters (purpose controls downstream tool nudging) and emphasizing defaults. Schema coverage is 100%, so the description enhances rather than replaces schema info.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool is the mandatory first step for uploading images for editing or video generation, distinguishing it from sibling tools like edit_image and generate_video. It specifies that it returns a signed PUT URL and file_id, providing a clear purpose.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says this is mandatory when the user attaches an image and wants to use edit_image or generate_video. It explains that downstream tools do not accept base64 and will fail if called directly, providing clear guidance on when to use this tool and why alternatives won't work.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
publish_postAInspect
Publish a post immediately to the user's connected social accounts (Instagram, TikTok, LinkedIn). ALWAYS confirm with the user first (target platforms + caption). Instagram and TikTok require at least one image or video in media_urls; LinkedIn allows text only. Pass generated asset URLs (Supabase-hosted) as media_urls. If the user isn't subscribed or hasn't linked the platform, the result will instruct you to show a connect card, do not retry in that case.
| Name | Required | Description | Default |
|---|---|---|---|
| content | Yes | The post caption / text. Keep it platform-appropriate. | |
| platforms | Yes | Which connected platforms to post to. Use list_connected_accounts to see what's linked. | |
| media_urls | No | Public media URLs to attach (the asset URLs you generated). Required for Instagram/TikTok. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds behavioral context beyond annotations: it requires user confirmation, handles subscription/link errors gracefully by instructing to show a connect card, and specifies that retries should not be attempted in that case. Annotations already indicate mutation (readOnlyHint=false), so this complements well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with five sentences, front-loaded with the main action, then specific requirements and error handling. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers the tool's purpose, confirmation step, platform requirements, error scenario for subscription/link issues, and reference to list_connected_accounts. While it doesn't specify the success response format, the context is sufficient for an agent to use the tool effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with parameter descriptions, but the description adds critical context: media_urls are required for Instagram/TikTok, should be generated asset URLs (Supabase-hosted), and platforms are the connected accounts. This enriches the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool publishes a post immediately to connected social accounts (Instagram, TikTok, LinkedIn). It distinguishes from sibling tools like schedule_post by emphasizing immediate publishing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit instructions: always confirm with the user (target platforms + caption), notes platform-specific requirements (media for Instagram/TikTok), and advises not to retry if the user isn't subscribed or hasn't linked the platform, directing to show a connect card.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
research_businessARead-onlyIdempotentInspect
Look a person or business up on the public web to find their company name and official website. Use this during onboarding, AFTER the user agrees to be looked up, passing their name plus any hint (business, role, location). Read-only, no credits. Returns candidate web results: pick the most likely OFFICIAL site, then confirm with the user ("Looks like you're X at domain.com, is that right?") before calling extract_brand on it. If nothing clearly matches, ask the user for their website instead.
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | Who or what to look up, e.g. 'Jane Doe founder of Acme' or a business name plus a hint. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, openWorldHint, idempotentHint, destructiveHint. The description adds 'Read-only, no credits' and explains the expected behavior after results (confirmation step, interaction with user, calling sibling tool). No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single paragraph but each sentence adds value. It is front-loaded with the main action and provides sequential steps. Could be slightly more concise but still effective.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description fully explains what the tool returns (candidate web results) and what the agent should do next (pick official site, confirm, call extract_brand). This is complete for the tool's purpose.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with a clear description for 'query'. The description adds examples and guidance on how to formulate the query (use hints). While baseline is 3, the added context justifies a 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool looks up a person or business on the public web to find their company name and official website. It specifies the context (onboarding) and the verb 'Look up' is specific. It distinguishes from sibling 'extract_brand' by explaining the workflow.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use (during onboarding, after user agrees) and provides clear instructions: pass name plus hint, pick official site, confirm with user, then call extract_brand. Also tells what to do if no match (ask user). No ambiguity.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
schedule_postAInspect
Schedule a post for a FUTURE time on the user's connected social accounts (Instagram, TikTok, LinkedIn). ALWAYS confirm with the user first (platforms, caption, and the exact time). Instagram and TikTok require at least one image or video in media_urls. Provide scheduled_for as an ISO 8601 timestamp; include the user's timezone if you know it. If the user isn't subscribed or hasn't linked the platform, the result will instruct you to show a connect card.
| Name | Required | Description | Default |
|---|---|---|---|
| content | Yes | The post caption / text. | |
| timezone | No | IANA timezone for the scheduled time, e.g. 'America/New_York'. Optional. | |
| platforms | Yes | Which connected platforms to post to. | |
| media_urls | No | Public media URLs to attach. Required for Instagram/TikTok. | |
| scheduled_for | Yes | When to publish, as an ISO 8601 timestamp (e.g. 2026-06-20T14:00:00). Must be in the future. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations show this is a mutating operation (readOnlyHint=false) and not destructive. The description adds context about confirmation requirement, future-only scheduling, and the need for media on certain platforms. It also notes subscription/linking conditions, which is valuable beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single paragraph of ~80 words, front-loaded with the primary purpose. Every sentence adds necessary information without redundancy or filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema and 5 parameters, the description covers key behavioral aspects: confirmation, media requirements, timestamp format, and error handling for subscription/linking. Could mention return value or success confirmation, but overall sufficient for agent understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds value by reinforcing required parameters and providing practical guidance (e.g., 'provide scheduled_for as an ISO 8601 timestamp; include the user's timezone if you know it'). This extra context justifies a score above baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the action ('Schedule a post for a FUTURE time') and the target resources ('user's connected social accounts (Instagram, TikTok, LinkedIn)'). It clearly distinguishes from the sibling 'publish_post' which implies immediate posting.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear guidance: it says 'ALWAYS confirm with the user first' and specifies that Instagram and TikTok require media. It implies when to use this tool (for future scheduling vs immediate publishing). No explicit when-not-to-use, but the context is well established.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!