Avocado AI
Server Details
Create ads inside any AI assistant with Avocado, create, edit and make AI UGC in chat.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.4/5 across 17 of 17 tools scored. Lowest: 3.8/5.
Each tool has a clearly distinct purpose: generation (image, video, music, sfx, speech) is separated by media type, editing and storyboard/flow creation are separate, and helper tools (upload, job polling, info) are well-defined. No two tools appear to do the same thing.
Tools follow a consistent verb_noun pattern in snake_case, e.g., generate_image, list_storyboards, prepare_image_upload. Minor deviations like 'models_list' (noun_verb) and 'get_started' (phrasal verb) do not significantly hinder readability.
With 17 tools, the server covers all major aspects of AI media generation, editing, and storyboard/flow management without being overwhelming. Each tool earns its place for a comprehensive creative workflow.
Core generation and editing are covered, but there are notable gaps: no tools to delete or update generated media, storyboards, or flows, and no gallery or asset management for previous creations. These missing operations could hinder some workflows.
Available Tools
17 toolsaccount_check_creditsARead-onlyIdempotentInspect
Check your Avocado AI credit balance. Returns available credits, membership tier, and what you can generate with your current balance.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds value by detailing the return data (credits, tier, generation capability), which is not covered by annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that conveys all necessary information without extraneous words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description covers the return values adequately. For a simple read tool with no inputs, this is complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With zero parameters and 100% schema coverage, the description does not need to explain parameters. It provides no further detail but also has no gaps.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it checks credit balance and specifies what it returns (credits, membership tier, what can be generated). It is distinct from sibling tools like generate_image, which consume credits.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use the tool (to check credit balance) but does not explicitly state when not to use it or provide alternatives. It implies usage before generation but lacks direct guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
check_jobARead-onlyIdempotentInspect
Always call this tool after generate_image, edit_image, or generate_video to retrieve the result. Pass the jobId returned by the generation tool. Returns status (queued, processing, completed, failed), result URLs when ready, and error details on failure. When an image job is completed, the resulting image(s) are returned as inline image content blocks so they render directly in chat alongside the JSON metadata. If status is queued or processing, wait 5 to 10 seconds and call again; image jobs typically finish in 10 to 60 seconds, video jobs in 2 to 10 minutes.
| Name | Required | Description | Default |
|---|---|---|---|
| jobId | Yes | The jobId returned by a generation tool. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds significant behavioral detail beyond the annotations: it specifies return values (status, URLs, error details, inline image blocks), polling guidance, and typical durations. No contradiction with annotations (readOnly, idempotent, non-destructive).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a concise paragraph that front-loads the critical usage instruction. It could be structured with bullet points for improved readability, but it efficiently conveys all necessary information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity and lack of output schema, the description fully explains the return structure, behavior (polling), and typical timings. It covers all essential information for an agent to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter jobId is fully described in the schema. The description repeats the same information without adding new semantics. With 100% schema coverage, baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool checks the status of generation jobs and retrieves results. It explicitly mentions the generation tools it follows (generate_image, edit_image, generate_video) and what it returns, distinguishing it from siblings that create jobs.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear when-to-use instructions ('always call this tool after...'), what to pass (jobId), and retry behavior with time estimates. It doesn't explicitly list when not to use, but the context is sufficiently clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_flowAInspect
Create a new Avocado AI Flow pre-built with a node-graph pipeline, and return
its id and direct URL so the user can open it on the canvas. You design the
whole pipeline: pass the nodes and edges and the server validates socket
compatibility, aligns video models to the input shape, lays the graph out
left-to-right, and adds a caption per step. Edges reference nodes by 0-based
index in the nodes array. This creates (does not run) the flow — the user
runs it from the editor.
Use the capability map below to choose node types, models, and handles:
You are Avo, a senior creative-workflow designer inside Avocado AI's Flow editor. The user describes a creative goal; you respond with a node-graph proposal that the editor previews on the canvas. Think like a production director: design the FULL pipeline needed to get a polished result, not the minimum number of nodes.
DESIGN PRINCIPLES — build capable, complete pipelines:
Match the pipeline's ambition to the request. A throwaway test is 2-3 nodes; a real deliverable (an ad, a UGC video, a product shot, a music video) is usually 5-12 nodes. Use up to 24 when it genuinely helps.
Prefer multi-stage quality: generate → refine (imageEditor) → upscale → animate, rather than a single generate node. Add an upscale step before any final image/video deliverable.
Use BRANCHING and FAN-OUT. One output can feed many nodes: e.g. one hero image → three different video models for variations the user can pick from; one script → both a voiceover and the video prompt.
Use PARALLEL TRACKS that converge: e.g. a voice track and an image track both feeding a lip-sync video; or a music track plus a visuals track.
Use the
llmnode to do creative thinking inside the graph — write or expand a script, brainstorm a prompt, turn a rough idea into a detailed image/video prompt — then wire its text output into the next node.Pick the BEST model for each step (see the menus below). Don't leave everything on defaults — choosing models is a big part of the value.
Set per-node settings (aspect ratio, resolution, duration, voice, variations) when the request implies them (e.g. 'vertical' → 9:16, 'short' → duration 5, '3 options' → variations 3 or three branches).
HARD RULES:
Use only the node types listed below. Never invent new ones.
Every edge must connect compatible socket types (text→text, image→image, audio→audio, video→video).
Give every runnable node a short
stepLabel('Step N — …') — it renders as a caption beneath that node.stickyNoteis only for standalone notes; never use it to caption a node (usestepLabel). Optionally add ONE stickyNote describing the workflow.Any schema field you don't need must be
null(numbers likevariationstoo).
MODEL MENUS (set the node's model to one of these ids):
image (text-to-image) — model ids:
• fal-ai/nano-banana-2 — fast, strong all-rounder (default)
• fal-ai/gpt-image-2 — best instruction-following & legible text
• fal-ai/bytedance/seedream/v5/lite/text-to-image — photoreal
• fal-ai/flux-pro/v1.1-ultra — high detail / fidelity
• fal-ai/nano-banana-pro — premium quality
• fal-ai/recraft/v4/text-to-image — design, brand, vector-style
• fal-ai/ideogram/v3 — posters & typography
imageEditor (image + prompt → edited image) — model ids:
• fal-ai/nano-banana-2/edit — default, multi-image (up to 14 inputs)
• openai/gpt-image-2/edit — precise instruction edits
• fal-ai/bytedance/seedream/v5/lite/edit — photoreal edits
• fal-ai/flux-pro/kontext/max/text-to-image — style / context transfer
• fal-ai/gemini-25-flash-image/edit — fast edits
(the image input accepts MULTIPLE connections for compositing/restyle)
imageUpscale (image → larger image) — model ids:
• fal-ai/topaz/upscale/image — best quality (default)
• fal-ai/recraft-crisp-upscale, fal-ai/clarity-upscaler,
fal-ai/crystal-upscaler
llm (text → text) — model ids: claude-haiku (default), gpt-4o-mini,
kimi-k2, seed-1.8. Put the instruction in prompt.
voice (text → speech) — pick a voice by name: Sarah (cheerful), Roger
(deep), Laura (soft), Charlie (warm), George (bold), Callum (energetic),
River (calm), Liam (reliable). The script comes from an upstream text/llm
node wired into in — do NOT put the script in the voice node's prompt.
music (text → music) — set duration to one of 30,60,90,120,180,240,300
(seconds). Put the music description in prompt.
videoUpscale (video → sharper video) — add after a video node for final deliverables. No model field.
VIDEO node — choose model to match the input shape (it drives which input
handles the node renders):
• Text → video: kling3-pro, sora-2, veo3-1-fast, seedance-2.0-t2v.
Wire text to prompt.
• Image → video (I2V): veo3-1-fast, kling3-pro, seedance-2.0-i2v,
hailuo-pro. Wire the image to image. For keyframe models
(kling-o1, veo3-1) wire start-frame + end-frame.
• Lip-sync / talking-head: fabric (image + audio, NO prompt — never wire
text into Fabric) or infinitalk (prompt + image + audio). Wire audio
to audio. Audio-over-stills narration: ltx2-audio.
• Multi-image reference / character consistency: vidu (≤7),
veo3-1-ref (≤10), kling-elements (2-4 ordered frames),
happy-horse-ref (≤9). Wire EACH image to the SAME ref-images handle
(it accepts multiple connections). Never use the plain image handle.
• Seedance reference (image + video + audio refs): seedance-2.0-ref /
seedance-2.0-ref-fast. Wire to ref-images / ref-videos / ref-audio.
• Motion control (drive a character with a motion video):
kling3-motion-control. Wire character to image, motion clip
(videoUpload) to motion-video.
Edge handle hints:
When the target has multiple typed inputs (Video, Image Editor), set
toHandleexplicitly (prompt,image,audio,ref-images,start-frame,end-frame,motion-video). The editor otherwise picks the first type-compatible handle, which may be the wrong slot.Never wire text into Fabric. Never wire a single image into a multi-ref model's
imageslot — useref-images.
Available node types (id — purpose — inputs / outputs):
text — Prompt — in: in | out: out
llm — LLM — in: in | out: out
upload — Upload — in: — | out: out
videoUpload — Video Upload — in: — | out: out
image — Image — in: in | out: out
imageEditor — Image Editor — in: prompt, image | out: out
imageUpscale — Image Upscale — in: image | out: out
video — Video — in: prompt, image, start-frame, end-frame, ref-images, ref-videos, ref-audio, audio, motion-video | out: out
videoUpscale — Video Upscale — in: video | out: out
voice — Voice — in: in | out: out
music — Music — in: in | out: out
stickyNote — Sticky Note — in: in | out: out
Edges reference nodes by index in the nodes array (0-based). In the
examples below, any field not shown is null.
EXAMPLES — study the PATTERNS (multi-stage, fan-out, parallel tracks), copy the handle names exactly:
Example 1 — UGC talking-head with scripted voice + final upscale: nodes=[ {type:"llm",stepLabel:"Step 1 — Write a punchy 15s script",prompt:"Write a 15-second energetic UGC script for the product.",model:"claude-haiku"}, {type:"voice",stepLabel:"Step 2 — Voiceover",voice:"George"}, {type:"upload",stepLabel:"Step 3 — Upload character photo"}, {type:"video",stepLabel:"Step 4 — Lip-sync video",model:"fabric"}, {type:"videoUpscale",stepLabel:"Step 5 — Upscale to deliver"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"audio"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"video"} ]
Example 2 — Text → image → refine → upscale (quality chain): nodes=[ {type:"text",stepLabel:"Step 1 — Prompt",prompt:"A cinematic product shot of a matte-black bottle on wet stone, golden hour"}, {type:"image",stepLabel:"Step 2 — Generate hero",model:"fal-ai/flux-pro/v1.1-ultra",aspectRatio:"4:3"}, {type:"imageEditor",stepLabel:"Step 3 — Add brand label",prompt:"Add a minimal embossed logo on the bottle",model:"fal-ai/nano-banana-2/edit"}, {type:"imageUpscale",stepLabel:"Step 4 — Upscale",model:"fal-ai/topaz/upscale/image"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"} ]
Example 3 — Fan-out: one image → three video variations (different models): nodes=[ {type:"upload",stepLabel:"Step 1 — Source image"}, {type:"text",stepLabel:"Step 2 — Motion brief",prompt:"Slow cinematic push-in, gentle parallax"}, {type:"video",stepLabel:"Variation A — Veo",model:"veo3-1-fast",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation B — Kling",model:"kling3-pro",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation C — Seedance",model:"seedance-2.0-i2v",aspectRatio:"9:16",duration:"5"} ] edges=[ {fromIndex:0,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"image"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"prompt"} ]
Example 4 — Multi-image reference video (character consistency): nodes=[ {type:"upload",stepLabel:"Ref 1 — Character front"}, {type:"upload",stepLabel:"Ref 2 — Character side"}, {type:"upload",stepLabel:"Ref 3 — Outfit detail"}, {type:"text",stepLabel:"Scene prompt",prompt:"The character walks through a neon market at night"}, {type:"video",stepLabel:"Generate with refs",model:"veo3-1-ref",aspectRatio:"16:9"} ] edges=[ {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:2,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"prompt"} ]
Example 5 — Music video: parallel music + visuals tracks converging: nodes=[ {type:"music",stepLabel:"Track 1 — Score",prompt:"Dreamy lo-fi beat, 90 BPM",duration:"60"}, {type:"text",stepLabel:"Track 2 — Scene",prompt:"A lone astronaut drifting past a glowing planet"}, {type:"image",stepLabel:"Keyframe",model:"fal-ai/nano-banana-pro",aspectRatio:"16:9"}, {type:"video",stepLabel:"Animate",model:"ltx2-audio",aspectRatio:"16:9"} ] edges=[ {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"in"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"audio"} ]
Return only the structured object — no prose, no markdown.
| Name | Required | Description | Default |
|---|---|---|---|
| edges | No | Connections between nodes (by index). Omit for a single-node flow. | |
| nodes | Yes | The pipeline's nodes (1-24). | |
| title | No | Flow title. Defaults to 'Untitled Flow'. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations are minimal (readOnlyHint=false, destructiveHint=false). The description adds critical behavioral context: it creates but does not run flows, validates sockets, aligns models, and returns id/URL. This far exceeds what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is long but well-organized with sections: purpose, design principles, hard rules, model menus, node types, edge hints, and examples. Every sentence adds necessary information; no wasted text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (numerous node types, models, edge rules, and constraints), the description is fully complete. It includes five detailed examples covering common patterns, and explains return values despite no output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% but the description adds enormous value: valid model ids, voice names, node type purposes, edge handle hints, and example parameter values. It explains the structure of nodes and edges in detail, making the schema meaningful.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool creates a flow with a node-graph pipeline and returns its id and URL. It distinguishes itself from siblings (single media tools) by specifying it designs full pipelines.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides extensive usage guidance: when to use (creative goals), design principles (multi-stage, branching, parallel tracks), hard rules (node types, edge compatibility, step labels), and explicit contrast with running flows. Examples further clarify usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_storyboardAInspect
Create a new empty Avocado AI storyboard for the user. Returns the new board's id and direct URL so the user can open it.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Title for the new storyboard. Defaults to 'Untitled Storyboard'. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations are minimal (readOnlyHint: false, destructiveHint: false), so the description adds valuable context: it creates an empty board and returns the id and URL. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences that are front-loaded with the key purpose and return information. No unnecessary words or details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple creation tool with 1 optional parameter and no output schema, the description adequately explains the action and return values (id and URL). Could mention error handling or authorization but not essential.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage with a description for the only parameter (title). The description does not add extra meaning beyond what the schema already provides, meeting the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Create', the resource 'new empty Avocado AI storyboard', and the return values (board id and direct URL). It effectively distinguishes from sibling tools like 'list_storyboards' and other creation tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for creating a storyboard but does not explicitly guide when to use it versus alternatives like 'generate_image_to_storyboard' or 'generate_video_to_storyboard'. No exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
describe_avocadoARead-onlyIdempotentInspect
Describe what Avocado AI is and what it can do. Call this when a user asks about Avocado AI, wants to know what AI media tools are available, or is deciding whether to sign up. Returns capabilities, supported models, use cases, pricing overview, and how to connect.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds value by listing what the tool returns (capabilities, supported models, pricing, etc.), providing context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with three sentences. It front-loads the purpose and adds usage scenarios efficiently. Could be slightly more streamlined, but overall well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (no parameters, no output schema), the description covers all necessary context: purpose, when to use, and what information is returned. Annotations provide safety cues. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has zero parameters with 100% schema coverage, so baseline is 4. The description does not need to add parameter information, and it correctly focuses on the tool's function.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: describing Avocado AI, its capabilities, and offerings. It specifies the verb 'Describe' and the resource 'Avocado AI', and distinguishes from siblings by being the only tool for general platform info.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly tells when to use this tool (when user asks about Avocado AI, wants available tools, or is deciding to sign up). It doesn't specify when not to use, but for a non-ambiguous tool this is sufficient.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
edit_imageEdit ImageAInspect
Modify an existing image. REQUIRED input: exactly one of file_id OR image_url. base64 is NOT accepted — do not try to pass image bytes as a tool argument, the call will be rejected. For chat-attached images you MUST first call prepare_image_upload to get a signed PUT URL, upload the bytes there (via the inline widget on Claude.ai, or via curl on Claude Desktop / Claude Code), then call this tool with the returned file_id. For URLs the user has pasted, use image_url directly. Returns a jobId immediately; call check_job with the jobId to retrieve the edited image inline. Models (both 1 credit/image): 'nano-banana-2' (fast, default) and 'gpt-image-2' (higher quality).
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Edit model. 'nano-banana-2' is fast and cheap (default). 'gpt-image-2' is higher quality but costs more credits. | |
| prompt | Yes | What to change about the image. Be specific. Example: 'Replace the background with a sunset beach' or 'Add reading glasses to the person'. | |
| file_id | No | file_id returned by prepare_image_upload after the image was uploaded to the signed URL. This is the ONLY supported path for chat-attached images. Format: 'mcp-source/{userId}/{uuid}.{ext}'. | |
| quality | No | Quality tier. Only applies to 'gpt-image-2'. low=1 credit, medium=1-2, high=4-6 credits per image (varies by aspect). Defaults to 'high'. Ignored by 'nano-banana-2'. | |
| image_url | No | HTTPS URL of the image to edit. Use only when the user pasted a public URL. Otherwise call prepare_image_upload first. | |
| num_images | No | Number of edited variants to produce (1-4). Defaults to 1. | |
| aspect_ratio | No | Output aspect ratio. Omit to keep the source image's shape (best for retouching an existing photo). Set it when composing a new layout around the input — e.g. '9:16' or '3:4' for a vertical poster, '2:3' for a tall portrait. For 4:5-style feed posts use '3:4' (closest supported portrait). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations are minimal (no safety hints), but the description fully compensates by disclosing the async job pattern, the fact that it mutates an existing image, and credit costs. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single well-structured paragraph, front-loaded with purpose, then input requirements, async behavior, and model options. Every sentence is necessary and no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With 7 parameters, 1 required, no output schema, and no nested objects, the description covers all necessary context: input selection, async return, model differentiation, and quality behavior. The agent can invoke correctly without gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds crucial semantics: mutual exclusivity of file_id and image_url (not in schema), credit costs per model, and quality tier behavior. This justifies a 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description starts with 'Modify an existing image,' which clearly states the verb and resource. It distinguishes from siblings like generate_image and prepare_image_upload by specifying input sources and workflow.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit when-to-use guidance: exactly one of file_id or image_url, no base64, steps for chat-attached vs URL images, and the async workflow with check_job. It also explains model selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_imageGenerate ImageAInspect
Generate an AI image using Avocado AI. Returns a jobId immediately; image generation completes in 10-60 seconds. After calling, use the check_job tool with the returned jobId to retrieve the result, once complete, check_job returns the image inline so it renders directly in chat. Run models_list to see available models. Costs 1-6 credits per image depending on model and quality.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model slug from models_list. Defaults to 'gpt-image-2'. | |
| prompt | Yes | Text description of the image to generate. Be descriptive for best results. | |
| quality | No | Quality tier. Only applies to 'gpt-image-2'. low=1 credit, medium=1-2 credits, high=4-6 credits per image (varies by aspect). Ignored by other models. Defaults to 'high'. | |
| num_images | No | Number of images to generate (1-4). Defaults to 1. | |
| aspect_ratio | No | Image aspect ratio. Defaults to '1:1'. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses async behavior, typical completion time (10-60 seconds), and credit cost (1-6 per image). It also explains that the image is rendered inline via check_job. Annotations are all false, so no contradiction, and the description adds valuable context beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single paragraph with clear front-loading: first sentence states purpose, then async flow, follow-up tool, model listing, and cost. Every sentence adds value without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's async nature, multiple parameters, and cost model, the description covers the essential workflow and constraints. It explains what to do with the jobId and mentions inline rendering. Minor omission: no mention of rate limits or failure scenarios, but overall complete for typical use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, providing baseline 3. The description adds value by explaining credit costs for quality tiers, clarifying that quality only applies to 'gpt-image-2', and recommending descriptive prompts. This goes beyond the schema's descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool generates an AI image using Avocado AI, and explains the async behavior (returns jobId, completes in 10-60 seconds). It distinguishes itself from sibling tools like check_job (for retrieval) and edit_image (for editing).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly advises using check_job to retrieve the result and running models_list to see available models. It also mentions cost and time. However, it does not explicitly state when not to use this tool versus alternatives, though sibling tools provide context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_image_to_storyboardGenerate Image to StoryboardAInspect
Generate an AI image and place it directly on a user's Avocado AI storyboard. Drops 'Generating...' placeholder(s) on the board immediately, then the webhook swaps each placeholder for the final image when generation completes (10-60s). Use list_storyboards or create_storyboard first to obtain the storyboard_id. If the user has the storyboard tab open, they may need to refresh once for the image to appear (the canvas does not yet support live realtime updates from MCP). Costs match generate_image (1-6 credits per image depending on model and quality).
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model slug from models_list. Defaults to 'gpt-image-2'. | |
| prompt | Yes | Text description of the image to generate. | |
| quality | No | Quality tier ('gpt-image-2' only). Defaults to 'high'. | |
| num_images | No | Number of images to generate (1-4). Defaults to 1. One placeholder per image. | |
| aspect_ratio | No | Image aspect ratio. Defaults to '1:1'. Also controls placeholder shape on the board. | |
| storyboard_id | Yes | The id of the storyboard to add the image to. Must be owned by, or shared with edit access to, the authenticated user. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses async behavior with placeholder drops and webhook swaps, cost implications (1-6 credits), and the need for a page refresh. This adds context beyond annotations (readOnlyHint=false, destructiveHint=false), with no contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
A single paragraph that front-loads the main purpose, then explains mechanism, prerequisites, user experience, and costs. Each sentence adds value, though slightly dense; could be split for readability.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers key aspects: behavior, prerequisites, UI impact, and cost. However, lacks output/return value description and error handling, which would be helpful given no output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions. Description adds meaning to num_images (one placeholder per image) and aspect_ratio (controls placeholder shape), enhancing understanding beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool generates an AI image and places it on a storyboard, distinguishing it from sibling tools like generate_image (no storyboard placement) and generate_video_to_storyboard (video version).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises to use list_storyboards or create_storyboard first to obtain the storyboard_id. Implicitly differentiates from generate_image for standalone generation, but does not explicitly state when not to use this tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_musicAInspect
Generate AI music using Avocado AI. Create original music tracks from text prompts describing genre, mood, tempo, and instruments. Tracks can be 30 seconds to 5 minutes. Costs 4 credits per 30-second block. The track is saved to your Music Studio at https://www.avocadoai.co/music-studio.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Title for the music track. | |
| prompt | Yes | Description of the music to generate. Include genre, mood, tempo, instruments, and style. Example: 'Upbeat electronic dance music with synth leads, punchy drums, 128 BPM, energetic and euphoric mood' | |
| duration_seconds | No | Duration in seconds (30-300). Defaults to 30. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses credit consumption and persistent storage behavior (saved to Music Studio), adding value beyond the annotations which only indicate non-destructive/non-readOnly. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences with no redundancy. The first sentence immediately clarifies the tool's purpose, and subsequent sentences add necessary constraints without verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description adequately explains the outcome (track saved to Music Studio) and the cost model, making the tool's behavior fully understandable for invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, and the description does not add significant new meaning beyond the parameter descriptions already present in the schema. Example provided in schema, so baseline score is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states 'Generate AI music' and details the creation of original tracks from text prompts, clearly distinguishing it from sibling tools like generate_sfx or generate_speech.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
It provides concrete constraints on duration (30s-5min) and cost (4 credits per 30s block), along with the output destination (Music Studio URL). While it doesn't explicitly state when not to use, the context is sufficient for decision-making.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_sfxAInspect
Generate AI sound effects using Avocado AI. Create short sound effects from text prompts describing the sound. Effects can be 1 to 22 seconds. Costs 1 credit per 5-second block. The effect is saved to your Music Studio at https://www.avocadoai.co/music-studio.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Title for the sound effect. | |
| prompt | Yes | Description of the sound effect to generate. Example: 'Glass shattering on a tile floor with sharp reverberation' or 'Heavy footsteps on wet concrete in a dark alley' | |
| duration_seconds | No | Duration in seconds (1-22). Defaults to 5. | |
| prompt_influence | No | How closely to follow the prompt (0-1). Higher = more literal, lower = more creative. Defaults to 0.35. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (all false), the description adds important behavioral details: credit cost per 5-second block, duration limits, and where the effect is saved (Music Studio URL).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with the main purpose, then constraints and cost. No fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, parameters, cost, and destination. However, with no output schema and sibling tools including check_job, it likely returns a job ID asynchronously, but the description omits this, which is a significant gap for an agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%; description adds context about purpose of duration (1-22 seconds) and prompt influence (default 0.35) via examples, plus cost and saving location, enhancing interpretation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it generates AI sound effects from text prompts, distinguishing from sibling tools like generate_music or generate_speech by specifying short sound effects and saving to Music Studio.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for short sound effects (1-22 seconds) and mentions credit cost, but does not explicitly say when to use this tool versus alternatives or provide exclusion criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_speechAInspect
Convert text to natural-sounding speech using Avocado AI. Supports multiple voices and languages. Costs 3 credits per 1000 characters. Audio will appear in your Avocado AI workspace.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The text to convert to speech. | |
| voice | No | Voice to use. Defaults to 'rachel'. Options: rachel (female, calm), adam (male, deep), josh (male, young), bella (female, soft), sam (male, raspy). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations, the description adds cost per character and where the audio appears, which are useful behavioral details not present in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with core purpose, then cost and result location. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple input schema and no output schema, the description covers the major aspects: what it does, cost, and where the result appears. Missing error or speed details but adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the baseline is 3. The description adds cost context but does not add new parameter-level meaning beyond the schema's descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it converts text to speech using Avocado AI, mentions multiple voices and languages, and distinguishes from siblings like generate_music or generate_sfx.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies when to use (text-to-speech) but does not explicitly provide when-not or alternatives, leaving the agent to infer based on sibling tool names.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_videoGenerate VideoAInspect
Generate an AI video. Nine models: seedance-2.0-t2v / -t2v-fast (text only), seedance-2.0-i2v / -i2v-fast (REQUIRE an image), kling3-standard (720p, 5-10s), kling3-pro (1080p, 5-10s), kling3-4k & kling-o3-4k (4K, 3-15s; all four Kling 3.x variants support BOTH text-to-video and image-to-video — supplying image_url or file_id automatically picks image mode), grok-imagine-video-v1-5 (480p/720p, 1-15s, REQUIRES an image — image-to-video only). For image-to-video on any host: call prepare_image_upload first, then pass the returned file_id here. Renders take 2-10 minutes; the inline result card polls for completion. Pricing is per-second, varies by model and resolution.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model. Defaults to 'seedance-2.0-t2v'. Use the -i2v variant or any kling3 variant for image-to-video. | |
| prompt | Yes | Text description of the video. For image-to-video, describe the motion/action you want applied to the source image. | |
| file_id | No | file_id from prepare_image_upload — preferred for chat attachments. Required for seedance-2.0-i2v / -i2v-fast. Optional for kling3-* (presence triggers image-to-video mode). | |
| duration | No | Video duration in seconds. Per-model bounds: seedance i2v 4-15, seedance t2v 5-15, kling3-standard/pro 5-10, kling3-4k/o3-4k 3-15. Defaults to 5. | |
| fast_mode | No | Legacy alias. true picks seedance-2.0-t2v-fast or seedance-2.0-i2v-fast when no explicit model was given. Prefer setting model directly. | |
| image_url | No | HTTPS URL of the source image. Use only if you already have a public URL; otherwise call prepare_image_upload and pass file_id. | |
| resolution | No | Video resolution. Only meaningful for seedance (480p/720p/1080p; 1080p not allowed with seedance fast). Kling models lock resolution by variant. | |
| aspect_ratio | No | Aspect ratio. Defaults to '16:9'. Ignored for image-to-video (aspect derives from input). | |
| generate_audio | No | Generate audio (Kling 3 standard/pro only). Ignored for other models. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond neutral annotations, the description discloses render time (2-10 minutes), polling mechanism ('inline result card polls for completion'), pricing (per-second, model-dependent), and prerequisite workflow (upload image first). This sets accurate expectations for the agent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is dense but well-structured: starts with purpose, lists models with details, then prerequisites, timing, pricing. While long, it front-loads key information and avoids redundancy. Could be slightly tighter but is effective.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 9 parameters, 100% schema coverage, and no output schema, the description covers all essential aspects: model selection, parameter usage, prerequisites, timing, pricing, and polling behavior. It leaves no significant gaps for an agent to misuse the tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, baseline is 3. The description adds extra meaning: grouping models by capability, explaining fast_mode as legacy, clarifying resolution relevance per model, and noting aspect_ratio and generate_audio scope. This enhances understanding beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Generate an AI video' and lists specific models with their modes (text-to-video vs image-to-video). It distinguishes this from sibling tools like generate_image or generate_speech by focusing solely on video generation, making purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides detailed guidance on when to use each model (e.g., seedance requires image, Kling variants support both), prerequisites (prepare_image_upload for images), and constraints (duration bounds, resolution limits). It lacks an explicit comparison to siblings, but the purpose itself naturally guides selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_video_to_storyboardGenerate Video to StoryboardAInspect
Generate an AI video and place it directly on a user's Avocado AI storyboard. Drops a 'Generating...' placeholder on the board immediately, then the storyboard's recovery hook swaps it for the final video when generation completes (2-10 minutes). Use list_storyboards or create_storyboard first to obtain the storyboard_id. If the user has the storyboard tab open, they may need to refresh once for the video to appear (the canvas does not yet support live realtime swap from MCP). Eight models supported: seedance-2.0-t2v / -t2v-fast (text only), seedance-2.0-i2v / -i2v-fast (REQUIRE an image), kling3-standard (720p, 5-10s), kling3-pro (1080p, 5-10s), kling3-4k & kling-o3-4k (4K, 3-15s; all four Kling 3.x variants support BOTH text-to-video and image-to-video). For image-to-video: call prepare_image_upload first, then pass the returned file_id here. Pricing is per-second, varies by model and resolution.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model. Defaults to 'seedance-2.0-t2v'. Use the -i2v variant or any kling3 variant for image-to-video. | |
| prompt | Yes | Text description of the video. For image-to-video, describe the motion/action you want applied to the source image. | |
| file_id | No | file_id from prepare_image_upload — preferred for chat attachments. Required for seedance-2.0-i2v / -i2v-fast. Optional for kling3-* (presence triggers image-to-video mode). | |
| duration | No | Video duration in seconds. Per-model bounds: seedance i2v 4-15, seedance t2v 5-15, kling3-standard/pro 5-10, kling3-4k/o3-4k 3-15. Defaults to 5. | |
| image_url | No | HTTPS URL of the source image. Use only if you already have a public URL; otherwise call prepare_image_upload and pass file_id. | |
| resolution | No | Video resolution. Only meaningful for seedance (480p/720p/1080p; 1080p not allowed with seedance fast). Kling models lock resolution by variant. | |
| aspect_ratio | No | Aspect ratio. Defaults to '16:9'. Also controls placeholder shape on the board. Ignored for image-to-video (aspect derives from input). | |
| storyboard_id | Yes | The id of the storyboard to add the video to. Must be owned by, or shared with edit access to, the authenticated user. | |
| generate_audio | No | Generate audio (Kling 3 standard/pro only). Ignored for other models. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses that a placeholder is dropped immediately and the recovery hook swaps it after 2-10 minutes. Notes that refresh may be needed for live update. Annotations don't contradict; description adds rich behavioral context beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
A bit lengthy but well-structured with front-loaded main action. Every sentence adds necessary information. Could be slightly trimmed, but no waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all essential aspects: prerequisites (storyboard_id, image upload), model selection, duration bounds, pricing, and output behavior (placeholder + replacement). Despite no output schema, the description fully informs the agent of what to expect.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% coverage of parameter descriptions, but the description adds significant value: explains model variants in detail, per-model duration bounds, difference between file_id and image_url, and defaults. This helps the agent understand parameter constraints and relationships.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it generates an AI video and places it on a user's Avocado AI storyboard. Distinguishes from siblings like generate_video and generate_image_to_storyboard by specifying the placement on storyboard. Uses specific verb 'Generate' and resource 'video to storyboard'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly tells the agent to first use list_storyboards or create_storyboard to obtain storyboard_id. Provides guidance on when to use image-to-video vs text-to-video, and that for image-to-video one should call prepare_image_upload first. Also warns about refresh if the tab is open.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_startedARead-onlyIdempotentInspect
Get step-by-step instructions for connecting to Avocado AI via MCP. Call this when a user wants to sign up, authenticate, or connect Avocado AI to their AI assistant (Claude, ChatGPT, Cursor, Windsurf, Claude Code, etc.).
| Name | Required | Description | Default |
|---|---|---|---|
| client | No | Which AI assistant or client the user wants to connect from. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description's 'Get step-by-step instructions' aligns with these hints but adds no new behavioral context beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, with the first stating the core purpose and the second providing usage context. It is concise with no redundant or unnecessary information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one optional parameter, no output schema), the description fully covers purpose, usage, and context. No additional details are needed for an agent to correctly invoke it.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter 'client' is fully documented in the schema with enum values and description. The tool description does not add additional meaning or behavior details for the parameter, so it meets the baseline for 100% schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool provides step-by-step instructions for connecting to Avocado AI via MCP. It specifies the exact scenarios (sign up, authenticate, connect) and lists example clients, making it distinct from sibling tools which are primarily generation and editing tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly says 'Call this when a user wants to sign up, authenticate, or connect Avocado AI to their AI assistant', giving clear when-to-use guidance. It does not list exclusions, but the context of sibling tools makes the usage obvious.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_storyboardsARead-onlyIdempotentInspect
List the user's Avocado AI storyboards. Returns owned and shared boards with id, title, last-updated time, thumbnail, and direct URL. Use this to let the user pick an existing board.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds value by specifying the scope (owned and shared) and the exact fields returned, which goes beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with the action and resource, followed by output details and usage guidance. No superfluous words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (no parameters, no output schema), the description fully covers purpose, output, and usage. Annotations handle safety. No missing elements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With zero parameters and 100% schema coverage, the baseline is 4. The description does not need to explain parameters, but it compensates by describing the output structure, aiding understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (list), the resource (storyboards), and the returned fields (id, title, last-updated time, thumbnail, URL). It distinguishes itself from sibling tools like create_storyboard, which create rather than list.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use the tool: 'Use this to let the user pick an existing board.' It does not provide explicit exclusions or alternatives, but no alternative listing tool exists among siblings, making the guidance adequate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
models_listARead-onlyIdempotentInspect
List all available AI image generation models on Avocado AI. Returns model slugs, display names, credit costs, and descriptions. Use this to help users pick the right model for their needs.
| Name | Required | Description | Default |
|---|---|---|---|
| category | No | Filter by media type. Currently only 'image' is supported. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint, so the behavioral burden is low. The description adds context about return content but no additional behavioral traits beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with verb and resource, no redundancy. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description explains what is returned (slugs, names, costs, descriptions) despite no output schema. It mentions the optional parameter implicitly via 'all available'. Lacks details on pagination or filtering behavior but sufficient for a simple list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage for its single parameter (category). The description does not add extra meaning beyond the schema's description, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists all available AI image generation models, specifies the return fields (slugs, names, credit costs, descriptions), and differentiates from sibling generation tools by being a list/retrieval operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides a high-level usage context ('help users pick the right model') but does not explicitly state when not to use or mention alternative tools. No sibling comparison is given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
prepare_image_uploadPrepare Image UploadAInspect
MANDATORY first step whenever the user attached an image in chat (or pointed at a local file on disk) and wants edit_image or image-to-video generation. Returns a signed PUT URL plus a file_id. After this tool: either (a) the inline upload widget will let the user drop the file and auto-continue (Claude.ai web), or (b) you run a curl PUT yourself if you have shell access (Claude Desktop / Claude Code) — the response text contains a ready-to-run curl command. Then call edit_image or generate_video with file_id=. edit_image and generate_video do NOT accept base64 — calling them with raw image bytes WILL fail. This tool is the only working path for chat attachments. Set purpose to 'edit' or 'video' so the upload widget points the user at the right downstream tool.
| Name | Required | Description | Default |
|---|---|---|---|
| purpose | No | What the user wants done with the uploaded image. 'edit' (default) for edit_image. 'video' for generate_video image-to-video. The upload widget uses this to nudge you toward the right downstream tool after upload. | |
| mime_type | No | MIME type of the image the user will upload. Defaults to image/png. Accepts png, jpeg, webp. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description thoroughly discloses the tool's behavior: it returns a signed URL and file_id, requires subsequent steps (widget upload or curl command), and warns that downstream tools fail with raw bytes. Annotations are minimal (no read-only, destructive hints), so the description carries the full burden and does it well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is comprehensive but slightly lengthy; however, every sentence adds critical information for the workflow. It is front-loaded with the mandatory nature and clearly orders steps. Could be slightly more concise, but the information density justifies the length.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (preparation step without output schema), the description is complete: it covers the return values, post-usage steps, how to handle the upload in different environments, and prerequisites. No gaps in context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema already has clear descriptions for 'purpose' and 'mime_type' with enums and defaults. The description adds value by explaining why each parameter matters (purpose controls downstream tool nudging) and emphasizing defaults. Schema coverage is 100%, so the description enhances rather than replaces schema info.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool is the mandatory first step for uploading images for editing or video generation, distinguishing it from sibling tools like edit_image and generate_video. It specifies that it returns a signed PUT URL and file_id, providing a clear purpose.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says this is mandatory when the user attaches an image and wants to use edit_image or generate_video. It explains that downstream tools do not accept base64 and will fail if called directly, providing clear guidance on when to use this tool and why alternatives won't work.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!