Skip to main content
Glama

image

Generate images from text prompts or edit existing images using AI models, supporting multiple aspect ratios and resolutions, with results saved as files.

Instructions

Generate images via OpenRouter-compatible or OpenAI-compatible endpoints.

CAPABILITIES:

  • Text-to-image generation with multiple providers

  • Image editing and transformation with reference images

  • Multiple aspect ratios and resolutions (1K/2K/4K)

RESPONSE FORMAT:

  • Returns XML with file paths to generated images

  • Images saved to disk (no base64 in response)

  • Includes text descriptions when available

BEST PRACTICES:

  • Be descriptive: describe scenes, lighting, style, composition

  • Use negative constraints in prompt: "no text", "no watermark", "no blur"

  • For editing: provide reference image and specify what to keep

Supports: reference images for editing.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesImage generation prompt. Structure: <goal>what you want to generate (can be a statement)</goal> <context>detailed background info - the more the better</context> <hope>desired visual outcome, can be abstract</hope>. Example: <goal>Create a 4-panel comic about debugging</goal> <context>Developer finds a bug at 3am, tries multiple fixes, finally discovers it was a typo, comedic relief for tech blog</context> <hope>simple black-white line art, speech bubbles, exaggerated tired expressions</hope>
imagesNoReference images for editing or style transfer.
modelNoModel to use (default: from IMAGE_MODEL env).
aspect_ratioNoOutput image aspect ratio. Default: 1:1 (square).1:1
resolutionNoOutput resolution. 1K (1024px), 2K (2048px), 4K (4096px). Default: 1K.1K
qualityNoImage quality (OpenAI generations API). Options: standard, hd.standard
save_pathYesBase directory for saving images. Files saved to {save_path}/{task_note}/.
api_typeNoAPI type to use. Default: from IMAGE_API_TYPE env var (openrouter_chat).openrouter_chat
task_noteYesSubdirectory name for saving images (English recommended, e.g., 'hero-banner', 'product-shot'). Also shown in GUI.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it describes the response format (XML with file paths, images saved to disk), mentions that images are saved rather than returned as base64, and includes best practices for effective usage. It doesn't cover rate limits, authentication needs, or error handling, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (CAPABILITIES, RESPONSE FORMAT, BEST PRACTICES) and efficiently conveys information. While slightly longer than minimal, each section adds value and the structure helps with quick scanning. The final 'Supports:' line feels redundant but doesn't significantly detract.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex 9-parameter tool with no annotations and no output schema, the description provides substantial context about capabilities, response format, and best practices. It covers the tool's scope well but doesn't address potential limitations, error cases, or provide examples of the XML response structure that would help the agent understand what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 9 parameters thoroughly. The description adds some context about reference images for editing and general capabilities, but doesn't provide additional parameter semantics beyond what's in the schema. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Generate images via OpenRouter-compatible or OpenAI-compatible endpoints' with specific capabilities listed including text-to-image generation, image editing, and multiple aspect ratios/resolutions. It distinguishes from sibling tools (which appear to be text/chat models) by focusing exclusively on image generation and manipulation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (image generation and editing) and includes 'BEST PRACTICES' section with specific guidance on prompt construction and editing workflows. However, it doesn't explicitly state when NOT to use this tool or name alternatives for similar functionality.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shiharuharu/cli-agent-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server