Skip to main content
Glama
MSWEIMZ
by MSWEIMZ

image_to_image

Generate new images from reference images and a text prompt. Upload existing images and describe the desired output to create customized versions.

Instructions

Generate new image(s) based on reference image(s) and a text prompt.

This is image-to-image generation: provide one or more reference images (as URLs) along with a text prompt describing the desired output.

Args: prompt: Text description guiding the generation. images: List of reference image URLs (at least one required). model: Model name (agnes-image-2.0-flash or agnes-image-2.1-flash). size: Output size (e.g. 1024x768, 1024x1024, 768x1024). n: Number of images to generate (1-4). Default: 1. output_dir: Directory to save the downloaded image(s). Defaults to ~/agnes_output. return_mode: 'url' for image URL, 'b64' for base64 + local save.

Returns: dict with url, local_path, model, size, n, images.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nNo
sizeNo1024x768
modelNoagnes-image-2.1-flash
imagesYes
promptYes
output_dirNo
return_modeNourl

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It discloses that the tool generates images, accepts image URLs, and returns a dict with url, local_path, etc. It mentions saving locally and return modes. It does not cover potential rate limits or file size constraints, but the core behavior is transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with 'Args' and 'Returns' sections, making it easy to parse. It is concise, with only minor redundancy in the opening sentences. Every sentence provides value, though a slight trim could improve focus.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters (2 required) and an output schema, the description covers all inputs and outputs. It explains each parameter's role and the return format. No critical gaps are present, though it could mention that 'images' must be URLs. Overall, it is complete enough for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning no parameter descriptions in the schema. However, the tool description lists all 7 parameters with explanations, defaults, and allowed values (e.g., model names, size formats). This adds significant meaning beyond the bare schema, fulfilling the compensation requirement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate new image(s) based on reference image(s) and a text prompt.' It uses specific verbs ('generate') and resources ('image(s)'), and distinguishes from sibling tools like text_to_image, which lacks reference images. The title is absent but the description compensates.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that reference images are required and a text prompt guides generation. It lists parameters and their defaults, providing context. However, it does not explicitly state when to use this tool over alternatives (e.g., text_to_image) or when not to use it. Still, the purpose is clear enough for an agent to infer appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MSWEIMZ/agnes-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server