Skip to main content
Glama

image_multi_reference

Combine 2-10 reference images and a text prompt to generate a single new image that blends visual elements from all inputs. Ideal for merging multiple styles or product angles into one coherent output.

Instructions

多图融合参考 → 输出 1 张新图(支持 1K + 不稳的真 2K;4K 已禁用)。

[WHAT] 输入 2-10 张参考图 + prompt,模型综合所有图的视觉信息后画 1 张全新的图。 与 image_batch_edit 的本质区别:batch 是 N 进 N 出(每张独立改),此 tool 是 N 进 1 出(综合参考)。

[WHEN TO USE]

  • 用户:"这几张是同一产品的不同角度,按这个风格画一个新角度" → 用此 tool。

  • 用户:"这些是我喜欢的风格,画一张类似风格的 X" → 用此 tool。

  • 用户:"这是 logo 主图,这是辅助图,做成海报" → 用此 tool。

  • 如果用户只想"逐张修改" → 改用 image_batch_edit。

  • 如果用户只有 1 张图 → 改用 image_edit。

  • 如果用户没提供任何参考图 → 改用 image_generate。

[4K 已禁用] origin 处理 4K 多图融合稳定 > 120s 撞 CF 524;入口直接拒。 想要真 4K 多图融合:两步法 — 此 tool 出 1K/2K 综合图 → image_generate(size="3840x2160") 描述同场景升 4K。

[路由实现](双路径 + 自动 fallback)

  • 主路径:/v1/images/generations + image_urls=[...](米醋扩展字段,size 真实生效)

  • 兜底:/v1/chat/completions + 顶层 image_urls + stream:true SSE(永不撞 CF 524,但 size 不生效输出 ~1.57MP)

  • 自动锁 pro:max edge ≥1600 → gpt-image-2-pro

  • 主路径 5xx/524 失败 → 自动 fallback chat stream,notes 标注降级原因

  • 返回的 used_fallback 字段说明走的哪条路径

[LIMITS](当前真实状态,会变化)

  • image_paths 长度 2-10 张。

  • 1K 稳定:主路径 ~30-100s,size=1024² 实际输出 ~1.57MP。

  • 2K 不稳定:主路径 generations + image_urls 在米醋后端间歇 HTTP 500"系统繁忙"; 触发 fallback 后改走 chat stream,size 字段被忽略,实际仍输出 ~1.57MP

  • 单张参考图建议 ≤2MB;总 base64 体积 ≤8MB(米醋代理上限实测约 10MB)。

Args: prompt: 综合指令。例:"combine the colors from img1 and the composition from img2 into a sunset cityscape". image_paths: 2-10 张参考图路径(绝对或相对)。 size: 输出 size。真实生效(不再像旧版 chat 路径那样被忽略)。 推荐:"1024x1024"(1.57MP 福利)/ "2048x2048"(真 2K,可能 fallback 到 1.57MP)。 "3840x2160" / "2160x3840" 已禁用(撞 CF 524 物理上限),传入直接拒。 默认 "1024x1024"。 model: "gpt-image-2"(默认)/ "gpt-image-2-pro"(≥2K 必需,自动切换)。 save_dir: 输出目录(必须在安全根目录之下)。 basename: 文件名前缀(仅 [A-Za-z0-9_-.],含 / .. 会被拒)。默认 "multiref_"。 api_key: 覆盖 MICU_API_KEY;base_url 已锁在启动期 env,运行期不接受。

Returns: dict 含: ok (bool): 是否成功。 model (str): 实际用的模型。 n_references (int): 实际嵌入的参考图张数。 saved (dict): { path, size_bytes, actual_size, actual_megapixels }。 notes (list[str]): 决策与提示。

Examples: # 1K 综合参考 image_multi_reference( prompt="combine these into a single cinematic poster", image_paths=["/p/sketch.png", "/p/character.png", "/p/background.png"], )

# 2K 综合参考(不稳,可能 fallback 到 1.57MP)
image_multi_reference(
    prompt="merge the architecture style from img1 with the lighting from img2",
    image_paths=["/p/img1.jpg", "/p/img2.jpg"],
    size="2048x2048",
)

Common errors: "至少需要 2 张参考图" → 1 张请用 image_edit。 "请求体超 X MB" → 减少图片数量或先压缩。 "size=3840x2160 (4K) 在 image_multi_reference 已禁用" → 4K 多图融合物理撞 CF 524; 两步法:先 1K/2K 出综合图 → image_generate(size="3840x2160") 升 4K。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYes
image_pathsYes
sizeNo1024x1024
modelNo
save_dirNo
basenameNo
api_keyNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description fully carries the burden. Discloses dual-path routing with fallback, 4K disablement due to CF 524, size limitations (1K stable, 2K unstable), and the actual output resolution during fallback. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-organized with sections: summary, what, when, limits, args, returns, examples, errors. Front-loaded with essential purpose. Every section adds value without redundancy. Concise for the complexity covered.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complex tool with multiple paths, fallback, and size behavior. Description covers all aspects: input requirements, process, output schema, error messages, and workarounds. With output schema present, return values are fully explained. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the 'Args' section thoroughly explains each parameter: prompt, image_paths (count constraint), size (recommendations and disallowed values), model, save_dir, basename, api_key. Includes defaults, examples, and common errors. Fully compensates for missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb '多图融合参考' and resource '输出 1 张新图'. Distinguished from sibling 'image_batch_edit' by explaining N-in-1-out vs N-in-N-out. Purpose is immediately understandable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'WHEN TO USE' section with specific user commands and a 'when not to use' section directing to alternative tools (image_batch_edit, image_edit, image_generate). Provides clear decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Subaru486desuwa/micu-image-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server