xpay✦ Media Studio
Server Details
25+ AI media generation tools — FLUX Pro, Ideogram v3, Recraft v3, Stable Diffusion XL, MiniMax video, and Kokoro TTS. Images, video, and audio from one server. $0.01/call.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 3.2/5 across 8 of 8 tools scored.
Most tools are clearly distinct by model/API (e.g., FLUX Dev vs. FLUX Pro vs. Ideogram), but some overlap exists in image generation capabilities where multiple tools serve similar purposes (e.g., FLUX Pro, Recraft v3, and SDXL all target high-quality image generation). The descriptions help differentiate them by emphasizing specific strengths like speed, photorealism, or text rendering.
Naming is inconsistent with mixed conventions: some use underscores and descriptive names (black_forest_labs_flux_dev), others use camelCase or simpler forms (ideogram_v3, kokoro_tts, minimax_video_01). There's no uniform verb_noun pattern, and the naming style varies significantly across tools, making it harder to predict or remember tool names.
With 8 tools, the count is well-scoped for a media studio server covering image generation, text-to-speech, and video creation. Each tool appears to serve a distinct model or API, and the number is reasonable for the domain without being overwhelming or too sparse.
The tool set covers core media generation tasks (images, speech, video) but has notable gaps. There are no tools for editing, processing, or managing generated media (e.g., resize, filter, delete), and coverage is limited to specific models without broader operations like listing available models or checking generation status. This could lead to agent failures in more complex workflows.
Available Tools
8 toolsblack_forest_labs_flux_devBInspect
FLUX Dev — balanced quality and speed for image generation
| Name | Required | Description | Default |
|---|---|---|---|
| seed | No | Random seed. Set for reproducible generation | |
| image | No | Input image for image to image mode. The aspect ratio of your output will match this image | |
| prompt | Yes | Prompt for generated image | |
| go_fast | No | Run faster predictions with model optimized for speed (currently fp8 quantized); disable to run in original bf16. Note that outputs will not be deterministic when this is enabled, even if you set a seed. | |
| guidance | No | Guidance for generated image | |
| megapixels | No | Approximate number of megapixels for generated image | 1 |
| num_outputs | No | Number of outputs to generate | |
| aspect_ratio | No | Aspect ratio for the generated image | 1:1 |
| output_format | No | Format of the output images | webp |
| output_quality | No | Quality when saving the output images, from 0 to 100. 100 is best quality, 0 is lowest quality. Not relevant for .png outputs | |
| prompt_strength | No | Prompt strength when using img2img. 1.0 corresponds to full destruction of information in image | |
| num_inference_steps | No | Number of denoising steps. Recommended range is 28-50, and lower number of steps produce lower quality outputs, faster. | |
| disable_safety_checker | No | Disable safety checker for generated images. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It fails to disclose critical behavioral traits present in the schema: img2img capability (via 'image' parameter), non-determinism when 'go_fast' is enabled, safety checker behavior, or output format options. Only the quality/speed characteristic is mentioned.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
While the single phrase is efficient and front-loads the model identity, it is inappropriately terse for a 13-parameter tool with complex modes (img2img, safety controls). The description underserves the tool's complexity despite not wasting words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex media generation tool with 13 parameters, img2img support, and safety features—but no annotations or output schema—the description is insufficient. It omits operational context that would help an agent understand capabilities, limitations, and side effects.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description adds no parameter-specific context beyond the schema (e.g., no explanation of prompt_strength vs guidance, or the implications of go_fast). It meets but does not exceed the schema's documentation level.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies the tool as an image generator and positions it with 'balanced quality and speed,' implicitly distinguishing it from the sibling 'pro' (quality-focused) and 'schnell' (speed-focused) variants. However, it lacks explicit differentiation statements comparing it to siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The phrase 'balanced quality and speed' implies when to use this tool (middle ground), but provides no explicit when-to-use guidance, prerequisites, or named alternatives. The agent must infer the trade-off positioning.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
black_forest_labs_flux_proBInspect
FLUX 1.1 Pro — highest quality photorealistic image generation
| Name | Required | Description | Default |
|---|---|---|---|
| seed | No | Random seed. Set for reproducible generation | |
| width | No | Width of the generated image in text-to-image mode. Only used when aspect_ratio=custom. Must be a multiple of 32 (if it's not, it will be rounded to nearest multiple of 32). Note: Ignored in img2img and inpainting modes. | |
| height | No | Height of the generated image in text-to-image mode. Only used when aspect_ratio=custom. Must be a multiple of 32 (if it's not, it will be rounded to nearest multiple of 32). Note: Ignored in img2img and inpainting modes. | |
| prompt | Yes | Text prompt for image generation | |
| aspect_ratio | No | Aspect ratio for the generated image | 1:1 |
| image_prompt | No | Image to use with Flux Redux. This is used together with the text prompt to guide the generation towards the composition of the image_prompt. Must be jpeg, png, gif, or webp. | |
| output_format | No | Format of the output images. | webp |
| output_quality | No | Quality when saving the output images, from 0 to 100. 100 is best quality, 0 is lowest quality. Not relevant for .png outputs | |
| safety_tolerance | No | Safety tolerance, 1 is most strict and 6 is most permissive | |
| prompt_upsampling | No | Automatically modify the prompt for more creative generation |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure but only delivers the 'highest quality' positioning. It omits critical behavioral aspects: it doesn't mention the safety_tolerance filtering (range 1-6), the img2img and inpainting modes revealed in parameter descriptions, potential latency/cost implications of 'highest quality,' or what happens to generated images (temporary vs. persistent URLs).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise (single sentence/phrase) with zero filler words. Information is front-loaded with the model identifier and quality claim. However, for a tool with 10 parameters and multiple modes (text-to-image, img2img, inpainting), the brevity borders on under-specification, preventing a score of 5.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 100% schema coverage, the description adequately identifies the tool's core function. However, it fails to mention the return value (image data/URL) despite the absence of an output schema, and ignores the multi-modal capabilities (img2img/inpainting) evident in the parameter documentation. For a complex generation tool with safety controls, this is minimally viable but incomplete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds no explicit parameter guidance (e.g., it doesn't mention the seed for reproducibility or safety_tolerance for content filtering), but the schema comprehensively documents all 10 parameters including constraints like 'multiple of 32' for dimensions. No additional value is added by the description text itself.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies the tool as an image generation resource using the specific 'FLUX 1.1 Pro' model and positions it as 'highest quality photorealistic,' which distinguishes it from sibling tools like flux_dev, flux_schnell, and other image generators. However, it lacks explicit differentiation criteria (e.g., speed vs. quality trade-offs) to definitively guide selection among the multiple image generation options available.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The phrase 'highest quality photorealistic' implies usage for high-fidelity realistic imagery, providing implicit context for when to select this over lower-tier or non-photorealistic alternatives. However, it fails to explicitly state when to use this tool versus the other image generation siblings (e.g., ideogram_v3, recraft_v3) or mention specific prerequisites like API credits or rate limits.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
black_forest_labs_flux_schnellAInspect
FLUX Schnell — fastest image generation, 1-4 steps
| Name | Required | Description | Default |
|---|---|---|---|
| seed | No | Random seed. Set for reproducible generation | |
| prompt | Yes | Prompt for generated image | |
| go_fast | No | Run faster predictions with model optimized for speed (currently fp8 quantized); disable to run in original bf16. Note that outputs will not be deterministic when this is enabled, even if you set a seed. | |
| megapixels | No | Approximate number of megapixels for generated image | 1 |
| num_outputs | No | Number of outputs to generate | |
| aspect_ratio | No | Aspect ratio for the generated image | 1:1 |
| output_format | No | Format of the output images | webp |
| output_quality | No | Quality when saving the output images, from 0 to 100. 100 is best quality, 0 is lowest quality. Not relevant for .png outputs | |
| num_inference_steps | No | Number of denoising steps. 4 is recommended, and lower number of steps produce lower quality outputs, faster. | |
| disable_safety_checker | No | Disable safety checker for generated images. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full disclosure burden but only mentions speed characteristics. It omits critical behavioral context: the non-deterministic nature when go_fast is enabled, safety checker implications, output delivery method (URL vs binary), and resource costs.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
At nine words, the description is maximally efficient with zero redundancy. It front-loads the model identity (FLUX Schnell), key differentiator (fastest), function (image generation), and technical constraint (1-4 steps) without extraneous filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 100% schema coverage, the description adequately covers the tool's value proposition but leaves gaps regarding output format (no output schema exists) and behavioral safety characteristics that would help an agent understand the full implications of invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description references '1-4 steps' which aligns with the num_inference_steps parameter constraints, but adds no additional semantic depth beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool performs 'image generation' (specific verb + resource) and distinguishes itself from siblings via 'fastest' and '1-4 steps', positioning it against flux_dev and flux_pro which are typically slower/higher quality variants.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
While 'fastest' implies this tool should be selected when speed is prioritized over quality, the description lacks explicit guidance on when to choose alternatives (e.g., flux_dev for higher fidelity) or when to disable go_fast for determinism.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
ideogram_v3CInspect
Ideogram v3 Quality — AI image generation with best-in-class text rendering
| Name | Required | Description | Default |
|---|---|---|---|
| mask | No | A black and white image. Black pixels are inpainted, white pixels are preserved. The mask will be resized to match the image size. | |
| seed | No | Random seed. Set for reproducible generation | |
| image | No | An image file to use for inpainting. You must also use a mask. | |
| prompt | Yes | Text prompt for image generation | |
| resolution | No | Resolution. Overrides aspect ratio. Ignored if an inpainting image is given. | None |
| style_type | No | The styles help define the specific aesthetic of the image you want to generate. | None |
| aspect_ratio | No | Aspect ratio. Ignored if a resolution or inpainting image is given. | 1:1 |
| style_preset | No | Apply a predefined artistic style to the generated image (V3 models only). | None |
| magic_prompt_option | No | Magic Prompt will interpret your prompt and optimize it to maximize variety and quality of the images generated. You can also use it to write prompts in different languages. | Auto |
| style_reference_images | No | A list of images to use as style references. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the text rendering capability but fails to disclose operational traits such as output format (URL vs base64), whether generation is synchronous, resource costs, rate limits, or the fact that it supports inpainting (evident in schema parameters 'mask' and 'image').
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The single sentence is efficiently structured without filler words, front-loading the model identifier followed immediately by the function and key differentiator. It achieves clarity with minimal length, though the brevity contributes to informational gaps in other dimensions.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex creative tool with 10 parameters and no output schema, the description is incomplete. It omits that the tool supports inpainting (despite relevant parameters), fails to describe the return value, and doesn't address error handling or generation limits that would help an agent orchestrate this tool effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the structured documentation already explains all 10 parameters adequately. The description adds no additional parameter guidance (e.g., explaining the relationship between resolution/aspect_ratio overrides or inpainting requirements), meeting the baseline expectation for well-documented schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'AI image generation' as the core function and identifies the specific resource being manipulated. It distinguishes itself from sibling image generation tools (Flux, SDXL, Recraft) by highlighting 'best-in-class text rendering' as a unique capability, though it doesn't explicitly name siblings for comparison.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides minimal guidance on when to use this tool versus alternatives. While 'best-in-class text rendering' implies a use case (images containing text), there are no explicit when-to-use conditions, prerequisites, or comparisons to the seven sibling image/video generation tools available on the server.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
kokoro_ttsAInspect
Kokoro 82M — fast, natural-sounding text-to-speech in multiple voices
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text input (long text is automatically split) | |
| speed | No | Speech speed multiplier (0.5 = half speed, 2.0 = double speed) | |
| voice | No | Voice to use for synthesis | af_bella |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It adds model context ('82M') and performance traits ('fast'), but omits critical behavioral details: output format (audio bytes vs URL), synchronous vs async operation, and whether results are cached. Schema notes automatic text splitting, which is helpful behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence with zero waste. Front-loaded with model identifier and key differentiators (fast, natural-sounding). Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 3-parameter tool with complete schema documentation, description adequately covers the 'what' but lacks output specification (critical for TTS: does it return binary audio, a file path, or URL?). No output schema exists to compensate, creating a gap in invocation readiness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% description coverage, establishing baseline 3. Description mentions 'multiple voices' (hinting at voice parameter) and 'fast' (relating to speed), but doesn't explain parameter semantics beyond what schema already provides (e.g., no guidance on voice selection criteria or speed trade-offs).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states 'text-to-speech' with specific characteristics (fast, natural-sounding, multiple voices). It clearly distinguishes from image/video generation siblings (black_forest_labs_flux_dev, ideogram_v3, etc.) by specifying the audio/TTS domain.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit when-to-use or when-not-to-use guidance provided. While the TTS functionality is distinct from image/video siblings, the description doesn't clarify usage conditions like 'use for audio narration' or prerequisites like text length limits.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
minimax_video_01BInspect
Video-01 — generate short cinematic videos from text prompts
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Text prompt for generation | |
| prompt_optimizer | No | Use prompt optimizer | |
| first_frame_image | No | First frame image for video generation. The output video will have the same aspect ratio as this image. | |
| subject_reference | No | An optional character reference image to use as the subject in the generated video (this will use the S2V-01 model) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure but only offers minimal traits ('short', 'cinematic'). It omits critical generation context such as asynchronous processing time, output format/URL structure, rate limits, or whether the operation consumes credits/quota.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The single-sentence description is efficiently front-loaded with the product identifier (Video-01) followed immediately by the core function. There is no redundant or extraneous text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema and annotations, the description fails to specify the return value structure (e.g., video URL, base64, file reference) or delivery mechanism for the generated asset, which is essential context for a media generation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, documenting all four parameters including the aspect ratio behavior of first_frame_image and the S2V-01 model trigger for subject_reference. The description adds no supplementary parameter guidance, meeting the baseline for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action (generate), resource type (short cinematic videos), and input method (text prompts). It effectively distinguishes this tool from its image-generation and TTS siblings by explicitly specifying 'videos'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus the available image generation alternatives, nor does it explain when to use the subject_reference parameter (which triggers S2V-01) versus standard generation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
recraft_v3CInspect
Recraft v3 — professional design-quality image generation
| Name | Required | Description | Default |
|---|---|---|---|
| size | No | Width and height of the generated image. Size is ignored if an aspect ratio is set. | 1024x1024 |
| style | No | Style of the generated image. | any |
| prompt | Yes | Text prompt for image generation | |
| aspect_ratio | No | Aspect ratio of the generated image | Not set |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure but only hints at output quality ('professional design-quality'). It fails to disclose critical traits like output format, rate limits, content safety policies, synchronous vs. asynchronous behavior, or whether generated images are persisted.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The single-sentence description is efficiently structured and front-loaded with the model version and capability. However, given the complexity of the tool landscape (multiple image siblings) and lack of annotations, the extreme brevity arguably underserves the agent's need for decision-making context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description is inadequate for the operational context. With four competing image generation siblings, no output schema, and zero annotations, the description must guide tool selection and set expectations for return values—neither of which is accomplished by the terse tagline provided.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, documenting all four parameters (prompt, size, style, aspect_ratio) adequately. The description adds no semantic clarification beyond the schema, but given the complete schema coverage, this meets the baseline expectation without earning bonus points for additional context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies this as an image generation tool with the qualifier 'professional design-quality,' which hints at high-fidelity output. However, it lacks explicit differentiation from the four sibling image generators (flux variants, ideogram, stability), leaving the agent to guess when Recraft is preferable over alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to select this tool versus the other image generation options available. Given the crowded sibling set of black_forest_labs_flux_dev, ideogram_v3, and stability_ai_sdxl, the absence of selection criteria forces the agent to make arbitrary choices.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
stability_ai_sdxlCInspect
SDXL — high-resolution image generation with fine-grained control
| Name | Required | Description | Default |
|---|---|---|---|
| mask | No | Input mask for inpaint mode. Black areas will be preserved, white areas will be inpainted. | |
| seed | No | Random seed. Leave blank to randomize the seed | |
| image | No | Input image for img2img or inpaint mode | |
| width | No | Width of output image | |
| height | No | Height of output image | |
| prompt | No | Input prompt | An astronaut riding a rainbow unicorn |
| refine | No | Which refine style to use | no_refiner |
| scheduler | No | scheduler | K_EULER |
| lora_scale | No | LoRA additive scale. Only applicable on trained models. | |
| num_outputs | No | Number of images to output. | |
| refine_steps | No | For base_image_refiner, the number of steps to refine, defaults to num_inference_steps | |
| guidance_scale | No | Scale for classifier-free guidance | |
| apply_watermark | No | Applies a watermark to enable determining if an image is generated in downstream applications. If you have other provisions for generating or deploying images safely, you can use this to disable watermarking. | |
| high_noise_frac | No | For expert_ensemble_refiner, the fraction of noise to use | |
| negative_prompt | No | Input Negative Prompt | |
| prompt_strength | No | Prompt strength when using img2img / inpaint. 1.0 corresponds to full destruction of information in image | |
| replicate_weights | No | Replicate LoRA weights to use. Leave blank to use the default weights. | |
| num_inference_steps | No | Number of denoising steps | |
| disable_safety_checker | No | Disable safety checker for generated images. This feature is only available through the API. See [https://replicate.com/docs/how-does-replicate-work#safety](https://replicate.com/docs/how-does-replicate-work#safety) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full disclosure burden. While it mentions 'high-resolution' and 'fine-grained control,' it omits critical behavioral details: output format (URL vs base64), latency expectations, cost implications, safety checker behavior, or watermarking defaults (despite these being controllable via parameters).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise (7 words) with no filler or redundancy. While the single sentence is efficient and front-loaded, it is arguably undersized for a tool with 19 parameters and complex functionality, leaving substantial informational gaps.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (19 parameters), lack of output schema, presence of many direct siblings, and absence of annotations, the description is materially incomplete. It fails to explain return values, error modes, or the specific value proposition of SDXL versus competing models.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description mentions 'fine-grained control,' which vaguely alludes to the extensive parameter set (19 parameters), but adds no specific semantic meaning for individual parameters like 'refine,' 'scheduler,' or 'high_noise_frac' beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly identifies the core function (high-resolution image generation) and the specific model (SDXL). However, it fails to distinguish this tool from the six sibling image generation tools (Flux variants, Ideogram, Recraft), leaving users without criteria for selection.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use this tool versus alternatives. With multiple competing image generation models available (Flux, Ideogram, etc.), the absence of selection criteria or prerequisites forces users to guess based on the model name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!