Skip to main content
Glama

text_to_image

Generate images from text prompts using ComfyUI's stable diffusion pipeline with configurable parameters for seed, steps, CFG scale, and denoise strength.

Instructions

Generate an image from a prompt.

Args:
    prompt: The prompt to generate the image from.
    seed: The seed to use for the image generation.
    steps: The number of steps to use for the image generation.
    cfg: The CFG scale to use for the image generation.
    denoise: The denoise strength to use for the image generation.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYes
seedYes
stepsYes
cfgYes
denoiseYes

Implementation Reference

  • The handler function for the text_to_image MCP tool, including registration via @mcp.tool(), type hints serving as schema, and core logic delegating to ComfyUI.process_workflow.
    @mcp.tool()
    async def text_to_image(prompt: str, seed: int, steps: int, cfg: float, denoise: float) -> Any:
        """Generate an image from a prompt.
        
        Args:
            prompt: The prompt to generate the image from.
            seed: The seed to use for the image generation.
            steps: The number of steps to use for the image generation.
            cfg: The CFG scale to use for the image generation.
            denoise: The denoise strength to use for the image generation.
        """
        auth = os.environ.get("COMFYUI_AUTHENTICATION")
        comfy = ComfyUI(
            url=f'http://{os.environ.get("COMFYUI_HOST", "localhost")}:{os.environ.get("COMFYUI_PORT", 8188)}',
            authentication=auth
        )
        images = await comfy.process_workflow("text_to_image", {"prompt": prompt, "seed": seed, "steps": steps, "cfg": cfg, "denoise": denoise}, return_url=os.environ.get("RETURN_URL", "true").lower() == "true")
        return images
  • Helper method in ComfyUI client that loads the 'text_to_image.json' workflow file when passed 'text_to_image' as string, updates params, queues the prompt to ComfyUI server, and waits for images via websocket.
    async def process_workflow(self, workflow: Any, params: Dict[str, Any], return_url: bool = False):
        if isinstance(workflow, str):
            workflow_path = os.path.join(os.environ.get("WORKFLOW_DIR", "workflows"), f"{workflow}.json")
            if not os.path.exists(workflow_path):
                raise Exception(f"Workflow {workflow} not found")
            with open(workflow_path, "r", encoding='utf-8') as f:
                prompt = json.load(f)
        else:
            prompt = workflow
    
        self.update_workflow_params(prompt, params)
    
        ws = websocket.WebSocket()
        ws_url = f"ws://{os.environ.get("COMFYUI_HOST", "localhost")}:{os.environ.get("COMFYUI_PORT", 8188)}/ws?clientId={self.client_id}"
        
        if self.authentication:
            ws.connect(ws_url, header=[f"Authorization: {self.authentication}"])
        else:
            ws.connect(ws_url)
    
        try:
            images = self.get_images(ws, prompt, return_url)
            return images
        finally:
            ws.close()
  • Helper method that updates specific nodes in the workflow JSON with the provided parameters (prompt as 'text' for CLIPTextEncode, seed/steps/cfg/denoise for KSampler), used by the text_to_image tool.
    def update_workflow_params(self, prompt, params):
        if not params:
            return
    
        for node in prompt.values():
            if node["class_type"] == "CLIPTextEncode" and "text" in params:
                if isinstance(node["inputs"]["text"], str):
                    node["inputs"]["text"] = params["text"]
            elif node["class_type"] == "KSampler":
                if "seed" in params:
                    node["inputs"]["seed"] = params["seed"]
                if "steps" in params:
                    node["inputs"]["steps"] = params["steps"]
                if "cfg" in params:
                    node["inputs"]["cfg"] = params["cfg"]
                if "denoise" in params:
                    node["inputs"]["denoise"] = params["denoise"]
            
            elif node["class_type"] == "LoadImage" and "image" in params:
                node["inputs"]["image"] = params["image"]
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool generates an image but doesn't describe what happens (e.g., where the image is stored, if it's returned as data or a file, potential rate limits, or error conditions). For a tool with 5 parameters and no annotations, this is a significant gap in behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and concise. It starts with a clear purpose statement, followed by a bullet-point list of parameters with brief explanations. Every sentence earns its place, and there's no wasted text, making it easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations, no output schema), the description is incomplete. It doesn't explain what the tool returns (e.g., image data, file path, error messages), nor does it provide behavioral details like side effects or constraints. This leaves significant gaps for an AI agent to understand the tool fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds substantial meaning beyond the input schema, which has 0% description coverage. It lists all 5 parameters with brief explanations (e.g., 'The prompt to generate the image from'), providing essential semantic context that the schema lacks. However, it doesn't specify value ranges or units for numeric parameters like 'steps' or 'cfg'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate an image from a prompt.' This is a specific verb ('Generate') and resource ('image'), though it doesn't explicitly differentiate from sibling tools like 'download_image' or 'run_workflow_from_file'. The purpose is unambiguous but lacks sibling distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools or suggest scenarios where this tool is preferred over others, leaving the agent without context for selection among available options.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Overseer66/comfyui-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server