Skip to main content
Glama

ComfyUI MCP Server

by neutrinotek
server-plan.md11 kB
# ComfyUI MCP Server Implementation Plan ## Overview This document outlines a comprehensive plan for implementing a Model Context Protocol (MCP) server that integrates with a locally hosted ComfyUI instance. The server will ingest workflow templates from a `/workflows` directory, interpret node semantics without relying on hard-coded IDs, and provide structured tools for large language models (LLMs) to customize and execute workflows. It will support real-time conversational updates, optional authentication to the ComfyUI backend, and batch execution scenarios where multiple models or stages run in sequence. --- ## 1. Project Scaffolding & Configuration 1. **Package layout** - Create a Python package at `src/comfyui_mcp/` with clear entry points for the MCP server runtime and shared utilities. - Add a `/workflows` directory at the repository root to store default `.json` workflow templates. - Include `docs/` (this plan) and potentially `examples/` for sample usage. 2. **Configuration management** - Provide a configuration file (e.g., `config.toml` or `settings.yaml`) or environment variables to define: - ComfyUI base URL (host, port). Allow overriding via CLI flags. - Optional API key/token for ComfyUI (even though the primary use-case is local, support this for future flexibility). - Default workflow name and optional workflow metadata (description, tags, recommended assets). - Directories for checkpoints, LoRAs, VAEs, text encoders, embeddings. - Default parameter bounds (CFG, steps, resolution). - Feature toggles (e.g., enable real-time streaming, enable batch execution mode). 3. **Data layer utilities** - Implement a discovery layer to load all JSON workflows at startup, cache parsed versions, and optionally watch for file changes. - Validate workflow structure (node list, link definitions, metadata) using a schema validator. - Extract human-readable metadata for nodes if available; otherwise apply heuristics (see Section 2). --- ## 2. Workflow Introspection & Node Interpretation 1. **Graph parsing** - Parse each workflow JSON into an internal graph representation with indices by `id`, `class_type`, and available metadata (`title`, `meta`, etc.). - Capture input/output slot information to understand connections. 2. **Semantic indexing** - Build secondary indices for node roles: prompts (`CLIPTextEncode`, prompt nodes), model selection (`CheckpointLoader*`), LoRA loaders, VAE loaders, CLIP/text encoders, samplers (CFG, steps), resolution/latent size controllers, output nodes (preview/save), etc. - Normalize vendor-specific node names (including custom nodes) into canonical roles (e.g., `prompt_positive`, `prompt_negative`, `base_checkpoint`, `lora_list`). 3. **Heuristics for unlabeled nodes** - Infer node roles by analyzing topology (e.g., nodes feeding a `model` input on a sampler are checkpoint loaders). - Allow optional annotations in workflow JSON (e.g., `node_meta.display_name`) for manual overrides, but ensure defaults work without manual IDs. - Consult ComfyUI documentation and APIs when new node types appear; build an extensible mapping layer (e.g., registry of known node signatures). 4. **Node role abstractions** - Create a normalization layer that surfaces canonical roles to the LLM, enabling instructions like “update the negative prompt” or “swap the base model” without referencing raw node IDs. --- ## 3. Template Mutation API 1. **High-level mutation functions** - Implement functions to modify prompts, base checkpoints, LoRAs (add/remove/adjust strength), VAEs, CLIP/text encoders, CFG/steps/scheduler, resolution/aspect ratios, seed control, and other common parameters. - Ensure functions handle both single-stage and multi-stage workflows (e.g., multiple samplers or chained models). 2. **Graph integrity** - Validate that mutations preserve required connections and update dependent nodes when necessary (e.g., ensuring encoder/decoder pairings match the chosen checkpoint). - Automatically insert auxiliary nodes when needed (e.g., injecting a LoRA loader before a sampler stage). 3. **Change tracking** - Maintain a diff summary showing before/after values for transparency and to inform the LLM of applied changes. 4. **Batch execution readiness** - Allow mutations to target specific stages or iterations (e.g., stage `A` vs. stage `B` in a multi-model workflow). - Support parameter sweeps or batched modifications when executing multiple variations. --- ## 4. Execution Pipeline 1. **ComfyUI client** - Build an HTTP/WebSocket client that can submit workflows, monitor queue status, receive progress updates, and download outputs. - Implement real-time streaming callbacks so conversational agents can receive incremental updates during execution. 2. **Handling outputs** - Inspect execution results for both `save_image` and `preview_image` nodes. If only previews are generated, retrieve the in-memory preview and optionally offer to save it upon request. - Structure output metadata with links or binary blobs so the LLM can present results. 3. **Batch execution and chaining** - Enable execution of workflows that produce multiple outputs (e.g., multiple sampler stages). Aggregate outputs per stage and provide context (e.g., which model generated each image). - Support batch submission for parameter variations or multi-model pipelines, returning a structured summary of results. 4. **Error handling** - Provide clear error states (missing assets, invalid parameters, API failures). Include recovery suggestions or fallback defaults where possible. --- ## 5. LLM-Facing Abstractions 1. **MCP tool definitions** - Expose structured tools such as: - `list_workflows()` → metadata about available templates. - `describe_workflow(name)` → semantic summary (node roles, adjustable parameters, required assets). - `customize_workflow(name, changes)` → apply mutations; return diff summary. - `execute_workflow(name, changes?, batch_options?, stream_updates?)` → optionally mutate, run, and stream results. 2. **Schema design** - Provide JSON schemas with enums and numeric constraints to help the LLM format requests (e.g., valid sampler types, CFG ranges). - Define structures for batch requests (e.g., arrays of parameter variations) and multi-stage workflows. 3. **Response format** - Return structured results containing execution metadata, change logs, and links/handles for generated images or previews. - Include conversational hints (e.g., suggested follow-up actions) to facilitate real-time interaction. --- ## 6. Model & Asset Management 1. **Asset discovery** - Scan configured directories for checkpoints, LoRAs, VAEs, text encoders, etc., caching metadata for quick lookup. - Optionally support metadata files (YAML/JSON) describing recommended usage, strengths, or compatible models. 2. **Validation & safeguards** - Validate requested assets exist; suggest closest matches if not found. - Enforce safe file paths to avoid arbitrary filesystem access. 3. **Dynamic updates** - Allow refreshing asset catalogs at runtime to pick up newly added files without restarting the server. --- ## 7. Safety, Robustness & Observability 1. **Input validation** - Enforce parameter bounds to prevent invalid sampler configurations. - Sanitize text inputs and ensure they meet ComfyUI API limits. 2. **Timeouts & retries** - Implement retry logic and configurable timeouts for network calls to ComfyUI. - Surface errors with actionable context. 3. **Logging & metrics** - Provide structured logging for requests, mutations, and execution results. - Optionally emit metrics (execution time, success rates) for observability. 4. **Authentication support** - Allow configuring optional API keys or tokens for ComfyUI endpoints, even though the default deployment is local. --- ## 8. Testing & Validation Strategy 1. **Unit tests** - Cover workflow parsing, semantic indexing, mutation functions, and asset discovery. Use fixtures to simulate complex workflows (including multi-stage pipelines). 2. **Integration tests** - Mock ComfyUI endpoints to verify execution, preview-only outputs, error propagation, and batch runs. - Test real-time streaming hooks in a controlled environment. 3. **End-to-end scenarios** - Provide scripted examples demonstrating template selection, customization, batch execution, and output retrieval. --- ## 9. Documentation & Onboarding 1. **Developer documentation** - Expand `docs/` with architecture overviews, configuration guides, workflow JSON expectations, and instructions for adding new templates. - Document naming conventions and best practices, referencing ComfyUI’s API documentation when relevant. 2. **User guides** - Provide walkthroughs for common tasks (listing workflows, tweaking prompts/models, executing batches). - Include troubleshooting tips for missing assets or API connection issues. 3. **Examples** - Supply sample workflows in `/workflows` with descriptive metadata. - Add example MCP tool invocations demonstrating real-time update flows and batch execution. --- ## 10. Future Enhancements & Roadmap - **Dynamic node discovery**: Automatically ingest installed custom nodes from ComfyUI and extend the normalization registry without manual coding. - **Workflow diff visualization**: Generate visual or textual representations of workflow changes to aid understanding. - **Persistent sessions**: Remember user preferences and previous modifications across conversations. - **Advanced batching**: Offer grid or prompt-matrix style executions with automatic result collation. --- ## Open Question Responses 1. **Real-time conversational updates**: Support real-time streaming of execution progress and allow incremental adjustments during a session. 2. **Workflow labeling**: Follow ComfyUI’s existing metadata conventions and consult official documentation for node labels; provide gentle guidance but avoid rigid new standards unless needed. 3. **Authentication**: Default to local, no-auth operation but expose configuration options for base URL and optional API key/token. 4. **Batch and multi-stage workflows**: Ensure the system handles chained models and multi-stage pipelines, offering batched execution capabilities when required. --- ## Next Steps Checklist - [ ] Scaffold the Python package and configuration system. - [ ] Implement workflow discovery and semantic indexing. - [ ] Build the mutation API with support for multi-stage workflows. - [ ] Develop the ComfyUI client with streaming updates and preview handling. - [ ] Expose MCP tools and JSON schemas for LLM interaction. - [ ] Implement asset management, validation, and optional authentication. - [ ] Establish testing suites (unit, integration, end-to-end). - [ ] Produce user and developer documentation.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/neutrinotek/ComfyUI_MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server