server-plan.md•11 kB
# ComfyUI MCP Server Implementation Plan
## Overview
This document outlines a comprehensive plan for implementing a Model Context Protocol (MCP) server that integrates with a locally hosted ComfyUI instance. The server will ingest workflow templates from a `/workflows` directory, interpret node semantics without relying on hard-coded IDs, and provide structured tools for large language models (LLMs) to customize and execute workflows. It will support real-time conversational updates, optional authentication to the ComfyUI backend, and batch execution scenarios where multiple models or stages run in sequence.
---
## 1. Project Scaffolding & Configuration
1. **Package layout**
- Create a Python package at `src/comfyui_mcp/` with clear entry points for the MCP server runtime and shared utilities.
- Add a `/workflows` directory at the repository root to store default `.json` workflow templates.
- Include `docs/` (this plan) and potentially `examples/` for sample usage.
2. **Configuration management**
- Provide a configuration file (e.g., `config.toml` or `settings.yaml`) or environment variables to define:
- ComfyUI base URL (host, port). Allow overriding via CLI flags.
- Optional API key/token for ComfyUI (even though the primary use-case is local, support this for future flexibility).
- Default workflow name and optional workflow metadata (description, tags, recommended assets).
- Directories for checkpoints, LoRAs, VAEs, text encoders, embeddings.
- Default parameter bounds (CFG, steps, resolution).
- Feature toggles (e.g., enable real-time streaming, enable batch execution mode).
3. **Data layer utilities**
- Implement a discovery layer to load all JSON workflows at startup, cache parsed versions, and optionally watch for file changes.
- Validate workflow structure (node list, link definitions, metadata) using a schema validator.
- Extract human-readable metadata for nodes if available; otherwise apply heuristics (see Section 2).
---
## 2. Workflow Introspection & Node Interpretation
1. **Graph parsing**
- Parse each workflow JSON into an internal graph representation with indices by `id`, `class_type`, and available metadata (`title`, `meta`, etc.).
- Capture input/output slot information to understand connections.
2. **Semantic indexing**
- Build secondary indices for node roles: prompts (`CLIPTextEncode`, prompt nodes), model selection (`CheckpointLoader*`), LoRA loaders, VAE loaders, CLIP/text encoders, samplers (CFG, steps), resolution/latent size controllers, output nodes (preview/save), etc.
- Normalize vendor-specific node names (including custom nodes) into canonical roles (e.g., `prompt_positive`, `prompt_negative`, `base_checkpoint`, `lora_list`).
3. **Heuristics for unlabeled nodes**
- Infer node roles by analyzing topology (e.g., nodes feeding a `model` input on a sampler are checkpoint loaders).
- Allow optional annotations in workflow JSON (e.g., `node_meta.display_name`) for manual overrides, but ensure defaults work without manual IDs.
- Consult ComfyUI documentation and APIs when new node types appear; build an extensible mapping layer (e.g., registry of known node signatures).
4. **Node role abstractions**
- Create a normalization layer that surfaces canonical roles to the LLM, enabling instructions like “update the negative prompt” or “swap the base model” without referencing raw node IDs.
---
## 3. Template Mutation API
1. **High-level mutation functions**
- Implement functions to modify prompts, base checkpoints, LoRAs (add/remove/adjust strength), VAEs, CLIP/text encoders, CFG/steps/scheduler, resolution/aspect ratios, seed control, and other common parameters.
- Ensure functions handle both single-stage and multi-stage workflows (e.g., multiple samplers or chained models).
2. **Graph integrity**
- Validate that mutations preserve required connections and update dependent nodes when necessary (e.g., ensuring encoder/decoder pairings match the chosen checkpoint).
- Automatically insert auxiliary nodes when needed (e.g., injecting a LoRA loader before a sampler stage).
3. **Change tracking**
- Maintain a diff summary showing before/after values for transparency and to inform the LLM of applied changes.
4. **Batch execution readiness**
- Allow mutations to target specific stages or iterations (e.g., stage `A` vs. stage `B` in a multi-model workflow).
- Support parameter sweeps or batched modifications when executing multiple variations.
---
## 4. Execution Pipeline
1. **ComfyUI client**
- Build an HTTP/WebSocket client that can submit workflows, monitor queue status, receive progress updates, and download outputs.
- Implement real-time streaming callbacks so conversational agents can receive incremental updates during execution.
2. **Handling outputs**
- Inspect execution results for both `save_image` and `preview_image` nodes. If only previews are generated, retrieve the in-memory preview and optionally offer to save it upon request.
- Structure output metadata with links or binary blobs so the LLM can present results.
3. **Batch execution and chaining**
- Enable execution of workflows that produce multiple outputs (e.g., multiple sampler stages). Aggregate outputs per stage and provide context (e.g., which model generated each image).
- Support batch submission for parameter variations or multi-model pipelines, returning a structured summary of results.
4. **Error handling**
- Provide clear error states (missing assets, invalid parameters, API failures). Include recovery suggestions or fallback defaults where possible.
---
## 5. LLM-Facing Abstractions
1. **MCP tool definitions**
- Expose structured tools such as:
- `list_workflows()` → metadata about available templates.
- `describe_workflow(name)` → semantic summary (node roles, adjustable parameters, required assets).
- `customize_workflow(name, changes)` → apply mutations; return diff summary.
- `execute_workflow(name, changes?, batch_options?, stream_updates?)` → optionally mutate, run, and stream results.
2. **Schema design**
- Provide JSON schemas with enums and numeric constraints to help the LLM format requests (e.g., valid sampler types, CFG ranges).
- Define structures for batch requests (e.g., arrays of parameter variations) and multi-stage workflows.
3. **Response format**
- Return structured results containing execution metadata, change logs, and links/handles for generated images or previews.
- Include conversational hints (e.g., suggested follow-up actions) to facilitate real-time interaction.
---
## 6. Model & Asset Management
1. **Asset discovery**
- Scan configured directories for checkpoints, LoRAs, VAEs, text encoders, etc., caching metadata for quick lookup.
- Optionally support metadata files (YAML/JSON) describing recommended usage, strengths, or compatible models.
2. **Validation & safeguards**
- Validate requested assets exist; suggest closest matches if not found.
- Enforce safe file paths to avoid arbitrary filesystem access.
3. **Dynamic updates**
- Allow refreshing asset catalogs at runtime to pick up newly added files without restarting the server.
---
## 7. Safety, Robustness & Observability
1. **Input validation**
- Enforce parameter bounds to prevent invalid sampler configurations.
- Sanitize text inputs and ensure they meet ComfyUI API limits.
2. **Timeouts & retries**
- Implement retry logic and configurable timeouts for network calls to ComfyUI.
- Surface errors with actionable context.
3. **Logging & metrics**
- Provide structured logging for requests, mutations, and execution results.
- Optionally emit metrics (execution time, success rates) for observability.
4. **Authentication support**
- Allow configuring optional API keys or tokens for ComfyUI endpoints, even though the default deployment is local.
---
## 8. Testing & Validation Strategy
1. **Unit tests**
- Cover workflow parsing, semantic indexing, mutation functions, and asset discovery. Use fixtures to simulate complex workflows (including multi-stage pipelines).
2. **Integration tests**
- Mock ComfyUI endpoints to verify execution, preview-only outputs, error propagation, and batch runs.
- Test real-time streaming hooks in a controlled environment.
3. **End-to-end scenarios**
- Provide scripted examples demonstrating template selection, customization, batch execution, and output retrieval.
---
## 9. Documentation & Onboarding
1. **Developer documentation**
- Expand `docs/` with architecture overviews, configuration guides, workflow JSON expectations, and instructions for adding new templates.
- Document naming conventions and best practices, referencing ComfyUI’s API documentation when relevant.
2. **User guides**
- Provide walkthroughs for common tasks (listing workflows, tweaking prompts/models, executing batches).
- Include troubleshooting tips for missing assets or API connection issues.
3. **Examples**
- Supply sample workflows in `/workflows` with descriptive metadata.
- Add example MCP tool invocations demonstrating real-time update flows and batch execution.
---
## 10. Future Enhancements & Roadmap
- **Dynamic node discovery**: Automatically ingest installed custom nodes from ComfyUI and extend the normalization registry without manual coding.
- **Workflow diff visualization**: Generate visual or textual representations of workflow changes to aid understanding.
- **Persistent sessions**: Remember user preferences and previous modifications across conversations.
- **Advanced batching**: Offer grid or prompt-matrix style executions with automatic result collation.
---
## Open Question Responses
1. **Real-time conversational updates**: Support real-time streaming of execution progress and allow incremental adjustments during a session.
2. **Workflow labeling**: Follow ComfyUI’s existing metadata conventions and consult official documentation for node labels; provide gentle guidance but avoid rigid new standards unless needed.
3. **Authentication**: Default to local, no-auth operation but expose configuration options for base URL and optional API key/token.
4. **Batch and multi-stage workflows**: Ensure the system handles chained models and multi-stage pipelines, offering batched execution capabilities when required.
---
## Next Steps Checklist
- [ ] Scaffold the Python package and configuration system.
- [ ] Implement workflow discovery and semantic indexing.
- [ ] Build the mutation API with support for multi-stage workflows.
- [ ] Develop the ComfyUI client with streaming updates and preview handling.
- [ ] Expose MCP tools and JSON schemas for LLM interaction.
- [ ] Implement asset management, validation, and optional authentication.
- [ ] Establish testing suites (unit, integration, end-to-end).
- [ ] Produce user and developer documentation.