ComfyUI-GPU-Optimizer
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ComfyUI-GPU-Optimizershow GPU status and optimize VRAM"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
ComfyUI GPU Optimizer
AI-controlled GPU VRAM management for ComfyUI. Run LTX 2.3 video, SDXL, AnimateDiff, and FLUX on an 8GB GPU without running out of memory.
What It Does
Two components that work together:
ComfyUI custom node -- runs inside ComfyUI, monitors VRAM every 3 seconds, and automatically offloads models to CPU when memory pressure hits your threshold (default 82%). No more OOM crashes mid-workflow.
MCP server -- lets Claude Code (or any MCP client) see your GPU state and control it in real time. Ask Claude to free up VRAM, start/stop ComfyUI, or check what models are loaded.
Claude Code ComfyUI
| |
|--- MCP (stdio) ---> MCP Server (mcp_server.py)
|
|--- HTTP ---> GPU Optimizer (port 9111)
|
|--- comfy.model_management
|--- torch.cudaRelated MCP server: kje-mcp
Why This Exists
ComfyUI's built-in Dynamic VRAM handles basic auto-offloading. This project adds:
Configurable thresholds -- set exactly when offloading kicks in (50-95%)
Model-level visibility -- see every loaded model, its size, and device
Selective offloading -- unload specific models by index, not just the largest
AI control via MCP -- Claude can manage your GPU while you work
ComfyUI lifecycle management -- start, stop, restart with tuned flags for 8GB cards
Process visibility -- see everything using your GPU, not just ComfyUI
Quick Start
Option A: Full setup (custom node + MCP for Claude)
# Clone into ComfyUI custom_nodes
git clone https://github.com/WaightTech/ComfyUI-GPU-Optimizer \
~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer
# Install and register with Claude Code
cd ~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer
bash setup.shRestart ComfyUI. Open Claude Code. Your GPU tools are ready.
Option B: Custom node only (no MCP)
git clone https://github.com/WaightTech/ComfyUI-GPU-Optimizer \
~/ComfyUI/custom_nodes/ComfyUI-GPU-OptimizerRestart ComfyUI. The optimizer loads automatically and manages VRAM in the background. The HTTP API is available on http://127.0.0.1:9111 for custom integrations.
Option C: MCP server only (already have the node)
pip install comfyui-gpu-optimizerThen add to your Claude Code MCP config:
{
"mcpServers": {
"gpu-manager": {
"command": "comfyui-gpu-optimizer"
}
}
}MCP Tools
Tool | What it does |
| VRAM usage, utilization, temperature, power draw |
| List all processes using the GPU |
| Show loaded models with sizes and devices |
| Auto-offload models to free VRAM (largest first) |
| Offload a specific model by index |
| Set auto-offload threshold (50-95%) |
| Enable/disable automatic VRAM management |
| Unload ALL models and empty CUDA cache |
| Check if ComfyUI is running and responding |
| Start ComfyUI with optimized flags for your GPU |
| Graceful shutdown |
| Stop, flush, restart |
| Kill all GPU compute processes (nuclear option) |
| Tail the ComfyUI log |
Configuration
All settings are optional environment variables with sensible defaults:
Variable | Default | Description |
|
| ComfyUI install path |
|
| ComfyUI API endpoint |
|
| Log file location |
|
| GB to reserve for OS stability |
|
| Optimizer HTTP API port |
|
| Optimizer API URL (MCP server side) |
|
| Auto-offload threshold (%) |
|
| Monitor check interval (seconds) |
How It Works
The custom node loads when ComfyUI starts. It spins up two daemon threads:
Monitor thread -- checks VRAM usage every 3 seconds. When usage exceeds the threshold, it offloads the largest loaded model to CPU via
comfy.model_management, then clears the CUDA cache. Repeats until usage is below threshold.API server -- HTTP server on port 9111 (localhost only). Exposes endpoints for status, model listing, offloading, flushing, and threshold adjustment.
The MCP server runs as a separate process. It queries the GPU directly via pynvml (NVML) for hardware stats, and calls the optimizer's HTTP API for model-level operations. If ComfyUI isn't running, hardware-level tools (status, processes) still work; model-level tools return a clear error.
Tested On
NVIDIA GeForce RTX 4060 Ti 8GB
WSL2 (Ubuntu) on Windows
Python 3.10+
Verified workloads on 8GB:
Workload | VRAM Usage | Notes |
LTX 2.3 Video (22B Q4 GGUF) | 52-70% | 10-sec video in ~6 min. Add |
AnimateDiff + Realistic Vision | 54% | 5-sec video in ~5 min |
SDXL (epiCRealismXL) | 60% | Image generation |
FLUX (GGUF quantized) | varies | Works with auto-offloading |
System RAM matters. When the optimizer offloads models from GPU to CPU, they live in system RAM. LTX 2.3 keeps its 10.8GB Gemma text encoder on CPU full-time, and offloaded SDXL/FLUX models add to that. 32GB RAM recommended for video workloads, 16GB minimum for image generation.
Should work on any NVIDIA GPU with CUDA support. The 8GB defaults are tuned for low-VRAM cards but the thresholds are configurable for any card size.
License
MIT. See LICENSE.
Built by Waight Tech, LLC.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/mwaight/ComfyUI-GPU-Optimizer'
If you have feedback or need assistance with the MCP directory API, please join our Discord server