How do I use ComfyUI-GPU-Optimizer?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@ComfyUI-GPU-Optimizer show GPU status and optimize VRAM" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

ComfyUI-GPU-Optimizer

by mwaight

Overview Schema Related Servers Score Discussions

Python

Local

ComfyUI GPU Optimizer

AI-controlled GPU VRAM management for ComfyUI. Run LTX 2.3 video, SDXL, AnimateDiff, and FLUX on an 8GB GPU without running out of memory.

License: MIT Python 3.10+ ComfyUI MCP

What It Does

Two components that work together:

ComfyUI custom node -- runs inside ComfyUI, monitors VRAM every 3 seconds, and automatically offloads models to CPU when memory pressure hits your threshold (default 82%). No more OOM crashes mid-workflow.
MCP server -- lets Claude Code (or any MCP client) see your GPU state and control it in real time. Ask Claude to free up VRAM, start/stop ComfyUI, or check what models are loaded.

Claude Code                    ComfyUI
    |                              |
    |--- MCP (stdio) --->  MCP Server (mcp_server.py)
                               |
                               |--- HTTP --->  GPU Optimizer (port 9111)
                                                   |
                                                   |--- comfy.model_management
                                                   |--- torch.cuda

Related MCP server: kje-mcp

Why This Exists

ComfyUI's built-in Dynamic VRAM handles basic auto-offloading. This project adds:

Configurable thresholds -- set exactly when offloading kicks in (50-95%)
Model-level visibility -- see every loaded model, its size, and device
Selective offloading -- unload specific models by index, not just the largest
AI control via MCP -- Claude can manage your GPU while you work
ComfyUI lifecycle management -- start, stop, restart with tuned flags for 8GB cards
Process visibility -- see everything using your GPU, not just ComfyUI

Quick Start

Option A: Full setup (custom node + MCP for Claude)

# Clone into ComfyUI custom_nodes
git clone https://github.com/WaightTech/ComfyUI-GPU-Optimizer \
    ~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer

# Install and register with Claude Code
cd ~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer
bash setup.sh

Restart ComfyUI. Open Claude Code. Your GPU tools are ready.

Option B: Custom node only (no MCP)

git clone https://github.com/WaightTech/ComfyUI-GPU-Optimizer \
    ~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer

Restart ComfyUI. The optimizer loads automatically and manages VRAM in the background. The HTTP API is available on http://127.0.0.1:9111 for custom integrations.

Option C: MCP server only (already have the node)

pip install comfyui-gpu-optimizer

Then add to your Claude Code MCP config:

{
  "mcpServers": {
    "gpu-manager": {
      "command": "comfyui-gpu-optimizer"
    }
  }
}

MCP Tools

Tool	What it does
`gpu_status`	VRAM usage, utilization, temperature, power draw
`gpu_processes`	List all processes using the GPU
`gpu_memory_map`	Show loaded models with sizes and devices
`gpu_optimize`	Auto-offload models to free VRAM (largest first)
`gpu_offload_model`	Offload a specific model by index
`gpu_set_threshold`	Set auto-offload threshold (50-95%)
`gpu_auto_manage`	Enable/disable automatic VRAM management
`gpu_flush_models`	Unload ALL models and empty CUDA cache
`comfyui_status`	Check if ComfyUI is running and responding
`comfyui_start`	Start ComfyUI with optimized flags for your GPU
`comfyui_stop`	Graceful shutdown
`comfyui_restart`	Stop, flush, restart
`vram_flush`	Kill all GPU compute processes (nuclear option)
`comfyui_log`	Tail the ComfyUI log

Configuration

All settings are optional environment variables with sensible defaults:

Variable	Default	Description
`COMFYUI_DIR`	`~/ComfyUI`	ComfyUI install path
`COMFYUI_URL`	`http://localhost:8188`	ComfyUI API endpoint
`COMFYUI_LOG`	`~/.comfyui-gpu-optimizer.log`	Log file location
`COMFYUI_RESERVE_VRAM`	`1.5`	GB to reserve for OS stability
`GPU_OPTIMIZER_PORT`	`9111`	Optimizer HTTP API port
`GPU_OPTIMIZER_URL`	`http://127.0.0.1:9111`	Optimizer API URL (MCP server side)
`GPU_OPTIMIZER_THRESHOLD`	`82`	Auto-offload threshold (%)
`GPU_OPTIMIZER_POLL_INTERVAL`	`3`	Monitor check interval (seconds)

How It Works

The custom node loads when ComfyUI starts. It spins up two daemon threads:

Monitor thread -- checks VRAM usage every 3 seconds. When usage exceeds the threshold, it offloads the largest loaded model to CPU via comfy.model_management, then clears the CUDA cache. Repeats until usage is below threshold.
API server -- HTTP server on port 9111 (localhost only). Exposes endpoints for status, model listing, offloading, flushing, and threshold adjustment.

The MCP server runs as a separate process. It queries the GPU directly via pynvml (NVML) for hardware stats, and calls the optimizer's HTTP API for model-level operations. If ComfyUI isn't running, hardware-level tools (status, processes) still work; model-level tools return a clear error.

Tested On

NVIDIA GeForce RTX 4060 Ti 8GB
WSL2 (Ubuntu) on Windows
Python 3.10+

Verified workloads on 8GB:

Workload	VRAM Usage	Notes
LTX 2.3 Video (22B Q4 GGUF)	52-70%	10-sec video in ~6 min. Add `--cache-lru 10` to ComfyUI launch flags.
AnimateDiff + Realistic Vision	54%	5-sec video in ~5 min
SDXL (epiCRealismXL)	60%	Image generation
FLUX (GGUF quantized)	varies	Works with auto-offloading

System RAM matters. When the optimizer offloads models from GPU to CPU, they live in system RAM. LTX 2.3 keeps its 10.8GB Gemma text encoder on CPU full-time, and offloaded SDXL/FLUX models add to that. 32GB RAM recommended for video workloads, 16GB minimum for image generation.

Should work on any NVIDIA GPU with CUDA support. The 8GB defaults are tuned for low-VRAM cards but the thresholds are configurable for any card size.

License

MIT. See LICENSE.

Built by Waight Tech, LLC.

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mwaight/ComfyUI-GPU-Optimizer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server