Skip to main content
Glama
mwaight
by mwaight

ComfyUI GPU Optimizer

AI-controlled GPU VRAM management for ComfyUI. Run LTX 2.3 video, SDXL, AnimateDiff, and FLUX on an 8GB GPU without running out of memory.

License: MIT Python 3.10+ ComfyUI MCP

What It Does

Two components that work together:

  1. ComfyUI custom node -- runs inside ComfyUI, monitors VRAM every 3 seconds, and automatically offloads models to CPU when memory pressure hits your threshold (default 82%). No more OOM crashes mid-workflow.

  2. MCP server -- lets Claude Code (or any MCP client) see your GPU state and control it in real time. Ask Claude to free up VRAM, start/stop ComfyUI, or check what models are loaded.

Claude Code                    ComfyUI
    |                              |
    |--- MCP (stdio) --->  MCP Server (mcp_server.py)
                               |
                               |--- HTTP --->  GPU Optimizer (port 9111)
                                                   |
                                                   |--- comfy.model_management
                                                   |--- torch.cuda

Related MCP server: kje-mcp

Why This Exists

ComfyUI's built-in Dynamic VRAM handles basic auto-offloading. This project adds:

  • Configurable thresholds -- set exactly when offloading kicks in (50-95%)

  • Model-level visibility -- see every loaded model, its size, and device

  • Selective offloading -- unload specific models by index, not just the largest

  • AI control via MCP -- Claude can manage your GPU while you work

  • ComfyUI lifecycle management -- start, stop, restart with tuned flags for 8GB cards

  • Process visibility -- see everything using your GPU, not just ComfyUI

Quick Start

Option A: Full setup (custom node + MCP for Claude)

# Clone into ComfyUI custom_nodes
git clone https://github.com/WaightTech/ComfyUI-GPU-Optimizer \
    ~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer

# Install and register with Claude Code
cd ~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer
bash setup.sh

Restart ComfyUI. Open Claude Code. Your GPU tools are ready.

Option B: Custom node only (no MCP)

git clone https://github.com/WaightTech/ComfyUI-GPU-Optimizer \
    ~/ComfyUI/custom_nodes/ComfyUI-GPU-Optimizer

Restart ComfyUI. The optimizer loads automatically and manages VRAM in the background. The HTTP API is available on http://127.0.0.1:9111 for custom integrations.

Option C: MCP server only (already have the node)

pip install comfyui-gpu-optimizer

Then add to your Claude Code MCP config:

{
  "mcpServers": {
    "gpu-manager": {
      "command": "comfyui-gpu-optimizer"
    }
  }
}

MCP Tools

Tool

What it does

gpu_status

VRAM usage, utilization, temperature, power draw

gpu_processes

List all processes using the GPU

gpu_memory_map

Show loaded models with sizes and devices

gpu_optimize

Auto-offload models to free VRAM (largest first)

gpu_offload_model

Offload a specific model by index

gpu_set_threshold

Set auto-offload threshold (50-95%)

gpu_auto_manage

Enable/disable automatic VRAM management

gpu_flush_models

Unload ALL models and empty CUDA cache

comfyui_status

Check if ComfyUI is running and responding

comfyui_start

Start ComfyUI with optimized flags for your GPU

comfyui_stop

Graceful shutdown

comfyui_restart

Stop, flush, restart

vram_flush

Kill all GPU compute processes (nuclear option)

comfyui_log

Tail the ComfyUI log

Configuration

All settings are optional environment variables with sensible defaults:

Variable

Default

Description

COMFYUI_DIR

~/ComfyUI

ComfyUI install path

COMFYUI_URL

http://localhost:8188

ComfyUI API endpoint

COMFYUI_LOG

~/.comfyui-gpu-optimizer.log

Log file location

COMFYUI_RESERVE_VRAM

1.5

GB to reserve for OS stability

GPU_OPTIMIZER_PORT

9111

Optimizer HTTP API port

GPU_OPTIMIZER_URL

http://127.0.0.1:9111

Optimizer API URL (MCP server side)

GPU_OPTIMIZER_THRESHOLD

82

Auto-offload threshold (%)

GPU_OPTIMIZER_POLL_INTERVAL

3

Monitor check interval (seconds)

How It Works

The custom node loads when ComfyUI starts. It spins up two daemon threads:

  1. Monitor thread -- checks VRAM usage every 3 seconds. When usage exceeds the threshold, it offloads the largest loaded model to CPU via comfy.model_management, then clears the CUDA cache. Repeats until usage is below threshold.

  2. API server -- HTTP server on port 9111 (localhost only). Exposes endpoints for status, model listing, offloading, flushing, and threshold adjustment.

The MCP server runs as a separate process. It queries the GPU directly via pynvml (NVML) for hardware stats, and calls the optimizer's HTTP API for model-level operations. If ComfyUI isn't running, hardware-level tools (status, processes) still work; model-level tools return a clear error.

Tested On

  • NVIDIA GeForce RTX 4060 Ti 8GB

  • WSL2 (Ubuntu) on Windows

  • Python 3.10+

Verified workloads on 8GB:

Workload

VRAM Usage

Notes

LTX 2.3 Video (22B Q4 GGUF)

52-70%

10-sec video in ~6 min. Add --cache-lru 10 to ComfyUI launch flags.

AnimateDiff + Realistic Vision

54%

5-sec video in ~5 min

SDXL (epiCRealismXL)

60%

Image generation

FLUX (GGUF quantized)

varies

Works with auto-offloading

System RAM matters. When the optimizer offloads models from GPU to CPU, they live in system RAM. LTX 2.3 keeps its 10.8GB Gemma text encoder on CPU full-time, and offloaded SDXL/FLUX models add to that. 32GB RAM recommended for video workloads, 16GB minimum for image generation.

Should work on any NVIDIA GPU with CUDA support. The 8GB defaults are tuned for low-VRAM cards but the thresholds are configurable for any card size.

License

MIT. See LICENSE.

Built by Waight Tech, LLC.

Install Server
A
license - permissive license
B
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mwaight/ComfyUI-GPU-Optimizer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server