gpu_watch
Take multiple GPU status snapshots at a fixed interval and return raw frames plus per-card min/max/avg statistics for utilization, temperature, power, and VRAM usage. Helps determine if a training run is stable.
Instructions
Take N snapshots of gpu_status at a fixed interval and return both the raw frames and per-card min/max/avg statistics for utilization, temperature, power, and VRAM usage. Useful for answering “is this training run stable?”. Default: 5 samples at 1000ms intervals.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| samples | No | Number of samples to take (2–60). Default: 5. | |
| interval_ms | No | Milliseconds between samples (100–10000). Default: 1000. |