graph_frequency
Analyze CUDA Graph launch frequency per executable to identify hot and cold graphs, and detect graph pool saturation for vLLM batch size tuning.
Instructions
Analyze CUDA Graph launch frequency per executable. Identifies hot graphs (high replay rate), cold graphs (captured but rarely launched), and graph pool saturation. Essential for vLLM batch size tuning.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pid | Yes | Process ID to query graph launch frequency for (required) | |
| window_seconds | No | Analysis window in seconds (default 60) | |
| since | No | Time range, e.g. 5m, 1h. Omit for saved DBs. | |
| tsc | No | telegraphic compression (default: true) |