create-endpoint
Creates a serverless endpoint on RunPod. Configure Docker image, GPU pool, autoscaling with min/max workers, idle timeout, and environment variables.
Instructions
Create a Serverless endpoint. On v2 (default), pass an inline config: imageName + gpuPoolIds (GPU pool names from list-gpu-types — the pool field, e.g. AMPERE_80/ADA_24) plus optional workers/scaling/disk/env. On v1, pass a templateId instead. Worker min/max set autoscaling bounds (min 0 = scale to zero).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| env | No | Environment variables (v2) | |
| args | No | Container start command/args (v2) | |
| name | No | Name for the endpoint | |
| ports | No | Ports to expose (v2), e.g. ['8000/http'] | |
| gpuCount | No | GPUs per worker (v2) | |
| flashboot | No | FlashBoot mode (v2) | |
| imageName | No | Docker image (v2). Required on v2 instead of a templateId. | |
| gpuPoolIds | No | GPU pool names (v2, required). The `pool` field from list-gpu-types, e.g. ["AMPERE_80"]. NOT GPU type ids. | |
| gpuTypeIds | No | List of acceptable GPU types (v1) | |
| scalerType | No | Autoscaler type | |
| templateId | No | Template ID (v1). Required on v1. | |
| workersMax | No | Maximum number of workers | |
| workersMin | No | Minimum number of workers | |
| computeType | No | GPU or CPU endpoint (v1) | |
| idleTimeout | No | Idle timeout in seconds before scaling a worker down | |
| scalerValue | No | Autoscaler target value | |
| dataCenterIds | No | List of preferred data centers | |
| networkVolumeIds | No | Network volume ids to attach (v2) | |
| containerDiskInGb | No | Container disk size in GB (v2) | |
| executionTimeoutMs | No | Per-job execution timeout in ms (v2) | |
| containerRegistryAuthId | No | Container registry auth id for a private image (v2) |