Alias
You can usesls as a shorthand for serverless:
Subcommands
List endpoints
List all your Serverless endpoints:List flags
Include template information in the output.
Include workers information in the output.
Get endpoint details
Get detailed information about a specific endpoint:Get flags
Include template information in the output.
Include workers information in the output.
Create an endpoint
Create a new Serverless endpoint from a template or from a Hub repo:--hub-id, GPU IDs and container disk size are automatically pulled from the Hub release config. You can override the GPU type with --gpu-id. Environment variables from the Hub release are included automatically, and you can override or add to them with --env.
Serverless templates vs Pod templates: Serverless endpoints require a Serverless-specific template. Pod templates (like
runpod-torch-v21) cannot be used because they include configuration, which Serverless does not support. When creating a template with runpodctl template create, use the --serverless flag to create a Serverless template.Each Serverless template can only be bound to one endpoint at a time. To create multiple endpoints with the same configuration, create separate templates for each.Create flags
Name for the endpoint. Must be at least 3 characters. If omitted, a name is auto-generated in the format
endpoint-XXXXXXXX.Template ID to use (required if
--hub-id is not specified). Use runpodctl template search to find templates.Hub listing ID to deploy from (alternative to
--template-id). Use runpodctl hub search to find repos.GPU type for workers. Accepts either a GPU type ID (e.g.,
NVIDIA A40, NVIDIA GeForce RTX 4090) or a GPU pool ID (e.g., ADA_24, AMPERE_48). Use runpodctl gpu list to see available GPUs.Number of GPUs per worker.
Compute type (
GPU or CPU). For CPU endpoints, use --instance-id to specify the CPU instance type.CPU instance ID when using
--compute-type CPU. If omitted, defaults to cpu3g-4-16. Only valid with --compute-type CPU.Minimum number of workers.
Maximum number of workers.
Comma-separated list of preferred datacenter IDs. Use
runpodctl datacenter list to see available datacenters.Network volume ID to attach for single-region deployments. Use
runpodctl network-volume list to see available network volumes. Mutually exclusive with --network-volume-ids.Comma-separated list of network volume IDs for multi-region deployments. Mutually exclusive with
--network-volume-id.Minimum CUDA version required for workers (e.g.,
12.4). Workers will only be scheduled on machines that meet this CUDA version requirement.Autoscaling strategy:
delay (scales based on queue wait time in seconds) or requests (scales based on pending request count).Trigger point for the autoscaler. For
delay, this is the target queue wait time in seconds. For requests, this is the pending request count that triggers scaling.Idle timeout in seconds. Workers shut down after being idle for this duration. Valid range: 1-3600 seconds.
Enable or disable flash boot for faster worker startup. When enabled, workers start from cached container images.
Execution timeout in seconds. Jobs that exceed this duration are terminated. The CLI accepts seconds but converts to milliseconds internally.
Environment variable in
KEY=VALUE format. Use multiple --env flags to set multiple variables. These values only apply when deploying from --hub-id, where they override the Hub release defaults. With --template-id, environment variables come from the template, so --env is ignored and the CLI prints a note to that effect.Model reference URL to attach to the endpoint. Use multiple
--model-reference flags to attach multiple models. Works with both --template-id and --hub-id, and requires GPU compute type.Update an endpoint
Update endpoint configuration:Update flags
New name for the endpoint.
New minimum number of workers.
New maximum number of workers.
New idle timeout in seconds.
Scaler type (
QUEUE_DELAY or REQUEST_COUNT).Scaler value.
Enable or disable flash boot for faster worker startup.
Execution timeout in seconds. Jobs that exceed this duration are terminated.
Delete an endpoint
Delete an endpoint:Serverless URLs
Access your Serverless endpoint using these URL patterns:| Operation | URL |
|---|---|
| Async request | https://api.runpod.ai/v2/<endpoint-id>/run |
| Sync request | https://api.runpod.ai/v2/<endpoint-id>/runsync |
| Health check | https://api.runpod.ai/v2/<endpoint-id>/health |
| Job status | https://api.runpod.ai/v2/<endpoint-id>/status/<job-id> |