> ## Documentation Index > Fetch the complete documentation index at: https://runpod-b18f5ded-promptless-serverless-model-reference.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # serverless export const VolumeDiskTooltip = () => { return volume disk; }; Manage Serverless endpoints, including creating, listing, updating, and deleting endpoints. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl serverless [flags] ``` ## Alias You can use `sls` as a shorthand for `serverless`: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl sls list ``` ## Subcommands ### List endpoints List all your Serverless endpoints: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl serverless list ``` #### List flags Include template information in the output. Include workers information in the output. ### Get endpoint details Get detailed information about a specific endpoint: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl serverless get ``` #### Get flags Include template information in the output. Include workers information in the output. ### Create an endpoint Create a new Serverless endpoint from a template or from a Hub repo: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} # Create from a template runpodctl serverless create --template-id "tpl_abc123" --gpu-id "NVIDIA GeForce RTX 4090" # Create from a template with a model reference runpodctl serverless create --template-id "tpl_abc123" --gpu-id "NVIDIA GeForce RTX 4090" \ --model-reference https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct:main # Create a CPU endpoint runpodctl serverless create --template-id "tpl_abc123" --compute-type CPU # Create from a Hub repo runpodctl hub search vllm # Find the hub ID runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --name "my-vllm" # Create from a Hub repo and attach a model reference runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --gpu-id "NVIDIA GeForce RTX 4090" \ --model-reference https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct:main # Create from a Hub repo with custom environment variables runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --name "my-vllm" \ --env MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct \ --env MAX_TOKENS=4096 ``` When using `--hub-id`, GPU IDs and container disk size are automatically pulled from the Hub release config. You can override the GPU type with `--gpu-id`. Environment variables from the Hub release are included automatically, and you can override or add to them with `--env`. **Serverless templates vs Pod templates**: Serverless endpoints require a Serverless-specific template. Pod templates (like `runpod-torch-v21`) cannot be used because they include configuration, which Serverless does not support. When creating a template with [`runpodctl template create`](/runpodctl/reference/runpodctl-template), use the `--serverless` flag to create a Serverless template. Each Serverless template can only be bound to one endpoint at a time. To create multiple endpoints with the same configuration, create separate templates for each. #### Create flags Name for the endpoint. Must be at least 3 characters. If omitted, a name is auto-generated in the format `endpoint-XXXXXXXX`. Template ID to use (required if `--hub-id` is not specified). Use [`runpodctl template search`](/runpodctl/reference/runpodctl-template) to find templates. Hub listing ID to deploy from (alternative to `--template-id`). Use [`runpodctl hub search`](/runpodctl/reference/runpodctl-hub) to find repos. GPU type for workers. Accepts either a GPU type ID (e.g., `NVIDIA A40`, `NVIDIA GeForce RTX 4090`) or a GPU pool ID (e.g., `ADA_24`, `AMPERE_48`). Use [`runpodctl gpu list`](/runpodctl/reference/runpodctl-gpu) to see available GPUs. Number of GPUs per worker. Compute type (`GPU` or `CPU`). For CPU endpoints, use `--instance-id` to specify the CPU instance type. CPU instance ID when using `--compute-type CPU`. If omitted, defaults to `cpu3g-4-16`. Only valid with `--compute-type CPU`. Minimum number of workers. Maximum number of workers. Comma-separated list of preferred datacenter IDs. Use [`runpodctl datacenter list`](/runpodctl/reference/runpodctl-datacenter) to see available datacenters. Network volume ID to attach for single-region deployments. Use [`runpodctl network-volume list`](/runpodctl/reference/runpodctl-network-volume) to see available network volumes. Mutually exclusive with `--network-volume-ids`. Comma-separated list of network volume IDs for multi-region deployments. Mutually exclusive with `--network-volume-id`. Minimum CUDA version required for workers (e.g., `12.4`). Workers will only be scheduled on machines that meet this CUDA version requirement. Autoscaling strategy: `delay` (scales based on queue wait time in seconds) or `requests` (scales based on pending request count). Trigger point for the autoscaler. For `delay`, this is the target queue wait time in seconds. For `requests`, this is the pending request count that triggers scaling. Idle timeout in seconds. Workers shut down after being idle for this duration. Valid range: 1-3600 seconds. Enable or disable flash boot for faster worker startup. When enabled, workers start from cached container images. Execution timeout in seconds. Jobs that exceed this duration are terminated. The CLI accepts seconds but converts to milliseconds internally. Environment variable in `KEY=VALUE` format. Use multiple `--env` flags to set multiple variables. These values only apply when deploying from `--hub-id`, where they override the Hub release defaults. With `--template-id`, environment variables come from the template, so `--env` is ignored and the CLI prints a note to that effect. Model reference URL to attach to the endpoint. Use multiple `--model-reference` flags to attach multiple models. Works with both `--template-id` and `--hub-id`, and requires GPU compute type. ### Update an endpoint Update endpoint configuration: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl serverless update --workers-max 5 ``` #### Update flags New name for the endpoint. New minimum number of workers. New maximum number of workers. New idle timeout in seconds. Scaler type (`QUEUE_DELAY` or `REQUEST_COUNT`). Scaler value. Enable or disable flash boot for faster worker startup. Execution timeout in seconds. Jobs that exceed this duration are terminated. ### Delete an endpoint Delete an endpoint: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl serverless delete ``` ## Serverless URLs Access your Serverless endpoint using these URL patterns: | Operation | URL | | ------------- | -------------------------------------------------------- | | Async request | `https://api.runpod.ai/v2//run` | | Sync request | `https://api.runpod.ai/v2//runsync` | | Health check | `https://api.runpod.ai/v2//health` | | Job status | `https://api.runpod.ai/v2//status/` | ## Related commands * [`runpodctl hub`](/runpodctl/reference/runpodctl-hub) * [`runpodctl template`](/runpodctl/reference/runpodctl-template) * [`runpodctl gpu list`](/runpodctl/reference/runpodctl-gpu)