> ## Documentation Index
> Fetch the complete documentation index at: https://runpod-b18f5ded-promptless-serverless-model-reference.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# serverless

export const VolumeDiskTooltip = () => {
  return <Tooltip headline="Volume disk" tip="Persistent storage that remains available for the duration of the Pod's lease. It functions like a dedicated hard drive, allowing you to store data that needs to be retained even if the Pod is stopped or rebooted. Mounted at /workspace by default." cta="Learn more about volume disks" href="/pods/storage/types">volume disk</Tooltip>;
};

Manage Serverless endpoints, including creating, listing, updating, and deleting endpoints.

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
runpodctl serverless <subcommand> [flags]
```

## Alias

You can use `sls` as a shorthand for `serverless`:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
runpodctl sls list
```

## Subcommands

### List endpoints

List all your Serverless endpoints:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
runpodctl serverless list
```

#### List flags

<ResponseField name="--include-template" type="bool">
  Include template information in the output.
</ResponseField>

<ResponseField name="--include-workers" type="bool">
  Include workers information in the output.
</ResponseField>

### Get endpoint details

Get detailed information about a specific endpoint:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
runpodctl serverless get <endpoint-id>
```

#### Get flags

<ResponseField name="--include-template" type="bool">
  Include template information in the output.
</ResponseField>

<ResponseField name="--include-workers" type="bool">
  Include workers information in the output.
</ResponseField>

### Create an endpoint

Create a new Serverless endpoint from a template or from a Hub repo:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Create from a template
runpodctl serverless create --template-id "tpl_abc123" --gpu-id "NVIDIA GeForce RTX 4090"

# Create from a template with a model reference
runpodctl serverless create --template-id "tpl_abc123" --gpu-id "NVIDIA GeForce RTX 4090" \
  --model-reference https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct:main

# Create a CPU endpoint
runpodctl serverless create --template-id "tpl_abc123" --compute-type CPU

# Create from a Hub repo
runpodctl hub search vllm                                         # Find the hub ID
runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --name "my-vllm"

# Create from a Hub repo and attach a model reference
runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --gpu-id "NVIDIA GeForce RTX 4090" \
  --model-reference https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct:main

# Create from a Hub repo with custom environment variables
runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --name "my-vllm" \
  --env MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct \
  --env MAX_TOKENS=4096
```

When using `--hub-id`, GPU IDs and container disk size are automatically pulled from the Hub release config. You can override the GPU type with `--gpu-id`. Environment variables from the Hub release are included automatically, and you can override or add to them with `--env`.

<Note>
  **Serverless templates vs Pod templates**: Serverless endpoints require a Serverless-specific template. Pod templates (like `runpod-torch-v21`) cannot be used because they include <VolumeDiskTooltip /> configuration, which Serverless does not support. When creating a template with [`runpodctl template create`](/runpodctl/reference/runpodctl-template), use the `--serverless` flag to create a Serverless template.

  Each Serverless template can only be bound to one endpoint at a time. To create multiple endpoints with the same configuration, create separate templates for each.
</Note>

#### Create flags

<ResponseField name="--name" type="string">
  Name for the endpoint. Must be at least 3 characters. If omitted, a name is auto-generated in the format `endpoint-XXXXXXXX`.
</ResponseField>

<ResponseField name="--template-id" type="string">
  Template ID to use (required if `--hub-id` is not specified). Use [`runpodctl template search`](/runpodctl/reference/runpodctl-template) to find templates.
</ResponseField>

<ResponseField name="--hub-id" type="string">
  Hub listing ID to deploy from (alternative to `--template-id`). Use [`runpodctl hub search`](/runpodctl/reference/runpodctl-hub) to find repos.
</ResponseField>

<ResponseField name="--gpu-id" type="string">
  GPU type for workers. Accepts either a GPU type ID (e.g., `NVIDIA A40`, `NVIDIA GeForce RTX 4090`) or a GPU pool ID (e.g., `ADA_24`, `AMPERE_48`). Use [`runpodctl gpu list`](/runpodctl/reference/runpodctl-gpu) to see available GPUs.
</ResponseField>

<ResponseField name="--gpu-count" type="int" default="1">
  Number of GPUs per worker.
</ResponseField>

<ResponseField name="--compute-type" type="string" default="GPU">
  Compute type (`GPU` or `CPU`). For CPU endpoints, use `--instance-id` to specify the CPU instance type.
</ResponseField>

<ResponseField name="--instance-id" type="string" default="cpu3g-4-16">
  CPU instance ID when using `--compute-type CPU`. If omitted, defaults to `cpu3g-4-16`. Only valid with `--compute-type CPU`.
</ResponseField>

<ResponseField name="--workers-min" type="int" default="0">
  Minimum number of workers.
</ResponseField>

<ResponseField name="--workers-max" type="int" default="3">
  Maximum number of workers.
</ResponseField>

<ResponseField name="--data-center-ids" type="string">
  Comma-separated list of preferred datacenter IDs. Use [`runpodctl datacenter list`](/runpodctl/reference/runpodctl-datacenter) to see available datacenters.
</ResponseField>

<ResponseField name="--network-volume-id" type="string">
  Network volume ID to attach for single-region deployments. Use [`runpodctl network-volume list`](/runpodctl/reference/runpodctl-network-volume) to see available network volumes. Mutually exclusive with `--network-volume-ids`.
</ResponseField>

<ResponseField name="--network-volume-ids" type="string">
  Comma-separated list of network volume IDs for multi-region deployments. Mutually exclusive with `--network-volume-id`.
</ResponseField>

<ResponseField name="--min-cuda-version" type="string">
  Minimum CUDA version required for workers (e.g., `12.4`). Workers will only be scheduled on machines that meet this CUDA version requirement.
</ResponseField>

<ResponseField name="--scale-by" type="string">
  Autoscaling strategy: `delay` (scales based on queue wait time in seconds) or `requests` (scales based on pending request count).
</ResponseField>

<ResponseField name="--scale-threshold" type="int">
  Trigger point for the autoscaler. For `delay`, this is the target queue wait time in seconds. For `requests`, this is the pending request count that triggers scaling.
</ResponseField>

<ResponseField name="--idle-timeout" type="int">
  Idle timeout in seconds. Workers shut down after being idle for this duration. Valid range: 1-3600 seconds.
</ResponseField>

<ResponseField name="--flash-boot" type="bool">
  Enable or disable flash boot for faster worker startup. When enabled, workers start from cached container images.
</ResponseField>

<ResponseField name="--execution-timeout" type="int">
  Execution timeout in seconds. Jobs that exceed this duration are terminated. The CLI accepts seconds but converts to milliseconds internally.
</ResponseField>

<ResponseField name="--env" type="string">
  Environment variable in `KEY=VALUE` format. Use multiple `--env` flags to set multiple variables. These values only apply when deploying from `--hub-id`, where they override the Hub release defaults. With `--template-id`, environment variables come from the template, so `--env` is ignored and the CLI prints a note to that effect.
</ResponseField>

<ResponseField name="--model-reference" type="string">
  Model reference URL to attach to the endpoint. Use multiple `--model-reference` flags to attach multiple models. Works with both `--template-id` and `--hub-id`, and requires GPU compute type.
</ResponseField>

### Update an endpoint

Update endpoint configuration:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
runpodctl serverless update <endpoint-id> --workers-max 5
```

#### Update flags

<ResponseField name="--name" type="string">
  New name for the endpoint.
</ResponseField>

<ResponseField name="--workers-min" type="int">
  New minimum number of workers.
</ResponseField>

<ResponseField name="--workers-max" type="int">
  New maximum number of workers.
</ResponseField>

<ResponseField name="--idle-timeout" type="int">
  New idle timeout in seconds.
</ResponseField>

<ResponseField name="--scaler-type" type="string">
  Scaler type (`QUEUE_DELAY` or `REQUEST_COUNT`).
</ResponseField>

<ResponseField name="--scaler-value" type="int">
  Scaler value.
</ResponseField>

<ResponseField name="--flash-boot" type="bool">
  Enable or disable flash boot for faster worker startup.
</ResponseField>

<ResponseField name="--execution-timeout" type="int">
  Execution timeout in seconds. Jobs that exceed this duration are terminated.
</ResponseField>

### Delete an endpoint

Delete an endpoint:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
runpodctl serverless delete <endpoint-id>
```

## Serverless URLs

Access your Serverless endpoint using these URL patterns:

| Operation     | URL                                                      |
| ------------- | -------------------------------------------------------- |
| Async request | `https://api.runpod.ai/v2/<endpoint-id>/run`             |
| Sync request  | `https://api.runpod.ai/v2/<endpoint-id>/runsync`         |
| Health check  | `https://api.runpod.ai/v2/<endpoint-id>/health`          |
| Job status    | `https://api.runpod.ai/v2/<endpoint-id>/status/<job-id>` |

## Related commands

* [`runpodctl hub`](/runpodctl/reference/runpodctl-hub)
* [`runpodctl template`](/runpodctl/reference/runpodctl-template)
* [`runpodctl gpu list`](/runpodctl/reference/runpodctl-gpu)
