> ## Documentation Index
> Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Async Serverless Endpoint

If you don't have a Novita account, <Link href="https://novita.ai/user/register" target="_blank">sign up</Link> first. For details, see the <Link href="/guides/quickstart">Quickstart guide</Link>.

This article uses the ComfyUI worker image `novitalabs/comfyui-worker:v0.0.1` as an example to show how to create and call an Async Serverless Endpoint.

## 1. Prepare Container Image

Package your runtime environment into a Docker image and upload it to an image registry in advance. Both public and private image registries are supported. Private registries require image pull credentials.

* You can upload your image to Docker Hub. The platform currently provides an [image warm-up service](https://novita.ai/gpus-console/image) for Docker Hub images.

This example uses `novitalabs/comfyui-worker:v0.0.1`. The image includes ComfyUI and the Novita worker SDK. The task input is a ComfyUI workflow JSON, and the worker handler returns generated image results. We recommend configuring object storage environment variables such as `BUCKET_ENDPOINT_URL`, so generated images and videos can be uploaded to your bucket and returned as URLs in the job output.

## 2. Select Instance Specification

Async Serverless Endpoint currently supports the following GPU instance types:

* RTX 4090 24GB
* H100 SXM 80GB

For this `comfyui-worker` example, we recommend **RTX 4090 24GB**.

For additional requirements, [contact us](mailto:support@novita.ai).

## 3. Create Cloud Storage (Optional)

If you need shared or persistent storage, create cloud storage on the [storage management page](https://novita.ai/gpus-console/storage), then mount the storage when creating the endpoint. For details, see [Manage Cloud Storage](https://novita.ai/docs/guides/gpu-instance-quickstart-manage-network-volume).

## 4. Create Endpoint

1. Go to the [Async Serverless GPUs](https://novita.ai/gpus-console/serverless) page, select an instance type, and click "Create Endpoint".
2. Complete the Endpoint parameter configuration.

* **Endpoint Name**: Used to uniquely identify the Endpoint. It is part of the URL when creating jobs. The system generates a random default name. You can customize it, but using the default name is recommended.
* **Worker Configuration**

<table class="table table-big">
  <thead>
    <tr>
      <th>Configuration Item</th>
      <th>Description</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>Min Worker Count</td>
      <td>The minimum number of worker instances to keep for the Endpoint. Setting a higher minimum helps reduce cold start time. If set to 0, there will be no idle workers when there are no requests, which may increase response latency for new requests. Use 0 with caution for latency-sensitive scenarios.</td>
    </tr>

    <tr>
      <td>Max Worker Count</td>
      <td>The maximum number of worker instances that the Endpoint can scale up to. When request volume increases, the platform automatically increases workers up to this maximum. This limit helps control costs.</td>
    </tr>

    <tr>
      <td>Idle Timeout (seconds)</td>
      <td>When a worker is about to be released due to scale-down, the platform keeps it for the configured idle timeout so it can respond quickly to new requests. You are charged for the worker during this period.</td>
    </tr>

    <tr>
      <td>Max Concurrent Requests</td>
      <td>The maximum number of concurrent requests handled by one worker. If this is exceeded, requests are routed to other workers. If all workers are fully occupied, excess requests are queued until execution is possible.</td>
    </tr>

    <tr>
      <td>GPUs / Worker</td>
      <td>Number of GPU cards allocated to each worker.</td>
    </tr>

    <tr>
      <td>CUDA Version</td>
      <td>CUDA version used by the worker.</td>
    </tr>
  </tbody>
</table>

For this example, select **RTX 4090 24GB** and set `GPUs / Worker` to `1`.

* **Type**:
  * Select **Async**.
* **Elastic Policy**:
  * Select **Queue request policy**.
  * Set **Single worker target concurrency** to `1`. The ComfyUI worker in this example processes one job at a time. When queued requests exceed current worker capacity, the platform scales workers based on the queue request count until reaching the maximum worker count.
* **Image Configuration**:
  * Image address: `novitalabs/comfyui-worker:v0.0.1`.
  * Image repository credentials: If the image is private, provide image pull credentials. You can create credentials on the [security credentials management page](https://novita.ai/gpu-instance/console/settings).
  * HTTP Port: Worker HTTP port.
  * Container start command: Command executed when the container starts.
* **Storage Configuration**:
  * System disk: System disk size per worker instance.
  * Cloud storage: Select cloud storage if you need to mount it. For details, see [Manage Cloud Storage](https://novita.ai/docs/guides/gpu-instance-quickstart-manage-network-volume).
* **Other**:
  * Health check path: This parameter is currently not enabled.
  * Environment variables: Set environment variables required by the service. Example S3 configuration:

```bash theme={"system"}
BUCKET_ENDPOINT_URL=https://s3.<aws-region>.amazonaws.com
BUCKET_ACCESS_KEY_ID=<your-access-key-id>
BUCKET_SECRET_ACCESS_KEY=<your-secret-access-key>
BUCKET_NAME=<your-bucket-name>
```

When using `comfyui-worker`, we strongly recommend configuring object storage so output images are uploaded to a bucket and returned as URLs.

3. Review pricing and click "Deploy with One Click".

## 5. Access the Service

1. On the [Async Serverless GPUs](https://novita.ai/gpus-console/serverless) page, find the newly created Endpoint and ensure its status is "Running".
2. Ensure that at least one Worker in the Endpoint is running.
3. Ensure you have an API Key for authentication. The Endpoint creator and the API Key owner must belong to the same team.

You need the following information to call an Async Serverless Endpoint:

| Parameter       | Description                                                                                                                        |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| Public base URL | `https://async-public.serverless.novita.ai/v1`                                                                                     |
| Endpoint Name   | The name generated after creating the Endpoint, for example `0f43a6867e05fddd`. This name is part of the job URL.                  |
| API Key         | Create or copy an API Key from the API Key / Key Management page. Pass it in the `Authorization: Bearer <API_KEY>` request header. |

Get an API Key:

1. Log in to the Novita console.
2. Go to the API Key / Key Management page.
3. Create an API Key and copy the generated `sk_...` value.
4. Ensure the API Key owner and Endpoint owner are in the same team.

### 5.1 Create a Job and Retrieve Output via Curl

The following request is an executable `comfyui-worker` example and matches the tested case. Replace `0f43a6867e05fddd` in the URL with your real Endpoint name, and replace `sk_xxxx` with your real API Key.

<Note>
  The maximum job size accepted by Async Serverless Endpoint is 4 MiB.
</Note>

```bash theme={"system"}
curl -X POST https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/run \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk_xxxx' \
  -d '{
    "input": {
      "workflow": {
        "4": {
          "class_type": "CheckpointLoaderSimple",
          "inputs": {
            "ckpt_name": "flux1-dev-fp8.safetensors"
          }
        },
        "5": {
          "class_type": "EmptyLatentImage",
          "inputs": {
            "width": 512,
            "height": 512,
            "batch_size": 1
          }
        },
        "6": {
          "class_type": "CLIPTextEncode",
          "inputs": {
            "clip": ["4", 1],
            "text": "a red apple on a table"
          }
        },
        "7": {
          "class_type": "CLIPTextEncode",
          "inputs": {
            "clip": ["4", 1],
            "text": "blurry, low quality"
          }
        },
        "3": {
          "class_type": "KSampler",
          "inputs": {
            "model": ["4", 0],
            "positive": ["6", 0],
            "negative": ["7", 0],
            "latent_image": ["5", 0],
            "seed": 42,
            "steps": 10,
            "cfg": 7,
            "sampler_name": "euler",
            "scheduler": "normal",
            "denoise": 1
          }
        },
        "8": {
          "class_type": "VAEDecode",
          "inputs": {
            "samples": ["3", 0],
            "vae": ["4", 2]
          }
        },
        "9": {
          "class_type": "SaveImage",
          "inputs": {
            "filename_prefix": "test",
            "images": ["8", 0]
          }
        }
      },
      "output_node_id": "9"
    }
}'
```

Response example, where `id` is the `job_id`:

```json theme={"system"}
{"id":"8cb6a77c-62aa-4eb4-9226-1ca5724fd9dd","status":"PENDING"}
```

**Check job status and retrieve results**

<Note>
  The maximum output size returned by the Async Serverless Endpoint `status` API is 4 MiB. To avoid this limitation, configure object storage environment variables and return uploaded file URLs in the output.

  Job results are kept in the Async Serverless Endpoint for up to 6 hours after completion.
</Note>

```bash theme={"system"}
curl -X GET https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/status/33a0bc4b-7312-41f6-ad15-eb9016bd68f9 \
  -H 'Authorization: Bearer sk_xxxx'
```

**Cancel Job**

```bash theme={"system"}
curl -X POST https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/cancel/e5f3c3c0-c3b1-49c2-9452-bb96eaa34ce6 \
  -H 'Authorization: Bearer sk_xxxx'
```

**Check Endpoint Job Queue Status**

```bash theme={"system"}
curl -X GET https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/health \
  -H 'Authorization: Bearer sk_xxxx'
```

Response example:

```json theme={"system"}
{
  "workers": {
    "idle": 0,
    "running": 0,
    "throttled": 0,
    "total": 0
  },
  "jobs": {
    "completed": 0,
    "failed": 0,
    "inProgress": 0,
    "inQueue": 0,
    "retried": 0
  }
}
```

### 5.2 Create Job and Get Results via Novita SDK

Install the SDK:

```bash theme={"system"}
pip install novita-gpus
```

```python theme={"system"}
import novita_gpus

novita_gpus.api_key = "sk_xxxx"

input_payload = {
    "workflow": {
        "4": {
            "class_type": "CheckpointLoaderSimple",
            "inputs": {"ckpt_name": "flux1-dev-fp8.safetensors"},
        },
        "5": {
            "class_type": "EmptyLatentImage",
            "inputs": {"width": 512, "height": 512, "batch_size": 1},
        },
        "6": {
            "class_type": "CLIPTextEncode",
            "inputs": {"clip": ["4", 1], "text": "a red apple on a table"},
        },
        "7": {
            "class_type": "CLIPTextEncode",
            "inputs": {"clip": ["4", 1], "text": "blurry, low quality"},
        },
        "3": {
            "class_type": "KSampler",
            "inputs": {
                "model": ["4", 0],
                "positive": ["6", 0],
                "negative": ["7", 0],
                "latent_image": ["5", 0],
                "seed": 42,
                "steps": 10,
                "cfg": 7,
                "sampler_name": "euler",
                "scheduler": "normal",
                "denoise": 1,
            },
        },
        "8": {
            "class_type": "VAEDecode",
            "inputs": {"samples": ["3", 0], "vae": ["4", 2]},
        },
        "9": {
            "class_type": "SaveImage",
            "inputs": {"filename_prefix": "test", "images": ["8", 0]},
        },
    },
    "output_node_id": "9",
}

endpoint = novita_gpus.Endpoint("0f43a6867e05fddd")
job = endpoint.run(input_payload)

print(job.status())
output = job.output(timeout=300)
print(output)
```

The `novita-gpus` SDK default request URL is `https://async-public.serverless.novita.ai/v1`.

## 6. Manage Async Serverless Endpoint

See [Manage Serverless Endpoint](https://novita.ai/docs/guides/serverless-gpus-quickstart-manage-endpoint).
Configuration Item	Description
Min Worker Count	The minimum number of worker instances to keep for the Endpoint. Setting a higher minimum helps reduce cold start time. If set to 0, there will be no idle workers when there are no requests, which may increase response latency for new requests. Use 0 with caution for latency-sensitive scenarios.
Max Worker Count	The maximum number of worker instances that the Endpoint can scale up to. When request volume increases, the platform automatically increases workers up to this maximum. This limit helps control costs.
Idle Timeout (seconds)	When a worker is about to be released due to scale-down, the platform keeps it for the configured idle timeout so it can respond quickly to new requests. You are charged for the worker during this period.
Max Concurrent Requests	The maximum number of concurrent requests handled by one worker. If this is exceeded, requests are routed to other workers. If all workers are fully occupied, excess requests are queued until execution is possible.
GPUs / Worker	Number of GPU cards allocated to each worker.
CUDA Version	CUDA version used by the worker.