Create Async Serverless Endpoint

If you don’t have a Novita account, you need to sign up first. For details, please refer to Quickstart guide. This article uses the deployment of runpod/worker-comfyui:5.5.0-flux1-dev as an example to demonstrate how to create an Async Serverless Endpoint.

1. Prepare Container Image

You need to package your runtime environment into a Docker image and upload it to an image repository in advance. Both public and private image repositories are supported (credentials required for private repositories).

You can upload your image to Docker Hub. The platform currently provides an image warm-up service for this site.

This example uses the runpod/worker-comfyui:5.5.0-flux1-dev model image. When using the worker-comfyui image, please configure S3-related settings such as BUCKET_ENDPOINT_URL. These settings ensure that images and videos generated by the async serverless endpoint are uploaded to your S3 bucket.

2. Select Instance Specification

Currently, Async Serverless Endpoint supports the following GPU instance types:

RTX 4090 24GB
H100 SXM 80GB

For additional requirements, please contact us.

3. Create Cloud Storage (Optional)

If you need shared or persistent storage, you can create cloud storage on the storage management page, and mount this storage when creating an instance. For more details, see Manage Cloud Storage.

4. Create Endpoint

Go to the Async Serverless GPUs page, choose an instance type, and click “Create Endpoint”.
Complete the Endpoint parameter configuration:

Endpoint Name: Used to uniquely identify the endpoint; it will be part of the URL when creating jobs. The system will generate a random default name; you may customize it but using the default name is recommended.
Worker Configuration

Configuration Item	Description
Min Worker Count	The minimum number of worker instances to keep for the endpoint. Setting a higher minimum helps reduce cold start time. If set to 0, there will be no idle workers when there are no requests, which may increase response time for incoming requests. For latency-sensitive scenarios, use 0 with caution.
Max Worker Count	The maximum number of worker instances that the endpoint can scale up to. When request volume increases, the platform automatically increases workers up to this maximum. This limit helps control costs.
Idle Timeout (seconds)	When a worker is about to be released due to autoscaling down, the platform will keep the worker alive for the specified idle timeout period to be able to react quickly to new requests. Note that you will be charged for the worker during this period.
Max Concurrent Requests	The maximum number of concurrent requests handled by a single worker. If this is exceeded, requests will be routed to other workers. If all workers are fully occupied, excess requests will be queued until execution is possible.
GPUs / Worker	Number of GPU cards allocated to each worker.
CUDA Version	Specify the CUDA version supported for the worker.

Type:
- Select the Endpoint type: choose Async (asynchronous).
Elastic Policy:
- Queue request policy: The number of Workers is automatically scaled according to the number of queued requests. By default, each Worker can only process one job at a time. You need to specify the maximum number of requests supported by each Worker.
Image Configuration:
- Image address: The address of the image to deploy, e.g., runpod/worker-comfyui:5.5.0-flux1-dev.
- Image repository credentials: If using a private image, provide access credentials so the image can be pulled. You can create credentials at the security credentials management page.
- HTTP Port: The HTTP port to expose on the Worker.
- Container start command: The command to run when the container starts.
Storage Configuration:
- System disk: System disk size per Worker instance.
- Cloud storage: Select your cloud storage if you wish to mount it. For details, see Manage Cloud Storage.
Other:
- Health check path: This parameter is currently not enabled.
- Environment variables: Set necessary environment variables for the service. These will be initialized automatically when the Worker starts. For example:
  - BUCKET_ENDPOINT_URL=https://<your-bucket-name>s3.<aws-region>.amazonaws.com
  - BUCKET_ACCESS_KEY_ID=AKIASVYYYN6L4S6TTTTTT
  - BUCKET_SECRET_ACCESS_KEY=maVz2OwY98UUUUUUGjMsmR/Yo8/Zzw0qWMMMMMMM
When using the worker-comfyui image, it is highly recommended to configure S3 BUCKET settings so that the output images are uploaded to S3.

Review pricing, then click “Deploy with One Click”.

5. Access the Service

In Async Serverless GPUs, find the newly created Endpoint and ensure its status is “Running”.
Ensure that at least one Worker in the Endpoint is running.
Ensure you have the corresponding api key for authentication. The Endpoint creator and the api key owner must belong to the same team.

5.1 Create a Job and Retrieve Output via Curl

Below is an example showing an actual use of the worker-comfyui worker . Replace 0f43a6867e05fddd in the URL with your real endpointName, and replace sk_xxxx in the example with your actual user api key.

The maximum job size accepted by Async Serverless Endpoint is 4 MiB.

curl -X POST https://async.novita.ai/v1/0f43a6867e05fddd/run \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk_xxxx' \
  -d '{
    "input": {
        "workflow": {
            "8": {
                "inputs": {
                    "samples": [
                        "31",
                        0
                    ],
                    "vae": [
                        "39",
                        0
                    ]
                },
                "class_type": "VAEDecode",
                "_meta": {
                    "title": "VAE Decode"
                }
            },
            "9": {
                "inputs": {
                    "filename_prefix": "ComfyUI",
                    "images": [
                        "8",
                        0
                    ]
                },
                "class_type": "SaveImage",
                "_meta": {
                    "title": "Save Image"
                }
            },
            "27": {
                "inputs": {
                    "width": 1024,
                    "height": 1024,
                    "batch_size": 1
                },
                "class_type": "EmptySD3LatentImage",
                "_meta": {
                    "title": "Empty Latent Image (SD3)"
                }
            },
            "31": {
                "inputs": {
                    "seed": 890266796055272,
                    "steps": 20,
                    "cfg": 1,
                    "sampler_name": "euler",
                    "scheduler": "simple",
                    "denoise": 1,
                    "model": [
                        "38",
                        0
                    ],
                    "positive": [
                        "41",
                        0
                    ],
                    "negative": [
                        "42",
                        0
                    ],
                    "latent_image": [
                        "27",
                        0
                    ]
                },
                "class_type": "KSampler",
                "_meta": {
                    "title": "K Sampler"
                }
            },
            "38": {
                "inputs": {
                    "unet_name": "flux1-dev.safetensors",
                    "weight_dtype": "default"
                },
                "class_type": "UNETLoader",
                "_meta": {
                    "title": "UNet Loader"
                }
            },
            "39": {
                "inputs": {
                    "vae_name": "ae.safetensors"
                },
                "class_type": "VAELoader",
                "_meta": {
                    "title": "VAE Loader"
                }
            },
            "40": {
                "inputs": {
                    "clip_name1": "clip_l.safetensors",
                    "clip_name2": "t5xxl_fp8_e4m3fn.safetensors",
                    "type": "flux",
                    "device": "default"
                },
                "class_type": "DualCLIPLoader",
                "_meta": {
                    "title": "Dual CLIP Loader"
                }
            },
            "41": {
                "inputs": {
                    "clip_l": "A beautiful fantasy dog with long curly red hair and big blue eyes, wearing a green transparent fairy dress with lace and puffed sleeves. Surrounded by iridescent butterflies and giant glass roses, dreamy lighting, ethereal atmosphere, soft glow, magical realism, highly detailed, cinematic, 8K render.\n",
                    "t5xxl": "A fairy tale scene of a fire dog with red curly hair wearing a delicate blue dress, standing among crystal butterflies and glowing glass roses. The scene is filled with soft magical light, like a dream from a fantasy world.",
                    "guidance": 3.5,
                    "clip": [
                        "40",
                        0
                    ]
                },
                "class_type": "CLIPTextEncodeFlux",
                "_meta": {
                    "title": "CLIP Text Encode Flux"
                }
            },
            "42": {
                "inputs": {
                    "conditioning": [
                        "41",
                        0
                    ]
                },
                "class_type": "ConditioningZeroOut",
                "_meta": {
                    "title": "Conditioning Zero Out"
                }
            }
        }
    }
}'

Response example (where id is the job_id):

{"id":"8cb6a77c-62aa-4eb4-9226-1ca5724fd9dd","status":"PENDING"}

Check job status and retrieve results:

The maximum output you can get via the status API of the Async Service Endpoint is 4 MiB. To avoid this limitation, configure S3 environment variables and upload output images or videos to S3 in your handler.py, so output size is not limited.Job results are kept in the Async Serverless Endpoint for up to 6 hours after completion.

curl -X GET https://async.novita.ai/v1/0f43a6867e05fddd/status/33a0bc4b-7312-41f6-ad15-eb9016bd68f9 \
-H "Authorization: Bearer sk_yyy"

Cancel Job:

curl -X GET https://async.novita.ai/v1/0f43a6867e05fddd/cancel/e5f3c3c0-c3b1-49c2-9452-bb96eaa34ce6 \
 -H "Authorization: Bearer sk_yyy"

Check Endpoint Job Queue Status:

curl -X GET https://async.novita.ai/v1/e41a4c7e58a4eddd/stats \
-H 'Authorization: Bearer sk_yyy'

Response example:

{"busy_workers":0,"endpoint":"e41a4c7e58a4eddd","in_progress":0,"pending":0,"total_workers":0}

5.2 Create Job & Get Results via Runpod SDK

import runpod
import os

runpod.api_key = "sk_xxx"
runpod.endpoint_url_base = "https://async.novita.ai/v1"
input_payload = { xxxx }

endpoint = runpod.Endpoint("e41a4c7e58a4e7bc")
run_request = endpoint.run(input_payload)

# Initial check without blocking, useful for quick tasks
status = run_request.status()
print(f"Initial job status: {status}")

if status != "COMPLETED":
    # Polling with timeout for long-running tasks
    output = run_request.output(timeout=60)
else:
    output = run_request.output()
print(f"Job output: {output}")

6. Manage Async Serverless Endpoint

See Manage serverless Endpoint

Get started

Model APIs

Agent Sandbox

GPUs

Observability

Resources

Create Async Serverless Endpoint

1. Prepare Container Image

2. Select Instance Specification

3. Create Cloud Storage (Optional)

4. Create Endpoint

5. Access the Service

5.1 Create a Job and Retrieve Output via Curl

5.2 Create Job & Get Results via Runpod SDK

6. Manage Async Serverless Endpoint

Get started

Model APIs

Agent Sandbox

GPUs

Observability

Resources

​1. Prepare Container Image

​2. Select Instance Specification

​3. Create Cloud Storage (Optional)

​4. Create Endpoint

​5. Access the Service

​5.1 Create a Job and Retrieve Output via Curl

​5.2 Create Job & Get Results via Runpod SDK

​6. Manage Async Serverless Endpoint

1. Prepare Container Image

2. Select Instance Specification

3. Create Cloud Storage (Optional)

4. Create Endpoint

5. Access the Service

5.1 Create a Job and Retrieve Output via Curl

5.2 Create Job & Get Results via Runpod SDK

6. Manage Async Serverless Endpoint