> ## Documentation Index
> Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM API Billing

> Understand how novita AI LLM API billing works, including token-based pricing, HTTP status code billing rules, and the 499 client disconnection billing policy.

## Pricing

novita AI LLM API is billed per token. Each request incurs charges for:

* **Input Tokens** — tokens consumed by the prompt
* **Output Tokens** — tokens generated by the model

Total Cost = Input Tokens × Input Rate + Output Tokens × Output Rate

<Info>
  Rates vary by model. Refer to the [Pricing Page](https://novita.ai/pricing) for details.
</Info>

## HTTP Status Codes

| HTTP Status | Name                  | Description                                    | Charged |
| :---------- | :-------------------- | :--------------------------------------------- | :------ |
| 200         | Success               | Request completed successfully                 | **Yes** |
| 400         | Bad Request           | Invalid request parameters                     | No      |
| 401         | Unauthorized          | Invalid or missing API key                     | No      |
| 403         | Forbidden             | Insufficient permissions                       | No      |
| 429         | Rate Limited          | TPM or RPM limit exceeded                      | No      |
| 499         | Client Disconnected   | Client closed the connection before completion | **Yes** |
| 500         | Internal Server Error | Server-side error                              | No      |
| 503         | Service Unavailable   | Service temporarily unavailable                | No      |
| 504         | Gateway Timeout       | Upstream response timed out                    | No      |

**Billing principle:**

* Requests rejected before reaching the model (400/401/403/429): not charged
* Error due to platform issues (500/503/504): not charged
* Requests where the model has begun inference (200/499): charged

## 499 — Client Disconnected

Once a request reaches the model, inference begins immediately on the server side. If the client disconnects before receiving the full response, compute resources have already been consumed. The request is charged based on actual token usage.

| Request Mode  | Billing                                                        |
| :------------ | :------------------------------------------------------------- |
| Non-Streaming | Charged for actual token usage regardless of disconnect timing |
| Streaming     | Charged for actual token usage regardless of disconnect timing |

<Tip>
  **Recommendations**

  * Use `max_tokens` to limit output length at the request level
  * Set client timeout to ≥ 60s to avoid unintended disconnections
  * Prefer controlling generation via `max_tokens` rather than closing the connection
</Tip>

## FAQ

<AccordionGroup>
  <Accordion title="Why am I charged after disconnecting?">
    Inference begins as soon as the request reaches the model. Disconnecting does not halt computation already in progress.
  </Accordion>

  <Accordion title="How can I avoid 499 charges?">
    Set the `max_tokens` parameter to control output length before sending the request.
  </Accordion>

  <Accordion title="Where do 499 charges appear?">
    In your usage dashboard, alongside standard requests, with actual token consumption recorded.
  </Accordion>
</AccordionGroup>
