> ## Documentation Index
> Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM FAQs

## Voucher Usage Guide

### What types of voucher does Novita provide?

Novita currently offers the following types of vouchers to help users experience the platform's services:

* **New User Voucher**: Available to first-time registered users for trying out services of various types of models.
* **Referral Voucher**: Earned by inviting others to register and complete tasks on the platform.

## Rate Limits for Model Usage

### What are the RPM limits for different users?

RPM (Requests Per Minute) limits vary based on a user's verification level and account status. For detailed limits, please refer to the official documentation:

<Tip>
  👉 For further assistance, please [book a call with our sales team](https://meet.brevo.com/novita-ai/contact-sales)
</Tip>

## RPM Adjustment and Upgrade Policy

### Can users apply for increased RPM limits?

Yes. Novita allows flexible RPM upgrades based on usage needs. Rules are listed as below:

* **DeepSeek models**: The platform will strive to accommodate reasonable scaling needs.
* **Other models**: RPM increases are evaluated based on model cost and actual user behavior, subject to capacity availability.

### **Application process:**

User → Customer Manager / Tech Support → Product Team Review & Approval

### **What happens if actual usage falls short of the committed RPM?**

If the user's actual RPM remains below the committed level for one consecutive week, the platform will adjust the rate limits policy as:

* Reduce the limit to the peak RPM in the past week, or
* Revert to the model's default rate limit (whichever is lower).

### **Is self-service RPM upgrade supported?**

Yes. Novita plans to launch an **RPM Upgrade Package**, allowing users to manage and increase RPM limits independently, without manual approval.

For further assistance, please [book a call with our sales team](https://meet.brevo.com/novita-ai/contact-sales).

### How to control thinking function of Zai-org/GLM-4.5 when calling its API?

When calling API zai-org/glm-4.5, there always exists some situations where thinking function is not needed. In these cases, if you want to turn thinking function off, you can simply add one fixed sentence called:

```python theme={"system"}
"enable_thinking": false 
```

at the bottom, for example:

```python theme={"system"}
{
  "model": "zai-org/glm-4.5",
  "messages": [
  {
   "role": "user",
   "content": "How is the weather in New York?"
  }
 ],
 "temperature": 0.7,
 "stream": false,
 "max_tokens": 500,
 "tool_choice": "auto",
 "enable_thinking": false
}
```

## Billing

### What is the billing priority when using the Coding Plan?

The system deducts from your **Coding Plan quota first**. Only after the Coding Plan quota is fully exhausted will charges be applied to your API account balance.

### How are the 50M tokens in the Coding Plan calculated?

Tokens are calculated based on base-rate equivalent token counts. Different models carry varying pricing weights proportional to their cost ratios. Both input tokens and output tokens count toward your total usage.

### How are cached tokens billed?

Cached read tokens are billed at a reduced rate, approximately **1/10th of the standard input token price** (exact rates vary by model). Cached write operations are billed at the normal input token rate.

## Thinking Mode

### How do I disable thinking mode for reasoning models?

Behavior varies by model:

* **moonshotai/Kimi-K2-Instruct** and **zai-org/GLM-4.5**: Add `"enable_thinking": false` to your request body.
* **MiniMax-M1**: Thinking mode **cannot currently be disabled** for this model.

```python theme={"system"}
{
  "model": "moonshotai/Kimi-K2-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "enable_thinking": false
}
```

## Feature Support

### Is the `top_logprobs` parameter supported?

The `top_logprobs` parameter is currently **not supported**.

### Is structured output (`response_format`) supported?

Currently, only `"type": "json_object"` is supported. `"type": "json_schema"` (with a strict schema definition) is not yet available.

## Troubleshooting

### Why am I receiving 403 errors when calling a model?

Common causes include:

1. **Insufficient account balance** — top up your account and retry.
2. **Model requires whitelist access** — certain models are restricted. See [How do I apply for model whitelist access?](#how-do-i-apply-for-model-whitelist-access) below.
3. **Plan restriction** — the model may not be included in your current subscription plan.

### Why am I receiving 400 errors?

Common causes include:

* Input length exceeds the model's context window limit.
* Incorrect parameter types or formatting.
* Missing required fields in the request body.

Please provide your `trace_id` to our support team for detailed investigation.

### What should I do if the API returns a "server overload" error?

This indicates temporary resource pressure on the platform. Retry your request after a brief delay (we recommend exponential backoff). If the issue persists, <a href="https://discord.gg/YyPRAzwp7P" target="_blank">contact us</a> for status confirmation.

## Model Access

### How do I apply for model whitelist access?

Contact our support team via <a href="https://discord.gg/YyPRAzwp7P" target="_blank">Discord</a> or the in-platform chat with your account information and the specific model name(s) you require. Our team will review and process your request.