> ## Documentation Index
> Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Fish Audio S2 Pro Text to Speech

Fish Audio S2 Pro text-to-speech model converts text into natural speech with support for reference voices, sampling controls, chunking, audio formats, and prosody controls.

## Request Headers

<ParamField header="Content-Type" type="string" required={true}>
  Supports: `application/json`
</ParamField>

<ParamField header="Authorization" type="string" required={true}>
  Bearer authentication format, for example: Bearer \{\{API Key}}.
</ParamField>

## Request Body

<ParamField body="text" type="string" required={true}>
  Text to convert to speech. S2-Pro multi-speaker text can use tags such as \<|speaker:0|>Hello\<|speaker:1|>Hi.
</ParamField>

<ParamField body="top_p" type="number" default={0.7}>
  Nucleus sampling diversity control.

  Value range: \[0, 1]
</ParamField>

<ParamField body="format" type="string" default="mp3">
  Output audio format.

  Optional values: `wav`, `pcm`, `mp3`, `opus`
</ParamField>

<ParamField body="latency" type="string" default="normal">
  Latency profile.

  Optional values: `low`, `normal`, `balanced`
</ParamField>

<ParamField body="prosody" type="object">
  Prosody controls.

  <Expandable title="properties" defaultOpen={true}>
    <ParamField body="speed" type="number" default={1}>
      Speech speed multiplier.
    </ParamField>

    <ParamField body="volume" type="number" default={0}>
      Volume adjustment.
    </ParamField>

    <ParamField body="normalize_loudness" type="boolean" default={true}>
      Normalize output loudness.
    </ParamField>
  </Expandable>
</ParamField>

<ParamField body="normalize" type="boolean" default={true}>
  Normalize English and Chinese text before synthesis.
</ParamField>

<ParamField body="references" type="array">
  Reference audio samples for zero-shot voice cloning.

  <Expandable title="properties" defaultOpen={true}>
    <ParamField body="text" type="string">
      Transcript for the reference audio.
    </ParamField>

    <ParamField body="audio" type="string">
      Reference audio as base64 or URL, depending on provider support.
    </ParamField>
  </Expandable>
</ParamField>

<ParamField body="mp3_bitrate" type="integer" default={128}>
  MP3 bitrate in kbps.

  Optional values: `64`, `128`, `192`
</ParamField>

<ParamField body="sample_rate" type="integer" nullable={true}>
  Output sample rate in Hz. Null uses format default, 44100 Hz or 48000 Hz for opus.
</ParamField>

<ParamField body="temperature" type="number" default={0.7}>
  Expressiveness control.

  Value range: \[0, 1]
</ParamField>

<ParamField body="chunk_length" type="integer" default={300}>
  Text segment size.

  Value range: \[100, 300]
</ParamField>

<ParamField body="opus_bitrate" type="integer" default={-1000}>
  Opus bitrate in bps. -1000 means automatic.

  Optional values: `-1000`, `24000`, `32000`, `48000`, `64000`
</ParamField>

<ParamField body="reference_id" type="string">
  Voice model ID. For multi-speaker use, pass an array matching speaker indices.
</ParamField>

<ParamField body="max_new_tokens" type="integer" default={1024}>
  Maximum audio tokens per chunk.
</ParamField>

<ParamField body="min_chunk_length" type="integer" default={50}>
  Minimum characters before splitting text.

  Value range: \[0, 100]
</ParamField>

<ParamField body="repetition_penalty" type="number" default={1.2}>
  Penalty to reduce repeated audio patterns.
</ParamField>

<ParamField body="early_stop_threshold" type="number" default={1}>
  Early stopping threshold.

  Value range: \[0, 1]
</ParamField>

<ParamField body="condition_on_previous_chunks" type="boolean" default={true}>
  Use previous audio chunks as context.
</ParamField>

## Response

Generated audio.

Format: `binary`
