> ## Documentation Index > Fetch the complete documentation index at: https://novita.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # Fish Audio S2 Pro Text to Speech Fish Audio S2 Pro text-to-speech model converts text into natural speech with support for reference voices, sampling controls, chunking, audio formats, and prosody controls. ## Request Headers Supports: `application/json` Bearer authentication format, for example: Bearer \{\{API Key}}. ## Request Body Text to convert to speech. S2-Pro multi-speaker text can use tags such as \<|speaker:0|>Hello\<|speaker:1|>Hi. Nucleus sampling diversity control. Value range: \[0, 1] Output audio format. Optional values: `wav`, `pcm`, `mp3`, `opus` Latency profile. Optional values: `low`, `normal`, `balanced` Prosody controls. Speech speed multiplier. Volume adjustment. Normalize output loudness. Normalize English and Chinese text before synthesis. Reference audio samples for zero-shot voice cloning. Transcript for the reference audio. Reference audio as base64 or URL, depending on provider support. MP3 bitrate in kbps. Optional values: `64`, `128`, `192` Output sample rate in Hz. Null uses format default, 44100 Hz or 48000 Hz for opus. Expressiveness control. Value range: \[0, 1] Text segment size. Value range: \[100, 300] Opus bitrate in bps. -1000 means automatic. Optional values: `-1000`, `24000`, `32000`, `48000`, `64000` Voice model ID. For multi-speaker use, pass an array matching speaker indices. Maximum audio tokens per chunk. Minimum characters before splitting text. Value range: \[0, 100] Penalty to reduce repeated audio patterns. Early stopping threshold. Value range: \[0, 1] Use previous audio chunks as context. ## Response Generated audio. Format: `binary`