> ## Documentation Index > Fetch the complete documentation index at: https://novita.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # MiniMax Speech 2.8 Turbo Sync Text-to-Speech MiniMax synchronous text-to-speech API using HTTP protocol. Supports various voice, emotion, speed and other parameter settings. ## Request Headers Supports: `application/json` Bearer authentication format, for example: Bearer \{\{API Key}}. ## Request Body Text to synthesize into speech, length limit is less than 10000 characters. If text length is greater than 3000 characters, streaming output is recommended. Supports paragraph breaks (newline), pause control (`<#x#>` tag), and interjection tags (such as (laughs), (coughs), etc., only supported by speech-2.8-hd/turbo) Controls whether to enable streaming output. Default is false Pitch adjustment (deep/bright), range \[-100, 100]. Values closer to -100 produce deeper voice; closer to 100 produce brighter voice Value range: \[-100, 100] Timbre adjustment (rich/crisp), range \[-100, 100]. Values closer to -100 produce richer voice; closer to 100 produce crisper voice Value range: \[-100, 100] Intensity adjustment (powerful/soft), range \[-100, 100]. Values closer to -100 produce more powerful voice; closer to 100 produce softer voice Value range: \[-100, 100] Sound effect setting, only one can be selected at a time. Options: spacious\_echo (spacious echo), auditorium\_echo (auditorium broadcast), lofi\_telephone (telephone distortion), robotic (electronic) Optional values: `spacious_echo`, `auditorium_echo`, `lofi_telephone`, `robotic` Audio output format. wav is only supported in non-streaming output Optional values: `mp3`, `pcm`, `flac`, `wav` Audio bitrate. Options: \[32000, 64000, 128000, 256000], default is 128000. This parameter only applies to mp3 format Optional values: `32000`, `64000`, `128000`, `256000` Number of audio channels. Options: \[1, 2], where 1 is mono and 2 is stereo. Default is 1 Optional values: `1`, `2` Controls constant bitrate (CBR) encoding. When set to true, audio will be encoded with constant bitrate. Note: This parameter only works when streaming output is enabled and audio format is mp3 Audio sample rate. Options: \[8000, 16000, 22050, 24000, 32000, 44100], default is 32000 Optional values: `8000`, `16000`, `22050`, `24000`, `32000`, `44100` Controls output format, options are url or hex, default is hex. This parameter is only valid in non-streaming scenarios. URL is valid for 24 hours Optional values: `url`, `hex` Audio volume, higher value means louder. Range (0, 10], default is 1.0 Value range: \[0, 10] Audio pitch, range \[-12, 12], default is 0, where 0 is original voice output Value range: \[-12, 12] Speech speed, higher value means faster. Range \[0.5, 2], default is 1.0 Value range: \[0.5, 2] Controls the emotion of synthesized speech. Options correspond to 8 emotions: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper. The model will automatically match appropriate emotion based on input text Optional values: `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, `whisper` Voice ID for audio synthesis. If mixed voice is needed, set timber\_weights parameter and leave this empty. Supports system voice, cloned voice, and text-generated voice Controls whether to read LaTeX formulas, default is false. Only supports Chinese. When enabled, language\_boost will be set to Chinese Whether to enable Chinese/English text normalization, which can improve performance in number reading scenarios but slightly increases latency. Default is false Controls whether to add audio rhythm identifier at the end of synthesized audio, default is false. This parameter is only valid for non-streaming synthesis Whether to enhance recognition ability for specified minor languages and dialects. Default is null, can be set to auto to let the model decide automatically Optional values: `Chinese`, `Chinese,Yue`, `English`, `Arabic`, `Russian`, `Spanish`, `French`, `Portuguese`, `German`, `Turkish`, `Dutch`, `Ukrainian`, `Vietnamese`, `Indonesian`, `Japanese`, `Italian`, `Korean`, `Thai`, `Polish`, `Romanian`, `Greek`, `Czech`, `Finnish`, `Hindi`, `Bulgarian`, `Danish`, `Hebrew`, `Malay`, `Persian`, `Slovak`, `Swedish`, `Croatian`, `Filipino`, `Hungarian`, `Norwegian`, `Slovenian`, `Catalan`, `Nynorsk`, `Tamil`, `Afrikaans`, `auto` Sets whether the last chunk contains the concatenated audio hex data. Default is false, meaning the last chunk contains the complete concatenated audio hex data Mixed voice settings, supports up to 4 voice mixtures Weight of each voice in the mix, must be set together with voice\_id. Range \[1, 100], supports up to 4 voice mixtures. Higher weight means more similarity to that voice Value range: \[1, 100] Voice ID for audio synthesis, must be set together with weight parameter. Supports system voice, cloned voice, and text-generated voice Controls whether to enable subtitle service, default is false. This parameter is only valid in non-streaming output scenarios, and only valid for speech-2.6-hd, speech-2.6-turbo, speech-02-turbo, speech-02-hd, speech-01-turbo, speech-01-hd models Enable this parameter to make clause transitions more natural, only supported by speech-2.8-hd and speech-2.8-turbo models Defines pronunciation or replacement rules for special characters or symbols. For Chinese text, tones are represented by numbers: 1st tone = 1, 2nd tone = 2, 3rd tone = 3, 4th tone = 4, neutral tone = 5. Example: \["omg/oh my god"] ## Response Returned synthesis data object, may be null and needs null check Synthesized audio data in hex encoding, format matches the output format specified in request Current audio stream status: 1 means synthesizing, 2 means synthesis completed Subtitle download link. Subtitles for the audio file, accurate to sentence (no more than 50 characters), in milliseconds, JSON format Session ID for this request, used for troubleshooting Status code and details for this request Status details Status code. `0`: success, `1000`: unknown error, `1001`: timeout, `1002`: rate limit triggered, `1004`: authentication failed, `1039`: TPM rate limit triggered, `1042`: invalid characters exceed 10%, `2013`: invalid input parameters Additional audio information Audio bitrate Audio file size in bytes Word count of pronounced text, includes Chinese characters, numbers, letters, excludes punctuation Generated audio file format. Options: \[mp3, pcm, flac] Optional values: `mp3`, `pcm`, `flac` Audio duration in milliseconds Generated audio channel count, `1`: mono, `2`: stereo Billable character count Audio sample rate Invalid character ratio. If invalid characters do not exceed 10% (inclusive), audio will be generated normally with this ratio data returned; if exceeds 10%, an error will be returned