Generating...
|
API
Output
Download
Voice Settings
Language
Temperature 0.7
Top-K 50
Top-P 0.9
Repetition Penalty 1.0
Seed

Settings

API Key

Enter your API key to enable voice generation.

Generation History

No generations yet

Parameter Guide

Language

Sets the target language for voice generation. The model will synthesize speech in this language regardless of the input text language.

Instruct (Preset only)

Text instruction for guiding preset voice style. Examples: "Speak slowly and warmly", "Read as a news anchor". Only available with preset voices.

Temperature

Controls randomness in token sampling. Lower values (0.1-0.3) produce more deterministic, consistent output. Higher values (0.7-1.0) increase variation and expressiveness. Default: 0.7.

0.7

Top-K

Limits the number of candidate tokens at each generation step to the K most probable. Lower values (10-30) produce safer output; higher values (50-100) allow more diversity. Default: 50.

50

Top-P (Nucleus Sampling)

Cumulative probability threshold for token selection. The model considers the smallest set of tokens whose combined probability exceeds this value. Lower = more focused, higher = more diverse. Default: 0.9.

0.9

Repetition Penalty

Penalizes tokens that have already appeared, reducing loops and stuttering. 1.0 = no penalty. Values above 1.2 can reduce naturalness. Default: 1.0.

1.0

Seed

Fixed random seed for reproducible generations. Leave empty for random output each time. Same seed + same parameters = same audio.

Voice Samples

Audio clips used to clone a voice. More samples with clear speech and low background noise produce better results. Each sample should be under 30 seconds. A transcript matching the spoken words is highly recommended.

Clone Voice

Add an audio sample to get started immediately. You can add more samples later.

Drop audio or click

Supported formats: WAV, MP3, M4A. Maximum duration: 0:30. Click "Transcribe" to automatically extract text from the audio.