audio quality is weird

#29
by AnusreeDas01 - opened

When serving via our previous HF Transformers pipeline, long inputs produced full-length audio with occasional US-accent drift. When switching to vLLM for faster inference, short utterances are usually OK, but long outputs end early or sound robotic; accent drift to a mild American accent occurs in outputs(with or without quantization).

also sometimes the model would read the voice description text with original input.

Maya Research org
Maya Research org

Also to explain better, the model works best when you create your description similarly as verbose as possible like the template in the description to more stable and consistent results. this alone would fix everything of what you've mentioned.

bharathkumarK changed discussion status to closed

Sign up or log in to comment