Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
📰 ArXiv cs.AI
arXiv:2409.18512v2 Announce Type: replace-cross Abstract: Recent advancements in speech synthesis have enabled large language model (LLM)-based systems to perform zero-shot generation with controllable content, timbre, speaker identity, and emotion through input prompts. As a result, these models heavily rely on prompt design to guide the generation process. However, existing prompt selection methods often fail to ensure that prompts contain sufficiently stable speaker identity cues and appropri
DeepCamp AI