RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

📰 ArXiv cs.AI

arXiv:2510.14628v2 Announce Type: replace-cross Abstract: Recent advances in Text-To-Speech (TTS) synthesis have achieved near-human speech quality in neutral speaking styles. However, most existing approaches either depend on costly emotion annotations or optimize surrogate objectives that fail to adequately capture perceptual emotional quality. As a result, the generated speech, while semantically accurate, often lacks expressive and emotionally rich characteristics. To address these limitatio

Published 8 Apr 2026
Read full paper → ← Back to News