X-VC: Zero-shot Streaming Voice Conversion in Codec Space

📰 ArXiv cs.AI

arXiv:2604.12456v1 Announce Type: cross Abstract: Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion quality, building zero-shot VC systems for interactive scenarios remains challenging because high-fidelity speaker transfer and low-latency streaming inference are difficult to achieve simultaneously. In this work, we present X-VC, a zero-shot stream

Published 15 Apr 2026
Read full paper → ← Back to Reads