DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

📰 ArXiv cs.AI

DuoTok is a source-aware dual-track tokenizer for multi-track music language modeling that balances reconstruction, predictability, and cross-track correspondence

advanced Published 2 Apr 2026

Action Steps

Pretrain a semantic encoder to learn representations of audio data
Apply staged disentanglement to separate tokens into dual tracks
Evaluate the tokenizer on metrics such as reconstruction loss, perplexity, and cross-track correlation

Who Needs to Know This

AI engineers and researchers working on music language models can benefit from DuoTok's ability to preserve high-fidelity reconstruction and strong predictability, while also considering cross-track correspondence

Key Insight

💡 DuoTok's staged disentanglement approach allows for effective tokenization of multi-track music data while preserving important properties