PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

📰 ArXiv cs.AI

arXiv:2604.22823v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) rely on multimodal pre-training over diverse data sources, where different datasets often induce complementary cross-modal alignment capabilities. Model merging provides a cost-effective mechanism for integrating multiple expert MLLMs with complementary strengths into a unified model. However, existing model merging research mainly focuses on post-finetuning scenarios, leaving the pre-training stage largel

Published 28 Apr 2026

Read full paper → ← Back to Reads