Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed ML Training
📰 ArXiv cs.AI
arXiv:2303.05330v1 Announce Type: cross Abstract: Geo-distributed ML training can benefit many emerging ML scenarios (e.g., large model training, federated learning) with multi-regional cloud resources and wide area network. However, its efficiency is limited due to 2 challenges. First, efficient elastic scheduling of multi-regional cloud resources is usually missing, affecting resource utilization and performance of training. Second, training communication on WAN is still the main overhead, eas
DeepCamp AI