Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer

📰 ArXiv cs.AI

Scalable pretraining of large mixture of experts language models on Aurora super computer

advanced Published 2 Apr 2026

Action Steps

Developing an in-house training library like Optimus to support large model training techniques
Utilizing a large-scale computing infrastructure like Aurora with thousands of GPU tiles
Pretraining large language models like Mula-1B using the developed library and infrastructure
Scaling up the pretraining process to achieve efficient and effective training of LLMs

Who Needs to Know This

AI researchers and engineers working on large language models can benefit from this work, as it demonstrates the feasibility of pretraining LLMs on a large scale using a super computer like Aurora

Key Insight

💡 Scalable pretraining of large language models is feasible on a super computer like Aurora using a custom training library