MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

📰 ArXiv cs.AI

arXiv:2604.06798v2 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs struggle with MoE-specific issues, including cross-expert redundancy, task-agnostic importance estimation, and quantization-induced routing shifts. To this end, we propose MoBiE, the first binarization fra

Published 14 Apr 2026

Read full paper → ← Back to Reads