I Compressed a 14GB Model to 3.5GB and Kept 95% of Its Quality — Here's How
📰 Dev.to · jidonglab
Quantization alone cut the model to 25% of its size. Combined with distillation and MoE routing, it runs on a laptop.
Quantization alone cut the model to 25% of its size. Combined with distillation and MoE routing, it runs on a laptop.