I Compressed a 14GB Model to 3.5GB and Kept 95% of Its Quality — Here's How

📰 Dev.to · jidonglab

Quantization alone cut the model to 25% of its size. Combined with distillation and MoE routing, it runs on a laptop.

Published 10 Mar 2026