📰 Dev.to · Tech_Nuggets

4 articles · Updated every 3 hours · View all reads

All Articles 81,304 Blog Posts 105,092 Tech Tutorials 19,806 Research Papers 17,820 News 13,845 ⚡ AI Lessons

Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production

Dev.to · Tech_Nuggets 11h ago

Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production

A production-oriented comparison of LLM sampling parameters -- how temperature, top-p, top-k, and min-p reshape the output distribution, what combos actually wo

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

Dev.to · Tech_Nuggets 1d ago

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

A practical comparison of the four major LLM weight quantization formats — which one to use for CPU, GPU serving, and fine-tuning, with current version numbers

LoRA and QLoRA fine-tuning: what they actually do under the hood

Dev.to · Tech_Nuggets 2d ago

LoRA and QLoRA fine-tuning: what they actually do under the hood

A practical walkthrough of LoRA and QLoRA -- how low-rank adaptation works, what NF4 quantization brings, and when to use each.

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

Dev.to · Tech_Nuggets 6d ago

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

FP8 and INT8 KV caches cut attention state ~50%, but they shift the target model's logit distribution — and that can quietly halve the gains from speculative de