📰 Dev.to · Tech_Nuggets
8 articles · Updated every 3 hours · View all reads
All
Articles 91,066Blog Posts 109,397Tech Tutorials 22,773Research Papers 19,223News 14,847
⚡ AI Lessons

Dev.to · Tech_Nuggets
9h ago
KV cache and PagedAttention: what they do and why they matter
An explanation of the KV cache memory problem in production LLM serving and how PagedAttention (the technique behind vLLM) solves it with OS-inspired virtual me

Dev.to · Tech_Nuggets
5d ago
The Model Context Protocol (MCP): what it is and how to build a server
A practical walkthrough of the Model Context Protocol — what it is, how JSON-RPC 2.0 transports work, and how to build an MCP server in Python with FastMCP.

Dev.to · Tech_Nuggets
6d ago
Structured output from LLMs: JSON mode, function calling, and grammar-constrained decoding
A practical comparison of three approaches to getting structured data from LLMs: prompt-only JSON, API-level JSON mode and function calling, and grammar-constra

Dev.to · Tech_Nuggets
📐 ML Fundamentals
⚡ AI Lesson
1w ago
Mixture of Experts (MoE): what it actually does under the hood, and when it pays off
MoE explained for practitioners: how the router works, load-balancing loss, why Mixtral has 45B params but activates 13B, and when not to use it. Practical, no

Dev.to · Tech_Nuggets
1w ago
Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production
A production-oriented comparison of LLM sampling parameters -- how temperature, top-p, top-k, and min-p reshape the output distribution, what combos actually wo

Dev.to · Tech_Nuggets
1w ago
Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4
A practical comparison of the four major LLM weight quantization formats — which one to use for CPU, GPU serving, and fine-tuning, with current version numbers

Dev.to · Tech_Nuggets
1w ago
LoRA and QLoRA fine-tuning: what they actually do under the hood
A practical walkthrough of LoRA and QLoRA -- how low-rank adaptation works, what NF4 quantization brings, and when to use each.

Dev.to · Tech_Nuggets
2w ago
KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break
FP8 and INT8 KV caches cut attention state ~50%, but they shift the target model's logit distribution — and that can quietly halve the gains from speculative de
DeepCamp AI