24 articles

📰 Dev.to · plasmon

Articles from Dev.to · plasmon · 24 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (9405) ArXiv cs.AIDev.to · FORUM WEBForbes InnovationDev.to AIOpenAI NewsHugging Face Blog
Running Just One LLM on 8GB VRAM Is a Waste
Dev.to · plasmon 4d ago
Running Just One LLM on 8GB VRAM Is a Waste
Liquid syntax error: Unknown tag 'endraw'
Light Just Cut KV Cache Memory Traffic to 1/16th
Dev.to · plasmon 4d ago
Light Just Cut KV Cache Memory Traffic to 1/16th
Light Just Cut KV Cache Memory Traffic to 1/16th The bottleneck in long-context LLM...
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished
Dev.to · plasmon 5d ago
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished Every...
Letting AI Control RAG Search Improved Accuracy by 79%
Dev.to · plasmon 6d ago
Letting AI Control RAG Search Improved Accuracy by 79%
Letting AI Control RAG Search Improved Accuracy by 79% Most RAG (Retrieval-Augmented...
If Memory Could Compute, Would We Still Need GPUs?
Dev.to · plasmon 6d ago
If Memory Could Compute, Would We Still Need GPUs?
If Memory Could Compute, Would We Still Need GPUs? The bottleneck for LLM inference isn't...
I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them
Dev.to · plasmon 1w ago
I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them
I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between...
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count
Dev.to · plasmon 1w ago
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count If you...
ML Hit 99% Accuracy on Yield Prediction — The Factory Floor Ignored It
Dev.to · plasmon 1w ago
ML Hit 99% Accuracy on Yield Prediction — The Factory Floor Ignored It
ML Hit 99% Accuracy on Yield Prediction — The Factory Floor Ignored It The pitch to bring...
3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless
Dev.to · plasmon 1w ago
3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless
3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless LLM Chain-of-Thought...
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
Dev.to · plasmon 1w ago
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM I've been running local LLMs...
The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling
Dev.to · plasmon 1w ago
The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling
The Wall I Hit on an RTX 4060 Was a Bandwidth Wall Running Qwen3.5-9B on an RTX 4060 8GB...
MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected
Dev.to · plasmon 1w ago
MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected
Start with the benchmarks In a previous article, I compared three Qwen3.5 models on the...
I Designed a Memory System for Claude Code — 'Forgetting' Was the Hardest Part
Dev.to · plasmon 1w ago
I Designed a Memory System for Claude Code — 'Forgetting' Was the Hardest Part
Everyone talks about making AI remember things. Handoff prompts. System instructions. Memory files....
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
Dev.to · plasmon 1w ago
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
When You're Reading CoT, the Model Is Thinking Something Else Thinking models are...
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
Dev.to · plasmon 1w ago
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
When You're Reading CoT, the Model Is Thinking Something Else Thinking models are...
I Let Claude Code Run My Tech Blog. A Fake Article Passed Every Quality Check.
Dev.to · plasmon 2w ago
I Let Claude Code Run My Tech Blog. A Fake Article Passed Every Quality Check.
I've been letting Claude Code autonomously run a tech blog. Topic selection, article generation,...
Still Picking API vs Local LLM by Gut Feeling? A Framework With Real Benchmarks
Dev.to · plasmon 2w ago
Still Picking API vs Local LLM by Gut Feeling? A Framework With Real Benchmarks
Still Picking API vs Local LLM by Gut Feeling? A Framework With Real Benchmarks "Just use...
I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline
Dev.to · plasmon 2w ago
I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline
I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline All...
What Happens When You Bring LLMs Into a Semiconductor FAB — 5 ArXiv Papers, Brutally Honest Reviews
Dev.to · plasmon 2w ago
What Happens When You Bring LLMs Into a Semiconductor FAB — 5 ArXiv Papers, Brutally Honest Reviews
ArXiv papers on semiconductor manufacturing x AI have been surging. From late 2024 onward, proposals...
I Built a Fully Local Paper RAG on an RTX 4060 8GB — BGE-M3 + Qwen2.5-32B + ChromaDB
Dev.to · plasmon 2w ago
I Built a Fully Local Paper RAG on an RTX 4060 8GB — BGE-M3 + Qwen2.5-32B + ChromaDB
I Built a Fully Local Paper RAG on an RTX 4060 8GB — BGE-M3 + Qwen2.5-32B + ChromaDB I was...