📰 Dev.to · Randy AP
3 articles · Updated every 3 hours · View all reads
All
Articles 82,869Blog Posts 106,010Tech Tutorials 20,190Research Papers 17,840News 13,994
⚡ AI Lessons

Dev.to · Randy AP
1w ago
Running Mixtral 8x7B at 21+ TPS on Pure CPU via io_uring and Predictive Caching
The current consensus in AI infrastructure is unyielding: if you want to run frontier Mixture of...

Dev.to · Randy AP
1w ago
I streamed Mixtral 8x7B from NVMe on a $0.40/hour VM and got 3.32 tps, here's how
I streamed Mixtral 8x7B from NVMe on a $0.40/hour VM and got 3.32 tps — here's how Most...

Dev.to · Randy AP
2w ago
I built a Rust inference engine that streams MoE expert weights from NVMe SSDs, no GPU required
Most people trying to run Mixtral or DeepSeek-V3 locally hit the same wall: they don't have 80GB of...
DeepCamp AI