My M5 Max, Gemma 4, MLX LOCAL Stack. (This KILLS MODEL PROVIDERS)
Skills:
LLM Engineering90%
Model providers DON'T want you to see this video. The M5 Max just exposed the dirty secret of the cloud LLM economy: you're renting what you could already OWN.
🔥 While Anthropic and OpenAI APIs go down AGAIN mid-recording, my local stack keeps shipping. Private. Cheap. Fast. On-device. This is the beginning of the end for the API rental racket.
🎥 FEATURED LINKS:
• MLX, Gemma4, Qwen3.6, Pi agent live-bench codebase: https://github.com/disler/live-bench
• Tactical Agentic Coding: https://agenticengineer.com/tactical-agentic-coding?y=00Y-p62sk0s
📚 RESOURCES
• Nvidia NVFP4: https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/
• Apple M5 GPU Neural Accelerators: https://machinelearning.apple.com/research/exploring-llms-mlx-m5
• mlx-vlm: https://github.com/ml-explore/mlx-lm
• Ollama Gemma4 Model: https://ollama.com/library/gemma4
• Ollama MLX Blog: https://ollama.com/blog/mlx
• Pi coding agent: http://pi.dev
• Gemma4 26 nvfp4: https://huggingface.co/mlx-community/gemma-4-26b-a4b-it-nvfp4
• Vitalik Eth Secure LLMs: https://vitalik.eth.limo/general/2026/04/02/secure_llms.html
⚡ Here's the uncomfortable truth most engineers are ignoring: you're paying a premium for cloud inference when your M5 Max, M4 Max, or even Apple Silicon you already own can run state-of-the-art local LLMs RIGHT NOW. Gemma 4, Qwen 3.5, MLX variants optimized for Apple AI hardware are quietly eating the model providers' lunch.
🧠 In this head-to-head benchmark, I pit the M5 Max vs the M4 Max across three brutal local inference tests: raw prompt throughput, context scaling with Graph Walks, and full agentic coding workflows via the Pi coding agent. The results are going to reshape how you think about local agents.
💣 THE CONTROVERSIAL FINDING: If you're running GGUF models on Apple Silicon in 2026, you're leaving 2x performance on the table. MLX smokes GGUF. Not by a little. By a LOT. 118 tokens per second vs 60. Almost double the pre-fill sp
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Multimodal AI Explained: Text, Image, Audio and Video in One Tool
Dev.to AI
The Silent Bridge: Building a Bi-Directional, Expression-Aware Indian Sign Language Translator
Medium · LLM
How to Start Using AI Without Feeling Like an Idiot
Medium · AI
How to Start Using AI Without Feeling Like an Idiot
Medium · ChatGPT
🎓
Tutor Explanation
DeepCamp AI