NEW Gemma 4 26B A4B Update

Julian Goldie SEO ยท Beginner ยท๐Ÿง  Large Language Models ยท1h ago
Want to make money and save time with AI? Join here: https://www.skool.com/ai-profit-lab-7462/about Video notes + links to the tools ๐Ÿ‘‰ https://www.skool.com/ai-profit-lab-7462/about Get a FREE AI Course + Community + 1,000 AI Agents ๐Ÿ‘‰ https://www.skool.com/ai-seo-with-julian-goldie-1553/about Google's new Gemma 4 26B A4B runs 10 simultaneous AI requests on a MacBook Pro โ€” no API costs, no cloud dependency. This video breaks down the MoE architecture that makes it possible and exactly how to run it locally on consumer hardware today. 00:00 Intro โ€“ Why you're still paying for API calls you don't need to 00:20 What is Gemma 4 26B A4B? โ€“ The model overview 00:42 Gemma 4 vs Gemini 3 โ€“ Same foundations, open-source and free 01:14 Full model lineup โ€“ E2B, E4B, 26B A4B, and 31B Dense explained 01:52 MoE Architecture explained โ€“ Why only 4B of 26B parameters activate 02:32 Dense vs MoE โ€“ Speed and compute comparison 03:12 Hardware requirements โ€“ What you need to run it locally 03:36 Multi-instance inference โ€“ 10 concurrent requests on one laptop 04:28 256K context window โ€“ What that actually means in practice 04:39 Multimodal + thinking mode โ€“ Images, reasoning chains, and more 04:52 Agentic framework support โ€“ LangChain, LlamaIndex, JSON output 05:16 Best tools to run it โ€“ Ollama, LlamaC++, LM Studio, MLX 05:57 Known Apple Silicon issue โ€“ Flash attention fix for Gemma 4 06:11 Windows/Linux GPU guide โ€“ RTX 4090 and partial offload options 06:19 Try it free โ€“ No hardware? Use Google AI Studio 06:25 The big picture โ€“ What this shift means for local AI in 2025
Watch on YouTube โ†— (saves to browser)
Sign in to unlock AI tutor explanation ยท โšก30

Related AI Lessons

Chapters (16)

Intro โ€“ Why you're still paying for API calls you don't need to
0:20 What is Gemma 4 26B A4B? โ€“ The model overview
0:42 Gemma 4 vs Gemini 3 โ€“ Same foundations, open-source and free
1:14 Full model lineup โ€“ E2B, E4B, 26B A4B, and 31B Dense explained
1:52 MoE Architecture explained โ€“ Why only 4B of 26B parameters activate
2:32 Dense vs MoE โ€“ Speed and compute comparison
3:12 Hardware requirements โ€“ What you need to run it locally
3:36 Multi-instance inference โ€“ 10 concurrent requests on one laptop
4:28 256K context window โ€“ What that actually means in practice
4:39 Multimodal + thinking mode โ€“ Images, reasoning chains, and more
4:52 Agentic framework support โ€“ LangChain, LlamaIndex, JSON output
5:16 Best tools to run it โ€“ Ollama, LlamaC++, LM Studio, MLX
5:57 Known Apple Silicon issue โ€“ Flash attention fix for Gemma 4
6:11 Windows/Linux GPU guide โ€“ RTX 4090 and partial offload options
6:19 Try it free โ€“ No hardware? Use Google AI Studio
6:25 The big picture โ€“ What this shift means for local AI in 2025
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch โ†’