NEW Gemma 4 26B A4B Update

Julian Goldie SEO · Beginner ·🧠 Large Language Models ·1h ago

Skills: LLM Foundations80%

Want to make money and save time with AI? Join here: https://www.skool.com/ai-profit-lab-7462/about Video notes + links to the tools 👉 https://www.skool.com/ai-profit-lab-7462/about Get a FREE AI Course + Community + 1,000 AI Agents 👉 https://www.skool.com/ai-seo-with-julian-goldie-1553/about Google's new Gemma 4 26B A4B runs 10 simultaneous AI requests on a MacBook Pro — no API costs, no cloud dependency. This video breaks down the MoE architecture that makes it possible and exactly how to run it locally on consumer hardware today. 00:00 Intro – Why you're still paying for API calls you don't need to 00:20 What is Gemma 4 26B A4B? – The model overview 00:42 Gemma 4 vs Gemini 3 – Same foundations, open-source and free 01:14 Full model lineup – E2B, E4B, 26B A4B, and 31B Dense explained 01:52 MoE Architecture explained – Why only 4B of 26B parameters activate 02:32 Dense vs MoE – Speed and compute comparison 03:12 Hardware requirements – What you need to run it locally 03:36 Multi-instance inference – 10 concurrent requests on one laptop 04:28 256K context window – What that actually means in practice 04:39 Multimodal + thinking mode – Images, reasoning chains, and more 04:52 Agentic framework support – LangChain, LlamaIndex, JSON output 05:16 Best tools to run it – Ollama, LlamaC++, LM Studio, MLX 05:57 Known Apple Silicon issue – Flash attention fix for Gemma 4 06:11 Windows/Linux GPU guide – RTX 4090 and partial offload options 06:19 Try it free – No hardware? Use Google AI Studio 06:25 The big picture – What this shift means for local AI in 2025

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Beginners Tutorial to Upload Github Jupyter Notebook to Google Colab

Beginners Tutorial to Upload Github Jupyter Notebook to Google Colab

Related AI Lessons

Ein Echo hat keinen Mund.

Understand how language models work and why they don't have a sense of self, to avoid anthropomorphizing them

The Discipline Is the Product

Apply four lenses to build working AI systems by identifying where judgment belongs, stripping unnecessary tasks from LLMs, and ensuring the system meets actual needs.

Medium · Machine Learning

Generative AI from First Principles — Article 1

Learn the fundamentals of Generative AI from first principles to build a strong foundation for your AI journey

Generative AI from First Principles — Article 1

Learn the fundamentals of Generative AI from first principles to build a strong foundation for your AI journey

Medium · Machine Learning

Chapters (16)

Intro – Why you're still paying for API calls you don't need to

0:20 What is Gemma 4 26B A4B? – The model overview

0:42 Gemma 4 vs Gemini 3 – Same foundations, open-source and free

1:14 Full model lineup – E2B, E4B, 26B A4B, and 31B Dense explained

1:52 MoE Architecture explained – Why only 4B of 26B parameters activate

2:32 Dense vs MoE – Speed and compute comparison

3:12 Hardware requirements – What you need to run it locally

3:36 Multi-instance inference – 10 concurrent requests on one laptop

4:28 256K context window – What that actually means in practice

4:39 Multimodal + thinking mode – Images, reasoning chains, and more

4:52 Agentic framework support – LangChain, LlamaIndex, JSON output

5:16 Best tools to run it – Ollama, LlamaC++, LM Studio, MLX

5:57 Known Apple Silicon issue – Flash attention fix for Gemma 4

6:11 Windows/Linux GPU guide – RTX 4090 and partial offload options

6:19 Try it free – No hardware? Use Google AI Studio

6:25 The big picture – What this shift means for local AI in 2025

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)