Round 2 - I use CodeLlama 70B vs Mixtral MoE to write code to finetune a model on 16 GPUs 🤯🤯

william falcon · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Engineering90%

I try to see if I how well three different LLMs work for writing a python script to finetune a model on 16 GPUs (multi-node). This video is not edited in any way. It shows a realistic workflow for coding without gimmicks or hype. I ask CodeLlama 70B, Mixtral MoE to write a python program to finetune a computer vision model on the CIFAR10 dataset. You can validate all this for yourself by running the 3 studios for free: This is an unedited video... so here are some corrections: - To clarify what "based on Llama 2 means". Mistral 7B tweaks the way llama 2 does attention but is then pretrained from scratch. - I think I forgot about llama 2 7B... mixtral was just working super well Chapters: 00:00 Introduction 00:40 Run CodeLlama 70B 01:13 Run Mixtral 8x7B (MoE) 01:34 Run Mistral 7B 01:47 How to get a GPU 02:08 What is a Lightning Studio 03:47 Basic CodeLlama 70B test 04:20 Basics of model monitoring 04:39 Connect a local VSCode 06:20 Basic Mixtral MoE coding test 08:46 Create the prompt to generate the ML code 09:04 Connect an S3 bucket 10:10 Full prompt for ML code 13:16 Prompt Mistral 7B 13:50 Debug the finetuning script 14:16 About the Lightning Trainer 14:56 Sanity check the finetuning script 15:30 Monitor with Tensorboard 16:20 About model RAM and model size 16:44 A quick TL;DR about profiling a model 17:40 Scale to multi-node (16 GPUs) 19:10 CodeLlama 70B results 20:00 About finetuning 22:10 Monitoring the 16 GPUs 22:54 CodeLlama 70B code results 25:35 Look at multi-node logs, weights

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Advanced AI and Machine Learning Techniques and Capstone

Advanced AI and Machine Learning Techniques and Capstone

AI Development with DeepSeek for Developers

AI Development with DeepSeek for Developers

I built the most expensive CPU ever! (Every instruction is a prompt)

I built the most expensive CPU ever! (Every instruction is a prompt)

Related AI Lessons

The Orbital Response Network

Learn about the Orbital Response Network, a concept network architecture similar to transformers, and its potential applications

Measuring What Matters with NeMo Agent Toolkit

Learn to measure what matters in LLMs using NeMo Agent Toolkit for observability, evaluations, and model comparisons

Without google's transformers, there is no GPT-ishs

Learn how Google's Transformers enabled the creation of GPT-2 and the modern generative AI industry

Cache-Augmented Generation (CAG): A RAG-less Approach to Document QA

Learn about Cache-Augmented Generation (CAG), a novel approach to document QA that eliminates the need for Retrieval-Augmented Generation (RAG)

Medium · Machine Learning

Chapters (26)

Introduction

0:40 Run CodeLlama 70B

1:13 Run Mixtral 8x7B (MoE)

1:34 Run Mistral 7B

1:47 How to get a GPU

2:08 What is a Lightning Studio

3:47 Basic CodeLlama 70B test

4:20 Basics of model monitoring

4:39 Connect a local VSCode

6:20 Basic Mixtral MoE coding test

8:46 Create the prompt to generate the ML code

9:04 Connect an S3 bucket

10:10 Full prompt for ML code

13:16 Prompt Mistral 7B

13:50 Debug the finetuning script

14:16 About the Lightning Trainer

14:56 Sanity check the finetuning script

15:30 Monitor with Tensorboard

16:20 About model RAM and model size

16:44 A quick TL;DR about profiling a model

17:40 Scale to multi-node (16 GPUs)

19:10 CodeLlama 70B results

20:00 About finetuning

22:10 Monitoring the 16 GPUs

22:54 CodeLlama 70B code results

25:35 Look at multi-node logs, weights

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)