Round 2 - I use CodeLlama 70B vs Mixtral MoE to write code to finetune a model on 16 GPUs ๐Ÿคฏ๐Ÿคฏ

william falcon ยท Beginner ยท๐Ÿง  Large Language Models ยท2y ago
I try to see if I how well three different LLMs work for writing a python script to finetune a model on 16 GPUs (multi-node). This video is not edited in any way. It shows a realistic workflow for coding without gimmicks or hype. I ask CodeLlama 70B, Mixtral MoE to write a python program to finetune a computer vision model on the CIFAR10 dataset. You can validate all this for yourself by running the 3 studios for free: This is an unedited video... so here are some corrections: - To clarify what "based on Llama 2 means". Mistral 7B tweaks the way llama 2 does attention but is then pretrained from scratch. - I think I forgot about llama 2 7B... mixtral was just working super well Chapters: 00:00 Introduction 00:40 Run CodeLlama 70B 01:13 Run Mixtral 8x7B (MoE) 01:34 Run Mistral 7B 01:47 How to get a GPU 02:08 What is a Lightning Studio 03:47 Basic CodeLlama 70B test 04:20 Basics of model monitoring 04:39 Connect a local VSCode 06:20 Basic Mixtral MoE coding test 08:46 Create the prompt to generate the ML code 09:04 Connect an S3 bucket 10:10 Full prompt for ML code 13:16 Prompt Mistral 7B 13:50 Debug the finetuning script 14:16 About the Lightning Trainer 14:56 Sanity check the finetuning script 15:30 Monitor with Tensorboard 16:20 About model RAM and model size 16:44 A quick TL;DR about profiling a model 17:40 Scale to multi-node (16 GPUs) 19:10 CodeLlama 70B results 20:00 About finetuning 22:10 Monitoring the 16 GPUs 22:54 CodeLlama 70B code results 25:35 Look at multi-node logs, weights
Watch on YouTube โ†— (saves to browser)
Sign in to unlock AI tutor explanation ยท โšก30

Related AI Lessons

โšก
The Orbital Response Network
Learn about the Orbital Response Network, a concept network architecture similar to transformers, and its potential applications
Medium ยท LLM
โšก
Measuring What Matters with NeMo Agent Toolkit
Learn to measure what matters in LLMs using NeMo Agent Toolkit for observability, evaluations, and model comparisons
Medium ยท LLM
โšก
Without google's transformers, there is no GPT-ishs
Learn how Google's Transformers enabled the creation of GPT-2 and the modern generative AI industry
Dev.to AI
โšก
Cache-Augmented Generation (CAG): A RAG-less Approach to Document QA
Learn about Cache-Augmented Generation (CAG), a novel approach to document QA that eliminates the need for Retrieval-Augmented Generation (RAG)
Medium ยท Machine Learning

Chapters (26)

Introduction
0:40 Run CodeLlama 70B
1:13 Run Mixtral 8x7B (MoE)
1:34 Run Mistral 7B
1:47 How to get a GPU
2:08 What is a Lightning Studio
3:47 Basic CodeLlama 70B test
4:20 Basics of model monitoring
4:39 Connect a local VSCode
6:20 Basic Mixtral MoE coding test
8:46 Create the prompt to generate the ML code
9:04 Connect an S3 bucket
10:10 Full prompt for ML code
13:16 Prompt Mistral 7B
13:50 Debug the finetuning script
14:16 About the Lightning Trainer
14:56 Sanity check the finetuning script
15:30 Monitor with Tensorboard
16:20 About model RAM and model size
16:44 A quick TL;DR about profiling a model
17:40 Scale to multi-node (16 GPUs)
19:10 CodeLlama 70B results
20:00 About finetuning
22:10 Monitoring the 16 GPUs
22:54 CodeLlama 70B code results
25:35 Look at multi-node logs, weights
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch โ†’