Training at Scale

Train large models with mixed precision, gradient checkpointing, and distributed strategies.

advanced 🧬 Deep Learning

0%

Confidence · no data yet

Sign in to track

After this skill you can…

Use FP16/BF16 mixed precision training
Apply gradient accumulation for large batches
Set up DDP and FSDP on multi-GPU clusters

Prerequisites

Neural Network Basics

Learn this skill (10 videos)

Part 8: Maximizing GPU Throughput with FSDP

PyTorch · beginner

Part 5: Loading and saving models with FSDP full state dictionary

PyTorch · beginner

Part 1: Accelerate your training speed with the FSDP Transformer wrapper

PyTorch · intermediate

Part 3: FSDP Mixed Precision training

PyTorch · advanced

Deep Learning News #5, Feb 27 2021

Sebastian Raschka · beginner

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

Aleksa Gordić - The AI Epiphany · beginner

Torch.Compile for Autograd, DDP and FSDP - Will Feng , Chien-Chin Huang & Simon Fan, Meta

PyTorch · beginner

Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6

Abhishek Thakur · beginner

Distributed Checkpoint - Iris Zhang & Chien-Chin Huang, Meta

PyTorch · beginner

PyTorch 2.0 Ask the Engineers Q&A Series: PT2 and Distributed (DDP/FSDP)

PyTorch · intermediate