Latest PyTorch's Secret Power to Handle Sequences of 10K or 100K Length

Name: Latest PyTorch's Secret Power to Handle Sequences of 10K or 100K Length
Uploaded: 2023-07-12T22:07:12+00:00
Channel: DeepLearning Hero
Description: In this video, we'll be exploring a very cool feature in PyTorch 1.13+ that is so powerful but you may not even be harnessing the full power of; Flash a...

DeepLearning Hero · Beginner ·🧠 Large Language Models ·2y ago

In this video, we'll be exploring a very cool feature in PyTorch 1.13+ that is so powerful but you may not even be harnessing the full power of; Flash attention. In this video I'll show you how conservative new Pytorch with memory and how it helps us to fit 10K or even 100K long sequences even on a modest GPU. Flash attention repo: https://github.com/HazyResearch/flash-attention Github code: https://github.com/thushv89/tutorials_deeplearninghero/blob/master/llms/flash_attention_torch.ipynb 00:00 - Introduction 00:28 - Scaled dot production in Pytorch 01:51 - Google colab environment 02:21 - Pytorch version for Flash Attention 02:51 - Input data 03:01 - Hyperparameters and the architecture 04:17 - Few important arguments to the model 04:55 - Utility functions 05:39 - Torch without Flash Attention 07:06 - Torch with Flash Attention 07:49 - Limitations of Flash Attention 08:59 - Analysing the results 10:47 - Conclusion

Watch on YouTube ↗ (saves to browser)