Latest PyTorch's Secret Power to Handle Sequences of 10K or 100K Length

DeepLearning Hero · Beginner ·🧠 Large Language Models ·2y ago
In this video, we'll be exploring a very cool feature in PyTorch 1.13+ that is so powerful but you may not even be harnessing the full power of; Flash attention. In this video I'll show you how conservative new Pytorch with memory and how it helps us to fit 10K or even 100K long sequences even on a modest GPU. Flash attention repo: https://github.com/HazyResearch/flash-attention Github code: https://github.com/thushv89/tutorials_deeplearninghero/blob/master/llms/flash_attention_torch.ipynb 00:00 - Introduction 00:28 - Scaled dot production in Pytorch 01:51 - Google colab environment 02:21 - Pytorch version for Flash Attention 02:51 - Input data 03:01 - Hyperparameters and the architecture 04:17 - Few important arguments to the model 04:55 - Utility functions 05:39 - Torch without Flash Attention 07:06 - Torch with Flash Attention 07:49 - Limitations of Flash Attention 08:59 - Analysing the results 10:47 - Conclusion
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

I Tested 50 AI Prompts — Here’s the Formula That Always Works (2026)
Discover a proven formula for crafting effective AI prompts that drive results, based on testing 50 prompts
Medium · ChatGPT
How to Use ChatGPT in Business: A Practical Guide for 2026
Learn how to leverage ChatGPT in your business to automate tasks and generate content at scale, increasing efficiency and productivity
Medium · ChatGPT
The Complete Guide to Prompt Engineering: Unlock the Full Potential of AI
Learn how to unlock the full potential of AI with prompt engineering, a crucial factor in achieving game-changing outputs
Medium · Machine Learning
How to Add AI Features to Your SaaS App Without Breaking Everything
Learn how to add AI features to your SaaS app without disrupting your service, covering LLM integration, streaming responses, and cost management
Dev.to AI

Chapters (13)

Introduction
0:28 Scaled dot production in Pytorch
1:51 Google colab environment
2:21 Pytorch version for Flash Attention
2:51 Input data
3:01 Hyperparameters and the architecture
4:17 Few important arguments to the model
4:55 Utility functions
5:39 Torch without Flash Attention
7:06 Torch with Flash Attention
7:49 Limitations of Flash Attention
8:59 Analysing the results
10:47 Conclusion
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →