Paper Highlights: Grokking Structure with Transformers

Pister Labs · Advanced ·📄 Research Papers Explained ·2y ago
Reading Structural Grokking in Vanilla Transformers by Hoogland et al. This paper challenges the concept that your validation accuracy is what determines when you should stop training your transformer models. https://arxiv.org/abs/2305.18741
Watch on YouTube ↗ (saves to browser)
Lecture 23: The Qing through Qianlong
Next Up
Lecture 23: The Qing through Qianlong
MIT OpenCourseWare