124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

📰 Dev.to · Ingero Team

TL;DR: PyTorch's DataLoader can be 50-124x slower than direct tensor indexing for in-memory GPU...

Published 1 Apr 2026