Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on

📰 Reddit r/deeplearning

submitted by /u/Kill_Streak308 [link] [comments]

Published 13 Apr 2026

Read full article → ← Back to Reads