Fast Models Need Slow Developers — Sarah Chieng, Cerebras

Name: Fast Models Need Slow Developers — Sarah Chieng, Cerebras
Uploaded: 2026-05-22T18:00:06Z
Channel: AI Engineer
Description: Codex Spark, a model Cerebras built with OpenAI, generates code at 1,200 tokens per second. The Sonnet and Opus families run at 40 to 60. At that 20x di...

AI Engineer · Advanced ·🧠 Large Language Models ·54m ago

Skills: LLM Foundations90%

Codex Spark, a model Cerebras built with OpenAI, generates code at 1,200 tokens per second. The Sonnet and Opus families run at 40 to 60. At that 20x difference, a context window that used to take ten minutes to fill now takes 30 seconds, and every habit built around slow generation starts producing technical debt at a scale nobody has dealt with before. Sarah Chieng from Cerebras covers what the playbook looks like in this regime. Validation and linting at every step is now instant, so there is no excuse not to run it continuously. Generating 75 component variations across five sub-agents and cherrypicking the best one becomes practical where it was not before. And when context burns in 30 seconds, a four file external memory system (agents, plan, progress, verify) is what keeps each new session from starting over instead of from scratch. Speaker info: - https://x.com/sarahchieng - https://www.linkedin.com/in/sarah-chieng-888595139/

Watch on YouTube ↗ (saves to browser)