Share your speculative settings for llama.cpp and Gemma4

📰 Reddit r/LocalLLaMA

I have totally missed the boat on speculative decoding. Today when generating some code again for the frontend i found myself staring down at some quite monotonic javascript code. I decided to give a go at the speculative decoding settings of llama.cpp and was pleasantly surprised as i saw a 15-30% speedup in generation for this exact usecase. The code was an arcade game on canvas (lots of simple fors and if statements for boundary checks and simple game

Published 14 Apr 2026
Read full article → ← Back to Reads