From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google

AI Engineer · Intermediate ·🧠 Large Language Models ·10h ago
Function Gemma ships at 270 million parameters and processes nearly 2,000 tokens per second prefill on a Pixel 7. Out of the box, on a fixed set of app intents, it hits 46% accuracy. Fine-tuned on a synthetically generated dataset, it clears 90% on eight of ten functions. Cormac Brick covers the two options developers have for on-device AI: Gemini Nano via AI core for common tasks, and LiteRT-LM for custom models that ship inside your app. The session walks through a live skill harness built on Gemma 4 with a restaurant roulette demo running fully on-device, and Eloquent, a production transcription app built by chaining two models under a few hundred million parameters. Speaker info: - https://www.linkedin.com/in/cbrick/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The RAG tool that auto-generates Q&A pairs from your documents
Learn to auto-generate Q&A pairs from documents using RAG tool and improve your document management
Dev.to · retrovirusretro
How to Build Secure AI: Implementing Guardrails for Enterprise LLM
Learn to build secure AI by implementing guardrails for enterprise LLMs, going beyond prompt engineering safety for production-ready defense-in-depth architecture
Medium · LLM
5 Chinese AI tools with 100K+ stars that the West is ignoring
Discover 5 Chinese AI tools with 100K+ stars on GitHub that the Western world is overlooking, and learn how to explore and utilize them
Dev.to AI
OpenAI claims it solved an 80-year-old math problem — for real this time
OpenAI's reasoning model claims to have solved an 80-year-old math problem, with mathematicians verifying its solution
TechCrunch AI
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →