[Workshop] AI Engineering 201: Inference

AI Engineer · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Engineering80%

Optional introductory course for AI Engineers, free for all Summit attendees. Advanced knowledge of AI Engineering, led by instructor Charles Frye of the massively popular Full Stack LLM Bootcamp. Part I: Running Inference What is the workload? Open vs Proprietary Models Execution End User Device Over a Network Serving Inference Timestamps 0:00:00 Intro & Overview 0:03:52 What is Inference? 0:10:16 Proprietary Models for Inference 0:21:22 Open Models for Inference 0:30:41 Will Open or Proprietary Models Win Long-Term? 0:36:19 Q&A on Models 0:44:12 Inference on End-User Devices 1:04:32 Inference-as-a-Service Providers 1:10:00 Cloud Inference and Serverless GPUs 1:17:46 Rack-and-Stack for Inference 1:20:12 Inference Arithmetic for GPUs 1:27:07 TPUs and Other Custom Silicon for Inference 1:36:11 Containerizing Inference and Inference Services

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Engineer · AI Engineer · 17 of 60

← Previous Next →

AI Engineer Summit 2023 — DAY 1 Livestream

AI Engineer Summit 2023 — DAY 1 Livestream

AI Engineer Summit 2023 — DAY 2 Livestream

AI Engineer Summit 2023 — DAY 2 Livestream

Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)

Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)

Announcing the AI Engineer Network: Benjamin Dunphy

Announcing the AI Engineer Network: Benjamin Dunphy

The 1,000x AI Engineer: Swyx

The 1,000x AI Engineer: Swyx

Building AI For All: Amjad Masad & Michele Catasta

Building AI For All: Amjad Masad & Michele Catasta

The Age of the Agent: Flo Crivello

The Age of the Agent: Flo Crivello

See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman

See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman

Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase

Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase

Pydantic is all you need: Jason Liu

Pydantic is all you need: Jason Liu

Building Blocks for LLM Systems & Products: Eugene Yan

Building Blocks for LLM Systems & Products: Eugene Yan

The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer

The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer

Climbing the Ladder of Abstraction: Amelia Wattenberger

Climbing the Ladder of Abstraction: Amelia Wattenberger

Supabase Vector: The Postgres Vector database: Paul Copplestone

Supabase Vector: The Postgres Vector database: Paul Copplestone

[Workshop] AI Engineering 101

[Workshop] AI Engineering 101

The Hidden Life of Embeddings: Linus Lee

The Hidden Life of Embeddings: Linus Lee

[Workshop] AI Engineering 201: Inference

[Workshop] AI Engineering 201: Inference

The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex

The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex

The AI Evolution: Mario Rodriguez, GitHub

The AI Evolution: Mario Rodriguez, GitHub

Move Fast Break Nothing: Dedy Kredo

Move Fast Break Nothing: Dedy Kredo

AI Engineering 201: The Rest of the Owl

AI Engineering 201: The Rest of the Owl

Building Reactive AI Apps: Matt Welsh

Building Reactive AI Apps: Matt Welsh

Pragmatic AI with TypeChat: Daniel Rosenwasser

Pragmatic AI with TypeChat: Daniel Rosenwasser

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Retrieval Augmented Generation in the Wild: Anton Troynikov

Retrieval Augmented Generation in the Wild: Anton Troynikov

Building Production-Ready RAG Applications: Jerry Liu

Building Production-Ready RAG Applications: Jerry Liu

120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson

120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson

The Weekend AI Engineer: Hassan El Mghari

The Weekend AI Engineer: Hassan El Mghari

Harnessing the Power of LLMs Locally: Mithun Hunsur

Harnessing the Power of LLMs Locally: Mithun Hunsur

Trust, but Verify: Shreya Rajpal

Trust, but Verify: Shreya Rajpal

Open Questions for AI Engineering: Simon Willison

Open Questions for AI Engineering: Simon Willison

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

GPT Web App Generator - 10,000 apps created in a month: Matija Sosic

GPT Web App Generator - 10,000 apps created in a month: Matija Sosic

Using AI to Build an Infinite Game: Jeff Schomay

Using AI to Build an Infinite Game: Jeff Schomay

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

The Code AI Maturity Model and What It Means For You: Ado Kukic

The Code AI Maturity Model and What It Means For You: Ado Kukic

AI Engineer World’s Fair 2024 - Keynotes & Multimodality track

AI Engineer World’s Fair 2024 - Keynotes & Multimodality track

From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet

From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet

The Making of Devin by Cognition AI: Scott Wu

The Making of Devin by Cognition AI: Scott Wu

The Future of Knowledge Assistants: Jerry Liu

The Future of Knowledge Assistants: Jerry Liu

Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney

Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney

Open Challenges for AI Engineering: Simon Willison

Open Challenges for AI Engineering: Simon Willison

Lessons From A Year Building With LLMs

Lessons From A Year Building With LLMs

From Software Developer to AI Engineer: Antje Barth

From Software Developer to AI Engineer: Antje Barth

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

Copilots Everywhere: Thomas Dohmke and Eugene Yan

Copilots Everywhere: Thomas Dohmke and Eugene Yan

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

Low Level Technicals of LLMs: Daniel Han

Low Level Technicals of LLMs: Daniel Han

Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta

Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta

How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou

How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou

What's new from Anthropic and what's next: Alex Albert

What's new from Anthropic and what's next: Alex Albert

Using agents to build an agent company: Joao Moura

Using agents to build an agent company: Joao Moura

Decoding the Decoder LLM without de code: Ishan Anand

Decoding the Decoder LLM without de code: Ishan Anand

Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner

Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner

Building with Anthropic Claude: Prompt Workshop with Zack Witten

Building with Anthropic Claude: Prompt Workshop with Zack Witten

Building Reliable Agentic Systems: Eno Reyes

Building Reliable Agentic Systems: Eno Reyes

10x Development: LLMs For the working Programmer - Manuel Odendahl

10x Development: LLMs For the working Programmer - Manuel Odendahl

Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner

Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner

Hypermode Launch: Kevin Van Gundy

Hypermode Launch: Kevin Van Gundy

Git push get an AI API: Ryan Fox-Tyler

Git push get an AI API: Ryan Fox-Tyler

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Advanced AI and Machine Learning Techniques and Capstone

Advanced AI and Machine Learning Techniques and Capstone

AI Development with DeepSeek for Developers

AI Development with DeepSeek for Developers

I built the most expensive CPU ever! (Every instruction is a prompt)

I built the most expensive CPU ever! (Every instruction is a prompt)

Related AI Lessons

DeepSeek Just Dropped V4. Here's What the Benchmarks Actually Tell You.

DeepSeek-V4-Pro narrows the gap with frontier AI models, and here's what the benchmarks reveal

Dev.to · Om Shree

Article: Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel

Learn to orchestrate agentic and multimodal AI pipelines using Apache Camel and LangChain4j for advanced AI applications

Context Engineering for Developers: A Practical Guide (2026)

Learn context engineering to improve LLM responses by providing relevant information, increasing productivity and accuracy

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Learn how to keep up with the latest AI model releases without getting bogged down in version numbers, and why understanding the actual capabilities is more important than the version number

Chapters (13)

Intro & Overview

3:52 What is Inference?

10:16 Proprietary Models for Inference

21:22 Open Models for Inference

30:41 Will Open or Proprietary Models Win Long-Term?

36:19 Q&A on Models

44:12 Inference on End-User Devices

1:04:32 Inference-as-a-Service Providers

1:10:00 Cloud Inference and Serverless GPUs

1:17:46 Rack-and-Stack for Inference

1:20:12 Inference Arithmetic for GPUs

1:27:07 TPUs and Other Custom Silicon for Inference

1:36:11 Containerizing Inference and Inference Services

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)