Model Serving Systems: Containers, APIs & Scalability

Coursera Courses ↗ · Coursera

Open Course on Coursera

Free to audit · Opens on Coursera

Model Serving Systems: Containers, APIs & Scalability

Coursera · Beginner ·🏗️ Systems Design & Architecture ·3d ago
"Docker and Model Serving: Deploy ML APIs with FastAPI and ONNX is designed for ML engineers, MLOps practitioners, and backend developers who want to take models from notebooks to production. You'll learn to build Docker containers for ML workloads, design scalable REST APIs with FastAPI, serialize models with ONNX and SavedModel, and deploy with zero-downtime strategies like blue-green and canary releases. The first module covers Docker fundamentals, image optimization, multi-stage builds, secrets management, and Docker Compose for multi-container ML apps. The second module focuses on REST API design with FastAPI, model versioning, input validation with Pydantic, structured logging, and production-grade error handling. The third module teaches scaling strategies — horizontal scaling, async queues, load balancing, batch vs. real-time inference, and latency optimization for high-throughput serving. The final module covers model serialization formats (ONNX, pickle, SavedModel), blue-green and canary deployments, automated rollback, and disaster recovery. By the end of this course, you will: - Build and optimize Docker images for ML models using multi-stage builds and Compose - Design scalable FastAPI endpoints with versioning, validation, and observability - Scale ML inference with async queues, load balancing, and latency optimization - Deploy models with ONNX serialization and zero-downtime blue-green rollbacks"
Watch on Coursera ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

What Breaks When Platform-Specific Publishing Steps Stop Sharing the Same Assumptions: Practical Notes for Builders
Learn how to identify and address workflow breakdowns in platform-specific publishing steps beyond the draft stage
Dev.to AI
Proto-Synth Grid Engine: Building a Math-First 2D World Runtime That Feels 3D
Learn how Proto-Synth Grid Engine creates a 2D world that feels 3D using math-first simulation and blueprint-driven design
Dev.to · Gary Doman/TizWildin
ACID vs BASE Transactions
Learn the difference between ACID and BASE transaction models and how to choose the right one for your database needs
Dev.to · 丁久
Chapter 1. The Big Three of Circuits — R, L, C
Learn the fundamentals of R, L, C circuits in electronics to improve your embedded software skills
Medium · Programming
Up next
Optimizing and Managing Windows 365 Cloud PCs
Coursera
Watch →