Apache Iceberg: From Zero to Production Data Lakehouse

Coursera Courses ↗ · Coursera

Open Course on Coursera

Free to audit · Opens on Coursera

Apache Iceberg: From Zero to Production Data Lakehouse

Coursera · Advanced ·🏗️ Systems Design & Architecture ·1mo ago
This course is designed for data engineers, analytics engineers, data platform engineers, and data architects who work with data lakes and want to modernize their data infrastructure. It's also valuable for software engineers transitioning into data roles and technical leads evaluating Apache Iceberg for their data. By the end of this course, you will be able to: - Build and configure an Apache Iceberg lakehouse using catalogs, object storage, and query engines like Spark and Trino - Design optimal table structures using hidden partitioning, sort orders, and column metrics to maximize query performance - Migrate existing data from Hive tables, Parquet files, CSV, and databases into Iceberg using snapshot, migrate, and reserialization approaches - Implement production workflows using Write-Audit-Publish for validation, branching for testing, and rollback for recovery - Evolve table schemas and partition specifications without downtime or rewriting data - Execute maintenance operations including data file compaction, metadata compaction, and snapshot expiration - Configure write strategies (merge-on-read vs copy-on-write) and distribution modes for different workload requirements - Manage concurrent operations and avoid conflicts in multi-writer scenarios To be successful in this course, you should have: - Working knowledge of SQL and relational database concepts (tables, schemas, queries) - Basic understanding of data engineering concepts including ETL/ELT, data warehouses, and data lakes - Familiarity with command-line interfaces and Docker for running the course environment - Comfort reading and understanding code examples in Python/PySpark (code is provided; you don't need to write from scratch) - Experience with Apache Spark or distributed computing is helpful but not required—core concepts are explained throughout the course Apache Iceberg, Iceberg, Apache, and the Apache feather logo are either registered trademarks or trademarks of The Apache S
Watch on Coursera ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

I Thought Domain-Driven Design Was a Waste of Time. I Was Wrong.
Learn how Domain-Driven Design can improve software development and why it's essential for backend engineers to understand its value
Dev.to · Mostafijur Rahman
Why Next.js Dominates Modern Web Development
Learn why Next.js dominates modern web development and how it enables faster, scalable websites and applications
Medium · UX Design
We discovered the real workflow during lunch conversations.
Discover the difference between official and real workflows in an organization and learn to identify them
Dev.to · Vishal Koriya
Designing a System to Survive Its Own Success: Lessons from the Treasure Hunt Engine's Scaling Fiasco
Learn how to design a system that can scale to meet sudden surges in demand, avoiding the pitfalls of the Treasure Hunt Engine's scaling fiasco
Dev.to · mary moloyi
Up next
Databases, Scalability and Containers on AWS
Coursera
Watch →