The 2021 Slack Outage - Detailed analysis of what happened to Slack on Jan 4th 2021
On Jan 4th 2021, Slack experienced a global outage that prevented customers from using the service for nearly 5 hours.
Slack has released the Root cause analysis incident report which I’m going to summarize in the first part of this video. After that Ill provide a lengthy deep dive of the incident so make sure to stick around for that.
If you are new here, I make backend engineering videos and also cover software news, so make sure to Like comment and subscribe if you would like to see more plus it really helps the channel, lets jump into it.
So This is an approximation of Slack’s architecture based on what was the described in the reports. Clients connects to load balancers, load balancers distribute requests to backend servers and backend servers finally make requests to database servers which is powered by mysql through vitess sharding. All of those are connected by routers in cross boundary network.
Around 6AM jan 4 , the cross network boundary routers setting between LB and backend and backend to DB started to drop packets.
This lead to the load balancers slowly marking backends as unhealthy and removing them from the fleet Which compounded the number of requests
The number of failed requests eventually triggered the provisioning service to start spinning an absurdly large number of backend servers
However the provisioning service couldn’t keep up with the huge demand and shortly started to time out for the same networking reasons and eventually ran out of maximum open file handles.
Eventually Slack’s cloud provider increased the networking capacity and backend servers went back to normal around 11 AM PST
This was a summary of the slack outage, Now set back, grab your favorite beverage and lets go through the detailed incident report!
0:00 Outage Summary
2:00 Detailed Analysis Starts
5:20 The Root Cause
30:00 Corrective Actions
Links
https://slack.engineering/slacks-outage-on-january-4th-2021/
https://status.slack.com/2021-01/9ecc1bc75347b
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Hussein Nasser · Hussein Nasser · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Extending ArcObjects (IGeometry) - 01 - Getting Started
Hussein Nasser
Extending ArcObjects (IGeometry) - 02 - The Document, The Map and The Layers
Hussein Nasser
Channel Update - New Book, New Job, New Videos
Hussein Nasser
Learn Programming with VB.NET - 01 - Getting Started
Hussein Nasser
Learn Programming with VB.NET - 02 - Classes and Objects (Part 1)
Hussein Nasser
Learn Programming with VB.NET - 03 - Classes and Objects (Part 2)
Hussein Nasser
Learn Programming with VB.NET - 04 - User Interface
Hussein Nasser
Learn Programming with VB.NET - 05 - By Value v. By Reference
Hussein Nasser
Learn Programming with VB.NET - 06 - Variable size, 32 bit vs 64 bit
Hussein Nasser
Learn Programming with VB.NET - 07 - Conditional Statements
Hussein Nasser
Learn Programming with VB.NET - 08 - Inheritance
Hussein Nasser
Learn Programming with VB.NET - 09 - Strategy Design Pattern
Hussein Nasser
Learn Programming with VB.NET - 10 - How did I learn programming
Hussein Nasser
IGeometry 2016 Retrospective - Channel Update
Hussein Nasser
Javascript by Example - The Vook
Hussein Nasser
Vlog - Keep your servers close and your database closer
Hussein Nasser
Vlog - Client/Server Programming Languages
Hussein Nasser
Javascript By Example L1E01 - Getting Started
Hussein Nasser
Persistent Connections (Pros and Cons)
Hussein Nasser
Javascript By Example L1E02 - Building the Calculator Interface
Hussein Nasser
Happy new Year from IGeometry!
Hussein Nasser
Synchronous v. Asynchronous
Hussein Nasser
Javascript By Example L1E03 - Displaying the Digits on Calculator Screen
Hussein Nasser
Show Your Work. Blog, Vlog, Write, Create and Develop!
Hussein Nasser
Relational Database Atomicity Explained By Example
Hussein Nasser
Javascript By Example L1E04 - Operators, All Clear with Arrow Functions
Hussein Nasser
What Comes First, User Experience or Software Architecture?
Hussein Nasser
Javascript By Example L1E05 - Evaluate the Calculator Expressions with eval
Hussein Nasser
Fastest Way to Learn Programming Language or Technology
Hussein Nasser
Javascript By Example L1E06 - Fix Leading Zero Bug with Conditions
Hussein Nasser
Stateful vs Stateless Applications (Explained by Example)
Hussein Nasser
Javascript By Example L1E07 - Running our Calculator on the Mobile Phone
Hussein Nasser
Advice for New Software Engineers and Developers
Hussein Nasser
Why JSON is so Popular?
Hussein Nasser
Building Scalable Software - SLA, HS, VS
Hussein Nasser
Vlog (Istanbul) - Datacenter Proximity
Hussein Nasser
Should Software Engineers Learn Bleeding-Edge Technologies?
Hussein Nasser
Do Developers Build Bad User Interfaces/Experience?
Hussein Nasser
Learn By Doing.
Hussein Nasser
I Wrote Bad Front-End Code That Broke Chrome
Hussein Nasser
My Story
Hussein Nasser
Vlog - Horizontal vs Vertical Scaling
Hussein Nasser
Can User Experience Help Build Better Rest API?
Hussein Nasser
Reverse engineering Instagram in flight mode
Hussein Nasser
The Benefits of the 3-Tier Architecture (e.g. REST API)
Hussein Nasser
Stateless v. Stateful Architecture (Podcast)
Hussein Nasser
The evolution from virtual machines to containers
Hussein Nasser
Proxy vs. Reverse Proxy (Explained by Example)
Hussein Nasser
Canary Deployment (Explained by Example)
Hussein Nasser
No Excuses
Hussein Nasser
Synchronous vs Asynchronous Applications (Explained by Example)
Hussein Nasser
What is an Asynchronous service?
Hussein Nasser
Difference between Client Polling vs Server Push in Notifications
Hussein Nasser
Software vs. Hardware AdBlockers (Explained by Example)
Hussein Nasser
HTTP Caching with E-Tags - (Explained by Example)
Hussein Nasser
Simple Object Access Protocol Pros and Cons (Explained by Example)
Hussein Nasser
Nodejs Express "Hello, World"
Hussein Nasser
Reverse Engineering Instagram feed
Hussein Nasser
Popup Modal Dialog with Javascript and HTML
Hussein Nasser
MIME and Media Type sniffing explained and the type of attacks it leads to
Hussein Nasser
More on: Systems Design Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Gartner Just Found a 4x Gap Between AI Winners and Losers
Medium · AI
The year is 2043, and you are one of the world’s last original thinkers
Medium · LLM
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to AI
The era of the Indian H-1B programmer is over. US graduates will take their place.
Medium · AI
Chapters (4)
Outage Summary
2:00
Detailed Analysis Starts
5:20
The Root Cause
30:00
Corrective Actions
🎓
Tutor Explanation
DeepCamp AI