Outlier detection and removal: z score, standard deviation | Feature engineering tutorial python # 3

codebasics · Beginner ·📐 ML Fundamentals ·5y ago
If we have a dataset that follows normal distribution than we can use 3 or more standard deviation to spot outliers in the dataset. Many times these are legitimate values and it really depends on the situation if you want to remove them or not. But removing outliers can significantly increase the statistical power of machine learning model hence it is recommended that you treat outliers before building a model. Z score indicates how many standard deviation away a given sample is. We are going to go through all this theory and write python code to remove outliers from heights dataset that I have taken it from kaggle. Link for kaggle dataset: https://www.kaggle.com/mustafaali96/weight-height Code & Exercise: https://github.com/codebasics/py/blob/master/ML/FeatureEngineering/2_outliers_z_score/2_outliers_z_score.ipynb CSV file for exercise: https://github.com/codebasics/py/tree/master/ML/FeatureEngineering/2_outliers_z_score/Exercise Topics 00:00 Introduction 00:20 Exploratory analysis on a kaggle dataset 01:14 Plot histogram and bell curve 06:30 Use 3 standard deviation to remove outliers 12:14 Use Z score to remove outliers 17:39 Exercise Do you want to learn technology from me? Check https://codebasics.io/ for my affordable video courses. Website: https://codebasics.io/ Facebook: https://www.facebook.com/codebasicshub Twitter: https://twitter.com/codebasicshub
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from codebasics · codebasics · 0 of 60

← Previous Next →
1 Python Tutorial - 1. Install python on windows
Python Tutorial - 1. Install python on windows
codebasics
2 Python Tutorial - 2. Variables
Python Tutorial - 2. Variables
codebasics
3 Python Tutorial - 3. Numbers
Python Tutorial - 3. Numbers
codebasics
4 Python Tutorial - 4. Strings
Python Tutorial - 4. Strings
codebasics
5 Python Tutorial - 5. Lists
Python Tutorial - 5. Lists
codebasics
6 Python Tutorial - 6. Install PyCharm on Windows
Python Tutorial - 6. Install PyCharm on Windows
codebasics
7 PyCharm Tutorial - 7. Debug python code using PyCharm
PyCharm Tutorial - 7. Debug python code using PyCharm
codebasics
8 Python Tutorial -  8. If Statement
Python Tutorial - 8. If Statement
codebasics
9 Python Tutorial - 9. For loop
Python Tutorial - 9. For loop
codebasics
10 Python Tutorial -  10. Functions
Python Tutorial - 10. Functions
codebasics
11 Python Tutorial - 11. Dictionaries and Tuples
Python Tutorial - 11. Dictionaries and Tuples
codebasics
12 Python Tutorial - 12. Modules
Python Tutorial - 12. Modules
codebasics
13 Python Tutorial - 13. Reading/Writing Files
Python Tutorial - 13. Reading/Writing Files
codebasics
14 How to install Julia on Windows
How to install Julia on Windows
codebasics
15 Python Tutorial - 14. Working With JSON
Python Tutorial - 14. Working With JSON
codebasics
16 Julia Tutorial - 1. Variables
Julia Tutorial - 1. Variables
codebasics
17 Julia Tutorial - 2. Numbers
Julia Tutorial - 2. Numbers
codebasics
18 Python Tutorial - 15. if __name__ == "__main__"
Python Tutorial - 15. if __name__ == "__main__"
codebasics
19 Julia Tutorial - Why Should I Learn Julia Programming Language
Julia Tutorial - Why Should I Learn Julia Programming Language
codebasics
20 Python Tutorial  - 16. Exception Handling
Python Tutorial - 16. Exception Handling
codebasics
21 Julia Tutorial - 3. Complex and Rational Numbers
Julia Tutorial - 3. Complex and Rational Numbers
codebasics
22 Julia Tutorial - 4. Strings
Julia Tutorial - 4. Strings
codebasics
23 Python Tutorial -  17. Class and Objects
Python Tutorial - 17. Class and Objects
codebasics
24 Julia Tutorial - 5. Functions
Julia Tutorial - 5. Functions
codebasics
25 Julia Tutorial - 6. If Statement and Ternary Operator
Julia Tutorial - 6. If Statement and Ternary Operator
codebasics
26 Julia Tutorial - 7. For While Loop
Julia Tutorial - 7. For While Loop
codebasics
27 Python Tutorial  - 18. Inheritance
Python Tutorial - 18. Inheritance
codebasics
28 Julia Tutorial - 8. begin and (;) Compound Expressions
Julia Tutorial - 8. begin and (;) Compound Expressions
codebasics
29 Python Tutorial - 12.1 - Install Python Module (using pip)
Python Tutorial - 12.1 - Install Python Module (using pip)
codebasics
30 Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)
Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)
codebasics
31 Julia Tutorial - 10. Exception Handling
Julia Tutorial - 10. Exception Handling
codebasics
32 Python Tutorial  - 19. Multiple Inheritance
Python Tutorial - 19. Multiple Inheritance
codebasics
33 Python Tutorial - 20. Raise Exception And Finally
Python Tutorial - 20. Raise Exception And Finally
codebasics
34 Python Tutorial - 21. Iterators
Python Tutorial - 21. Iterators
codebasics
35 Python Tutorial - 22. Generators
Python Tutorial - 22. Generators
codebasics
36 Python Tutorial - 23. List Set Dict Comprehensions
Python Tutorial - 23. List Set Dict Comprehensions
codebasics
37 Python Tutorial - 24. Sets and Frozen Sets
Python Tutorial - 24. Sets and Frozen Sets
codebasics
38 Python Tutorial - 25. Command line argument processing using argparse
Python Tutorial - 25. Command line argument processing using argparse
codebasics
39 Debugging Tips - What is bug and debugging?
Debugging Tips - What is bug and debugging?
codebasics
40 Debugging Tips - Conditional Breakpoint
Debugging Tips - Conditional Breakpoint
codebasics
41 Debugging Tips - Watches and Call Stack
Debugging Tips - Watches and Call Stack
codebasics
42 Python Tutorial - 26. Multithreading - Introduction
Python Tutorial - 26. Multithreading - Introduction
codebasics
43 Git Tutorial 3:  How To Install Git
Git Tutorial 3: How To Install Git
codebasics
44 Git Tutorial 1: What is git / What is version control system?
Git Tutorial 1: What is git / What is version control system?
codebasics
45 Git Tutorial 2 : What is Github? | github tutorial
Git Tutorial 2 : What is Github? | github tutorial
codebasics
46 Git Tutorial 4: Basic Commands: add, commit, push
Git Tutorial 4: Basic Commands: add, commit, push
codebasics
47 Git Tutorial 5: Undoing/Reverting/Resetting code changes
Git Tutorial 5: Undoing/Reverting/Resetting code changes
codebasics
48 Git Tutorial 6: Branches (Create, Merge, Delete a branch)
Git Tutorial 6: Branches (Create, Merge, Delete a branch)
codebasics
49 Git Github Tutorial 10: What is Pull Request?
Git Github Tutorial 10: What is Pull Request?
codebasics
50 Git Tutorial 7: What is HEAD?
Git Tutorial 7: What is HEAD?
codebasics
51 Git Tutorial 9: Diff and Merge using meld
Git Tutorial 9: Diff and Merge using meld
codebasics
52 Difference between Multiprocessing and Multithreading
Difference between Multiprocessing and Multithreading
codebasics
53 Python Tutorial - 27. Multiprocessing Introduction
Python Tutorial - 27. Multiprocessing Introduction
codebasics
54 Python Tutorial - 28. Sharing Data Between Processes Using Array and Value
Python Tutorial - 28. Sharing Data Between Processes Using Array and Value
codebasics
55 Git Tutorial 8 - .gitignore file
Git Tutorial 8 - .gitignore file
codebasics
56 Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue
Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue
codebasics
57 Python Tutorial - 30. Multiprocessing Lock
Python Tutorial - 30. Multiprocessing Lock
codebasics
58 Python Tutorial - 31. Multiprocessing Pool (Map Reduce)
Python Tutorial - 31. Multiprocessing Pool (Map Reduce)
codebasics
59 What is code?
What is code?
codebasics
60 Python unit testing - pytest introduction
Python unit testing - pytest introduction
codebasics

Related AI Lessons

(EDA Part-5) Multivariate Analysis — Wrapping Up EDA and What Comes Next
Learn to perform multivariate analysis as the final step in exploratory data analysis (EDA) for machine learning, and discover what comes next in the data science workflow.
Dev.to AI
What Building a Flight Delay Prediction System Taught Me About Real-World Data Science
Learn how building a flight delay prediction system taught valuable lessons about real-world data science, including handling large datasets and model deployment
Medium · Machine Learning
THE EVALUATION PROBLEM
Learn to measure AI system performance to build trust in machine learning models
Medium · Machine Learning
Probability & Statistics — Deep Dive + Problem: Connected Components Labeling
Learn Probability & Statistics fundamentals and apply them to solve the Connected Components Labeling problem
Dev.to AI

Chapters (6)

Introduction
0:20 Exploratory analysis on a kaggle dataset
1:14 Plot histogram and bell curve
6:30 Use 3 standard deviation to remove outliers
12:14 Use Z score to remove outliers
17:39 Exercise
Up next
AI Engineer Roadmap 2026 🤖🔥 How to Become an AI Engineer Step by Step #AIEngineer #AIRoadmap #AI
CydexCode
Watch →