Data Pre-processing in R: Handling Missing Data

Data Professor · Beginner ·🛠️ AI Tools & Apps ·6y ago
In this video, I will show you how you can handle missing data in your own data science project. This video represents the first in a multi-part series on data pre-processing in R. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor ⭕ Timeline 0:33 First part in Data pre-processing series 1:11 DHFR dataset 2:41 Outline of this episode 4:08 Open up RStudio or RStudio.cloud 4:15 Let's start 4:21 1. Load in the dataset 4:59 2. Check for missing data 5:48 3. Let's make the data dirty! 5:58 The custom function na.gen() 8:38 4. Check for missing data 9:08 How does is.na(dhfr) looks like? 10:18 Let's look at rows containing NA 11:29 Let's find the NA in the data 12:45 5. Handling the missing data 12:54 5.1 Simply delete data samples containing NA 13:30 5.2 Perform imputation 16:59 Preview of next episode of this series (on Data pre-processing) The idea for this video was suggested in a comment by Marco Festugato 📎DATA: https://raw.githubusercontent.com/dataprofessor/data/master/dhfr.csv 📎CODE: https://github.com/dataprofessor/code/blob/master/dhfr/dhfr-handling-missing-data.R 📎SLIDES: https://github.com/dataprofessor/slides/blob/master/Handling-missing-data.pdf ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ⭕ S
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 25 of 60

1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
12 Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
37 Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
40 Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

Related AI Lessons

Why Developers Abandon 90% of Their Side Projects (And How I Used AI to Fix My Brain)
Learn how to overcome the 90% side project abandonment rate using AI-powered strategies and mindset shifts
Dev.to · trobasuj-cpu
This Tool Quietly Gives You Free Access to Claude Opus Every Month
Discover a tool that offers free access to Claude Opus every month, enhancing your AI workflow
Medium · Programming
Claude AI Dev Tools: MCP Server, Blender Connector & Sonnet Evaluation Patterns
Learn about Claude AI Dev Tools, including MCP Server, Blender Connector, and Sonnet Evaluation Patterns, to enhance your AI development workflow
Dev.to · soy
Kiro Specs: Deja de Improvisar y Empieza a Planificar Features con IA
Planifica features con IA para evitar improvisación y mejorar la eficiencia en el desarrollo de software
Dev.to · Antonia Navarrete

Chapters (17)

0:33 First part in Data pre-processing series
1:11 DHFR dataset
2:41 Outline of this episode
4:08 Open up RStudio or RStudio.cloud
4:15 Let's start
4:21 1. Load in the dataset
4:59 2. Check for missing data
5:48 3. Let's make the data dirty!
5:58 The custom function na.gen()
8:38 4. Check for missing data
9:08 How does is.na(dhfr) looks like?
10:18 Let's look at rows containing NA
11:29 Let's find the NA in the data
12:45 5. Handling the missing data
12:54 5.1 Simply delete data samples containing NA
13:30 5.2 Perform imputation
16:59 Preview of next episode of this series (on Data pre-processing)
Up next
ChatGPT 5.5 + OpenCode is INSANE! 🤯
Julian Goldie SEO
Watch →