My top 50 scikit-learn tips

Data School · Beginner ·🧠 Large Language Models ·3y ago
If you already know the basics of scikit-learn, but you want to be more efficient and get up-to-date with the latest features, then THIS is the video for you. My name is Kevin Markham, and I've been teaching Machine Learning in Python with scikit-learn for more than 8 years. Over the next 3 hours, I'm going to share with you my top 50 scikit-learn tips. Each tip ranges from 2 to 8 minutes, and you can use the timestamp links below to skip along if you're already familiar with a particular tip. 👩‍💻 Code: https://github.com/justmarkham/scikit-learn-tips 🤖 Learn ML from me: https://courses.dataschool.io/ml-courses 💌 Weekly Data Science tips: https://tuesday.tips/ 50 TIPS: 0:00 - Introduction 1:03 - 1. Transform data with ColumnTransformer 4:19 - 2. Seven ways to select columns 8:18 - 3. "fit" vs "transform" 10:53 - 4. Don't use "fit" on new data! 15:05 - 5. Don't use pandas for preprocessing! 19:00 - 6. Encode categorical features 24:07 - 7. Handle new categories in testing data 27:16 - 8. Chain steps with Pipeline 30:19 - 9. Encode "missingness" as a feature 33:12 - 10. Why set a random state? 35:40 - 11. Better ways to impute missing values 41:22 - 12. Pipeline vs make_pipeline 44:08 - 13. Inspect a Pipeline 47:03 - 14. Handle missing values automatically 49:47 - 15. Don't drop the first categorical level 54:15 - 16. Tune a Pipeline 1:01:09 - 17. Randomized search vs grid search 1:05:42 - 18. Examine grid search results 1:08:10 - 19. Logistic regression tuning parameters 1:12:41 - 20. Plot a confusion matrix 1:15:37 - 21. Plot multiple ROC curves 1:17:21 - 22. Use the correct Pipeline methods 1:18:59 - 23. Access model coefficients 1:20:11 - 24. Visualize a decision tree 1:23:57 - 25. Improve a decision tree by pruning it 1:25:23 - 26. Use stratified sampling when splitting data 1:29:40 - 27. Impute missing values for categoricals 1:32:10 - 28. Save a model or Pipeline 1:33:47 - 29. Add multiple text columns to a model 1:35:35 - 30. More ways to inspect a Pip
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data School · Data School · 0 of 60

← Previous Next →
1 Setting up Git and GitHub
Setting up Git and GitHub
Data School
2 Navigating a GitHub Repository - Part 1
Navigating a GitHub Repository - Part 1
Data School
3 Forking a GitHub Repository
Forking a GitHub Repository
Data School
4 Creating a New GitHub Repository
Creating a New GitHub Repository
Data School
5 Copying a GitHub Repository to Your Local Computer
Copying a GitHub Repository to Your Local Computer
Data School
6 Committing Changes in Git and Pushing to a GitHub Repository
Committing Changes in Git and Pushing to a GitHub Repository
Data School
7 Syncing Your GitHub Fork
Syncing Your GitHub Fork
Data School
8 Allstate Purchase Prediction Challenge on Kaggle
Allstate Purchase Prediction Challenge on Kaggle
Data School
9 Troubleshooting: Updates Rejected When Pushing to GitHub
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
10 Hands-on dplyr tutorial for faster data manipulation in R
Hands-on dplyr tutorial for faster data manipulation in R
Data School
11 ROC Curves and Area Under the Curve (AUC) Explained
ROC Curves and Area Under the Curve (AUC) Explained
Data School
12 Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
13 What is machine learning, and how does it work?
What is machine learning, and how does it work?
Data School
14 Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
15 Getting started in scikit-learn with the famous iris dataset
Getting started in scikit-learn with the famous iris dataset
Data School
16 Training a machine learning model with scikit-learn
Training a machine learning model with scikit-learn
Data School
17 Comparing machine learning models in scikit-learn
Comparing machine learning models in scikit-learn
Data School
18 Data science in Python: pandas, seaborn, scikit-learn
Data science in Python: pandas, seaborn, scikit-learn
Data School
19 Selecting the best model in scikit-learn using cross-validation
Selecting the best model in scikit-learn using cross-validation
Data School
20 How to find the best model parameters in scikit-learn
How to find the best model parameters in scikit-learn
Data School
21 How to evaluate a classifier in scikit-learn
How to evaluate a classifier in scikit-learn
Data School
22 What is pandas? (Introduction to the Q&A series)
What is pandas? (Introduction to the Q&A series)
Data School
23 How do I read a tabular data file into pandas?
How do I read a tabular data file into pandas?
Data School
24 How do I select a pandas Series from a DataFrame?
How do I select a pandas Series from a DataFrame?
Data School
25 Why do some pandas commands end with parentheses (and others don't)?
Why do some pandas commands end with parentheses (and others don't)?
Data School
26 How do I rename columns in a pandas DataFrame?
How do I rename columns in a pandas DataFrame?
Data School
27 How do I remove columns from a pandas DataFrame?
How do I remove columns from a pandas DataFrame?
Data School
28 How do I sort a pandas DataFrame or a Series?
How do I sort a pandas DataFrame or a Series?
Data School
29 How do I filter rows of a pandas DataFrame by column value?
How do I filter rows of a pandas DataFrame by column value?
Data School
30 How do I apply multiple filter criteria to a pandas DataFrame?
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
31 Your pandas questions answered!
Your pandas questions answered!
Data School
32 How do I use the "axis" parameter in pandas?
How do I use the "axis" parameter in pandas?
Data School
33 How do I use string methods in pandas?
How do I use string methods in pandas?
Data School
34 How do I change the data type of a pandas Series?
How do I change the data type of a pandas Series?
Data School
35 When should I use a "groupby" in pandas?
When should I use a "groupby" in pandas?
Data School
36 How do I explore a pandas Series?
How do I explore a pandas Series?
Data School
37 How do I handle missing values in pandas?
How do I handle missing values in pandas?
Data School
38 What do I need to know about the pandas index? (Part 1)
What do I need to know about the pandas index? (Part 1)
Data School
39 What do I need to know about the pandas index? (Part 2)
What do I need to know about the pandas index? (Part 2)
Data School
40 How do I select multiple rows and columns from a pandas DataFrame?
How do I select multiple rows and columns from a pandas DataFrame?
Data School
41 Machine Learning with Text in scikit-learn (PyCon 2016)
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
42 When should I use the "inplace" parameter in pandas?
When should I use the "inplace" parameter in pandas?
Data School
43 How do I make my pandas DataFrame smaller and faster?
How do I make my pandas DataFrame smaller and faster?
Data School
44 How do I use pandas with scikit-learn to create Kaggle submissions?
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
45 More of your pandas questions answered!
More of your pandas questions answered!
Data School
46 How do I create dummy variables in pandas?
How do I create dummy variables in pandas?
Data School
47 How do I work with dates and times in pandas?
How do I work with dates and times in pandas?
Data School
48 How do I find and remove duplicate rows in pandas?
How do I find and remove duplicate rows in pandas?
Data School
49 How do I avoid a SettingWithCopyWarning in pandas?
How do I avoid a SettingWithCopyWarning in pandas?
Data School
50 How do I change display options in pandas?
How do I change display options in pandas?
Data School
51 How do I create a pandas DataFrame from another object?
How do I create a pandas DataFrame from another object?
Data School
52 How do I apply a function to a pandas Series or DataFrame?
How do I apply a function to a pandas Series or DataFrame?
Data School
53 Getting started with machine learning in Python (webcast)
Getting started with machine learning in Python (webcast)
Data School
54 Q&A about Machine Learning with Text (online course)
Q&A about Machine Learning with Text (online course)
Data School
55 Your pandas questions answered! (webcast)
Your pandas questions answered! (webcast)
Data School
56 Machine Learning with Text in scikit-learn (PyData DC 2016)
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
57 Write Pythonic Code for Better Data Science (webcast)
Write Pythonic Code for Better Data Science (webcast)
Data School
58 Web scraping in Python (Part 1): Getting started
Web scraping in Python (Part 1): Getting started
Data School
59 Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
60 Web scraping in Python (Part 3): Building a dataset
Web scraping in Python (Part 3): Building a dataset
Data School

Related AI Lessons

Legal AI is one of the most interesting (and dangerous) real-world applications of GenAI right now.
Learn how Legal AI combines high automation potential with zero tolerance for errors, and why it's a must-read use case for developers working with LLMs
Dev.to AI
GPT 5.5 Is a Workflow Takeover
GPT 5.5 is revolutionizing workflows by integrating a browser, terminal, and operating system, but this convergence may lead to decreased user control
Dev.to AI
How to Remember Everything You Learn Using AI (The Science-Backed Memory System)
Boost your learning retention by 90% with an AI-powered memory system
Medium · AI
You're Flying Blind: Adding LLM Observability to Spring AI with OpenTelemetry and Self-Hosted Langfuse
Add LLM observability to Spring AI using OpenTelemetry and Self-Hosted Langfuse to fix the observability gap in LLM-enabled Java services
Dev.to AI

Chapters (31)

Introduction
1:03 1. Transform data with ColumnTransformer
4:19 2. Seven ways to select columns
8:18 3. "fit" vs "transform"
10:53 4. Don't use "fit" on new data!
15:05 5. Don't use pandas for preprocessing!
19:00 6. Encode categorical features
24:07 7. Handle new categories in testing data
27:16 8. Chain steps with Pipeline
30:19 9. Encode "missingness" as a feature
33:12 10. Why set a random state?
35:40 11. Better ways to impute missing values
41:22 12. Pipeline vs make_pipeline
44:08 13. Inspect a Pipeline
47:03 14. Handle missing values automatically
49:47 15. Don't drop the first categorical level
54:15 16. Tune a Pipeline
1:01:09 17. Randomized search vs grid search
1:05:42 18. Examine grid search results
1:08:10 19. Logistic regression tuning parameters
1:12:41 20. Plot a confusion matrix
1:15:37 21. Plot multiple ROC curves
1:17:21 22. Use the correct Pipeline methods
1:18:59 23. Access model coefficients
1:20:11 24. Visualize a decision tree
1:23:57 25. Improve a decision tree by pruning it
1:25:23 26. Use stratified sampling when splitting data
1:29:40 27. Impute missing values for categoricals
1:32:10 28. Save a model or Pipeline
1:33:47 29. Add multiple text columns to a model
1:35:35 30. More ways to inspect a Pip
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →