Web scraping in Python (Part 4): Exporting a CSV with pandas
Skills:
ML Pipelines70%
This is part 4 of an introductory web scraping tutorial. In this video, we'll use Python's pandas library to apply a tabular data structure to our scraped dataset and then export it to a CSV file. I'll end the video with advice and resources for getting better at web scraping.
Watch the 4-video series: https://www.youtube.com/playlist?list=PL5-da3qGB5IDbOi0g5WFh1YPDNzXw4LNL
== RESOURCES ==
Download the Jupyter notebook: https://github.com/justmarkham/trump-lies
New York Times article: https://www.nytimes.com/interactive/2017/06/23/opinion/trumps-lies.html
Beautiful Soup documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
pandas installation: http://pandas.pydata.org/pandas-docs/stable/install.html
== DATA SCHOOL VIDEOS ==
Machine learning with scikit-learn: https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1
Data analysis with pandas: https://www.youtube.com/watch?v=yzIMircGU5I&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=1
Version control with Git: https://www.youtube.com/watch?v=xKVlZ3wFVKA&list=PL5-da3qGB5IBLMp7LtN8Nc3Efd4hJq0kD&index=1
== SUBSCRIBE FOR MORE VIDEOS ==
https://www.youtube.com/user/dataschool?sub_confirmation=1
== JOIN THE DATA SCHOOL COMMUNITY ==
Newsletter: http://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
Facebook: https://www.facebook.com/DataScienceSchool/
Patreon: https://www.patreon.com/dataschool
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data School · Data School · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Setting up Git and GitHub
Data School
Navigating a GitHub Repository - Part 1
Data School
Forking a GitHub Repository
Data School
Creating a New GitHub Repository
Data School
Copying a GitHub Repository to Your Local Computer
Data School
Committing Changes in Git and Pushing to a GitHub Repository
Data School
Syncing Your GitHub Fork
Data School
Allstate Purchase Prediction Challenge on Kaggle
Data School
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
Hands-on dplyr tutorial for faster data manipulation in R
Data School
ROC Curves and Area Under the Curve (AUC) Explained
Data School
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
What is machine learning, and how does it work?
Data School
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
Getting started in scikit-learn with the famous iris dataset
Data School
Training a machine learning model with scikit-learn
Data School
Comparing machine learning models in scikit-learn
Data School
Data science in Python: pandas, seaborn, scikit-learn
Data School
Selecting the best model in scikit-learn using cross-validation
Data School
How to find the best model parameters in scikit-learn
Data School
How to evaluate a classifier in scikit-learn
Data School
What is pandas? (Introduction to the Q&A series)
Data School
How do I read a tabular data file into pandas?
Data School
How do I select a pandas Series from a DataFrame?
Data School
Why do some pandas commands end with parentheses (and others don't)?
Data School
How do I rename columns in a pandas DataFrame?
Data School
How do I remove columns from a pandas DataFrame?
Data School
How do I sort a pandas DataFrame or a Series?
Data School
How do I filter rows of a pandas DataFrame by column value?
Data School
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
Your pandas questions answered!
Data School
How do I use the "axis" parameter in pandas?
Data School
How do I use string methods in pandas?
Data School
How do I change the data type of a pandas Series?
Data School
When should I use a "groupby" in pandas?
Data School
How do I explore a pandas Series?
Data School
How do I handle missing values in pandas?
Data School
What do I need to know about the pandas index? (Part 1)
Data School
What do I need to know about the pandas index? (Part 2)
Data School
How do I select multiple rows and columns from a pandas DataFrame?
Data School
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
When should I use the "inplace" parameter in pandas?
Data School
How do I make my pandas DataFrame smaller and faster?
Data School
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
More of your pandas questions answered!
Data School
How do I create dummy variables in pandas?
Data School
How do I work with dates and times in pandas?
Data School
How do I find and remove duplicate rows in pandas?
Data School
How do I avoid a SettingWithCopyWarning in pandas?
Data School
How do I change display options in pandas?
Data School
How do I create a pandas DataFrame from another object?
Data School
How do I apply a function to a pandas Series or DataFrame?
Data School
Getting started with machine learning in Python (webcast)
Data School
Q&A about Machine Learning with Text (online course)
Data School
Your pandas questions answered! (webcast)
Data School
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
Write Pythonic Code for Better Data Science (webcast)
Data School
Web scraping in Python (Part 1): Getting started
Data School
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
Web scraping in Python (Part 3): Building a dataset
Data School
More on: ML Pipelines
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Magnific vs Topaz vs Krea: Which AI Upscaler Actually Works
Dev.to AI
Forget productivity: Here are 5 strategic shifts that drive real AI value
ZDNet
This AI Tool Feels Illegal to Use (But It’s Not):
Medium · AI
Learning a New Skill Is Hard - Even When It’s Something You Were Born to Do
Medium · Startup
🎓
Tutor Explanation
DeepCamp AI