Web Scraping in Python: Tools, Techniques, and Legality | Real Python Podcast #12
Do you want to get started with web scraping using Python? Are you concerned about the potential legal implications? What are the tools required and what are some of the best practices? This week on the show we have Kimberly Fessel to discuss her excellent tutorial created for PyCon 2020 online titled “It’s Officially Legal so Let’s Scrape the Web.”
We discuss getting started with web scraping, and cover tools and techniques. Kimberly gives advice on finding elements inside of the html, and techniques for cleaning your data. She also notes a recent change to the legal landscape regarding scraping the web.
Kimberly is a Senior Data Scientist at Metis Data Science Bootcamp in New York City. She holds a Ph.D. in applied mathematics. We talk about her switch from academia to data science, and discuss her passion for data storytelling and visualizations.
Topics:
00:00:00 – Introduction
00:01:31 – Kimberly’s background and Metis Data Science Bootcamp
00:02:19 – NLP and work in advertising
00:03:27 – Changes in the legality of web scraping
00:06:12 – What are good projects for web scraping?
00:06:56 – Tools to start web scraping
00:07:51 – How to find the elements you want?
00:09:00 – How much HTML should you know?
00:10:49 – Inspecting elements in the browser
00:14:30 – What are good sites to practice on?
00:16:20 – Pausing between requests
00:19:02 – Saving as you go
00:20:54 – Real Python Video Course Spotlight
00:21:55 – Navigating the DOM
00:23:10 – Data cleaning and formatting
00:28:26 – Dynamic sites and Selenium
00:32:16 – Scrapy
00:33:55 – PyOhio 2020
00:35:40 – Transition out of academia
00:38:40 – What are you excited about in the world of Python?
00:41:05 – What do you want to learn next in Python?
00:48:00 – What is a less known Python tip or trick?
00:49:17 – Thanks and Goodbye
👉Links from the show: https://realpython.com/podcasts/rpp/12/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Real Python · Real Python · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
A better Python REPL – bpython vs python interpreter
Real Python
Introducing large-type.com – A Utility Website
Real Python
Reading Hacker News Without Wasting Tons of Time
Real Python
Forward References and Python 3 Type Hints
Real Python
Using Sublime Text as your Git Editor
Real Python
Python Code Linting and Auto-Complete for Sublime Text
Real Python
Make your Python Code More Readable with Custom Exceptions
Real Python
Write Better Tests with Sublime Text's Split Layout Feature
Real Python
How to Use Sublime Text from the Command Line
Real Python
Rename Variables with Multiple Selection in Sublime Text
Real Python
Sublime Text Settings for Writing PEP 8 Python
Real Python
Write Cleaner Python with Sublime Text's Indent Guides
Real Python
Sublime Text Whitespace Settings for Python Development
Real Python
Function Argument Unpacking in Python
Real Python
Python Code Review: Debugging and Refactoring "Conway's Game of Life" + Automated Tests
Real Python
Using "get()" to Return a Default Value from a Python Dict
Real Python
A Python Shorthand for Swapping Two Variables
Real Python
Python Code Review: Refactoring a Web Scraper, PEP 8 Style Guide Compliance, requirements.txt
Real Python
Click & Jump to Test Failures from the Command Line (iTerm2)
Real Python
Setting up Sublime Text for Python Developers
Real Python
Sublime Text + Python Guide Overview
Real Python
Python Code Review: Adding Pytest Tests to an Existing Python Web Scraper
Real Python
Type-Checking Python Programs With Type Hints and mypy
Real Python
A Shorthand for Merging Dictionaries in Python 3.5+
Real Python
Python Code Review Flask Web Security Tutorial + Virtualenvs, requirements.txt
Real Python
My Python Code Looks Ugly and Confusing – Help!
Real Python
Setting Up a Programmer Portfolio/Developer Blog – How To Get Started
Real Python
Do I Need a GitHub/GitLab/Bitbucket Profile as a Developer?
Real Python
Programmer Portfolio – Example and Walkthrough
Real Python
How to Get Your 1st Speaking Gig at a Tech Conference
Real Python
How to Build Your Public Speaking Skills as a Developer
Real Python
The Object-oriented Version of "Spaghetti Code" is "Lasagna Code" ?!
Real Python
Setting up Sublime Text for Python Developers – Lesson #1
Real Python
Cool New Features in Python 3.6
Real Python
"is" vs "==" in Python – What's the Difference? (And When to Use Each)
Real Python
Emulating switch/case Statements in Python with Dictionaries
Real Python
Python Function Argument Unpacking Tutorial (* and ** Operators)
Real Python
What Code Should I Put On My GitHub/GitLab/BitBucket Profile?
Real Python
A Crazy Python Dictionary Expression ?!
Real Python
String Conversion in Python: When to Use __repr__ vs __str__
Real Python
Method Types in Python OOP: @classmethod, @staticmethod, and Instance Methods
Real Python
Optional Arguments in Python With *args and **kwargs
Real Python
Python Context Managers and the "with" Statement (__enter__ & __exit__)
Real Python
Installing Python Packages with pip and virtualenv / venv
Real Python
"For Each" Loops in Python with enumerate() and range()
Real Python
Python Code Review: LibreOffice Automation and the Python Standard Library
Real Python
Managing Python Dependencies With Pip and Virtual Environments – Lesson #1
Real Python
Python Tutorial: List Comprehensions Step-By-Step
Real Python
Leveraging Python's Implicit "return None" Statements
Real Python
What's the meaning of underscores (_ & __) in Python variable names?
Real Python
Python Data Structures: Sets, Frozensets, and Multisets (Bags)
Real Python
Writing automated tests for Python command-line apps and scripts
Real Python
How to find great Python packages on PyPI, the Python Package Repository
Real Python
Immutable vs Mutable Objects in Python
Real Python
PyPI vs Warehouse, the Next-Generation Python Package Repository
Real Python
pep8.org — The Prettiest Way to View the PEP 8 Python Style Guide
Real Python
My Experience at PyCon 2017 in Portland
Real Python
Pylint Tutorial – How to Write Clean Python
Real Python
"Reverse a List in Python" Tutorial: Three Methods & How-to Demos
Real Python
Python Refactoring: "while True" Infinite Loops & The "input" Function
Real Python
More on: Python for Data
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The System Didn’t Fail. It Did Exactly What It Was Designed To Do.
Medium · Data Science
Masih Ragu Ambil Sertifikasi BNSP Data Science? Ini Alasan Kenapa Kamu Justru Perlu Sekarang!
Medium · Data Science
Balancing predictive power with privacy in insurance.
Medium · Data Science
The Unexpected Benefits of Knowledge Hoarding: What Two Years with Papers Taught Me About Information Addiction
Dev.to AI
Chapters (23)
Introduction
1:31
Kimberly’s background and Metis Data Science Bootcamp
2:19
NLP and work in advertising
3:27
Changes in the legality of web scraping
6:12
What are good projects for web scraping?
6:56
Tools to start web scraping
7:51
How to find the elements you want?
9:00
How much HTML should you know?
10:49
Inspecting elements in the browser
14:30
What are good sites to practice on?
16:20
Pausing between requests
19:02
Saving as you go
20:54
Real Python Video Course Spotlight
21:55
Navigating the DOM
23:10
Data cleaning and formatting
28:26
Dynamic sites and Selenium
32:16
Scrapy
33:55
PyOhio 2020
35:40
Transition out of academia
38:40
What are you excited about in the world of Python?
41:05
What do you want to learn next in Python?
48:00
What is a less known Python tip or trick?
49:17
Thanks and Goodbye
🎓
Tutor Explanation
DeepCamp AI