Talks # 14: Martin Henze; Knowledge is Power: Understanding your Data through EDA and Visualisations
Title:
Knowledge is Power: Understanding your Data through EDA and Visualisations
Abstract: We will discuss the power of exploratory data analysis (EDA) to gain a deeper understanding of your data sets and machine learning problems. In particular, data visualisation techniques can quickly reveal key characteristics, promising features, and important caveats. Beyond data exploration, visuals are crucial for interpreting and communicating your findings on a variety of levels. Kaggle is a treasure trove for EDA tools and techniques - as well as for masterful examples of how to structure your ana…
Watch on YouTube ↗
(saves to browser)
Playlist
Uploads from Abhishek Thakur · Abhishek Thakur · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Episode 1.1: Intro and building a machine learning framework
Abhishek Thakur
Episode 1.2: Building an inference for the machine learning framework
Abhishek Thakur
Episode 2: A Cross Validation Framework
Abhishek Thakur
Tips N Tricks #2: Setting up development environment for machine learning
Abhishek Thakur
Episode 3: Handling Categorical Features in Machine Learning Problems
Abhishek Thakur
BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs
Abhishek Thakur
Special Announcement: Approaching (almost) any machine learning problem
Abhishek Thakur
Training BERT Language Model From Scratch On TPUs
Abhishek Thakur
Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)
Abhishek Thakur
Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-2)
Abhishek Thakur
Episode 4: Simple and Basic Binary Classification Metrics
Abhishek Thakur
Training Sentiment Model Using BERT and Serving it with Flask API
Abhishek Thakur
Episode 5: Entity Embeddings for Categorical Variables
Abhishek Thakur
Tips N Tricks #5: 3 Simple and Easy Ways to Cache Functions in Python
Abhishek Thakur
Multi-Lingual Toxic Comment Classification using BERT and TPUs with PyTorch
Abhishek Thakur
Text Extraction From a Corpus Using BERT (AKA Question Answering)
Abhishek Thakur
10K Subscribers: Approaching (almost) Any Machine Learning Problem and Talk Show
Abhishek Thakur
Data Processing For Question & Answering Systems: BERT vs. RoBERTa
Abhishek Thakur
Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously
Abhishek Thakur
Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More
Abhishek Thakur
Talks # 1:Andrey Lukyanenko - Handwritten digit recognition w/ a twist & topic modelling over time
Abhishek Thakur
Episode 6: Simple and Basic Evaluation Metrics For Regression
Abhishek Thakur
Talks # 2: Subhaditya Mukherjee - Image restoration using Deep Learning: Dehazing
Abhishek Thakur
Basic git commands everyone should know about
Abhishek Thakur
How do I start my career in Data Science?
Abhishek Thakur
Talks # 3: Lorenzo Ampil - Introduction to T5 for Sentiment Span Extraction
Abhishek Thakur
Detecting Skin Cancer (Melanoma) With Deep Learning
Abhishek Thakur
Talks # 4: Sebastien Fischman - Pytorch-TabNet: Beating XGBoost on Tabular Data Using Deep Learning
Abhishek Thakur
Build a web-app to serve a deep learning model for skin cancer detection
Abhishek Thakur
Talks # 5: Parul Pandey: Data Science, Diversity and Kaggle
Abhishek Thakur
Implementing original U-Net from scratch using PyTorch
Abhishek Thakur
Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6
Abhishek Thakur
Talks # 6: Mani Sarkar: From backend development to machine learning
Abhishek Thakur
Dockerizing the skin cancer detection web application
Abhishek Thakur
How to train a deep learning model using docker?
Abhishek Thakur
Building an entity extraction model using BERT
Abhishek Thakur
Train custom object detection model with YOLO V5
Abhishek Thakur
Talks # 7: Moez Ali: Machine learning with PyCaret
Abhishek Thakur
How to convert almost any PyTorch model to ONNX and serve it using flask
Abhishek Thakur
Hyperparameter Optimization: This Tutorial Is All You Need
Abhishek Thakur
I finally got a copy of "Approaching (Almost) Any Machine Learning Problem"
Abhishek Thakur
Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)
Abhishek Thakur
Live Q&A: Getting Started With Data Science
Abhishek Thakur
WTFML: Simple, reusable code for PyTorch models
Abhishek Thakur
Talks # 8: Sebastián Ramírez; Build a machine learning API from scratch with FastAPI
Abhishek Thakur
Data Science PC Configs: From Low Range to Super-High Range
Abhishek Thakur
BERT Model Architectures For Semantic Similarity
Abhishek Thakur
I just got access to GitHub's Codespaces and it's amazing!
Abhishek Thakur
Talks # 9: Vladimir Iglovikov; Detecting Masked Faces In The Pandemic World
Abhishek Thakur
Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)
Abhishek Thakur
Docker For Data Scientists
Abhishek Thakur
How To Become A Data Scientist In 1 Year (Learn From A Real World Example)
Abhishek Thakur
Talks # 10: Tanishq Abraham; What are CycleGANs? (a novel deep learning tool in pathology)
Abhishek Thakur
Deploy Any Machine Learning Or Deep Learning Model On Google Cloud Platform (App Engine)
Abhishek Thakur
Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko
Abhishek Thakur
VS Code (codeserver) on Google Colab / Kaggle / Anywhere
Abhishek Thakur
Talks # 11: Jean-François Puget; Did you know GPUs are not just for Deep Learning?
Abhishek Thakur
End-to-End: Automated Hyperparameter Tuning For Deep Neural Networks
Abhishek Thakur
Deploy Any Machine Learning (or Deep Learning) Endpoint on Google Cloud Platform In 10 minutes
Abhishek Thakur
Ensembling, Blending & Stacking
Abhishek Thakur
⚡
AI Lesson Summary
✦ V3 skills
🛠 Hands-on
The video teaches the importance of exploratory data analysis (EDA) and data visualization in understanding data and building machine learning models, with examples from Kaggle notebooks and various datasets. It highlights the need to question assumptions, pay close attention to data, and be aware of bias in training data. The video also covers techniques for data visualization, including the use of heat maps, pie charts, and circular plots.
Key Takeaways
- Do exploratory data analysis (EDA) to understand data and avoid garbage in garbage out
- Plot your data
- Apply a log transform to improve distributions
- Communicate findings to stakeholders
- Be aware of bias and garbage in, garbage out
- Subsample data
- Perform EDA on subsamples
- Build models on subsamples
- Create custom visualizations using libraries like matplotlib or seaborn
💡 EDA and visualizations are crucial for understanding data and building machine learning models, and can help identify issues with data, such as bias and outliers, and improve model performance.
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Rethinking Indicators: From Price Measurement to Structural Observation
Medium · AI
Forecasting with Augmented Dynamic Adaptive Model, Language Modeling from Scratch | Issue 84
Medium · Machine Learning
Forecasting with Augmented Dynamic Adaptive Model, Language Modeling from Scratch | Issue 84
Medium · Data Science
From Raw Data to Profit: Designing a Full Trading Pipeline in Python
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI